Tutorial On Machine Learning

September 18, 2024 · View on GitHub

Python and R tutorial on Machine Learning

tree classifiers

Photo by Lukasz Szmigiel on Unsplash

 

This section is dedicated to tutorials about linear algebra (principle of mathematics for machine learning), machine learning algorithms (clustering, linear regression, classification, and so on), data science basics (data frame, data visualization, etc...), and principles of graph theory.

In this section, you will find the Jupiter Notebook for the tutorial I published in Medium. I suggest reading the tutorial and the companion tutorial code in the order provided in the table below. For practical reasons, I have divided some of the tutorials into more than one part (allowing me to concentrate in one of the tutorials on the theoretical part and the others on the programming). Tutorial dedicated only to the theory have not a linked Jupiter notebook containing the Python code used for the model and the graph. I wrote and test the code in Google Colab in order to make it reproducible.

I am progressively adding also some R tutorials, I decided to upload the R-scripts so you can test them. Check the table below where I list the Colab Notebooks, the R-scripts, and the companion articles.

Moreover, you may find here some Colab notebooks without a theoretical tutorial (yet). I decided to upload the code before I have finish to write the theoretical part (this would be indicated). I am convinced that the code alone is already beneficial. I would successively publish on Medium the written article (with details and comments on the code).

You can open a Github Issue for any request, comment or any issue you encounter.

Index

  • Tutorial List - The list of tutorials and corresponding code
  • Utility - A list of functions and code you can use for your projects.
  • Scripts - A list of scripts you can execute on your PC.

Tutorials

TutorialNotebookDescription
Data manipulationnotebookCommon data manipulation tasks and data issues - MEDIUM ARTICLE NOT YET PUBLISHED
Pandas CheatsheetnotebookIntroduction to Pandas library - MEDIUM ARTICLE NOT YET PUBLISHED
Python Data VisualizationnotebookIntroduction to data visualization with Python- MEDIUM ARTICLE NOT YET PUBLISHED
Regular expression in PythonnotebookRegular expression in Python - MEDIUM ARTICLE NOT YET PUBLISHED
Matrix operations for machine learningnotebookMatrix operations for machine learning in Python - MEDIUM ARTICLE NOT YET PUBLISHED
Matrix operations for machine learning - part 2notebookMatrix operations for machine learning in Python, the second part - MEDIUM ARTICLE NOT YET PUBLISHED
Tree classifiers----Introduction to tree classifiers, theory and math explained simple - MEDIUM ARTICLE NOT YET PUBLISHED
Tree classifiersnotebookTraining of tree classifiers - MEDIUM ARTICLE NOT YET PUBLISHED
Visualize decision treenotebookVisualization of decision tree - MEDIUM ARTICLE NOT YET PUBLISHED
Train and visualize decision tree in RR-scriptPlot and visualize a decision tree in R - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for classification - part InotebookHow to calculate, code, and interpret evaluation metrics for classification - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for classification - part II---Part II about imbalance dataset and multiclass classification - MEDIUM ARTICLE NOT YET PUBLISHED
Linear Regression - OLSnotebookLinear regression introduction, least square method - MEDIUM ARTICLE NOT YET PUBLISHED
Evaluation metrics for regressionnotebookEvaluation metrics for regression - MEDIUM ARTICLE NOT YET PUBLISHED
Train and visualize regression treenotebookTrain, visualize regression decision tree in Python- MEDIUM ARTICLE NOT YET PUBLISHED
Linear regression in RR-scriptTrain and visualize a linear regression model in R- MEDIUM ARTICLE NOT YET PUBLISHED
Introduction to Python iGraphNotebookA notebook to refresh the use of Python iGraph
Introduction to R iGraphNotebookA notebook to refresh the use of Python iGraph
Introduction to point processingJupiter NotebookWhether you are doing medical image analysis or you use Photoshop, you are using point preprocessing
Introduction to ThresholdingJupiter NotebookA simple but powerful system for segmenting images
A practical guide to neighborhood image processingJupiter NotebookLove thy neighbors: How the neighbors are influencing a pixel
A practical guide to morphological image processingJupiter Notebooksimple but powerful operations to analyze images
Dividi et Impera: A Practical Guide to BLOB Analysis and Extraction with PythonJupiter NotebookSimple yet powerful techniques to extract objects.
Harnessing the power of colors in PythonJupiter NotebookColor images have more hidden information than you think
Image Segmentation with Simple and Elegant MethodsJupiter NotebookWhy the need for a deep learning model with hundreds of layers? Sometimes, there are simpler and faster models.
A Guide to Geometric Transformation with PythonJupiter NotebookWhy the need for Photoshop when you can have fun with Python
Graph ML: A Gentle Introduction to Graphs--A deep introduction to these mysterious creatures.
Graph ML: fantastic graphs and where to find them--Why to use a graph? which application?
Graph ML: introduction to NetworkXJupiter NotebookHow to start with handle graph in Python using the most popular library
Graph ML: Graph traversal algorithms in a nutshellJupiter NotebookA quick glance at bread-first and depth-first search algorithms for graph machine learning
Graph ML: Introduction to Python iGraphJupiter NotebookPython iGraph is a wide-use library to handle graphs. how do start using it? why?
Graph ML: How Do you Visualize a Large network?Jupiter NotebookSeeing is understanding: How to visualize large networks

Back to General Index -- Back to local index  

Utility

I am providing some useful functions and classes that can be ready to use. I am providing them as executable Python files that you can import and use. You find them in the utility folder.

Check in the utiliy folder the example of usages and the explanation about them. Each function is a document and you can access the provided documentation.

For example, if you want to use my regression_report function in Colab you can import it in this way:

import sys
import os

user = "SalvatoreRa"
repo = "tutorial"
src_dir = "machine%20learning/utility/"
pyfile = "regression_report.py" #here the name of the file py

url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{pyfile}"
!wget --no-cache --backups=1 {url}
#copy here the link of the file
py_file_location = "https://github.com/SalvatoreRa/tutorial/blob/main/machine%20learning/utility/regression_report.py"
sys.path.append(os.path.abspath(py_file_location))
#here the importing
from regression_report import regression_report 

Or alternatively, you can use in this way in Colab:

wget.download('https://raw.githubusercontent.com/SalvatoreRa/tutorial/main/machine learning/utility/utils_NA.py')
!pip install wget 
from utils import *
import torch
import seaborn as sns

#generate different type of NA
X_miss_mcar = produce_NA(df, p_miss=0.4, mecha="MCAR")
X_miss_mar = produce_NA(df, p_miss=0.4, mecha="MAR", p_obs=0.5)
X_miss_mnar = produce_NA(df, p_miss=0.4, mecha="MNAR", opt="logistic", p_obs=0.5)
X_miss_quant = produce_NA(df, p_miss=0.4, mecha="MNAR", opt="quantile", p_obs=0.5, q=0.3)

FileDescription
Regression reportPrint different regression metric (similar to classification report of scikit-learn)
Upset plotPlot an upset plot to visualize missing data and their distribution in the columns
Random NA generationIntroduces random missing values into a dataset.
Utils NAa set of utils to generate and insert NA in your dataset
DR_utilsa set of utils for dimensional reduction techniques
Correlation_utilsa set of utils for correlation dimension

Back to General Index -- Back to local index    

Scripts

Here you can find a list of scripts that have been used to generate images for the tutorials or that can be used to analyze data and models. You can easily adapt to your needs.

For example, if you want to use my MAR script in your pc you can simply execute it in this way:

python3 MAR.py

Or alternatively:

python3.8 MAR.py
FileDescription
MARLoop to test different algorithms for MAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots
MNARLoop to test different algorithms for MNAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots
MCARLoop to test different algorithms for MCAR missing value imputation. The script is generating missing values, testing different imputation methods, and generating the plots

Back to General Index -- Back to local index    

Contributing

License

This project is licensed under the MIT License

Bugs/Issues

Comment or open an issue on Github