Cheatsheet - Achronus/Machine-Learning-101 GitHub Wiki
Table of Contents
General
- import numpy as np
- import pandas as pd
- import matplotlib.pyplot as plt
Scikit-Learn (Sklearn)
General
-
.fit() - used to find the internal parameters of a model
-
.transform() - used to map new or existing values to data
-
.fit_transform() - does both fit and transform
-
.predict() - used to make predictions
-
from xgboost import XGBClassifier - XGBoost gradient boosting software
Data Preprocessing
- from sklearn.preprocessing import ...
- LabelEncoder - used for categorical data
- OneHotEncoder - used for dummy variables
- StandardScaler - used for standardising data
- MinMaxScaler - Used for normalising data
- Imputer - Used to replace empty spaces/missing data within a dataset
Model Selection
- from sklearn.model_selection import ...
- train_test_split - used for splitting data into test sets and training sets
- cross_val_score - used for K-Fold Cross Validation
- GridSearchCV - used for grid search (tuning models)
Accuracy & Predictions
- from sklearn.metrics import confusion_matrix - used to identify the accuracy of a trained model
Models
-
from sklearn.preprocessing import PolynomialFeatures - Used for creating Polynomial Regressions
-
from sklearn.svm import ...
-
from sklearn.linear_model import ...
- LinearRegression - used for Linear Regressions (single and multiple variables).
- LogisticRegression - used for Logistic Regressions
-
from sklearn.tree import ...
- DecisionTreeRegressor - used for Decision Tree Regression
- DecisionTreeClassifier - used for Decision Tree Classification
-
from sklearn.ensemble import ...
- RandomForestRegressor - used for Random Forest Regression
- RandomForestClassifier - used for Random Forest Classification
-
from sklearn.neighbors import KNeighborsClassifier - K-Neighbours Classification model
-
from sklearn.naive_bayes import GaussianNB - Naive Bayes model
-
import scipy.cluster.hierarchy as sch - A popular library that can be used for dendrogram creation in Hierarchical Clustering
-
from sklearn.cluster import ...
- AgglomerativeClustering - A Hierarchical Clustering Model
- KMeans - K-Means clustering model
-
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA - Linear Discriminant Analysis model
-
from sklearn.decomposition import ...
Keras
-
from keras.models import Sequential - basic building block to creating a model
-
from keras.layers import ...
- Dense - basic function for linear models
- Dropout - used to add dropout to layers
- Flatten - used to flatten convolutional layers
- Conv2D - a basic convolutional layer
- MaxPooling2D - used to apply max pooling to a convolutional layer
-
from keras.wrappers.scikit_learn import KerasClassifier - used to wrap a sequential model to allow the model to be fit to datasets