Machine Learning Journal - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki
Experiment 1: Try to build Classification models
Classification models will make use of the Class as the label
Two different model categories:
Category 1: Models with just the Chemical Descriptors used as features
Category 2: Models with Chemical Descriptors, Side Effects and Indications used as features (does the addition of Side Effects and Indications to the Chemical Descriptors improve our predictive performance?)
Training sets:
For category 1 the whole dataset will be used, excluding the entries used in the Test set
For category 2 a subset of the dataset will be used, those entries that have Side Effects and Indications available, again excluding the entries used in the Test set
Test set:
Will be used to compare the models against against each other
20% subset of the dataset entries that have Chemical Descriptors and Side Effects and Indicators. This is to allow us to use compare the performance of the two different categories of models using the same test setperformance of the two different categories of models using the same test set
Experiment 2: Try to build Regression models
Regression models will make use of the LogBB as the label
Training set:
A subset of the dataset will be used, those entries that have LogBB available, again excluding the entries used in the Test set
Test set:
Will be used to compare the models against against each other
20% subset of the training set
Experiment 3: Try to find the most relevant Side Effects and Indications
Using RFECV
Models
Classification:
Dummy Classifier
Logistic Regression
Support Vector Classifier
K-Nearest Neighbour Classifier
Random Forest Classifier
Decision Tree Classifier
Stochastic Gradient Descent Classifier
Regression:
Dummy Regressor
Linear Regression
Support Vector Regression
K-Nearest Neighbour Regressor
Random Forest Regressor
Decision Tree Regressor
Stochastic Gradient Descent Regressor
Metrics
Will not rely on just one metric as it can lead to extremely wrong conclusions about the model's performance
Classification Models
Sensitivity/Recall:
How many of the actual positives are labelled as positive by our model
tp / (tp/fn)
Precision:
How many of positive predictions were actually true
tp / (tp/fp)
F1 Score:
Mean of precision and recall
Other versions that add more/less weight to precision or recall
Matthews correlation coefficient
Others that could be used:
ROC curce & AUC
PR curve (Better for class imbalance)
What do we care about most? False Positives or False Negatives?
Regression Models
Negated Mean Absolute Error
R2
Evaluation
Dummy models
Test set for each experiment
Permutation testing for model robustness
Common Practices
Data will be scaled
Some data exploration will be performed
Cross validation will be used to find the best hyperparameters for our models
Multiple metrics will be reported for each of our models
The models will take the class imbalance into account
The testing sets will be stratified, preserving the class imbalance and will be used to reach appropriate conclusions