Meeting 9 - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki

Meeting Minutes

This meeting focused on Q/A after a short status report recap

Q: What metric should I try to optimise? Precision, Recall, F1 Score, AUC

Don't rely on just one metric as it can lead to extremely wrong conclusions about the model's performance
Main ones are Precision, Recall, F1 Score, ROC, AUC
Report multiple ones. Precision + Recall + F1 Score
F1 score can be modified to give more weight to precision or recall

Q: Should I add a class weight to the models?

Yes since we have a class imbalance and the models performance seems to increase when used

Q: When using cross validation do I need to further evaluate my data using an independent test set?

Optimise the models using cross validations and then compare their performance on an independent test set
The test set should be roughly 20% of the original dataset with the same class imbalance

Q: Any SK-Learn best practices I should be aware of?

Tinkering with your data too much can make it lose its predictive ability in the real world
Scale data

Q: Any models that would work best and I should definitely give a try?

Logistic Regression
SVM with linear kernel (Should also give other kernels a try)
Random Forest
K-Nearest neighbours (Will most likely run into the same problems as TSNE, PCA, UMAP)

Action Plan

Model work
Have a look at the material sent over by the supervisor