Meeting 9 - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki

Meeting Minutes

This meeting focused on Q/A after a short status report recap

Q: What metric should I try to optimise? Precision, Recall, F1 Score, AUC

  • Don't rely on just one metric as it can lead to extremely wrong conclusions about the model's performance
  • Main ones are Precision, Recall, F1 Score, ROC, AUC
  • Report multiple ones. Precision + Recall + F1 Score
  • F1 score can be modified to give more weight to precision or recall

Q: Should I add a class weight to the models?

  • Yes since we have a class imbalance and the models performance seems to increase when used

Q: When using cross validation do I need to further evaluate my data using an independent test set?

  • Optimise the models using cross validations and then compare their performance on an independent test set
  • The test set should be roughly 20% of the original dataset with the same class imbalance

Q: Any SK-Learn best practices I should be aware of?

  • Tinkering with your data too much can make it lose its predictive ability in the real world
  • Scale data

Q: Any models that would work best and I should definitely give a try?

  • Logistic Regression
  • SVM with linear kernel (Should also give other kernels a try)
  • Random Forest
  • K-Nearest neighbours (Will most likely run into the same problems as TSNE, PCA, UMAP)

Action Plan

  • Model work
  • Have a look at the material sent over by the supervisor