Meeting 3 - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki

Meeting Minutes

This meeting mainly focussed on Q/A

Q: Many datasets use different thresholds for logBB and therefore to define BBB permeability. Should I just choose one and stick to it?

A threshold used by other papers will be used. Could lead to some interesting discussion in the dissertation.
Led to a discussion about the nature of the problem itself and what strategy should be followed. It was agreed with the supervisor to first focus on creating a model using binary classification and then maybe going for regression. This is again something that could be discussed in the dissertation.

Q: While doing some dataset exploration I used PubChem to compare some of the descriptors and I noticed some differences. Different number of hydrogen bond acceptors, etc. Should I take the data as it is in the dataset or try to update it with the one available at PubChem?

The supervisor will have a look at the examples. Advised to have a look at RDKit (Uses SMILES as input) which can be used to produce/return the descriptors

Q: I read an interesting paper discussing the idea of having as descriptors the drug/compound side effects. Would it be something interesting to follow and try to implement?

The supervisor agreed that this is an interesting idea that could be followed and would definitely lead to some good results
The side effects would be available for only some of the drugs and possible none of the compounds
The supervisor pointed me to another database called FAERS
We could either use FAERS or SIDER to get the side effects of drugs. Decision will be made when appropriate

Q: Should I read more papers or focus on the dataset?

The supervisor advised to focus on the dataset

Some important comments were made

One of the challenges is going to be going from the name of the drug/compound to its SMILES format
Drugs can have multiple SMILES representations. Could PubChem be used to clear this up?
DrugBank and WikiData could be useful

Action Plan

Continue working on the dataset
Have a look at RDKit and PubChem API
Have a look at Automated Google Searches

Update

There seem to be some debate on what an actual hydrogen bond acceptor is and how to calculate their number, which would explain the irregularities found. Something that could be discussed in the dissertation.