Report 3 - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki

Worked on the dataset
- Process followed available at Dataset Creation Journal
- Retrieved all chemical descriptors I wanted from PubChem API. No need for RDKit as far I can see
- Retrieved side effects from SIDER dataset
- Current size of dataset: 2107, after removing any duplicates and unknown compounds. Available in the repo (Dataset.xlsx) along with the code used to create it (modify_dataset.py)
Had a short look at Automated Google Searches
- Not entirely familiar with web scraping
- Will search for a tutorial/resources
- Thinking of using the drug names in the SIDER dataset to perform the Google Searches and discover BBB permeability so we have a larger set of drugs with known side effects
Question/Topics to discuss:
- General feedback about the project so far. Anything to improve