Report 3 - GeorgeIniatis/Blood_Brain_Barrier_Drug_Prediction GitHub Wiki

  • Worked on the dataset
    • Process followed available at Dataset Creation Journal
    • Retrieved all chemical descriptors I wanted from PubChem API. No need for RDKit as far I can see
    • Retrieved side effects from SIDER dataset
    • Current size of dataset: 2107, after removing any duplicates and unknown compounds. Available in the repo (Dataset.xlsx) along with the code used to create it (modify_dataset.py)
  • Had a short look at Automated Google Searches
    • Not entirely familiar with web scraping
    • Will search for a tutorial/resources
    • Thinking of using the drug names in the SIDER dataset to perform the Google Searches and discover BBB permeability so we have a larger set of drugs with known side effects
  • Question/Topics to discuss:
    • General feedback about the project so far. Anything to improve