LAB1 - VIJAYAYERUVA/CS5560-KDM GitHub Wiki
CS5560 Knowledge Discovery Management
Lab Assignment #1
Lab Submission before September 10th, 2018 (11:59 PM)
Submit a short lab report including screenshots and source code & data to GitHub. Post your GitHub URL through the Lab1 form at https://goo.gl/forms/MP2n7LQGmm2atsZo2
Name: Vijaya Kumari Yeruva
Student Id: 11
Research topic: Drug Abuse/ Substance Abuse
Collected the 10 unique PubMed abstracts along with the titles related to Drug Abuse. Collected the data using two keywords Drug Abuse and Drug Addiction with the help of given code (Retrive_abstract)
Example abstract: The sexual behaviors of 15- to 24-year-olds increase the risk of this population to acquire sexually transmitted infections (STIs). The present study aimed to describe the sexual behavior in the transition to adulthood Brazilian population and its association with STI history. We analyzed cross-sectional data collected from 8562 sexually active women and men who participated in the National Survey of Human Papillomavirus Prevalence (POP-Brazil). This large-scale survey enrolled participants from 26 Brazilian capitals and the Federal District. Professionals from primary care facilities were trained to collect data utilizing a standardized questionnaire with questions on sociodemographic, sexual behavior, and drug use. We constructed a Poisson model with robust variance for both crude and adjusted analysis to investigate the associations between the variables. To adjust the distribution of the sample to the study population, we weighted the measures by the population size in each city and by gender. There were differences in several aspects from sexual behavior between genders. This is the first report regarding sexual behavior in a nationally representative population sample in Brazil. This study provides more valid estimates of sexual behavior and associated STIs, identifying important differences in sexual behavior and identifying predictors for referred STIs among females and males.
Statics:
Code snippet to read abstracts and their titles form the XML data:
Output:
Code snippet to count the no of words in the given abstracts:
Output:
Code snippet to count the no of words in the given abstracts that are verified by WordNet:
Code snippet to count the no of words in the given abstracts that are verified by Bioportal API:
API Key:
Adding Labels to output file whenever the term is available in the Bioportal API:
Output:
Challenges Faced:
Most of the times Bioportal API thrown an error ‘FileNotFoundException’, as most of the terms are not available in existing API.
Statics for individual abstracts:
Abstract1:
Abstract2:
Abstract3:
Abstract4:
Abstract5:
Abstract6:
Abstract7:
Abstract8:
Abstract9:
Abstract10:
Code snippet to count the no of Nouns and Verbs in POS tagging & PERSON and LOCATION in NER tagging:
Code snippet to count the no of triplets: