LAB1 - VIJAYAYERUVA/CS5560-KDM GitHub Wiki

CS5560 Knowledge Discovery Management

Lab Assignment #1

Lab Submission before September 10th, 2018 (11:59 PM)

Submit a short lab report including screenshots and source code & data to GitHub. Post your GitHub URL through the Lab1 form at https://goo.gl/forms/MP2n7LQGmm2atsZo2

Name: Vijaya Kumari Yeruva

Student Id: 11

Research topic: Drug Abuse/ Substance Abuse

Dataset:

Collected the 10 unique PubMed abstracts along with the titles related to Drug Abuse. Collected the data using two keywords Drug Abuse and Drug Addiction with the help of given code (Retrive_abstract)

Abstracts

Example abstract: The sexual behaviors of 15- to 24-year-olds increase the risk of this population to acquire sexually transmitted infections (STIs). The present study aimed to describe the sexual behavior in the transition to adulthood Brazilian population and its association with STI history. We analyzed cross-sectional data collected from 8562 sexually active women and men who participated in the National Survey of Human Papillomavirus Prevalence (POP-Brazil). This large-scale survey enrolled participants from 26 Brazilian capitals and the Federal District. Professionals from primary care facilities were trained to collect data utilizing a standardized questionnaire with questions on sociodemographic, sexual behavior, and drug use. We constructed a Poisson model with robust variance for both crude and adjusted analysis to investigate the associations between the variables. To adjust the distribution of the sample to the study population, we weighted the measures by the population size in each city and by gender. There were differences in several aspects from sexual behavior between genders. This is the first report regarding sexual behavior in a nationally representative population sample in Brazil. This study provides more valid estimates of sexual behavior and associated STIs, identifying important differences in sexual behavior and identifying predictors for referred STIs among females and males.

Statics:

All

Code snippet to read abstracts and their titles form the XML data:

Abstracts

Output:

output

Code snippet to count the no of words in the given abstracts:

Abstracts

Output:

output

Code snippet to count the no of words in the given abstracts that are verified by WordNet:

Abstracts

Code snippet to count the no of words in the given abstracts that are verified by Bioportal API:

API Key:

Abstracts

Adding Labels to output file whenever the term is available in the Bioportal API:

Abstracts

Output:

Abstracts

Challenges Faced:

Most of the times Bioportal API thrown an error ‘FileNotFoundException’, as most of the terms are not available in existing API.

Statics for individual abstracts:

Abstract1:

Abstracts

Abstract2:

Abstracts

Abstract3:

Abstracts

Abstract4:

Abstracts

Abstract5:

Abstracts

Abstract6:

Abstracts

Abstract7:

Abstracts

Abstract8:

Abstracts

Abstract9:

Abstracts

Abstract10:

Abstracts

Code snippet to count the no of Nouns and Verbs in POS tagging & PERSON and LOCATION in NER tagging:

Abstracts

Code snippet to count the no of triplets:

Abstracts