ICP3 - VIJAYAYERUVA/CS5560-KDM GitHub Wiki

In Class Programming 3:

Deadline : 7thSept 2018 (11:59 AM)

Dataset:

  1. Collected the abstracts using Retrieve Abstracts code

  2. Eliminated the documents having null values rather than abstracts

  3. Dataset consist of 5 abstracts form my topic (Drug Abuse)

Example:

There may be substantial overlap in the risk factors for substance use and substance use disorders (SUD). Identifying risk factors for substance use initiation is essential for understanding the etiology and natural history of SUD and to develop empirically-based preventive interventions to reduce initiation.

Dataset

OpenIE:

  1. Parsed the above 5 abstracts form dataset to OpenIE API

  2. It will help us to extract meaning full triples from the sentence along with the score

  3. Most of the triplets are appropriate, however few of them are meaningless.

Example:

[(disorders,overlap in,risk factors for substance use,0.4489748241314442), (overlap,use,disorders,1.0), (overlap,use,SUD,1.0), (disorders,overlap in,risk factors,0.4489748241314442)]

Dataset

WordNet Synonyms:

  1. WordNet is a large lexical database of English.

  2. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

  3. Synsets are interlinked by means of conceptual-semantic and lexical relations.

  4. Most of the WordNet’s relations connect words from the same part of speech (POS).

  5. Most of the synonyms are appropriate (example: risk), however few of them are inappropriate (example: may).

Example:

may:Apr,April,Aug,August,Crataegus aestivalis,Crataegus apiifolia,Crataegus biltmoreana,Crataegus calpodendron,Crataegus coccinea,Crataegus coccinea mollis

risk:campaign,cause,chance,conditional probability,contingent probability,crapshoot,cross section,crusade,danger,drive

substance:acculturation,acknowledgement,acknowledgment,activator,adulterant,adulterator,agent,allergen,anomalous communication,antigen

Dataset