Miniproject: Invasive species - petermr/CEVOpen GitHub Wiki
- Kanishka Parashar
- The dictionary contains information about aggressive invasive plants present worldwide.
- It consists of 469 plant invasive species.
- The dictionary contains: terms, name, taxon_name, taxon_common_name,synonyms, wikidataIDs, wikidataURL, wikipedia page, image, map of present invasive species.
Link to dictionary: https://github.com/petermr/CEVOpen/blob/master/dictionary/Invasive_species/invasive_plant.xml
- Development of dictionary plant_invasive to serve as a tool searching and annotating scientific articles
- Testing of getpapers; a web scrapper for open-source scientific literature and using it for the creation of a corpus "invasive"
- Running a dictionary-based search within the created corpus and drawing relationships between plants and country.
- Created a SPARQL Query by using values. Link: Invasive Species
-
getpapers
is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command. - It collects all freely available research papers in full text and xml format to your local machine.
-
Query code:
getpapers -q "(invasive plant species)" -k 100 -x -o invasive -f invasive/log.txt
- The command getpapers will initiate the process and -q refers to query which is to be searched. The query is entered in inverted commas as is done in "(invasive plant species) AND (essential oil)". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is invasive in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 100 limits our search to 100 files omly.
- getpapers used to create corpus "invasive" of plant invasive species.
General code syntax: getpapers -q <"project title "> -o <file name> -x<xml> -p<pdf> -k <number of papers requied>
- ami is a framework for gathering, searching, transforming scholarly publications, oriented towards STEM (Science, technology, Engineering, Medicine, Mathematics).
Ami section which is used to section the research papers into the front, body, back ,floats and groups. Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami .Which runs on command prompt.
General code syntax: ami -p <cproject> section
Query code:
ami –p Invasive_species section
Ami search which search and analysis the terms in your project repository and gives the frequency is terms and the histogram of your corpus.
General code syntax: ami –p <cprooject><directory> search –dictionary <path>
Query code:
ami -p Invasive_species search --dictionary invasive_plant
For search_lib, download PAPERS into PROJECTS , find SECTIONS and index with (DICTIONARIES and/or PATTERNS) into a searchable KNOWLEDGEBASE and analyse for new INSIGHTS.
General code syntax: python search_lib.py --dict --sect --proj
Link for search_lib result (Tester 3- Kanishka Parashar): https://github.com/petermr/openDiagram/wiki/Test-Report-for-Search_lib
- Collected freely available paper from EUROPMC. Once command executive.
Figure: Showing output of getpapers.
- Result of ami section. It sections the papers in directory.
Figure: OUTPUT of "ami-section"
-Results are in the form of table , histogram and in the each folder results.
Figure: OUTPUT of AMI search in table with frequency.
Figure: Plot of .SVG file.
State of Dictionary | content |
---|---|
entry | 831 |
term | 470 |
synonym | 1340 |
wikidataID | 463 |
Map | 19 |
wikipediaPage | 401 |
taxon_name | 916 |
taxon_common_name | 460 |
image | 639 |
map | 19 |
language (Non-english) | Chinease, Portugese,Swedish, Cebuano, Spanish, German, Urdu, Bohemian |
Name | 1383 |
wikidataURL | 460 |
Searching with dictionary | content | link |
---|---|---|
test corpus (corpora used) | 100 papers | https://github.com/petermr/CEVOpen/tree/master/minicorpora/invasive |
example of search commands: | ami -p invasive sections, ami -p invasive search --dictionary invaisve_plant.xml | |
brief tutorial on how to search with your dictionary | https://github.com/petermr/CEVOpen/wiki/Miniproject:-Invasive-species | |
use of dictionary | demonstration that other collegues can and have used your dictionary:Radhu. | |
analysis of outputs | United states has seen highly invasion followed by Europe and China. | |
analysis of result | https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D |
state of report | content |
---|---|
methods section | https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D#chapter-5- methodology |
results section | https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D#chapter-6- result and discussion |