Miniproject: Invasive species - petermr/CEVOpen GitHub Wiki

InvasiveSpecies

Owner and collaborators

  • Kanishka Parashar

Overview

  1. The dictionary contains information about aggressive invasive plants present worldwide.
  2. It consists of 469 plant invasive species.
  3. The dictionary contains: terms, name, taxon_name, taxon_common_name,synonyms, wikidataIDs, wikidataURL, wikipedia page, image, map of present invasive species.

Link to dictionary: https://github.com/petermr/CEVOpen/blob/master/dictionary/Invasive_species/invasive_plant.xml

Objectives:

  • Development of dictionary plant_invasive to serve as a tool searching and annotating scientific articles
  • Testing of getpapers; a web scrapper for open-source scientific literature and using it for the creation of a corpus "invasive"
  • Running a dictionary-based search within the created corpus and drawing relationships between plants and country.

things you have done

Methods and Methodology:

Getpapers:

  • getpapers is a simple, powerful tool for querying repositories of scholarly articles using a simple one-line command.
  • It collects all freely available research papers in full text and xml format to your local machine.

Retrieving papers from EUPMC using getpapers

  • Query code: getpapers -q "(invasive plant species)" -k 100 -x -o invasive -f invasive/log.txt
  • The command getpapers will initiate the process and -q refers to query which is to be searched. The query is entered in inverted commas as is done in "(invasive plant species) AND (essential oil)". The next element is -o which refers to output directory and the parameter that follows it in the name of the directory which is invasive in our case. Then, -x -p corresponds to xml and pdf files to be included in our search and -k 100 limits our search to 100 files omly.
  • getpapers used to create corpus "invasive" of plant invasive species.

General code syntax: getpapers -q <"project title "> -o <file name> -x<xml> -p<pdf> -k <number of papers requied>

ami:

  • ami is a framework for gathering, searching, transforming scholarly publications, oriented towards STEM (Science, technology, Engineering, Medicine, Mathematics).

ami section:

Ami section which is used to section the research papers into the front, body, back ,floats and groups. Sectioning of downloaded files will create a tree structure for us which will help in exploring the content of the file. Sectioning done using section function of ami .Which runs on command prompt.

General code syntax: ami -p <cproject> section

Query code:

            ami –p Invasive_species section

ami search:

Ami search which search and analysis the terms in your project repository and gives the frequency is terms and the histogram of your corpus.

General code syntax: ami –p <cprooject><directory> search –dictionary <path>

Query code:

            ami -p Invasive_species search --dictionary invasive_plant

Search_lib:

For search_lib, download PAPERS into PROJECTS , find SECTIONS and index with (DICTIONARIES and/or PATTERNS) into a searchable KNOWLEDGEBASE and analyse for new INSIGHTS.

General code syntax: python search_lib.py --dict --sect --proj

Link for search_lib result (Tester 3- Kanishka Parashar): https://github.com/petermr/openDiagram/wiki/Test-Report-for-Search_lib

Result and Discussion

Result of getpapers.

  • Collected freely available paper from EUROPMC. Once command executive.

image

Figure: Showing output of getpapers.

Result of AMI section

  • Result of ami section. It sections the papers in directory.

image

Figure: OUTPUT of "ami-section"

Result of AMI search

-Results are in the form of table , histogram and in the each folder results.

image

Figure: OUTPUT of AMI search in table with frequency.

image image

Figure: Plot of .SVG file.

Summary

State of Dictionary content
entry 831
term 470
synonym 1340
wikidataID 463
Map 19
wikipediaPage 401
taxon_name 916
taxon_common_name 460
image 639
map 19
language (Non-english) Chinease, Portugese,Swedish, Cebuano, Spanish, German, Urdu, Bohemian
Name 1383
wikidataURL 460
Searching with dictionary content link
test corpus (corpora used) 100 papers https://github.com/petermr/CEVOpen/tree/master/minicorpora/invasive
example of search commands: ami -p invasive sections, ami -p invasive search --dictionary invaisve_plant.xml
brief tutorial on how to search with your dictionary https://github.com/petermr/CEVOpen/wiki/Miniproject:-Invasive-species
use of dictionary demonstration that other collegues can and have used your dictionary:Radhu.
analysis of outputs United states has seen highly invasion followed by Europe and China.
analysis of result https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D
state of report content
methods section https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D#chapter-5- methodology
results section https://github.com/petermr/CEVOpen/wiki/Mini-Project:-%E2%80%9CSemantic-analysis-of-the-literature-on-Plant-Invasive-Species%E2%80%9D#chapter-6- result and discussion
⚠️ **GitHub.com Fallback** ⚠️