Scoping through TPS Corpus - petermr/CEVOpen GitHub Wiki

Scoping through TPS Corpus:

Date 2/8/2021

I queried https://europepmc.org/ for following searches and got results as:

Query	Number of hits
terpene synthase	4308
terpene synthase plant	3447
terpene synthase plant volatile	1200
terpene synthase plant TPS	650
terpene synthase TPS plant volatile	376
terpene synthase TPS plant volatile compounds	355 (Research articles 312) only 188 mention both TPS & compounds

I continued TPS corpus on Date 3/8/2021, 4/8/2021 and 5/8/2021

For 312 papers, I looked PMCID, Plant, Compound and TPS nomenclature availability.
Date 5/8/2021

Pls find TPS corpus 312 papers
**Date 6/8/2021 I continued improving scooping through TPS corpus.
**Date 9/8/2021

Pls find improved TPS corpus 91 papers
**Date 10/8/2021 and 11/8/2021

continued improving scooping through TPS corpus and INYAS presentation slides.

Out of 312 papers, only 188 papers mention both TPS and volatile compounds.
Date 16/8/2021

Arabidopsis

Camellia sinensis

Cinnamomum

Citrus

Lavandula

Nicotianna

Solanum

Vitis vinifera

INYAS Interns:

TPS genes for different species.

Camellia

Oryza

Malus

Citrus

Mentha

Vitis

Zea

Develop corpus "terpene synthase oryza"

Extract terms from papers.

Create dictionary and test.

Prenyltransferases from medicinal plants

Classify those TPS for each subspecies

Check if AtTPS1 is related to OsTPS1 or something similar in oryza corpus.
**Date 18/8/2021

Pls find TPS volatile corpus 121 papers
**Date 19/8/2021 Created a template for the 5 KARYA projects https://github.com/petermr/CEVOpen/wiki/crop5
**Date 23/8/2021 I created testtps dictionary

testtps

full data table testtps
**Date 24/8/2021, 25/8/2021 and 26/8/2021 Extracting volatile compounds from 121 papers (point 9).

volatiles from 121 corpus
**Date 24/8/2021, 25/8/2021 and 26/8/2021 Extracting volatile compounds from 121 papers (point 9)
Date 27/8/2021, 30/8/2021
Date 31/8/2021, 1/9/2021 Helping KARYA interns with installation of pygetpapers and ami3. 2/9/2021 meeting. 3/9/2021 Helping NIPGR intern with same.
Date 6/9/2021 installing softwares for https://github.com/petermr/crops/tree/main/metadata_analysis set up a virtual environment.

python -m venv project_env creating env

project_env\Scripts\activate.bat env activation

(Warning:This Python interpreter is in a conda environment, but the environment has not been activated. Libraries may fail to load. To activate this environment run above command)

Installed anaconda and then run C:\Users\user\anaconda3\Scripts\activate base

pip install scispacy

copy requirements.txt into sagar jadhav

Use conda to install and manage different versions of Python

conda create --name project_env python=3.6.0

conda activate project_env
Installed python 3.6, pycharm. Metadata analysis script runs but ami3 is not installed on my mac.
Installed ami3 on my mac. Set path.
Finding species that are highly represented in literature pygetpapers -q "terpene synthase TPS plant" -o TPS -p -k 650

In order to run METADATA ANALYSIS script by Shweata, I followed following protocol

Create folder, Open folder into pycharm and run commands or click on **add interpreter**, then click on conda environment, select python 
3.6, select conda path. Run the commands 

`conda create --name project_env python=3.6.0`

`conda activate project_env`

Ran metadata analysis script, 1st ran downloaded papers (pygetpapers -q "terpene synthase TPS plant" -o TPS -p -k 620) and shown lxml not installed error. so pip install lxml. Then commented to avoid paper download again. Instead of Citrus, I added TPS. I also uncommented lines 164, 165 and 166.
Please, find TPS metadata analysis output TPS metadata analysis
Extract "TPS conatining sentences": I used . (dot) in line 144 Shweata script. [words = text.split(".")] and also removed line 175.
Please, find TPS Senetences extraction TPS Sentences Extraction
Created TPS pathway dictionaries TPS pathway TPSpathway
Created dictionary for abbreviations of binomial nomenclature abbreviation binomial
CROP TPS diction

git cloned pyami. set path by running command open -e .bash_profile. then copying the following. export P2_HOME=/Users/sagar/pyami export PATH=$PATH:$P2_HOME/py4ami
Install pycharm. created folder valdict in pyami. add interpreter conda env, python 3.8, select conda path. Run the commands

conda create --name project_env python=3.8.0

conda activate project_env save. close.

Reopen folder. run pip install pytest. run test_pyamidict.py then gave lxml error. pip install lxml. run again test_pyamidict.py . then gave py4ami module not found error. then run pip install py4ami. Then gave error ImportError: cannot import name 'AMIDict' from 'py4ami.dict_lib'.

TPS enzyme dictionary

wiki binomial abbreviation

18/11/2021 Documentation of crops repository.

19/11/2021 Uploading corpora to crops repository

Move the file (folder) you'd like to upload to GitHub into the local directory that was created when you cloned the repository. Open Terminal. Change the current working directory to your local repository. cd crops Stage the file for committ to your local repository. git add . Commit the file that you've staged in your local repository. git commit -m "Add existing file" Push the changes in your local repository to GitHub.com. git push give ur username. generate token by going to settings then developer settings. copy token and paste into the password.