GENE : TPS - petermr/CEVOpen GitHub Wiki
GENE DICTIONARY:
INTRODUCTION
The evolution of plant metabolic pathways synthesizing volatile organic compounds has been beneficial because plant volatile organic compounds (VOCs) help in pollination, plant defense (herbivore attack) and abiotic stress also. Among VOCs, terpenes account for a larger proportion. These terpenes are mainly synthesized by methylerythritol phosphate (MEP) and mevalonic acid (MVA) pathways. Terpene synthase (TPS) enzymes from these 2 pathways play a crucial role in modifying one terpene to another. Though TPSs from some organisms have been identified and well characterized, there is a huge gap between functional annotation and actual enzymatic activity of particular TPS in plants. Here, we attempt to collect and classify all TPS genes that are available from genomics studies. Additionally, analyzing TPS gene density for monocots and dicots supports the biochemical traits of those species.
Giulia, Vasant and Sagar started Gene dictionary.
METHOD
-
I got 7572 UNIPROTKB AC/ID from following links.
UNIPROTKB AC/IDs:Terpene synthase
UNIPROTKB AC/IDs:Terpene synthase C
Please find above both combined.
-
I also searched uniprot for "terpene synthase AND reviewed:yes" and found new 945 TPS genes.
-
After removing duplicates (point 1 and 2) and entries with "deleted" annotation, I have total 8409 TPS genes.
-
Gene names, synonyms, organism names (and IDs), protein information, Enzyme Commission number and Gene ontology (molecular information) was retrieved from Uniprot (https://www.uniprot.org/uploadlists/).
-
Gene identifier IDs such as AT5G23960 for Arabidopsis (TPS21) are being used in literature. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268506/table/t02_01/)
-
So, I mapped Primary and secondary identifiers from Phytomine. (https://phytozome.jgi.doe.gov/phytomine/template.do?name=Proteins%20with%20Two%20PFAM%20Domains&scope=all).
-
Found 7 new plant species from (https://digitalcommons.wustl.edu/cgi/viewcontent.cgi?filename=5&article=9240&context=open_access_pubs&type=additional) From these 7 species and PFAM domain information, I retrieved Gene Identifier IDs for 510 new TPS genes.
-
Next, I collected 3032 new TPS genes from
Phytomine https://phytozome.jgi.doe.gov/phytomine/begin.do
http://www.nipgr.ac.in/terzyme.html
http://radish.kazusa.or.jp/cgi-bin/keyword.cgi.
https://viggs.dna.affrc.go.jp/
-
Finally, dictionary now contain 11951 TPS genes (8409+510+3032).
RESULTS
- TPS GENE CLASSIFICATION
Type of TPS | Number of TPS Genes |
---|---|
Monoterpene sythase | 1062 |
Sesquiterpene sythase | 2273 |
Diterpene sythase | 681 |
Prenyl transferase | 179 |
Triterpenoid sythase | 94 |
Uncharacterized | 3237 |
Unannotated genome,Terpen synthase domain containing protein | 4425 |
Total TPS Genes | 11,951 |
-
TPS Gene distribution in different species
-
Creating eo_Gene dictionary and minicorpus:
I used following 2 approaches:
A) I created txt file containing list of all gene names.
B) I created txt file containing list of all species names and terms such as TPS, TPS1, TPS2 and so on.
I used following command to create dictionary
amidict -v --dictionary eo_Gene--directory gene --input genee.txt create --informat list --outformats xml
pls find dictionary here
getpapers -q "(terpene synthase)" -o corporaTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded 500 papersgetpapers -q "(terpene synthase) AND (characterisation) AND (characterization)" -o corpusTPS -x -p -k 500 -f corporaTPS/log.txt
downloaded around 35 papers -
Testing above eo_Gene dictionaries:
ami -p "corporaTPS" section
ami -p "corporaTPS" search --dictionary eo_Gene.xml
-
eo_Gene1
eo_gene
-
Difficulties:
a) Some papers mention TPS in Vitis vinifera as VvTPS and VviTPS. Some use TPS1 or TPS01.
b) Gene names are in tables, figures or supplementary files.
Creating TPS corpus:
-
Date 2/8/2021
I queried https://europepmc.org/ for following searches and got results as:
Query Number of hits terpene synthase 4308 terpene synthase plant 3447 terpene synthase plant volatile 1200 terpene synthase plant TPS 650 terpene synthase TPS plant volatile 376 terpene synthase TPS plant volatile compounds 355 (Research articles 312) -
I continued TPS corpus on Date 3/8/2021, 4/8/2021 and 5/8/2021
For 312 papers, I looked PMCID, Plant, Compound and TPS nomenclature availability.
-
Date 5/8/2021
Pls find TPS corpus 312 papers