GENE : TPS - petermr/CEVOpen GitHub Wiki

GENE DICTIONARY:

INTRODUCTION

The evolution of plant metabolic pathways synthesizing volatile organic compounds has been beneficial because plant volatile organic compounds (VOCs) help in pollination, plant defense (herbivore attack) and abiotic stress also. Among VOCs, terpenes account for a larger proportion. These terpenes are mainly synthesized by methylerythritol phosphate (MEP) and mevalonic acid (MVA) pathways. Terpene synthase (TPS) enzymes from these 2 pathways play a crucial role in modifying one terpene to another. Though TPSs from some organisms have been identified and well characterized, there is a huge gap between functional annotation and actual enzymatic activity of particular TPS in plants. Here, we attempt to collect and classify all TPS genes that are available from genomics studies. Additionally, analyzing TPS gene density for monocots and dicots supports the biochemical traits of those species.

Giulia, Vasant and Sagar started Gene dictionary.


METHOD

  1. I got 7572 UNIPROTKB AC/ID from following links.

    UNIPROTKB AC/IDs:Terpene synthase

    UNIPROTKB AC/IDs:Terpene synthase C

    Please find above both combined.

    UNIPROTKB AC/ID retrieval for TPS

  2. I also searched uniprot for "terpene synthase AND reviewed:yes" and found new 945 TPS genes.

    uniprot terpene synthase

  3. After removing duplicates (point 1 and 2) and entries with "deleted" annotation, I have total 8409 TPS genes.

  4. Gene names, synonyms, organism names (and IDs), protein information, Enzyme Commission number and Gene ontology (molecular information) was retrieved from Uniprot (https://www.uniprot.org/uploadlists/).

    uniprot

    uniprot 8409

  5. Gene identifier IDs such as AT5G23960 for Arabidopsis (TPS21) are being used in literature. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268506/table/t02_01/)

  6. So, I mapped Primary and secondary identifiers from Phytomine. (https://phytozome.jgi.doe.gov/phytomine/template.do?name=Proteins%20with%20Two%20PFAM%20Domains&scope=all).

  7. Found 7 new plant species from (https://digitalcommons.wustl.edu/cgi/viewcontent.cgi?filename=5&article=9240&context=open_access_pubs&type=additional) From these 7 species and PFAM domain information, I retrieved Gene Identifier IDs for 510 new TPS genes.

    510 TPS

  8. Next, I collected 3032 new TPS genes from

    3032 TPS

    Phytomine https://phytozome.jgi.doe.gov/phytomine/begin.do

    http://www.nipgr.ac.in/terzyme.html

    http://radish.kazusa.or.jp/cgi-bin/keyword.cgi.

    www.rosaceae.org

    www.solgenomics.net

    www.citrusgenomedb.org

    www.pulsedb.org

    https://viggs.dna.affrc.go.jp/

    www.cucurbitgenomics.org

    www.banana-genome-hub.southgreen.fr

    www.morus.swu.edu.cn

  9. Finally, dictionary now contain 11951 TPS genes (8409+510+3032).

    eo_Gene with classification in excel



RESULTS

  1. TPS GENE CLASSIFICATION
Type of TPS Number of TPS Genes
Monoterpene sythase 1062
Sesquiterpene sythase 2273
Diterpene sythase 681
Prenyl transferase 179
Triterpenoid sythase 94
Uncharacterized 3237
Unannotated genome,Terpen synthase domain containing protein 4425
Total TPS Genes 11,951

TPS Classification



  1. TPS Gene distribution in different species

    all taxons

    monocots

    Dicots

    Dicots

    Gymnosperms and others

    gene density monocot


  1. Creating eo_Gene dictionary and minicorpus:

    I used following 2 approaches:

    A) I created txt file containing list of all gene names.

    B) I created txt file containing list of all species names and terms such as TPS, TPS1, TPS2 and so on.

    I used following command to create dictionary

    amidict -v --dictionary eo_Gene--directory gene --input genee.txt create --informat list --outformats xml

    pls find dictionary here

    Gene1

    eogene

    getpapers -q "(terpene synthase)" -o corporaTPS -x -p -k 500 -f corporaTPS/log.txt downloaded 500 papers

    getpapers -q "(terpene synthase) AND (characterisation) AND (characterization)" -o corpusTPS -x -p -k 500 -f corporaTPS/log.txt downloaded around 35 papers

  2. Testing above eo_Gene dictionaries:

    ami -p "corporaTPS" section

    ami -p "corporaTPS" search --dictionary eo_Gene.xml

  3. eo_Gene1

    Gene1

    eo_gene

    gene

  4. Difficulties:

    a) Some papers mention TPS in Vitis vinifera as VvTPS and VviTPS. Some use TPS1 or TPS01.

    b) Gene names are in tables, figures or supplementary files.



Creating TPS corpus:

  1. Date 2/8/2021

    I queried https://europepmc.org/ for following searches and got results as:

    Query Number of hits
    terpene synthase 4308
    terpene synthase plant 3447
    terpene synthase plant volatile 1200
    terpene synthase plant TPS 650
    terpene synthase TPS plant volatile 376
    terpene synthase TPS plant volatile compounds 355 (Research articles 312)
  2. I continued TPS corpus on Date 3/8/2021, 4/8/2021 and 5/8/2021

    For 312 papers, I looked PMCID, Plant, Compound and TPS nomenclature availability.

  3. Date 5/8/2021

    Pls find TPS corpus 312 papers