9.2_GO_enrichment - bennestor/hakea

GO enrichment

GO terms of Hakea genes with eggNOG mapper

I used eggNOG (http://eggnog-mapper.embl.de/) filtered to just Viridiplantae mapper to generate GO terms, but there are lots of mappers around.

45,786 sequences were annotated
24,052 sequences annotated with GO terms

Running GO term enrichment with Revigo

Used this script: https://philippbayer.github.io/Amphibolis_Posidonia_Comparison/GOenrichment.html#Revigo

Main scripts are topgo.R and revigo.R

plotrevigo.R is handy for enrichment network figures

GO enrichment of genes specific to Hakea (gene families highly divergent from other species)

   #Extract Hakea gene ids from orthogroups 
   cat Orthogroups.txt | grep -Fw -f hakea_specific.ids | tr ' ' '\n' | grep -E -v 'O.*' | sort -u > hakea_specific_gene.txt  
   #5868 sequences
   
   #Run topGO script in R
   compare('hakea_specific_gene.txt', './topgo_out/GO_results_hakea_specific_genes.csv', allgenes, annAT)
   
   #Run Revigo script in R
   write.csv((results_list[["GO_results_hakea_specific_genes.csv"]] %>% dplyr::filter(Eliminated == FALSE) %>% dplyr::select(TermID, Name, Value, Frequency) %>% arrange(Value)), "Revigo_hakea_specific_genes.csv")

GO enrichment of genes specific to Proteaceae GO enrichment is done on the Hakea gene IDs as a representative for the overall enriched genes in the Proteaceae-specific orthogroup (since I don’t have GO terms for Telopea and Macadamia).

   #Get ids of Proteaceae-specific genes
   cat Orthogroups.GeneCount.tsv | awk -F "\t" '$5>0 && $6>0 && $8>0 && $2==0 && $3==0 && $4==0 && $7==0 && $9==0' | cut -f 1 > proteaceae_specific.ids 
   #516 orthogroups 
   
   #Extract Hakea gene IDs from orthogroups (gemoma_nr.ids is a list of Hakea gene IDs, can make by grepping “>” then tr -d ‘>’ or similar). 
   cat Orthogroups.txt | grep -Fw -f proteaceae_specific.ids | tr ' ' '\n' | grep -Fw -f gemoma_nr.ids | sort -u > proteaceae_specific_gene.txt 
   #1962 sequences
   
   #Run topGO script in R
   compare('proteaceae_specific_gene.txt', './topgo_out/GO_results_proteaceae_specific_genes.csv', allgenes, annAT)
   
   #Run revigo script in R
   write.csv((results_list[["GO_results_proteaceae_specific_genes.csv"]] %>% dplyr::filter(Eliminated == FALSE) %>% dplyr::select(TermID, Name, Value, Frequency) %>% arrange(Value)), "Revigo_proteaceae_specific_genes.csv")

Genes in expanded gene families in Hakea

   #Getting orthogroup IDs from CAFE output
   cat Base_change.tab | cut -f 1,5 | grep "+" | grep -v "+0" | cut -f 1 > expanded_orthogroups.ids
   
   #Append ids of large filtered families 
   cat Base_change.tab | cut -f 1,5 | grep "+" | grep -v "+0" | cut -f 1 >> ../k2_e_1/expanded_orthogroups.ids 
   
   #Extract Hakea gene IDs from significantly expanded orthogroups 
   cat Orthogroups.txt | grep -Fw -f significant_orthogroups.ids | grep -Fw -f expanded_orthogroups.ids | tr ' ' '\n' | grep -Fw -f gemoma_nr.ids | sort -u > expanded_genes.txt  
   #527 significantly expanded orthogroups 
   #8104 hakea genes significantly expanded
   
   #Run topGO script in R
   compare('expanded_genes.txt', './topgo_out/GO_results_expanded_genes.csv', allgenes, annAT)
   
   #Run Revigo script in R
   write.csv((results_list[["GO_results_expanded_genes.csv"]] %>% dplyr::filter(Eliminated == FALSE) %>% dplyr::select(TermID, Name, Value, Frequency) %>% arrange(Value)), "Revigo_expanded_genes.csv")

Genes in contracted gene families in Hakea

   #Append ids of large filtered families 
   cat Base_change.tab | cut -f 1,5 | grep "\-" | cut -f 1 >>../k2_e_1/contracted_orthogroups.ids 
   
   #Extract Hakea gene IDs from contracted orthogroups 
   cat Orthogroups.txt | grep -Fw -f significant_orthogroups.ids | grep -Fw -f contracted_orthogroups.ids | tr ' ' '\n' | grep -Fw -f gemoma_nr.ids | sort -u > contracted_genes.txt  
   #748 significantly contracted orthogroups 
   #1487 Hakea genes 
   
   #Run topGO script in R
   compare('contracted_genes.txt', './topgo_out/GO_results_contracted_genes.csv', allgenes, annAT)
   
   #Run Revigo script in R
   write.csv((results_list[["GO_results_contracted_genes.csv"]] %>% dplyr::filter(Eliminated == FALSE) %>% dplyr::select(TermID, Name, Value, Frequency) %>% arrange(Value)), "Revigo_contracted_genes.csv")

Genes in expanded and contracted gene families in Proteaceae IDs of expanded gene families can be retrieved and then used similarly to the above methods to filter Orthogroups.txt and extract a list of expanded genes.

   #Append IDs of large filtered families
   cat Base_change.tab | cut -f 1,12 | grep "+" | grep -v "+0" | cut -f 1 >> ../k2_e_1/proteaceae_expanded_orthogroups.ids

Similarly, for contracted genes

   #Append ids of large filtered families 
   cat Base_change.tab | cut -f 1,12 | grep "\-" | cut -f 1 >>../k2_e_1/proteaceae_contracted_orthogroups.ids

9.2_GO_enrichment - bennestor/hakea_genome GitHub Wiki

GO enrichment

⚠️ GitHub.com Fallback ⚠️

9.2_GO_enrichment - bennestor/hakea_genome GitHub Wiki

GO enrichment

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️