From ontology annotation results to enrichment computation - wdlingit/cop GitHub Wiki

Based on a whole genome functional annotation table, it would be great to extract ontological information into files in GMT for GSEA. This would be helpful for doing downstream analysis like GSEA or ontology enrichment.

From functional annotation tables to gene-ontologyID entries

Suppose that we have a table made by using Trinotate that is with many columns of different kinds of annotation.

ubuntu@ubuntu20:~$ head -1 myTrinotate.tsv | perl -ne 'chomp; @t=split(/\t/); for($i=0;$i<@t;$i++){ print "$i\t$t[$i]\n" }'
0       #gene_id
1       transcript_id
2       sprot_Top_BLASTX_hit
3       infernal
4       prot_id
5       prot_coords
6       sprot_Top_BLASTP_hit
7       Pfam
8       SignalP
9       TmHMM
10      eggnog
11      Kegg
12      gene_ontology_BLASTX
13      gene_ontology_BLASTP
14      gene_ontology_Pfam
15      EggNM.seed_ortholog
16      EggNM.evalue
17      EggNM.score
18      EggNM.eggNOG_OGs
19      EggNM.max_annot_lvl
20      EggNM.COG_category
21      EggNM.Description
22      EggNM.Preferred_name
23      EggNM.GOs
24      EggNM.EC
25      EggNM.KEGG_ko
26      EggNM.KEGG_Pathway
27      EggNM.KEGG_Module
28      EggNM.KEGG_Reaction
29      EggNM.KEGG_rclass
30      EggNM.BRITE
31      EggNM.KEGG_TC
32      EggNM.CAZy
33      EggNM.BiGG_Reaction
34      EggNM.PFAMs
35      transcript
36      peptide

Suppose that we would like to extract GO annotations (columns 12,13,14 and 23) and KEGG Pathway/Module annotations (columns 26 & 27) for further use. The first thing to do is to extract them into a list of gene-ontologyID entries. For the GO part, we did the following.

ubuntu@ubuntu20:~$ cat myTrinotate.tsv | perl -ne 'chomp; @t=split(/\t/); print "$t[0]\t".join("\t",@t[12..14,23])."\n"' | perl -ne 'chomp; @t=split(/\t/); $id=shift @t; for $x (@t){ while($x=~/(GO:\d{7})/g){ print "$id\t$1\n" } }' > myGO.txt

ubuntu@ubuntu20:~$ head myGO.txt
Species_18805  GO:0016779
Species_18770  GO:0003677
Species_18770  GO:0006310
Species_18770  GO:0015074
Species_18750  GO:0005524
Species_18750  GO:0003677
Species_18750  GO:0009035
Species_18750  GO:0009307
Species_18750  GO:0005524
Species_18750  GO:0003677

ubuntu@ubuntu20:~$ cat myGO.txt | perl -ne '@t=split; $hash{$t[0]}=1; if(eof){ $cnt=keys %hash; print "$cnt\n" }'
2277

The first command is composed of two piped perl one-liners. The first one-liner perl -ne 'chomp; @t=split(/\t/); print "$t[0]\t".join("\t",@t[12..14,23])."\n"' is to extract gene IDs and corresponding GO informations into lines. The second one-liner is to take the first token from every line as the gene ID and extract every GOIDs from rest tokens and print gene ID-GOID entries. The last perl one-liner showed that 2277 genes were associated with at least one GO terms.

For the KEGG part, we did the following.

ubuntu@ubuntu20:~$ cat myTrinotate.tsv | perl -ne 'next if $.==1; chomp; @t=split(/\t/); print "$t[0]\t$t[26]\n" if $t[26] ne "."' | perl -ne 'chomp; @t=split(/\t/); @s=split(/,/,$t[1]); for $x (@s){ print "$t[0]\t$x\n" if $x=~/^map/ }' > myKO.txt

ubuntu@ubuntu20:~$ cat myTrinotate.tsv | perl -ne 'next if $.==1; chomp; @t=split(/\t/); print "$t[0]\t$t[27]\n" if $t[27] ne "."' | perl -ne 'chomp; @t=split(/\t/); @s=split(/,/,$t[1]); for $x (@s){ print "$t[0]\t$x\n" }' >> myKO.txt

ubuntu@ubuntu20:~$ head myKO.txt
Species_18635  map00230
Species_18635  map00240
Species_18635  map01100
Species_18635  map03030
Species_18635  map03410
Species_18635  map03420
Species_18635  map03440
Species_18705  map00230
Species_18705  map00240
Species_18705  map01100

ubuntu@ubuntu20:~$ cat myKO.txt | perl -ne '@t=split; $hash{$t[0]}=1; if(eof){ $cnt=keys %hash; print "$cnt\n" }'
1106

The first command is to skip the first line and extract column 26 as avoiding no data (.). KEGG IDs in this column that started with k were found to be duplicated with those started with map so only those entries with map12345 were outputted (if $x=~/^map/). The second command is to skip the first line and extract column 27 as avoiding no data (.). Note that the output was appended (>>) to the previous output file.

Transfer the gene-GOID entries into a GOBU tree

(You may consider this part as a self-advertisement)

As we mentioned to transfer information into GMT format, we are not going to do this for the GO part because I would like to honor the true path rule (geneX is associated with GO_A AND GO_A is_a child of GO_B => geneX is associated with GO_B). To do that, we build a GOBU tree and do corresponding GO computation using GOBU.

Download the software and extract it. Make sure that java 8 or above is available.

ubuntu@ubuntu20:~$ wget https://downloads.sourceforge.net/project/gobu/20230301/gobu20230301.zip

ubuntu@ubuntu20:~$ unzip gobu20230301.zip

The first step to build the GOBU tree is to build a backbone (myTree) and prepare leaves (myGO) for attaching them to the backbone.

ubuntu@ubuntu20:~$ cat myTrinotate.tsv | perl -ne 'next if $.==1; @t=split; print "reference\tRP:$t[0]\n"' > myTree.0.m

ubuntu@ubuntu20:~$ cat myGO.txt | perl -ne 'chomp; @t=split(/\t/); print "RP:$t[0]\t$t[1]\n"' > myGO.m

The GOBU java commands need to be run in the install(extracted) directory. The first command is to transfer the tab-delimited file (myTree.0.m) into a tree file.

ubuntu@ubuntu20:~$ cd gobu/

ubuntu@ubuntu20:~/gobu$ java -classpath gobu.jar bio301.goutil.gobu.data.TreeFileMaker -I ../myTree.0.m -O ../myTree.0.tree -OBO data/gene_ontology.1_2.obo
program: TreeFileMaker
input filename (-I): ../myTree.0.m
output filename (-O): ../myTree.0.tree
OBO filename (-OBO): data/gene_ontology.1_2.obo

The second command is to attach the GO annotation leaves to the tree.

ubuntu@ubuntu20:~/gobu$ java -classpath gobu.jar bio301.goutil.gobu.data.AddAnnotation -I ../myTree.0.tree -S ../myGO.m -O ../myTree.1.tree -OBO data/gene_ontology.1_2.obo
program: AddAnnotation
input filename (-I): ../myTree.0.tree
subtree filename (-S): ../myGO.m
output filename (-O): ../myTree.1.tree
OBO filename (-OBO): data/gene_ontology.1_2.obo
coverage check (-C): false

This last GOBU command is optional. This is to sort the GO annotation items in user data to follow their actual order in the GO hierarchy. Sorting them would make it rather comfortable when browsing the GOBU tree in the GOBU program. The program would report GOIDs (in numbers) that were not found in the GO OBO file. This would be due to the inconsistency between the annotation database and the GO vocabulary database. As we have 36198 GO associations in this case, the number of missing GOIDs is relatively small and acceptable.

ubuntu@ubuntu20:~/gobu$ java -classpath gobu.jar bio301.goutil.gobu.data.TreeSorter -I ../myTree.1.tree -O ../myTree.sorted.tree -OBO data/gene_ontology.1_2.obo
program: TreeSorter
input filename (-I): ../myTree.1.tree
output filename (-O): ../myTree.sorted.tree
OBO filename (-OBO): data/gene_ontology.1_2.obo

55114 not in GO tree, its parent: Species_00040
5623 not in GO tree, its parent: Species_00310
18065 not in GO tree, its parent: Species_01190
6732 not in GO tree, its parent: Species_01385
5451 not in GO tree, its parent: Species_01470
8144 not in GO tree, its parent: Species_01820
48037 not in GO tree, its parent: Species_02080
51186 not in GO tree, its parent: Species_04510
44421 not in GO tree, its parent: Species_06450
44425 not in GO tree, its parent: Species_07000
70035 not in GO tree, its parent: Species_07020
51181 not in GO tree, its parent: Species_07045
51704 not in GO tree, its parent: Species_09560
453 not in GO tree, its parent: Species_14320

Transfer the gene-KEGG ID entries into a GMT file

Recall that there is a description field in the GMT format and we got only KEGG IDs in file myKO.txt. We applied the KEGG REST API for getting names of KEGG IDs.

ubuntu@ubuntu20:~$ cat myKO.txt | perl -ne 'chomp; @t=split; $hash{$t[1]}=1; if(eof){ for $x (sort keys %hash){ print "$x\n"} }' | perl -ne 'chomp; $restURL="https://rest.kegg.jp/get/$_"; $cmd="wget -q -O /dev/stdout $restURL"; $msg=`$cmd`; $name=""; if($msg=~/^NAME\s+(.+)$/m){ $name="$1" } print "$_\t$name\n"; sleep 1' > KO.names

The first piped perl one-liner is to output a non-redundant list of KEGG IDs for query because every calling of the KEGG REST API takes a few seconds. The second piped perl one-liner is to use wget to access the API and extract NAME from the output. In this practice, a number of KEGG IDs were found without API output and they were found not in the database. This might be due to the inconsistency between the annotation database and the KEGG database. Note that the KEGG server has some kind of DOS avoiding mechanism so we use sleep 1 to make 1 second intervals between queries.

The next command collects gene IDs with the same KEGG IDs into lists, and then output lines of KEGG ID, KEGG NAME, and gene IDs (the GMT format).

ubuntu@ubuntu20:~$ cat myKO.txt | perl -ne 'if($.==1){ open(FILE,"<KO.names"); while($line=<FILE>){ chomp $line; @s=split(/\t/,$line); $defHash{$s[0]}=$s[1]; } close FILE } chomp; @t=split(/\t/); $hash{$t[1]}{$t[0]}=1; if(eof){ for $set (sort keys %hash){ @arr=keys %{$hash{$set}}; print "$set\t$defHash{$set}\t".join("\t",@arr)."\n" if length($defHash{$set})>0 } }' > KO.gmt

This file can be used as a database file for GSEA. Note that this described procedure can be similarly applied to other annotation categories.

Extract gene lists for parameter investigation

Assume that we have a file DEG.tsv for differential comparison results and we are not sure which p-value threshold would be appropriate. In this case, generate a few DEG lists based on different thresholds and investigate their enrichment results would be helpful.

ubuntu@ubuntu20:~$ head -1 DEG.tsv | perl -ne 'chomp; @t=split(/\t/); for($i=0;$i<@t;$i++){ print "$i\t$t[$i]\n" }'
0       #geneID
1       Length
2       ctrl_rep1
3       ctrl_rep2
4       trt_rep1
5       trt_rep2
6       weightedMultiRatio
7       T-TEST
8       adjP
9       regulation
10      logFC

ubuntu@ubuntu20:~$ mkdir Lists

ubuntu@ubuntu20:~$ ls DEG.tsv | perl -ne 'chomp; /(.+)\./; print "!!!FILE: $1\n"; system "cat $_"; print "!!!END\n"' | perl -ne 'chomp; if(/^!!!FILE: (\S+)/){ $f=$1; %pHash=(); }elsif(/^#/){}elsif(/^!!!END/){ for $pTh (0.05,0.01,0.005,0.001){ open(FILE,">Lists/$f.$pTh.list"); for $g (sort keys %pHash){ print FILE "$g\n" if $pHash{$g}[0]<=$pTh } close FILE; } }else{ @t=split(/\t/); push @{$pHash{$t[0]}},$t[7] if (length($t[0])>0) && (abs($t[10])>=1) }'

ubuntu@ubuntu20:~$ wc -l Lists/*.list
   2 Lists/DEG.0.001.list
  11 Lists/DEG.0.005.list
  22 Lists/DEG.0.01.list
 128 Lists/DEG.0.05.list
 163 total

The first piped perl one-liner is to send contents of DEG table(s) together with table file information. This is for processing multiple DEG tables at one time. Given that the coding is correct, it would be better to avoid inputting commands several times to prevent manual mistakes. The second piped perl one-liner is to recognize source filenames of the tables (if(/^!!!FILE: (\S+)/){ $f=$1; %pHash=(); }), store p-value information into a hash table (push @{$pHash{$t[0]}},$t[7]) if abs(log2-fold-change)>=1, and save different lists in Lists directory for different p-value threshold.

Enrichment computation, GO

To compute GO enrichment using GOBU in command line (instead of using GUI), we will have to further append list information in front of gene IDs.

ubuntu@ubuntu20:~$ ls Lists/*.list | perl -ne 'chomp; /.+\/(.+)\.list/; $name=$1; print "!!!FILE: $name\n"; system "cat $_"' | perl -ne 'chomp; if(/^!!!FILE:\s(\S+)/){ $tag=$1; open(FILE,">Lists/$1.grouplist"); }else{ chomp; print FILE "$tag\t$_\n" }'

ubuntu@ubuntu20:~$ wc -l Lists/*
   2 Lists/DEG.0.001.grouplist
   2 Lists/DEG.0.001.list
  11 Lists/DEG.0.005.grouplist
  11 Lists/DEG.0.005.list
  22 Lists/DEG.0.01.grouplist
  22 Lists/DEG.0.01.list
 128 Lists/DEG.0.05.grouplist
 128 Lists/DEG.0.05.list
 326 total

ubuntu@ubuntu20:~$ diff Lists/DEG.0.001.grouplist Lists/DEG.0.001.list
1,2c1,2
< DEG.0.001     Species_16535
< DEG.0.001     Species_16895
---
> Species_16535
> Species_16895

The .grouplist files are for saving different selections into one single file. For this walkthrough, we save them separately for other purpose. The next command is for doing GO enrichment on the .grouplist files.

ubuntu@ubuntu20:~$ mkdir Counts
ubuntu@ubuntu20:~$ cd gobu/

ubuntu@ubuntu20:~/gobu$ ls ../Lists/*.grouplist | perl -ne 'chomp; /.+\/(.+)\.grouplist/; $cmd="java -classpath gobu.jar:plugins/multiview.jar bio301.goutil.gobu.data.GOEnrichment -R ../myTree.sorted.tree -M $_ -OBO data/gene_ontology.1_2.obo -O ../Counts/$1.go.out"; print "\nCMD: $cmd\n"; #system $cmd'

CMD: java -classpath gobu.jar:plugins/multiview.jar bio301.goutil.gobu.data.GOEnrichment -R ../myTree.sorted.tree -M ../Lists/DEG.0.001.grouplist -OBO data/gene_ontology.1_2.obo -O ../Counts/DEG.0.001.go.out

CMD: java -classpath gobu.jar:plugins/multiview.jar bio301.goutil.gobu.data.GOEnrichment -R ../myTree.sorted.tree -M ../Lists/DEG.0.005.grouplist -OBO data/gene_ontology.1_2.obo -O ../Counts/DEG.0.005.go.out

CMD: java -classpath gobu.jar:plugins/multiview.jar bio301.goutil.gobu.data.GOEnrichment -R ../myTree.sorted.tree -M ../Lists/DEG.0.01.grouplist -OBO data/gene_ontology.1_2.obo -O ../Counts/DEG.0.01.go.out

CMD: java -classpath gobu.jar:plugins/multiview.jar bio301.goutil.gobu.data.GOEnrichment -R ../myTree.sorted.tree -M ../Lists/DEG.0.05.grouplist -OBO data/gene_ontology.1_2.obo -O ../Counts/DEG.0.05.go.out

(surely the sharp mark was removed for executing the commands)

So we have a number of GO enrichment output tables by inputting different DEG lists. They are all in tab-delimited format and columns mean:

  1. group: source list name, would be useful when the computed .grouplist file contains multiple lists
  2. Term Name for GO term name
  3. GOID
  4. Term Type: in (C)omponent, (F)unction, or (P)rocess
  5. Depth of the GO term in the GO hierarchy
  6. cnt (the first one): number of whole genome genes associated with this GO term
  7. cnt (the second one): number of input genes associated with this GO term
  8. bin: classical enrichment p-values made by applying the binomial model
  9. fsh: classical enrichment p-values made by applying fisher exact test (special thanks to Dr. Oyvind Langsrud for his fast code for fisher exact test)
  10. elim: enrichment p-values by applying TopGO elim method (PMID: 16606683)

Enrichment computation, KEGG

For the KEGG part, we have a GMT file and DEG lists. We apply GmtIntersectFisher.pl in this repository for classical enrichment. Running this script requires a java class from rackj for fisher exact test. Note that 3659 is the number of whole genome genes of this case.

ubuntu@ubuntu20:~$ wget https://downloads.sourceforge.net/project/rackj/0.99a/rackJ.tar.gz
ubuntu@ubuntu20:~$ tar -zxvf rackJ.tar.gz

ubuntu@ubuntu20:~$ git clone https://github.com/wdlingit/cop.git

ubuntu@ubuntu20:~$ ./cop/scripts/GmtIntersectFisher.pl
Usage: GmtIntersect4fisher.pl <geneFile> <GMT> <bkgNumber> <rackjJARpath>

ubuntu@ubuntu20:~$ ls Lists/*.list | perl -ne 'chomp; /.+\/(.+)\.list/; $cmd="./cop/scripts/GmtIntersectFisher.pl $_ KO.gmt 3659 rackJ/rackj.jar > Counts/$1.ko.out"; print "$cmd\n"; #system $cmd'
./cop/scripts/GmtIntersectFisher.pl Lists/DEG.0.001.list KO.gmt 3659 rackJ/rackj.jar > Counts/DEG.0.001.ko.out
./cop/scripts/GmtIntersectFisher.pl Lists/DEG.0.005.list KO.gmt 3659 rackJ/rackj.jar > Counts/DEG.0.005.ko.out
./cop/scripts/GmtIntersectFisher.pl Lists/DEG.0.01.list KO.gmt 3659 rackJ/rackj.jar > Counts/DEG.0.01.ko.out
./cop/scripts/GmtIntersectFisher.pl Lists/DEG.0.05.list KO.gmt 3659 rackJ/rackj.jar > Counts/DEG.0.05.ko.out

(again, we surely removed the sharp mark for executing the commands)

The output .ko.out files have no header lines. Their column meaning:

  1. KEGG ID
  2. NAME of the KEGG ID
  3. number of input genes associated with this KEGG ID
  4. number of non-input genes (whole genome exclude input) associated with this KEGG ID
  5. number of input genes not associated with this KEGG ID
  6. number of non-input genes not associated with this KEGG ID
  7. fisher exact test p-value, left tailed
  8. fisher exact test p-value, right tailed (the enrichment p-value)
  9. fisher exact test p-value, two tailed

Summary of the enrichment results, heatmaps

With these enrichment tables, the plan is to combine them to show which ontology terms were enriched under which threshold setting. Heatmap is a feasible visualization for this idea. We use genOntoHeatmapData.pl, which is also from this repository, for generating heatmap data matrix with p-values transformed by -log10(p).

ubuntu@ubuntu20:~$ ./cop/scripts/genOntoHeatmapData.pl
Usage: genOntoHeatmapData.pl [-term <string>] [-x <p-value cutoff>] <p-value column> [<input name> <input file>]+
     -x <float>     : p-value cutoff (default: 0.05)
     -term <string> : combination of columns for term definitions (default: "1")

ubuntu@ubuntu20:~$ ./cop/scripts/genOntoHeatmapData.pl -term 3,2 -x 0.01 9 DEG_0.05 Counts/DEG.0.05.go.out DEG_0.01 Counts/DEG.0.01.go.out DEG_0.005 Counts/DEG.0.005.go.out DEG_0.001 Counts/DEG.0.001.go.out > heatmap.go.txt

ubuntu@ubuntu20:~$ ./cop/scripts/genOntoHeatmapData.pl -term 1,2 8 DEG_0.05 Counts/DEG.0.05.ko.out DEG_0.01 Counts/DEG.0.01.ko.out DEG_0.005 Counts/DEG.0.005.ko.out DEG_0.001 Counts/DEG.0.001.ko.out > heatmap.ko.txt

ubuntu@ubuntu20:~$ wc -l heatmap.*.txt
  56 heatmap.go.txt
  19 heatmap.ko.txt
  75 total

The first execution of genOntoHeatmapData.pl is for generating heatmap matrix data of GO enrichment results, where the 9th column (fsh) is for the enrichment p-values to be visualized in the heatmap. Option -term 3,2 is to specify the combination of GOID_GO Term as row names in the matrix data. Option -x 0.01 is to constrain that only GO terms with enrichment p-value<=0.01 in some DEG list would be included in the heatmap data. In other words, GO terms without any significant p-values (<=0.01) will not be included. It is advisable to adjust -x to obtain a heatmap matrix data of an appropriate size.

The second execution of genOntoHeatmapData.pl is for KEGG enrichment results, where the 8th column (right tailed fisher exact test p-value) is for the p-values to be visualized. Option -term 1,2 is to specify the combination of KEGG ID_KEGG NAME as row names in the matrix data.

By standardizing the enrichment results from different annotation categories in to the same heatmap matrix data, the R script heatmap.R can be used to draw the heatmaps. To use it, make sure R is available with R packages pheatmap and RColorBrewer installed. It is strongly suggested to modify the command inside heatmap.R to fit your needs.

ubuntu@ubuntu20:~$ mkdir Heatmap

ubuntu@ubuntu20:~$ ls heatmap.*.txt | perl -ne 'chomp; /(.+)\.txt/; $cmd="Rscript cop/scripts/heatmap.R $_ Heatmap/$1.png"; print "\nCMD: $cmd\n"; #system $cmd'

CMD: Rscript cop/scripts/heatmap.R heatmap.go.txt Heatmap/heatmap.go.png

CMD: Rscript cop/scripts/heatmap.R heatmap.ko.txt Heatmap/heatmap.ko.png

(remove the sharp mark for executing the commands)

The following two pictures are the two heatmaps. It can be observed that some ontology terms are mostly enriched in different DEG lists. This somehow tells us functional enrichment in different scales. Controlling the gene lists by parameters like fold-changes and/or regulation direction would surely give us other perspectives.

GO heatmap KEGG heatmap

Summary of the enrichment results, dataframe => point plots

The script genOntoDataframe.pl can be use to generate a dataframe of the enrichment results so that they can be further adopted by other tools. In this example, we applied genOntoDataframe.pl for making point using the R ggplot2 package.

ubuntu@ubuntu20:~$ ./cop/scripts/genOntoDataframe.pl
Usage: genOntoDataframe.pl [options] [<input name> <input file>]+
     -filter <col> <filter> : column to be filtered by the filter (default: none)
     -term <string>         : combination of columns for term definitions (default: "1")
     -data <col> <header>   : assign data column and header name (default: no columns)

ubuntu@ubuntu20:~$ ./cop/scripts/genOntoDataframe.pl -term 3,2 -data 9 pVal -data 7 count -filter 7 ">1" -filter 9 "<0.01" DEG_0.05 Counts/DEG.0.05.go.out DEG_0.01 Counts/DEG.0.01.go.out DEG_0.005 Counts/DEG.0.005.go.out DEG_0.001 Counts/DEG.0.001.go.out > dataframe.go.txt

ubuntu@ubuntu20:~$ head dataframe.go.txt
TERM    SET     pVal    count
GO:0004129_cytochrome-c oxidase activity        DEG_0.05        0.0035598925574887356   2
GO:0004386_helicase activity    DEG_0.05        0.008814439484553419    4
GO:0004386_helicase activity    DEG_0.01        0.00879306622578951     2
GO:0004386_helicase activity    DEG_0.005       0.06992305632225482     1
GO:0005215_transporter activity DEG_0.05        0.004705390123695814    15
GO:0005215_transporter activity DEG_0.01        0.12205315350291851     3
GO:0005215_transporter activity DEG_0.005       0.12318805016331086     2
GO:0005215_transporter activity DEG_0.001       0.10892799213816683     1
GO:0005886_plasma membrane      DEG_0.05        0.008183812329780885    27

Descriptions on the options:

  1. -term 3,2: use the combination of "columns 3+2" as the TERM name. Note that we are extracting data from GO enrichment tables so "columns 3+2" means GOID+GO term.
  2. -data 9 pVal: append a column with header pVal where the data cells are the fisher exact test enrichment p-values
  3. -data 7 count: append a column with header count where the data cells are numbers of input genes associated with the GO term
  4. -filter 7 ">1": extract only terms with number of associated input genes>1 in some gene list
  5. -filter 9 "<0.01": extract only terms with a fisher exact test enrichment p-value<0.01 in some gene list
  6. DEG_0.05 Counts/DEG.0.05.go.out...: use DEG_0.05 as SET for data from file Counts/DEG.0.05.go.out

Accordingly, the output file can be read by the R script point.R (requires ggplot2) and transformed into a point plot. The point plot would be organized with ontology terms as rows and gene lists as columns. The points would be sized by the intersection gene size and colored by enrichment p-avlues (afater -log10(p) transform).

ubuntu@ubuntu20:~$ Rscript cop/scripts/point.R dataframe.go.txt point.go.png

GO point plot

⚠️ **GitHub.com Fallback** ⚠️