4.4.7 Motif discovery - WangLabTHU/GPro GitHub Wiki
hcwang and qxdu edited on Janu 21th, 2023, 1 version
The sequences generated by GPro can be used to perform de novo TF motif discovery or to test the enrichment of known TF motifs from motif databases using MEME Suite [1]. The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA, and RNA [1]. Its functions include motif discovery, motif enrichment, motif scanning, motif comparison, etc.
As an example, we used SEA [2] to perform the enrichment of known TF motifs and MEME [3] for de novo motif discovery. We start with sequences generated by the methods described in Quick Start.
After executing the code in Quick Start, we obtain the following results:
/sampleFolder
├── checkpoints
│ ├── cnn_k15
│ ├── wgan
├── optimization
│ ├── Filter
├── ExpIter.csv
├── ExpIter.txt
├── compared_with_natural.pdf
│── └── Gradient
├── ...
├── evaluation
│ ├── kmer_WGAN.png
│ ├── mutagenesis_CNNK15.png
│ ├── regression_CNNK15.png
│ ├── saliency_CNNK15.png
│ ├── seqlogo_CNNK15.png
│ ├── seqs.txt
└── └── pred.txt
We take the sequences generated by the Filter optimizer as an example. These sequences are stored in /sampleFolder/optimization/Filter/ExpIter.txt in the following fasta format:
>0
ACTTGCTGCAAAAATTTGCTTGTCATGTTGCTTTTGCTTACCATCTTGTC
>1
GTTACAAGTAGCGCCTTGCTTTTCACTTCAGCTGTTGCTAAGGTGTATCG
>2
TGTGCTCTGAAAGTCTGGCTTTTGACACTGTTTTCTGCTATAACTTATTC
>3
TCTGTCTTAAGGCGTTGACATTTTATTTTGCTATCCGTCGTAACTCTGCT
>4
TTCTGCCTTCCTATGTTGCTACCCGCTTTCTTTATGTTATCATAAACTGC
>5
GTGTGCGTATTTTTGTTGCTAATTCGGCGTTTTGTGATAAAATTCCGGCT
Then, ExpIter.txt can be used as the input for SEA software. We used the web server of SEA, with default settings except in the sections Input the sequences and Input the motifs. We set ExpIter.txt in Input the sequences and E.coli DNA and DPINTERACT in Input the motifs, as shown in the figure.
After waiting for about 20 seconds, the SEA will give the motif enrichment results, as shown in the figure.
Similar to the steps in SEA, we perform de novo motif discovery using MEME. Here, we also use the sequences generated by the Filter optimizer. The only thing we need to do is set ExpIter.txt in the Input the primary sequences section, as shown in the figure.
The process will take about half an hour. Then, the motif discovered by MEME will be presented, as shown in the figure.
[1] Bailey T L, Johnson J, Grant C E, Noble WS. The MEME suite. Nucleic acids research. 2015 Jul 1;43(W1):W39-49.
[2] Bailey T L, Grant C E. SEA: simple enrichment analysis of motifs. BioRxiv. 2021 Aug 24:2021-08.
[3] Bailey T L, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in bipolymers.