4.4.7 Motif discovery - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Janu 21th, 2023, 1 version

Introduction

The sequences generated by GPro can be used to perform de novo TF motif discovery or to test the enrichment of known TF motifs from motif databases using MEME Suite [1]. The MEME Suite is a powerful, integrated set of web-based tools for studying sequence motifs in proteins, DNA, and RNA [1]. Its functions include motif discovery, motif enrichment, motif scanning, motif comparison, etc.

As an example, we used SEA [2] to perform the enrichment of known TF motifs and MEME [3] for de novo motif discovery. We start with sequences generated by the methods described in Quick Start.

Guidance for SEA

After executing the code in Quick Start, we obtain the following results:

/sampleFolder
    ├── checkpoints
    │   ├── cnn_k15
    │   ├── wgan
    ├── optimization
    │   ├── Filter
            ├── ExpIter.csv
            ├── ExpIter.txt
            ├── compared_with_natural.pdf
    │── └── Gradient
            ├── ...
    ├── evaluation
    │   ├── kmer_WGAN.png
    │   ├── mutagenesis_CNNK15.png
    │   ├── regression_CNNK15.png
    │   ├── saliency_CNNK15.png
    │   ├── seqlogo_CNNK15.png
    │   ├── seqs.txt
    └── └── pred.txt

We take the sequences generated by the Filter optimizer as an example. These sequences are stored in /sampleFolder/optimization/Filter/ExpIter.txt in the following fasta format:

>0
ACTTGCTGCAAAAATTTGCTTGTCATGTTGCTTTTGCTTACCATCTTGTC
>1
GTTACAAGTAGCGCCTTGCTTTTCACTTCAGCTGTTGCTAAGGTGTATCG
>2
TGTGCTCTGAAAGTCTGGCTTTTGACACTGTTTTCTGCTATAACTTATTC
>3
TCTGTCTTAAGGCGTTGACATTTTATTTTGCTATCCGTCGTAACTCTGCT
>4
TTCTGCCTTCCTATGTTGCTACCCGCTTTCTTTATGTTATCATAAACTGC
>5
GTGTGCGTATTTTTGTTGCTAATTCGGCGTTTTGTGATAAAATTCCGGCT

Then, ExpIter.txt can be used as the input for SEA software. We used the web server of SEA, with default settings except in the sections Input the sequences and Input the motifs. We set ExpIter.txt in Input the sequences and E.coli DNA and DPINTERACT in Input the motifs, as shown in the figure.

After waiting for about 20 seconds, the SEA will give the motif enrichment results, as shown in the figure.

Guidance for MEME

Similar to the steps in SEA, we perform de novo motif discovery using MEME. Here, we also use the sequences generated by the Filter optimizer. The only thing we need to do is set ExpIter.txt in the Input the primary sequences section, as shown in the figure.

The process will take about half an hour. Then, the motif discovered by MEME will be presented, as shown in the figure.

citations

[1] Bailey T L, Johnson J, Grant C E, Noble WS. The MEME suite. Nucleic acids research. 2015 Jul 1;43(W1):W39-49.
[2] Bailey T L, Grant C E. SEA: simple enrichment analysis of motifs. BioRxiv. 2021 Aug 24:2021-08.
[3] Bailey T L, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in bipolymers.
⚠️ **GitHub.com Fallback** ⚠️