De novo assembly phylogenetics - CGRL-QB3-UCBerkeley/seqCapture GitHub Wiki
Introduction:
This module uses cleaned reads generated by cleanpe
as input and assemble them by SPAdes. SPAdes makes a combined final assembly using a combination of multi-kmer lengths (default: 21,33 and 55). For phylogenetic datasets, each individual needs to be assembled, respectively.
Command and options:
(seqCapture) $ seqCapture assemble
Usage: seqCapture assemble [options]
Options:
-reads DIR Directory with all sequence reads
-kmer INT,INT,INT... Kmer lengths chosen for SPAde assemblies
[21,33,55] (no space)
-lib CHAR ... Particular libraries to process?
(e.g. AAA BBB CCC). If -lib is not
used then process all libraries in
The folder (-reads)
-out CHAR Directory where results will go
-np INT number of processors used for assembly
Prepare input for the run:
After finishing running cleanpe, the cleaned reads of each samples are stored in diretory "cleaned_reads_dir", which is the input for this step.
(seqCapture) $ ls cleaned_reads_dir/
Sample1_1_final.fq Sample1_2_final.fq Sample1_u_final.fq Sample1.contam.out Sample2_1_final.fq Sample2_2_final.fq Sample2_u_final.fq Sample2.contam.out ...... SampleN_1_final.fq SampleN_2_final.fq SampleN_u_final.fq SampleN.contam.out
Usage examples:
Assembling each and all samples in "cleaned_reads_dir" and store raw assemblies for each sample in "raw_assemblies_dir"; choosing kmer lengths of 21, 33, 55, 77, 99, and 127 (-kmer 21,33,55,77,99,127
); allocating 10 cpus (-n 10
) for SPAdes assemblies.
(seqCapture) $ seqCapture assemble -reads /path/to/cleaned_reads_dir/ -kmer 21,33,55,77,99,127 -out raw_assemblies_dir -np 10
Output
In "raw_assemblies_dir" individual assemblies are stored:
(seqCapture) $ ls raw_assemblies_dir/
Sample1.fasta Sample2.fasta ...... SampleN.fasta