Batch processing of many samples - linzhi2013/MitoZ GitHub Wiki
Say, you have raw data files that look like this:
$ ls /abspath/to/fastq/sampleID*.fq.gz
/abspath/to/fastq/sampleID_1.1.fq.gz
/abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz
/abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz
/abspath/to/fastq/sampleID_3.2.fq.gz
then you can do:
$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID
$ ls /abspath/to/fastq/sampleID*.fq.gz | awk 'NR%2{printf "%s ",$0;next;}1' > sample_fq.list
$ cat sample_fq.list
/abspath/to/fastq/sampleID_1.1.fq.gz /abspath/to/fastq/sampleID_1.2.fq.gz
/abspath/to/fastq/sampleID_2.1.fq.gz /abspath/to/fastq/sampleID_2.2.fq.gz
/abspath/to/fastq/sampleID_3.1.fq.gz /abspath/to/fastq/sampleID_3.2.fq.gz
$ cat sample_fq.list | perl -ne '
chomp;
my @a=split /\s+/; # to split the path of fq1 and fq2 into array @a
my $b=(split /\//, $a[0])[-1]; # to get the basename of fq1, e.g. "sampleID_1.1.fq.gz"
my $sample=(split /\./, $b)[0]; # to extract the sample ID, e.g. ""sampleID_1". You might need to change this.
mkdir $sample;
chdir $sample;
`echo "mitoz all --fq1 $a[0] --fq2 $a[1] --outprefix $sample --thread_number 8 --clade Chordata --genetic_code 2 --insert_size 250 --fastq_read_length 150 --assembler mitoassemble --kmers 51,71,91 --requiring_taxa Chordata" > mitoz.sh`;
`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ;
chdir "../" ; '
qsub
command
About the You can make your jobs run on specific nodes, or exclude some nodes.
# to run on only these nodes
qsub -cwd -l vf=100g -l h='(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh
# or exclude these nodes
qsub -cwd -l vf=100g -l h='!(node1|node2|node3)' -q all.q -pe smp 8 mitoz.sh
Batch annotation
If you want to annotate
a lot of samples, you can provide many fasta files to the --fastafiles
option of the mitoz annotate
command, e.g.
--fastafiles /abspath/to/mitogenome/sampleID_1.fasta /abspath/to/mitogenome/sampleID_2.fasta
see https://github.com/linzhi2013/MitoZ/wiki/The-'annotate'-subcommand.
Or you can do similar things as the above:
Say, you have fasta files that look like this:
$ ls /abspath/to/fastq/sampleID*.fasta
/abspath/to/mitogenome/sampleID_1.fasta
/abspath/to/mitogenome/sampleID_2.fasta
/abspath/to/mitogenome/sampleID_3.fasta
then you can do:
$ mkdir -p /my/workdir/projectID
$ cd /my/workdir/projectID
$ ls /abspath/to/fastq/sampleID*.fasta > fasta.list
$ cat fasta.list | perl -ne '
chomp;
my $b=(split /\//, $_)[-1]; # to get the basename, e.g. "sampleID_1.fasta"
my $sample=(split /\./, $b)[0]; # to get the sample ID, e.g. "sampleID_1". You may need to change this.
mkdir $sample;
chdir $sample;
`echo "mitoz annotate --outprefix $sample --fastafiles $_ --thread_number 8 --clade Chordata --requiring_taxa Chordata" > mitoz.sh`;
`qsub -cwd -l vf=100g -q all.q -pe smp 8 mitoz.sh` ;
chdir "../" ; '