data - gsudre/autodenovo GitHub Wiki

09/13/2017

I didn't want to wait for bcbio to finish running to get the final files aligned to hg38. So, I just created symlinks in fake_trios/bam for the work/ files. Need to update them later!

while read s; do ln -s /data/NCR_SBRB/fake_trios/gatk_unique_and_population/work/align/${s}/${s}-sort.bam ${s}-sort.bam; done < ../sample_ids.txt

09/20/2017

So I could take advantage of tools like XHMM, and to test my scripts, I handpicked most (if not all) of the possible trios and quartets in my family data, and called them big_fake_simplex. I then adapted Sijun's script to call variants in single samples, and will run joint calling later. My goal here is to make those joint calling files and be able to use that with trioDenovo, and the BAMs with the other CNV tools.

Note that I could also use the mpileup vcf from denovogear in triodenovo, or even the vcf from GATK in denovogear. So, these are combinations of things to try...

In any case, this is the swarm:

while read s; do echo "bash ~/autodenovo/gatk_upToSingleCalls.sh $s" >> swarm.bfs; done < sample_ids.txt
swarm -f swarm.bfs -t 16 -g 55 --job-name gatk1 --logdir trash --time=48:00:00 --gres=lscratch:100

09/22/2017

Then, to figure out who finished:

while read s; do if [ -e VCF/${s}/${s}.g.vcf.idx ]; then echo $s; fi; done < sample_ids.txt

09/25/2017

Now that everyone finished running, it's time to do joint calling:

bash ~/autodenovo/gatk_jointCalling.sh /data/NCR_SBRB/big_fake_simplex/sample_ids.txt

Note that I might need to play with some of these parameters to make it run faster. At the moment I'm using a -c 32 --mem 120G interactive machine, but it doesn't look like it's going to finish on time. 50h prediction at the moment :( Adding the multithread option brings it down to 6h, so not bad.