dadi - christianparobek/cambodiaWGS GitHub Wiki
One way to get at the demography question, using inference (rather than coalescent MCMC simulations). Specifically, using dadi. This is what Chang et al. used.
####Installation
Sandeep helped me set it up on Kure as a python module:
Get the site-packages directory working
module add python/2.6.5
cd ~
mkdir -p lib/python2.6/site-packages/ ## (This is where dadi will live)
cd /netscr/prchrist/installs ## (Dir where you download the zip file- can be whatever you want)
wget https://dadi.googlecode.com/files/dadi-1.6.3.zip
Install dadi
bsub -Is /bin/bash
unzip dadi-1.6.3.zip
cd dadi-1.6.3
export PYTHONPATH=$PYTHONPATH:/nas02/home/p/r/prchrist/lib/python2.6/site-packages
python setup.py install --prefix=/nas02/home/p/r/prchrist
You should now see the installation under
/nas02/home/p/r/prchrist/Library/lib/python2.6/site-packages
Add the follwing line to the bottom of your ~/.bashrc file and save the change:
export PYTHONPATH=$PYTHONPATH:/nas02/home/p/r/prchrist/Library/lib/python2.6/site-packages
Now, log out and back into Kure and try to import dadi into a Python session (note this package appears to require a X connection to work). To get dadi up and running:
bsub -q int -Is /bin/bash
python
import dadi
####Running dadi
Before I can do anything, I need to convert our SNP data into dadi format. Here is a discussion.
What I ended up doing is recorded in the script vcf2dadi.py.
####Plotting AFS
We're first going to plot AFS from Pf and Pv by site type (syn, nonsyn, genic, intergenic). Also want to graph and model against the AFS from a constant-size WF population. To model the population, I compiled ms on Kure. At a minimum, I need these parameters: ms num_indivs num_reps_to_run and an extra parameter either giving the number of sites to model, or the theta. For theta, can use mutation rate from here - 1.0–9.7×10−9 mutations per base pair per generation.
Notes about dadi:
dadicalculates theta differently than the other parameters. Apparently, theta is easy to find once we've optimized the other parameters. Supplying a fixed value of theta can be useful if we have it and don't want to fit it from the data.
####2015-10-07
My CP3 group (from JP's paper) is actually a single genotype (at least at the synonymous sites)! All 19 of them! So can't use it in dadi simulations. So must just look at CP1, 2, 4.
####2016-02-15
Systematically snakemake-izing dadi analysis:
- Making dadi-formatted files
- Calculating the AFS
- Graphing the AFS
- Running 1-pop models on each dataset