dadi - christianparobek/cambodiaWGS GitHub Wiki

One way to get at the demography question, using inference (rather than coalescent MCMC simulations). Specifically, using dadi. This is what Chang et al. used.

####Installation

Sandeep helped me set it up on Kure as a python module:

Get the site-packages directory working

module add python/2.6.5
cd ~
mkdir -p lib/python2.6/site-packages/ ## (This is where dadi will live)
cd /netscr/prchrist/installs ## (Dir where you download the zip file- can be whatever you want)
wget https://dadi.googlecode.com/files/dadi-1.6.3.zip

Install dadi

bsub -Is /bin/bash
unzip dadi-1.6.3.zip
cd dadi-1.6.3
export PYTHONPATH=$PYTHONPATH:/nas02/home/p/r/prchrist/lib/python2.6/site-packages
python setup.py install --prefix=/nas02/home/p/r/prchrist

You should now see the installation under

/nas02/home/p/r/prchrist/Library/lib/python2.6/site-packages

Add the follwing line to the bottom of your ~/.bashrc file and save the change:

export PYTHONPATH=$PYTHONPATH:/nas02/home/p/r/prchrist/Library/lib/python2.6/site-packages

Now, log out and back into Kure and try to import dadi into a Python session (note this package appears to require a X connection to work). To get dadi up and running:

bsub -q int -Is /bin/bash
python
import dadi

####Running dadi Before I can do anything, I need to convert our SNP data into dadi format. Here is a discussion.

What I ended up doing is recorded in the script vcf2dadi.py.

####Plotting AFS We're first going to plot AFS from Pf and Pv by site type (syn, nonsyn, genic, intergenic). Also want to graph and model against the AFS from a constant-size WF population. To model the population, I compiled ms on Kure. At a minimum, I need these parameters: ms num_indivs num_reps_to_run and an extra parameter either giving the number of sites to model, or the theta. For theta, can use mutation rate from here - 1.0–9.7×10−9 mutations per base pair per generation.

Notes about dadi:

####2015-10-07 My CP3 group (from JP's paper) is actually a single genotype (at least at the synonymous sites)! All 19 of them! So can't use it in dadi simulations. So must just look at CP1, 2, 4.

####2016-02-15 Systematically snakemake-izing dadi analysis:

  • Making dadi-formatted files
  • Calculating the AFS
  • Graphing the AFS
  • Running 1-pop models on each dataset