Start - YeoLab/peak-simulator GitHub Wiki

Welcome to CLIP-seq peak simulator wiki.

This will document all distributions and assumptions that are made for the CLIP-seq peak simulator as development continues.

At the moment I am simply reimplementing a peak simulation algorithm (Zhang, Z. D., Rozowsky, J., Snyder, M., Chang, J. & Gerstein, M. Modeling ChIP sequencing in silico with applications. PLoS computational biology 4, e1000158 (2008).) originally written in R into Python so it can be extended in the future.

I have followed the gamma background distribution, but slightly changed the attachment algorithm.

Using the average background distribution across a peak as a starting point I select a peak using the average distribution as a bias to select peaks with higher weights already. Zhang claims that this will cause a power-law like distribution among peaks, but I observe all peaks of approximately equal height. This is being worked on.

I plan on adding other methods of distributing weight in the future.

Reads are distributed according to the weight at a starting location of the read / the total number of reads to be assigned. This method is currently sub-optimal because it does not assign all reads, just fractions, and for very low coverage simulations this will not work

Additionally reads are not assigned strand specific and the genome can not be converted into a transcriptome.