CNV_WES - gsudre/autodenovo GitHub Wiki

09/08/2017

Here I'm just waiting for the aligned BAMs from bcbio-nextgen to start playing. I can actually just use my currently aligned BAMs to see if software is working. I actually tried the sample data for XHMM, and it seems to work. But reading their paper it looks like there will be lots of sample-specific configuration, so I need to make sure it will work with my data (which is not even final).

In sum, the software I want to test are:

There's also CONIFER, but most review papers don't have it in great light, and the code is a bit old. In fact, it's breaking because pyTables changes its naming convention from camel to _, and it totally breaks CONIFER. I fixed that, but other things are still breaking. So, for now I'll skip CONIFER.

09/12/2017

It looks like XHMM has been used for denovo research in its papers (http://onlinelibrary.wiley.com/doi/10.1002/0471142905.hg0723s81/pdf, http://www.cell.com/ajhg/pdf/S0002-9297(12)00417-X.pdf), so we can potentially just adapt that analysis/protocol to the other tools as well? The question here is whether we should get a consensus across tools first, and then filter for denovo properties, or filter within tool and then check for consensus... CNVNator also has an approach to do that in their paper (http://genome.cshlp.org/content/21/6/974.full.pdf+html). XHMM does use the fact that the samples are all trios to optimize the HM parameters, so we could do that, but it does require a large (>25?) number of trios to get to a nice population-based statistic. Maybe we should consider the CNVnator approach instead?

I also just ran into this tool (cnvScan, https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-016-2374-2, https://github.com/PubuduSaneth/cnvScan/wiki/cnvScan-implementation), which does a lot of what I was thinking of doing. Reading the paper, they use a clever method to filter out false positive CNV calls, and then annotate them using clinical databases. But they don't actually merge them, so we could use SURVIVOR for that. Or, since we're doing a denovo approach, we could use that strategy for filtering anyways. Also, for the sake of completeness, they evaluated 5 callers: ExCopyDepth, ExomeDepth, ExomeCopy, coNIFER, and XHMM. Maybe we should add the to our list as well? ExCopyDepth is their own implementation getting features from ExomeDepth and ExomeCopy, so I won't mess with that, as the code is not out there. ExomeDepth has very poor documentation, and seems to be tailored only for hg19. I'll skip it for now. ExomeCopy seems a bit more user-friendly, but I think I'll hold on for now, as it looks like it needs the sample matrix as well. It might be needed in the future, but let's stick with the current approaches for now, and work in combining them. We can always add other ones in the future.