Arabidopsis lyrata - PacificBiosciences/DevNet GitHub Wiki

The data set includes 84 Runs of A. lyrata. This yields ~ 45X coverage of the diploid (2n) A. lyrata genome. The library was prepared from ~24 ug of genomic DNA. The DNA sheared was split into two parts and size selected on a blue pippin at 7 kb, and 15 kb. These two libraries run with the P5 chemistry.

The data run through HGAP and the Celera assembler yields a diploid assembly, of a size ~353 Mb and and N50 of 252 kb.

The directory "9-terminator" has the various output from the Celera Assembler. We typically consider asm.ctg.fasta + asm.deg.fasta as the major draft assembly. The asm.deg.fasta has the contigs that Celera Assembler thinks degenerated. Namely, they are likely to be collapsed repeats.

The directory "Quiver_Polished" contains the final assembly after polishing by 1 round of Quiver SW (quiver.1.xml) This step should decrease the error rate in the assembly.

The dataset can be downloaded here: http://datasets.pacb.com.s3.amazonaws.com/2014/Arabidopsis-lyrata/list.html