H_sapiens_54x_release - PacificBiosciences/DevNet GitHub Wiki

Instrument:  PacBio RS II
Chemistry:  C3
Enzyme: P5

Summary

The dataset released here contains the raw sequence data resulting from PacBio(R) SMRT(R)Sequencing for CHM1htert, a human cell line derived from a hydatidiform mole, as a resource for general community exploration. One ~20 kb long insert shotgun library was prepared from the same DNA sample. Size selection was performed using 7.5 kB and 10 kB elution cutoffs, respectively, on a BluePippin(TM) DNA size-selection system from SAGE Science. The genome was sequenced using P5-C3 chemistry and 3-hour SMRT Cell acquisitions to generate ~167 GB of sequence data.




Sequencing Data Statistics
Total number of reads: 21,856,161
Total number of post-filtered bases: 167,851,128,644 bp

Read length statistics		
Half of sequenced bases in reads greater than: 10,739 bp
5% of reads longer than: 19,060 bp
Average read length: 7,680 bp

SMRTbell template statistics
Longest DNA insert sequenced: 42,774 bp
Average throughput/SMRT Cell: 608 Mb

Download Dataset

To access the dataset, please navigate to http://datasets.pacb.com/2014/Human54x/fast.html. To reference the blog post, please visit http://blog.pacificbiosciences.com/2014/02/data-release-54x-long-read-coverage-for.html

To download, you can use wget or curl to go through the list of the file. For example, to download it with bash, save the list as file file_list and you can use a simple loop to download the files:

for f in `cat file_list`;do wget $f;done.

A more powerful download command is:

cat file_list | xargs -n 1 -P 4 wget --continue -P data/ #Download four files at once. Continue downloads if they are interrupted.

Please contact us via twitter @PacBio. We appreciate if you could follow us on twitter so that we can direct message in response.