Human_Microbiome_Project_MockB_Shotgun - PacificBiosciences/DevNet GitHub Wiki

Instrument:  PacBio RS II
Chemistry:  C2 & C3
Enzyme: P4 & P5
P4-C2 and P5-C3 both collected as indicated below.

Just as the de novo assembly of individual genomes is dramatically improved by applying the long read lengths of SMRT® Sequencing, the assembly of metagenomes should also benefit from these advances with significant improvements to delineate between members of the community. As a proof of concept to study this hypothesis, Pacific Biosciences has sequenced a mock community from the Human Microbiome Project and assembled the data using the same algorithm used to assemble single microbial genomes, HGAP.

Posted here is shotgun sequencing data from the Mock Community B sample from the Human Microbiome Project. These files contain the sequencing read data only, not an assembly. The data has been broken into several sets to keep the file size somewhat reasonable. Some preliminary example results of the assembly are also shown at the bottom of the page.

The mock community was obtained through BEI Resources, NIAID, NIH as part of the Human Microbiome Project: Genomic DNA from Microbial Mock Community B (Even, High Concentration), v5.1H, for Whole Genome Shotgun Sequencing, HM-276D. http://www.beiresources.org/Catalog/otherProducts/HM-276D.aspx

Data files:

SMRT Cell summary statistics:

Data file Sample Name Binding Tube Polymerase Chemistry Acq Time Post-Filter # Subreads Post-Filter # Bases
hmp_set1 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 180 86148 379137670
hmp_set1 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 180 81868 360470933
hmp_set1 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 180 74202 325257630
hmp_set1 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 180 56766 239910920
hmp_set1 BEI high even metagenomic_75 pM BEI high even metagenomic_bt_1 P4 C2 180 77952 334252160
hmp_set1 BEI high even metagenomic_75 pM BEI high even metagenomic_bt_1 P4 C2 180 84656 403816210
hmp_set1 BEI high even metagenomic_75 pM BEI high even metagenomic_bt_1 P4 C2 180 96587 478724129
hmp_set2 BEI high even metagenomic_75 pM BEI high even metagenomic_bt_1 P4 C2 180 88561 426776736
hmp_set2 BEI high even metagenomic_25 pM BEI high even metagenomic_bt_1 P4 C2 180 60997 277183852
hmp_set2 BEI high even metagenomic_25 pM BEI high even metagenomic_bt_1 P4 C2 180 53994 236239138
hmp_set2 BEI high even metagenomic_25 pM BEI high even metagenomic_bt_1 P4 C2 180 37144 153559945
hmp_set2 BEI high even metagenomic_25 pM BEI high even metagenomic_bt_1 P4 C2 180 16972 71041660
hmp_set2 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 82662 330381041
hmp_set2 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 68918 252186962
hpm_set3 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 72172 263299931
hmp_set3 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 82525 326780667
hmp_set3 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 71059 268035345
hmp_set3 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 64942 261233340
hmp_set3 BEI high even metagenomic_50nM BEI high even metagenomic_bt_1 P4 C2 180 45375 176070055
hmp_set3 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 57429 197140754
hmp_set3 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 36045 124101501
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 54251 216380686
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 79171 308755441
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 60845 237953531
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 82003 317202529
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 79972 316034059
hmp_set4 BEI high even metagenomic_50 pM BEI high even metagenomic_bt_1 P4 C2 120 72257 274021988
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 71388 383313431
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 71115 373244879
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 70456 366785650
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 69065 357261288
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 66999 346936491
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 70243 349011680
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 68937 337872541
hmp_set5 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 65010 320675096
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 80562 381691078
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 66540 324761102
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 78035 371404939
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 73278 349280176
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 61150 289192246
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 66913 314699834
hmp_set6 BEI high even metagenomic_50pM BEI high even metagenomic_bt_2 P5 C3 150 63319 291515819
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 58617 262376178
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 59557 266423085
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 43703 196536274
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 49685 224514646
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 44409 209207825
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 40569 181415565
hmp_set7 BEI high even metagenomic_ BEI high even metagenomic_bt_2 P5 C3 150 35858 162613189

The preliminary results of this analysis are promising, and the assemblies produced generally display improved contiguity as compared to publicly available, short-read data sets and assemblies. Additionally, the use of epigenetic information to make associations between contigs may make it possible to further improve the shotgun metagenome assembly. This approach would serve as a novel validation method provided only with PacBio sequencing, which can detect epigenetic modifications during single-molecule sequencing. We expect that using methylation data to make associations between contigs will prove more reliable than other strategies, such as binning by GC content, since the methylation profiles of different species within the community should follow a fundamental biological principle to be consistent across the genome. .

Some example results of the shotgun metagenome assembly are represented in the figures below, comparing PacBio preliminary results to MetaVelvet assemblies using Illumina® data. (Illumina benchmark from Treangen and Koren et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. MetaVelvet used for comparison because it produced the fewest contigs for the example genomes shown below -- i.e., the best short-read assemblies are used as a comparison below.)

Assembly comparisons

⚠️ **GitHub.com Fallback** ⚠️