IsoSeq Human MCF7 Transcriptome - PacificBiosciences/DevNet GitHub Wiki

PacBio BlogPost: Updated! Data Release: Human MCF-7 Transcriptome

The dataset linked from this page contains the polished results of transcriptome sequencing for the human MCF7 breast cancer cell line using PacBio® SMRT® Sequencing and Iso-Seq™ Analysis. The libraries were prepared using the full-length cDNA protocol [1].

There are two releases to the MCF7 dataset. The initial release is the 2013 release done using:

  • agarose gel cutting 1 - 2 kb, 2 - 3 kb, 3 - 6 kb; also no size selection
  • P4-C2 chemistry; 2 hour movies
  • total of 119 SMRT Cells

A second release is the 2015 release using:

  • SageELF® system size selection 1 - 2 kb, 2 - 3 kb, 3 - 5 kb, 5 - 10 kb
  • P5-C3 chemistry; 4 hour movies
  • total of 28 SMRT Cells

To obtain a non-redundant, high-quality, full-length set of transcripts, we applied an isoform-level clustering algorithm followed by consensus calling using Quiver. High-quality consensus sequences were then mapped back to the human genome (hg19) and redundant transcripts were collapsed to create the polished dataset below. Additional processing was done to identify fusion gene candidates. For a schematic of the bioinformatics process, see here.

The 2013 release dataset can be downloaded here. It contains the raw movie files for the 119 SMRT Cells as well as the final output of non-redundant, high-quality, unique transcript sequences.

The 2015 release dataset can be downloaded here. It contains the raw movie files for the additional 28 SMRT Cells. The final output files (file name starting with IsoSeq_MCF7_2015edition_polished), however, is the combined output of both releases.

A public UCSC browser track containing the GFF files from below is available: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Magdoll&hgS_otherUserSessionName=2015_MCF7

This GFF track contains the two following GFF files:

  • IsoSeq_MCF7_2015edition_polished.unimapped.gff
  • IsoSeq_MCF7_2015edition_polished.fusion.gff

And thus contains the final polished output from both 2013 and 2014 datasets.

Description of Files

  • IsoSeq_MCF7_2015edition_polished.unimapped.fasta - Polished fasta sequences, non-chimeric only.

  • IsoSeq_MCF7_2015edition_polished.unimapped.gff - Alignment of the above to hg19.

  • IsoSeq_MCF7_2015edition_polished.fusion.fasta - Polished fasta sequences.

  • IsoSeq_MCF7_edition_polished.fusion.gff - Alignment of the above to hg19. Each fusion candidate is named using the format + followed by the suffix _1, _2, to allow proper loading the UCSC browser track.

References

[1] Official Iso-Seq Landing Page

[2] Iso-Seq GitHub Wiki

[3] MCF-7 dataset release PacBio blogpost


For Research Use Only. Not for use in diagnostic procedures. © Copyright 2013 - 2015, Pacific Biosciences of California, Inc. All rights reserved. These data are provided as-is and without any warranty, and Pacific Biosciences assumes no responsibility for any errors or omissions in the data provided. Use of this data is offered to individuals who understand and accept the associated terms and conditions. The data being provided is subject to change without notice. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences data, products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacificbiosciences.com/licenses.html. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell, and Iso-Seq are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science, Inc. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.

Visit the PacBio Developer's Network Website for the most up-to-date links to downloads, documentation and more.

Terms of Use | Trademarks | Contact Us

⚠️ **GitHub.com Fallback** ⚠️