Read extraction - aechchiki/SIB_LongReadsWorkshop_Zurich17 GitHub Wiki

The raw output of MinION and PacBio RSII are both stored in Hierarchical Data Format (HDF5). This is basically an archive file format specifically designed to store large amount of data, allowing rapid access to its contents. In our case, these files do not only contain the raw reads, but also metadata information generated during the sequencing run and the basecalling.

Data in HDF format can be explored using inbuilt HDF5 tools, e.g.:

h5dump <HDF_file>       # examine contents of HDF file and dump content to ASCII

For the purposes of this tutorial, we will only need the reads sequences and their qualities, which can be easily stored in a fastq file for subsequent processing.

Our aim is thus to extract the basecalled reads from HDF format to fastq format.

Next

Go to tutorial Extraction of MinION reads.

Go to tutorial Extraction of PacBio reads.

Go back to Table of content.

⚠️ **GitHub.com Fallback** ⚠️