Formats_Sequel - aechchiki/SIB_LongReadsWorkshop_Zurich18 GitHub Wiki

PacBio Sequel format: BAM

Section: Data [3/5].

The new standard for PacBio is using a "classical" BAM format (alignment file) to store the sequences. However, this BAM has a very particular and specific PacBio tags and possibly header, which you can look up in detail on the dedicated website.

Data from Sequel format don't need extraction (lucky you!) so you can use them directly in (most) downstream PacBio-specific software.

Here you can see how a "real" run looks like.

For a given movie, three files are reported.

[   ] m54006_170729_232022.subreads.bam        2017-07-30 09:28   13G
[   ] m54006_170729_232022.subreads.bam.pbi    2017-07-30 09:28   22M
[TXT] m54006_170729_232022.subreadset.xml      2017-07-30 09:26   13K

The file .bam.pbi simply contains a table of semantic information about each read and its alignment ("index"), necessary for some PacBio downstream software. For your information, you can generate such an index with the pbindex utility. As usual, the xml file contains sequencing run metadata.

Some programs though need the input in fasta/q format instead of bam. As future reference, please refer to the PacBio BAM manipulation manual.

You can also convert your "old" basecalled RSII files into Sequel-like bam (for pipeline compatibility), by converting the basecalled bax.h5 (bax2bam)[https://github.com/PacificBiosciences/PacBioFileFormats/wiki/BAM-recipes] then align it (pbalign). Both these utilities are also available through Bioconda! ;) check them out .

BAM conversion

Data

If you have time or want to try that, you can convert this format also using utilities embedded in PacBio's bioconda (as in the previous page). As exercise, for example, you can try to convert a bam file into a fastq. For this you need a PacBio bam file and its corresponding index file:

# a tiny subset of the Avian dataset
wget https://drive.switch.ch/index.php/s/rmVRnGXbfmuzTfx/download -O PBbam.tar.gz

Bioconda recipe: bam2fastx

(if didn't do it before) You need to:

  • install conda (3.7):

    • get the installer:
    • wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
  • follow instructions:

    • launch it! bash Miniconda3-latest-Linux-x86_64.sh -f
    • press "enter" until end of License terms
    • enter "yes" (to accept the License terms)
    • enter a non-existing location for installation (you can keep the suggested one - for example: /home/training/miniconda3/)
    • enter "yes" (to prepend the install location to PATH in the .bashrc)
    • open a new terminal -> tadah!
    • check with: which conda
      • this should point to your installation location (following the example: /home/training/miniconda3/bin)
      • if not, prepend the location to EVERY conda command here below! (e.g. conda -> /home/training/miniconda3/bin/conda)
  • setup the necessary channels:

    • conda config --add channels defaults
    • conda config --add channels bioconda
    • conda config --add channels conda-forge

! NEW STUFF

  • install your favourite PacBiotool, in this specific case bam2fastx:
    • conda install bam2fastx
      • enter "yes" (when asked to proceed to installation)
    • check with: which bam2fastx
      • this should point to your installation location (following the example: /home/training/miniconda3/bin)
      • if not, prepend the location to EVERY conda command here below! (e.g. bam2fastx -> /home/training/miniconda3/bin/bam2fastx)

To access usage and local documentation use flag -h (bam2fastx -h).

For example, conversion to (default zipped) fastq is simply:

bam2fastq -o <output> <file.pb.bam>

Note: when downloading real data, make sure to also download the index! otherwise the command will fail. Little question, how to generate the index? Find out ;)

You're done!

If you had issues converting, or just didn't have time for that, here is a subset of the original dataset, converted to fastq:

# pacbio FASTQ from BAM
wget https://drive.switch.ch/index.php/s/LzXCP94TanTXaF0/download -O PBbam_subset.fastq.gz

Next

Back

⚠️ **GitHub.com Fallback** ⚠️