Blasr Output Format - pb-cdunn/blasr GitHub Wiki

Blasr can print alignments using the following file formats.

(a) blasr option: -m 0

blasr like human-readable output with |'s connecting matched nucleotides.

(b) blasr option: -m 1

Space-delimited summary of alignments containing 11 fields:

qName tName qStrand tStrand score percentSimilarity tStart tEnd tLength qStart qEnd qLength nCells

(c) blasr option: -m 2

XML format.

(d) blasr option: -m 3

Vulgar format (deprecated).

(e) blasr option: -m 4

Space-delimited summary of alignments containing 13 fields:

qName tName score percentSimilarity qStrand qStart qEnd qLength tStrand tStart tEnd tLength mapQV

(f) blasr option: -m 5

Space-delimited machine-parsable format containing 19 fields:

qName qLength qStart qEnd qStrand tName tLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQV qAlignedSeq matchPattern tAlignedSeq

(g) blasr option: -sam

SAM format. User-defined tags are:

  (1) "XS": 1 plus (first base of SEQ in 0 based coordinate of zmw unrolled polymerase read), inclusive, where SEQ is SAM mandatory field column 10.

  (2) "XE": 1 plus (last base of SEQ in 0 based coordinate of zmw unrolled polymerase read), exclusive.

  (3) "XL": number of aligned query bases

  (4) "XQ": length of zmw unrolled polymerase read.

  (5) "XT": number of continues reads, always 1 for blasr

  (6) "YS": first base of query subread in 0 based coordinate of zmw unrolled polymerase read, inclusive. movie/zmw/YS_YE

  (7) "YE": last base of query subread in 0 based coordinate of zmw unrolled polymerase read, exclusive.

  (8) "ZM": zmw (hole) number.

Notes: Reading material for understanding PacBio Sequencing Template (SMRTBell), video.

In short, each zmw produces only one unrolled polymerase read, which may contain multiple subreads. A subread name 'movie/zmw/YS_YE' describes a subread starting from YS, ending at YE in the coordinate of zmw polymerase read. For example, zmw 223 of a movie m130812_185809_42141_c100533960310000001823079711101380_s1_p0 may produce an unrolled polymerase read of length 10000. And this polymerase read may contain three subreads:

m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/0_3000 -> starts from base 0, ends at base 3000 of polymerase read
m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/3050_6000 -> starts from base 3050 ends at base 6000 of polymerase read
m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/6050_9800 -> starts from base 6050 ends at base 9800 of polymerase read

Tips: If blasr option -header is specified and output file format is -m 1, -m 4 or -m 5, then fields will be printed to the first line of the output file.