Blasr Output Format - pb-cdunn/blasr GitHub Wiki
Blasr can print alignments using the following file formats.
(a) blasr option: -m 0
blasr like human-readable output with |'s connecting matched nucleotides.
(b) blasr option: -m 1
Space-delimited summary of alignments containing 11 fields:
qName tName qStrand tStrand score percentSimilarity tStart tEnd tLength qStart qEnd qLength nCells
(c) blasr option: -m 2
XML format.
(d) blasr option: -m 3
Vulgar format (deprecated).
(e) blasr option: -m 4
Space-delimited summary of alignments containing 13 fields:
qName tName score percentSimilarity qStrand qStart qEnd qLength tStrand tStart tEnd tLength mapQV
(f) blasr option: -m 5
Space-delimited machine-parsable format containing 19 fields:
qName qLength qStart qEnd qStrand tName tLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQV qAlignedSeq matchPattern tAlignedSeq
(g) blasr option: -sam
SAM format. User-defined tags are:
(1) "XS": 1 plus (first base of SEQ in 0 based coordinate of zmw unrolled polymerase read), inclusive, where SEQ is SAM mandatory field column 10.
(2) "XE": 1 plus (last base of SEQ in 0 based coordinate of zmw unrolled polymerase read), exclusive.
(3) "XL": number of aligned query bases
(4) "XQ": length of zmw unrolled polymerase read.
(5) "XT": number of continues reads, always 1 for blasr
(6) "YS": first base of query subread in 0 based coordinate of zmw unrolled polymerase read, inclusive. movie/zmw/YS_YE
(7) "YE": last base of query subread in 0 based coordinate of zmw unrolled polymerase read, exclusive.
(8) "ZM": zmw (hole) number.
Notes: Reading material for understanding PacBio Sequencing Template (SMRTBell), video.
In short, each zmw produces only one unrolled polymerase read, which may contain multiple subreads. A subread name 'movie/zmw/YS_YE' describes a subread starting from YS, ending at YE in the coordinate of zmw polymerase read. For example, zmw 223 of a movie m130812_185809_42141_c100533960310000001823079711101380_s1_p0 may produce an unrolled polymerase read of length 10000. And this polymerase read may contain three subreads:
m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/0_3000 -> starts from base 0, ends at base 3000 of polymerase read
m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/3050_6000 -> starts from base 3050 ends at base 6000 of polymerase read
m130812_185809_42141_c100533960310000001823079711101380_s1_p0/223/6050_9800 -> starts from base 6050 ends at base 9800 of polymerase read
Tips: If blasr option -header is specified and output file format is -m 1, -m 4 or -m 5, then fields will be printed to the first line of the output file.