Specifying SMRT Pipe inputs - pb-dyim/SMRT-Analysis GitHub Wiki

The input file is an XML file specifying the sequencing data to process. Generally, you specify the inputs as URIs (Universal Resource Identifiers) which are resolved by code internal to SMRT Pipe. In practice, this is most useful to large enterprise users that have a data management scheme and are able to modify the SMRT Pipe code to include their own resolver.

The simpler way to specify inputs is to fully resolve the path to each input file, which is almost always a bas.h5 file. The script fofnToSmrtpipeInput.py is provided to convert a file of bas.h5 file names (a "file of file names" file) to the input format expected by SMRT Pipe. If my_inputs.fofn looks like

/share/data/run_1/m100923_005722_00122_c15301919401091173_s0_p0.bas.h5
/share/data/run_2/m100820_063008_00118_c04442556811011070_s0_p0.bas.h5

then it can be converted to a SMRT Pipe input XML file by entering:

fofnToSmrtpipeInput.py my_inputs.fofn > my_inputs.xml

Following is the resulting XML file:

<?xml version="1.0"?>
<pacbioAnalysisInputs>
 <dataReferences>
    <url ref="run:0000000-0000"><location>/share/data/
    run_1 m100923_005722_00122_c15301919401091173_s0_
    <url ref="run:0000000-0001"><location>/share/data/
    run_2/m100820_063008_00118_c04442556811011070_s0_
 </dataReferences>
</pacbioAnalysisInputs>

To run an analysis using these two bas.h5 files as input, enter the following command:

smrtpipe.py --params=settings.xml xml:my_inputs.xml

The SMRT Pipe input format lets you specify annotations, such as job IDs, job names, and job comments, in a job-management environment. The fofnToSmrtpipeInput.py application has command-line options for setting these optional attributes.

Note: To get help for a script, execute the script with the --help option and no additional arguments. For example:

fofnToSmrtpipeInput.py --help
⚠️ **GitHub.com Fallback** ⚠️