Explanations of Parameters - PouletAxel/SIPMeta GitHub Wiki
Parameters
simple vs substraction
Simple creates average plots using one dataset. Subtraction specifies that average plots of the difference between two datasets should be calculated.
Loops file
The loops file is tab delimited txt file and should be at least 6 columns:
Anchor1_chromosome Anchor1_start Anchor1_end Anchor2_chromosome Anchor2_start Anchor2_end
.
Chromosome names should match the SIP input files or the names in the .hic file if using that option. The name of the loops file should end in .txt.
SIP or .hic inputs
The program’s default is to run using data output by SIP. However, uses can specify to use a .hic file created by Juicer instead. If using SIP files, specify the directory with the output of each chromosome for the RawData. If using a .hic file, RawData will be this .hic file. When using .hic files, you must also specify the path to Juicer Tools, a chromosome size file listing the chromosomes in your .hic file, and an output directory for data that will be dumped using Juicer Tools.
Note: This output directory with the dumped files and can be used as the RawData input if rerunning SIPMeta with different parameters or with a different loops file. Alternatively, users can create their own SIP-like input files with the following format:
chr1 chr2 value distanceNormalizedValue
560000 565000 1499.26335.84271154556146
If generating your own processed files, divide the chromosome into small chunks (10 Mb) with all the chunks in each chromosome placed in a directory with the chromosome name. Each chromosome should have its own directory with multiple files corresponding to each chunk. The file name recognized by SIPMeta requires the chromosome name, the start coordinate of the chunk, and the end coordinate of the chunk (eg: 1_5000000_14999999.txt).
The program will run for each chromosome specified in the loops file.
-sMetaPlot
The number of rows / columns desired in the output matrix. In other words, how many pixels to display in the metaplot.
-sImg
Sizes of the submatrices produced by SIP or the number of bins to walk in each dumping step by Juicer Tools. A parameter set to -sImg 2000 at 5 kb resolution will produce 10 Mb chunks. Note that the resolution is automatically assigned by the resolution found in the input loop file.
-norm
The normalization scheme to use. We recommend Knight-Ruiz (KR / balanced) from juicer. However, it must be present in the .hic file. This parameter is only used when dumping from a .hic file.
-res
Resolution in bp (default 5000 bp)
-c
Specifies which matplotlib color range to use.
-z
Set this option to create separate Z-scores for each ring in the bullseye plot. Will also provide an Aggregate Domain Analysis (ADA) score which is based on the number of positive z-scores in the bottom right quadrant vs the other quadrants.
-t
Specifies a threshold of distance normalized values as extremes. Default is -1 so there is no threshold.
-prefix
Prefix to append to output files.
-s
If set, the bullseye transformation is performed but the edges will be trimmed to create a square plot. This plot will contain the Manhattan distance correction in a square shape.
-min
Lowest value in the heatmap color scale. Default is the minimum value in the average matrix.
-max
Highest value in the heatmap color scale. Default is the maximum value in the average matrix.
-cpu
The number of CPUs to use.
Output Files
_bullseye.png (image of average signal in the bullseye format, accounting for Manhattan distance as in the image on the right.)
_normal.png (image of average signal in the traditional square format. Does not account for Manhattan distance.)
_APA.tab (text file with the value of the center, corner, and APA score) _matrix.tab (average matrix fed into bullseye.py)
Note: The _matrix.tab file can be used as input to bullseye.py to quickly create new images with different color schemes or color scales or to perform z-Score normalization of each ring and obtain ADA scores. Use python3 bullseye.py --help for more information. If starting with a .hic file, a folder with the dumped data corresponding to distance normalized values for each chunk of every chromosome will also be created.