Configure the parameters for Fast Higashi - ma-compbio/Higashi GitHub Wiki

All customizable parameters are stored in a JSON config file. The path to this JSON config file will be needed when running the program. For examples of the configuration JSON file, see the tutorials linked in this wiki.

Higashi and Fast-Higashi shares most of the parameters, the JSON file you prepared for Higashi program can be directly reused for Fast-Higashi with a few additional parameters.

Fast-Higashi only parameters

If you plan to run Higashi also, and already prepared the config JSON file according to this, simply include the following parameters, and you are good!

params Type Required/Optional description example
resolution_fh list Required The resolution of contact maps [500000]
batch_id str Optional The name of the batch id information stored in label_info.pickle. The corresponding information would be used to remove batch effects "batch id"
blacklist str Optional Path of the ENCODE black file (https://github.com/Boyle-Lab/Blacklist). Will be used to filter out contacts "/home/rzhang/Higashi/hg19-blacklist.v2.bed"

Other parameters shared with Higashi

If you plan to only run Fast-Higashi, you will need the following parameters as well.

params Type Required/Optional description example
data_dir str Required Directory where the data are stored "/sn-m3C-seq"
input_format str Optional How the data are stored. Can either be "higashi_v1" or "higashi_v2". "higashi_v1" stands for storing the scHi-C dataset as one big table named as data.txt. "higashi_v2" stands for storing contact pairs as individual tables for each cell, and list the path to these files in the filelist.txt "higashi_v1"
header_included bool Required when input_format="higashi_v2" whether the header of the tab is included in each table true
contact_header list Required when input_format="higashi_v2" and header_included is false The header of the contact pairs. Must include ["chrom1", "pos1", "chrom2","pos2"], when "count" is not included, the program assumes count=1 for all contact pairs ["chrom1", "pos1", "chrom2", "pos2", "count"]
structured bool Required Whether the data.txt file is structured (interaction pairs of a cell i is successive in the dataframe not randomly placed). If the data.txt is organized before, it could save a lot of memory and time for processing true
temp_dir str Required Directory where the temporary files will be stored. An empty folder will be created if it doesn't exists. "../Temp/sn-m3C_1Mb"
genome_reference_path str Required Path of the genome reference file from USCS Genome Browser, will be used to generate bin nodes "../hg19.chrom.sizes.txt"
chrom_list str Required List of chromosomes to train the model on. The name convention should be the same as the data.txt and the genome_reference file ["chr1", "chr2","chr3","chr4","chr5"]
resolution int Required Resolution for imputation. 1000000
resolution_cell int Required Resolution for generate attributes of the cell nodes. Recommend to use 1Mb (data with lower coverage per cell) or 500Kb (data with higher coverage per cell). 1000000