Single cell A B compartment calling - ma-compbio/Higashi GitHub Wiki

The Higashi-analysis pipeline runs on the imputed contact maps produced by Higashi-main. Please first execute Higashi-main as described here.

How Higashi calculates single cell A/B compartment scores

We designed an approach for A/B compartment score annotation based on the widely-used method proposed in Lieberman-Aiden et al., Science, 2009. In the original compartment calling method, the Hi-C contact map is normalized and transformed into a Pearson correlation matrix. The sign of the first principal component (referred to as PC1 for simplicity) of this correlation matrix is then used to define A/B compartments.

We developed a new method that calculates continuous single cell A/B compartment scores that are directly comparable across cells and are sensitive to reflect the subtle variability of compartment shifts. The first two steps, i.e., normalization and transformation into Pearson correlation matrices, remain the same for each single cell. However, instead of performing PCA on each individual Pearson correlation matrix, we apply PCA once on the Pearson correlation matrix from the pooled scHi-C and save the PCA projection matrix. We then use this bulk projection matrix to transform the single-cell Pearson correlation matrices into continuous one dimensional vectors.

Usage

Please execute the following code:

cd higashi/
python scCompartment.py [-c CONFIG] [--calib_file FILE] [--calib] [--neighbor] [-o OUTPUT]

'
optional arguments:
--calib_file FILE     The path to the calibration file (CG ratio, CpG density or bulk A/B compartment 
                      annotations etc.)
--calib               Calibrate the sign of the called A/B compartments. When using this option the 
                      `calib_file` would be required. 
-n, --neighbor        Call compartments on the imputed maps with neighboring cell information utilized.
-o, --output          Output file name (stored in the `temp_dur`). (default: scTAD.hdf5)

required arguments:
-c CONFIG             The path to the configuration JSON file that you created in the step.
'

The FILE should have the following format (a tab-separated text file):

chr1	0	  0.0323533
chr1	1000000	  0.033473
chr1	2000000	  0.0275663
chr1	3000000	  0.0193224
chr1	4000000	  0.012299
chr1	5000000	  0.020483
chr1	6000000	  0.017645
chr1	8000000	  0.01735
chr1	8000000	  0.016412

Note: Higashi assumes that larger values corresponds to higher likelihood for a bin being A compartments. So if the used calibration file do not follow this convention, please adapt the signals accordingly.

We also provided a script called CpG_density.py to calculate the CpG density, which requires the genome reference fasta file (For instance, hg19.fa from UCSC genome browser). To execute the script:

cd higashi
python CpG_density.py [-g FASTA] [-w WINDOW] [-o OUTPUT]


'
required arguments:
-g FASTA              The path to the genome reference fasta file.
-w WINDOW             The window size, should be the same as the imputation resolution.
-o OUTPUT             The name of the output cpg_density file name, will be used in scAB calling proces
'

The code would generate a file cpg_density.txt at the higashi/ folder, which can be directly used as the {CALIB_FILE} in the above single cell A/B compartment calling algorithm.

Note: Higashi also supports using the projection matrix calculated based on real bulk Hi-C instead of the pooled scHi-C contact maps. To do that, include the bulk_path parameter in the configuration file which records the path to the bulk Hi-C in the .mcool format.