Change Log - ma-compbio/Higashi GitHub Wiki

2022-12-10

  • Add Tian et al. biorxiv to the gallery
  • Add conda support for Fast-Higashi with noarch build.

2022-11-27

  • Add more tutorials on fast-higashi
  • Adjust Fast-Higashi API for more user-friendly usage.
  • Automatically adjust cpu thread for Fast-Higashi
  • Add the function to use ENCODE blacklist to filter out contacts

2022-11-08

  • Major update
    • Fast-Higashi with batch effects correction function by
      • normalizing the l1 sum of each diagonal per batch to be consistent with the bulk data (motivated by BandNorm/HiCCompare/MultiHiCCompare)
      • normalizing coverage of each bin per batch to be consistent with the bulk data
    • Update Fast_process.py enabled fast processing of scHi-C data for Fast-Higashi.
    • Fast-Higashi memory consumption optimization by using int16/32 instead of long when appropriate. Cut-down memory usage by at least half.

2022-4-20

  • Conda install now supports all platform (Note on Nov 27, no it didn't work... still looking into options)

2022-1-25

  • Roadmap

    • Complete the API of all the CLI functions
  • Major update

    • Speed improvement for model training enabled by:
      • remove implicit csr_matrix generation
      • new dataloader scheme
      • move some of the data processing to Process.py
      • using multiprocessing cpu to deal with sparse coo matrix generation
    • The Code dir is renamed to higashi for future build of conda packages
    • Higashi is now on conda: conda install -c ruochiz higashi
    • Add tutorials for 4DN sci-Hi-C (Kim et al.) and Ramani et al.
  • Feature update

    • automatic batch size selection, which improves performance on large datasets (large number of cells or high resolutions)
    • new Higashi_Wrapper.py which allows running Higashi in jupyter notebook or custom scripts.
  • Bug fix

    • fix Inf numbers in sqrt_norm() function

2021-10-22

  • Major update
    • Add support for the list of contact pairs format (consistent with scHiCluster).

2021-05-31

  • Major update
    • Higashi now supports the ZINB (zero-inflated negative binomial) regression loss mode. It is recommended to use zinb instead of the ranking mode. Classification mode still works well on low-coverage datasets.
    • File structure is redesigned with much less temporary files generated
    • Runtime optimization, the training speed is optimized through parallelization on the graph construction
    • Memory optimization, the memory usage is reduced by using dynamic graph construction
    • Improvement on the imputation accuracy, especially for bins with no captured contacts in the original scHi-C contact maps
  • Feature update
    • Add the default behavior of Merge2Cool.py (merge all cells when not inputing a list)
    • Add --output options for scTAD.py and scCompartment.py

Thank @zengguangjie for identifying bugs.

  • Bug fix
    • Higashi now supports the latest pytorch version.
    • When inputing one cell at a time, the program won't throw the exception now.
    • The description and the behavior of the neighbor_num parameter is now consistent with the hyperparameter k described in the paper

2021-04-16

  • Feature update
  • Higashi2Scool.py is now functioning properly. Will update the corresponding document soon.

2021-04-01

Thank @tarak77 for being the beta user of some of the new features and identifying bugs.

  • Feature update
    • We now support selecting groups of cells and save the merged imputation results in .cool format. (Merge2Cool.py)
    • Remove the requirement of cell_name in data.txt
    • Higashi-vis now support displaying cell name that is stored in label_info.pickle by the preserved key value cell_name_higashi
    • Use im.show in pyplot instead of seaborn.heatmap for faster rendering
    • More reasonable multiprocessing for scCompartment and scTAD (1~3 process for IO intensive jobs and about 20 process for computational intensive jobs)
    • The output of scCompartment.py now consists compartment_zscore and compartment_raw as well, which corresponds to z-score normalized scA/B compartment values and unnormalized ones.
    • Improve the post-processing steps by merging multiple I/O intensive jobs to one process.
    • Improve the documentation of the code usage.
  • Bug fix
    • Fix the scCompartment.py for chromosomes with only one arm (non-human species).
    • Higashi2SCool.py is not functioning correctly. (Will be fixed in the next version)

2021-03-20

  • Feature update
    • Much faster imputation with pytorch sparse operations
    • Further improve the imputation results and reduce potential batch effects (corresponding options added to the configuration file)
  • Bug fix
    • Fix the parser for scA/B compartment calling
    • Fix the parser for scTAD calling

2021-03-13

  • Feature update
    • Runtime and memory optimization for processing structured dataframe (with multiprocessing support).
    • Add options for not imputing (deprecated after 2021-04-01 update)
    • Add options for customizable epoch numbers and automatically loading previous models
    • Improve the imputation results
  • Bug fix
    • Previous version has an error that when the data.txt include a chromosome that is not included in the chrom_list, the interactions of that chromosome would be randomly included.
    • Previous version used wrong version of code for batch effects removal.

2021-03-05

  • Feature update

    • Post processing of the Higashi-main results
      • Merge hdf5 results from multiple process
      • Match the distribution of contact map values between the output and the populational Hi-C
    • Higashi-vis update
      • Include read_count / kernel density estimation / kernel density estimation local as color scheme for Higashi-vis
      • Include local neighborhood selection function for Higashi-vis
      • Include more colormap options for Higashi-vis
      • Include compartment calling options for Higashi-vis
    • Higashi-analysis update
      • Include A/B sign calibration function and the corresponding script
  • Bug fix

    • Fix the calculation of weights of the neighborhood information

2021-02

  • Feature update
    • Adding single cell TAD calling code
    • Adding single cell compartment calling code
    • We now use fbpca to handle PCA of extremely large feature matrices
    • Beta version of removing batch effects of scHi-C (by including batch_id as part of the input)
    • Memory usage optimization (The memory usage is now 20% of the previous version on the sn-m3c-seq dataset)
    • Remove the optional smoothing and quantile normalization options due to computational efficiency
    • Allow customizable UMAP/TSNE parameters for Higashi-vis
    • Include linear-conv+rwr imputation results for visualization