Higashi Usage API - ma-compbio/Higashi GitHub Wiki

Initialize the Higashi instance

from higashi.Higashi_wrapper import Higashi
higashi_model = Higashi(config_path)

Data processing

Run the following commands to process the input data (only needs to be run once).

higashi_model.process_data()

This function will finish the following tasks:

  • generate a dictionary that'll map genomic bin loci to the node id.
  • extract data from the data.txt and turn that into the format of hyperedges (triplets)
  • create contact maps based on sparse scHi-C for visualization, baseline model, and generate node attributes
  • (Optional) run linear convolution + random-walk-with-restart (scHiCluster) to impute the contact maps as baseline and visualization
  • (Optional) process co-assayed signals

The above function is also equivalent to

higashi_model.generate_chrom_start_end()
higashi_model.extract_table()
higashi_model.create_matrix()

Before each step is executed, a message would be printed indicating the progress, which helps the debugging process.

Train the Higashi model

Step 0: model initialization

higashi_model.prep_model()

Step1: train higashi for getting embeddings

higashi_model.train_for_embeddings()

Step2: train higashi for contact map imputation without using neighboring information

higashi_model.train_for_imputation_nbr_0()
higashi_model.impute_no_nbr()

Step3: train higashi for contact map imputation with neighboring information

higashi_model.train_for_imputation_with_nbr()
higashi_model.impute_with_nbr()

**Extra Notes: ** Higashi saves parameters of the model and embeddings every 5 epochs, the user can check if the embeddings look good in the process. For instance, the user is not sure how many epochs would Higashi converges on their new dataset and set the embedding_epoch as 120 just to be on the safe side. During the training process, the user find that the embeddings converge at around epoch 58. Instead of waiting for 120 epochs to finish, one can just wait till the model finished the 60 epoch (as the model saves parameter every 5 epochs), and interrupt the function. The parameters will be load automatically the next time.

A few notices:

  • process_data() only needs to be called once unless the data utilized is changed. For instance, the change of chrom_list, or data source.
  • prep_model() needs to be called right before any training and imputation function, but only needs to be called once after higashi_model = Higashi(...)
  • Trained weights of Higashi are automatically saved in the temp_dir. You can continue the next stage of training or imputation directly if the previous stage is completed or intentionally interrupted.