Usage tips - NoamKaplan/dna-triangulation GitHub Wiki

  1. Design: Code was designed such that all triangulation algorithms (augmentation+de novo) are contained in the triangulation.py module, in addition to some helper functions. Each of the other scripts is executable and interfaces with this module.
  2. Parallelization: Scripts that perform heavy computations have been designed so they can be run in a distributed form either as multiple processes on a single machine or multiple processes on multiple machines.
  3. RAM: Some scripts currently require loading large matrices into RAM. Some of these requirements may be avoided by using hdf5 files instead of raw text files. Please contact author for further information.
  4. Seeds: karyotype.py and scaffold_chr.py use randomizations, but utilize seeds. Seeds server two purposes. First, seeds allow making results reproducible - without them, each time the code is run it may give different results. Second, seeds are useful when running parallel optimizations, e.g. when using chromosome_scaffold.py. In order to run several iterations in parallel on different machines, one may assign each run a specific seed.
  5. Debugging: In order to catch run-time errors in chromosome_scaffold.py, run with -p 1 (no multithreading) so that errors will be dumped to STDERR.
  6. Data: The code expects a Hi-C interaction matrix as input. The matrices used in the paper are provided here: http://my5c.umassmed.edu/triangulation/.