Developer Setup - raphael-group/hatchet Wiki

To contribute to HATCHet, we recommend the following steps:

  • Clone and check out the develop branch of HATCHet

  • Install the Gurobi solver and academic license (for fastest testing times). Set the GUROBI_HOME and GRB_LICENSE_FILE environment variables. If you cannot install Gurobi, see http://compbio.cs.brown.edu/hatchet/README.html#using-a-solver for alternate approaches. The steps that currently run in our CI to install Gurobi are roughly:

          wget https://packages.gurobi.com/9.0/gurobi9.0.2_linux64.tar.gz -O gurobi9.0.2_linux64.tar.gz
          tar xvzf gurobi9.0.2_linux64.tar.gz
          (cd gurobi902/linux64/src/build && make)
          (cd gurobi902/linux64/lib && ln -f -s ../src/build/libgurobi_c++.a libgurobi_c++.a)
          export GUROBI_HOME=$(realpath gurobi902)

          (cd gurobi902/linux64/bin && ./grbgetkey -q <your_gurobi_key_here> --path ${GUROBI_HOME})
          export GRB_LICENSE_FILE=${GUROBI_HOME}/gurobi.lic
  • Install commonly used BioInformatics tools that HATCHet relies on. You will have to set certain environment variables to tell HATCHet where it can find these tools. See https://github.com/raphael-group/hatchet/blob/develop/.github/workflows/main.yml to see how we're doing all this in our CI. Instead of specifying environment variables, you may choose to modify the included hatchet.ini (the paths section).

This list currently includes:

- SAMtools
  - set HATCHET_PATHS_SAMTOOLS to the folder where the samtools executable can be found.

- BCFtools
  - set HATCHET_PATHS_BCFTOOLS to the folder where the bcftools executable can be found.

- Tabix
  - set HATCHET_PATHS_TABIX and HATCHET_PATHS_BGZIP to the folder where the tabix
    (and bgzip) can be found.

- Mosdepth
  - set HATCHET_PATHS_MOSDEPTH to the folder where the mosdepth executable can be found.

- Picard Tools
  - set HATCHET_PATHS_PICARD to the folder where picard.jar
    (or picard if you installed picard tools from conda) can be found.

- Shapeit 2
  - set HATCHET_PATHS_SHAPEIT to the folder where the shapeit executable can be found.

- Phasing reference panel files
    - These files can be found at https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz.
      Set HATCHET_DOWNLOAD_PANEL_REFPANELDIR to the folder where you decompress+untar this file.
      Please note that HATCHet may download additional chain files inside this folder, if needed.
      So make sure that this is a writable location.
      You should see a 1000GP_Phase3 folder inside this folder.

- Reference human genome
    - We recommend the one at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz.
      Set HATCHET_PATHS_REFERENCE to the full path of the decompressed .fa file.

- Testing data for HATCHet
    - We provide some testing data for HATCHet at https://zenodo.org/record/4046906.
      You will want to set HATCHET_TESTS_BAM_DIRECTORY to the folder where you extract all those files.
      One possible way to do this is:
          pip3 install zenodo-get
          python3 -m zenodo_get 10.5281/zenodo.4046906 --output-dir=testdata
          export HATCHET_TESTS_BAM_DIRECTORY=$(realpath testdata)
  • Create an activate a new conda environment with Python 3.8 or 3.9 (preferred)
          conda create --name hatchet python=3.9 && conda activate hatchet
  • Start a new branch, install HATCHet in developer mode, with the dev extras.
          cd <path_to_hatchet_repo>
          git checkout -b <your_awesome_branch_name>
          pip install -e .[dev]

If the pip install step fails because of an error in C++ compilation, you may need to set the environment variable CXXFLAGS to -pthread.

  • Install the pre-commit hook. This will allow you to identify style/formatting/coding issues every time you commit your code. Pre-commit automatically formats the files in your repository according to certain standards, and/or warns you if certain best practices are not followed.
         pre-commit install
  • Run HATCHet Check. This is crucial to (quickly) see if HATCHet is likely to work for your setup or not.
         hatchet check
  • Run the unit tests. This step may take up to an hour, but this is crucial to see if HATCHet is working correctly.
         pytest tests

If any tests fail, do not proceed, but carefully go through the above procedure. Contact us on Github issues if you still can't figure it out.

NOTE: some of the steps in test_steps.py will fail with newer samtools/bcftools versions (e.g., 1.9). Try using version 1.7 of each as used in the GitHub Actions YAML file.

  • Tweak/modify the code, make HATCHet better!

  • Add new tests for any features you add. Re-run the unit tests to make sure you didn't break anything.

         pytest tests
  • Push your code to Github; send a PR towards the develop branch. We intend to follow the Gitflow workflow to accept contributions to HATCHet and release new versions.

Our CI will automatically run the pre-commit and pytest steps for PRs towards the protected branches, so running these steps on your local installation will prevent surprises for you later.

Thank you for contributing to HATCHet!