Custom Conda Environment on HPC Without Using Containers (Worked Example) - TGAC/knowledge_base GitHub Wiki

This guide provides a worked example of how to set up a custom conda environment on the HPC (High-Performance Computing) system, without relying on containers or your $HOME directory. The example uses Miniforge, a minimal installer for conda, to create an isolated environment for software development or installation. This setup is particularly useful for managing dependencies and ensuring that your software does not interfere with other installations on the HPC.

This approach is especially useful for:

  • Installing packages not available through the default Conda channels.
  • Managing your own software stack.
  • Maintaining clean, isolated environments for different projects.

We will use Miniforge for this example.

[!NOTE] ℹ️ Why Miniforge?
Miniforge is a minimal Conda installer that supports community-driven packaging and defaults to conda-forge, a reliable, open-source package repository.
What’s the difference between Anaconda, Conda, Miniconda, Mamba, Mambaforge, Micromamba?

I have added two sections to this guide:

1. Custom Conda Environment on HPC Without Using Containers (Miniforge Example):

  • This section provides a step-by-step guide to setting up a custom conda environment on the HPC without using containers. It uses Miniforge as the base installer and demonstrates how to install packages, configure channels, and create a wrapper script for easy access.

2. Test the Miniforge Installation to Install a Package from GitHub:

  • This section tests the Miniforge installation by installing a package from GitHub. It demonstrates how to use the conda environment for software development and installation without affecting your $HOME directory or other installations on the HPC. It also shows how to create a wrapper script for the installed package, allowing for easy access and management of the software.

Section 1: Custom Conda Environment on HPC Without Using Containers (Miniforge Example)

Log into the HPC Head Node

From your workstation, log into the HPC head node:

ssh $(whoami)@hpc.nbi.ac.uk

Connect to the Software Node

From the HPC head node, connect to the software node by typing either software or ssh software23. If prompted, enter your password.

software

Install Miniforge in a Custom Directory

The worked example is installed in the /ei/software/testing/ tree. You may pick a scratch space or project directory where you want Conda to be installed. This will ensure that the installation does not interfere with your $HOME directory or other installations on the HPC.

Create a bash variable for the base Miniforge installation directory. This will be used throughout the guide to refer to the installation path.

install_base=/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example

Install Miniforge in the specified directory. This will create a new directory structure under /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64. This installation will take about 3 minutes to complete.

# create the directory and navigate to it
mkdir -p ${install_base}/src
cd ${install_base}/src

# Download Miniforge for Linux and install it to the base (~3 minutes)
wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -p ${install_base}/x86_64

Activate the conda base environment

After the installation is complete, you need to activate the conda base environment. This will set up the necessary environment variables and paths for using conda and its packages.

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
$ eval "$(${install_base}/x86_64/bin/conda shell.bash hook)"

# check if mamba and conda are in the path

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ which mamba
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/mamba

(base) $ whereis conda
conda: /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/conda /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/condabin/conda

Check conda details and update channels with new the Pixi package manager

Here we will check the conda details, update the channels to include the Pixi package manager, and ensure that the conda configuration is set up correctly.

# check conda info
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda info

     active environment : base
    active env location : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64
            shell level : 1
       user config file : /hpc-home/kaithakg/.condarc
 populated config files : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc
          conda version : 25.3.0
    conda-build version : not installed
         python version : 3.12.10.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=icelake
                          __conda=25.3.0=0
                          __glibc=2.34=0
                          __linux=5.14.0=0
                          __unix=0=0
       base environment : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64  (writable)
      conda av data dir : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/pkgs
                          /hpc-home/kaithakg/.conda/pkgs
       envs directories : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/envs
                          /hpc-home/kaithakg/.conda/envs
               platform : linux-64
             user-agent : conda/25.3.0 requests/2.32.3 CPython/3.12.10 Linux/5.14.0-503.26.1.el9_5.x86_64 almalinux/9.6 glibc/2.34 solver/libmamba conda-libmamba-solver/25.3.0 libmambapy/2.1.1
                UID:GID : 9404:3658
             netrc file : None
           offline mode : False


           offline mode : False

[!IMPORTANT] As you can see below, I do not have a .condarc in my $HOME eventhough conda info shows a user config file. I do not want to have a .condarc in my $HOME directory, as I want to keep the conda configuration in the installation base directory. This is a good practice to avoid conflicts with other conda installations or configurations.

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ cat /hpc-home/kaithakg/.condarc
cat: /hpc-home/kaithakg/.condarc: No such file or directory

Check Default Conda Channels

# check the current conda channels, miniforge has conda-forge channel by default
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show channels
channels:
  - conda-forge

Check .condarc From the install base

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ cat ${install_base}/x86_64/.condarc
channels:
  - conda-forge

Add bioconda Channel to Conda Config

# add bioconda channel to the conda config

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config -p ${install_base}/x86_64 --add channels bioconda
(base) $ cat ${install_base}/x86_64/.condarc
channels:
  - bioconda
  - conda-forge

# you can add others - nvidia and pytorch, like so
(base) $ conda config -p ${install_base}/x86_64 --add channels nvidia
(base) $ conda config -p ${install_base}/x86_64 --add channels pytorch

Check Updated Conda Channels


/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show channels
channels:
  - bioconda
  - conda-forge

Great! we have added the bioconda channel to our conda configuration.

Now add the new channel alias for the Pixi package manager

[!IMPORTANT] The Pixi package manager is a new package manager that currently being used to manage packages on the HPC. It is a replacement for the Anaconda channels that were previously used. The Pixi package manager is managed by Prefix, the organisation behind the Pixi package manager. You can find more information about Prefix and the Pixi package manager in the RC Documentation.

# Before adding channel_alias
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show-sources
==> /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc <==
channels:
  - bioconda
  - conda-forge

Add Pixi package manager channel_alias

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ vim ${install_base}/x86_64/.condarc
(base) $ cat ${install_base}/x86_64/.condarc
channels:
  - bioconda
  - conda-forge
channel_alias:
  https://repo.prefix.dev
# After adding channel_alias
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show-sources
==> /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc <==
channel_alias: https://repo.prefix.dev
channels:
  - bioconda
  - conda-forge

We have now added the Pixi package manager channel alias to our conda configuration. This will allow us to install packages from the Pixi package manager. We can see that the channel alias is set to https://repo.prefix.dev, which is the Pixi package manager URL.

# Now check conda info

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda info

     active environment : base
    active env location : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64
            shell level : 1
       user config file : /hpc-home/kaithakg/.condarc
 populated config files : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc
          conda version : 25.3.0
    conda-build version : not installed
         python version : 3.12.10.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=icelake
                          __conda=25.3.0=0
                          __glibc=2.34=0
                          __linux=5.14.0=0
                          __unix=0=0
       base environment : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64  (writable)
      conda av data dir : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.prefix.dev/bioconda/linux-64
                          https://repo.prefix.dev/bioconda/noarch
                          https://repo.prefix.dev/conda-forge/linux-64
                          https://repo.prefix.dev/conda-forge/noarch
          package cache : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/pkgs
                          /hpc-home/kaithakg/.conda/pkgs
       envs directories : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/envs
                          /hpc-home/kaithakg/.conda/envs
               platform : linux-64
             user-agent : conda/25.3.0 requests/2.32.3 CPython/3.12.10 Linux/5.14.0-503.26.1.el9_5.x86_64 almalinux/9.6 glibc/2.34 solver/libmamba conda-libmamba-solver/25.3.0 libmambapy/2.1.1
                UID:GID : 9404:3658
             netrc file : None
           offline mode : False

Install packages using mamba

Now that we have set up the conda environment and added the Pixi package manager channel alias, we can install packages using mamba, which is a faster version of conda.
For this example, we will install some common packages that are often used in bioinformatics.

[!IMPORTANT] If you installing custom software from GitHub for example, this is where you would install all the dependencies for your software. You can generally find all the dependencies in the setup.py file or requirements.txt file or environment.yml file or pypropyproject.toml in the GitHub repository of the software you are trying to install.

It takes about 5 minutes to install the packages listed below.

/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ which mamba
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/mamba
(base) $ mamba install -y git tabulate numpy pandas
...
...
Transaction finished

Create a wrapper script for the conda environment

Create a wrapper script to source the conda environment and set the PATH variable. This will allow you to easily activate the environment without needing to type the full path each time. The wrapper script will be placed in the /ei/software/testing/bin directory. It can also placed in your projects directory, if you have one, or any other directory that is in your PATH.

cd /ei/software/testing/bin
(base) $ cat > python_miniforge-25.3.0-3_py3.12_example
#!/bin/bash
tool="python_miniforge/25.3.0-3_py3.12_example"
location="/ei/software/testing"
echo "${tool} is sourced from ${location} location"
export PATH="${location}/${tool}/x86_64/bin:$PATH"

We can now source this script to activate the conda environment and set the PATH variable. This will allow us to use the installed packages without needing to type the full path each time.

The following section shows how to source the wrappar script and activate the conda environment to install a package from GitHub.

Section 2: Test the miniforge installation to install a package from GitHub

Now that we have set up the conda environment and created a wrapper script, we can test the installation by installing a package from GitHub. This will demonstrate how to use the conda environment for software development and installation without affecting your $HOME directory.

We will install a custom package using the conda environment. This package can be anything you want, but for this example, we will install a simple Python vizgen_data_transfer package, which is a Python wrapper for managing data transfer processes related to Vizgen projects.

This package will be installed in the /ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64 directory, which is a custom location for the package.

This allows you to manage your software installations without cluttering your $HOME directory or interfering with other installations on the HPC. It also allows you to easily share your software with others by simply sharing the installation directory.

Activate the conda environment and install the package from GitHub

# Activate the conda environment
$ source /ei/software/testing/bin/python_miniforge-25.3.0-3_py3.12_example
python_miniforge/25.3.0-3_py3.12_example is sourced from /ei/software/testing location

Clone the GitHub repository and install the package

Here we will clone the vizgen_data_transfer repository from GitHub, create a wheel package, and install it in the custom location we specified earlier. This will allow us to use the package without needing to install it in our $HOME directory or the base conda environment.

mkdir -p /ei/software/testing/vizgen_data_transfer/0.1.0_example/src && \
cd /ei/software/testing/vizgen_data_transfer/0.1.0_example/src && \
git clone https://github.com/EI-CoreBioinformatics/vizgen_data_transfer.git

# Change to the cloned directory and create a wheel package and install it custom location
cd vizgen_data_transfer

# Check the pip location, it should point to the Miniforge installation
which pip
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/pip

version=0.1.0_example && \
pip wheel -w dist . && \
pip install dist/*whl --prefix=/ei/software/testing/vizgen_data_transfer/${version}/x86_64

Create a wrapper script for the vizgen_data_transfer package

As before, we will create a wrapper script to easily source/activate the vizgen_data_transfer package. This wrapper will set the necessary environment variables and paths for the package to work correctly. The script will be placed in the /ei/software/testing/bin directory, similar to the Miniforge wrapper script.

# Create a wrapper script for the vizgen_data_transfer package
cd /ei/software/testing/bin
cat > vizgen_data_transfer-0.1.0_example
#!/bin/bash

source /ei/software/testing/bin/python_miniforge-25.3.0-3_py3.12_example
export PATH=/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/bin:$PATH
export PYTHONPATH=/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/lib/python3.12/site-packages

echo "vizgen_data_transfer/0.1.0_example is sourced from /ei/software/testing location"

Test the vizgen_data_transfer package from a new terminal

Now that we have created the wrapper script for the vizgen_data_transfer package, we can test it by sourcing the script and running the package command.

# Open a new terminal and source the vizgen_data_transfer package
$ source /ei/software/testing/bin/vizgen_data_transfer-0.1.0_example
python_miniforge/25.3.0-3_py3.12_example is sourced from /ei/software/testing location
vizgen_data_transfer/0.1.0_example is sourced from /ei/software/testing location

$ which vizgen_data_transfer
/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/bin/vizgen_data_transfer

$ vizgen_data_transfer -h
usage: vizgen_data_transfer [-h] [--copy_type COPY_TYPE [COPY_TYPE ...]] [--threads THREADS] [--disk] [--vizgen_config VIZGEN_CONFIG] [--debug] run_id

        Script for Vizgen data transfer


positional arguments:
  run_id                Provide run name, for example: 202310261058_VZGEN1_VMSC10202

options:
  -h, --help            show this help message and exit
  --copy_type COPY_TYPE [COPY_TYPE ...]
                        Provide copy type, for example: raw_data, analysis, output (default: ['raw_data', 'analysis', 'output'])
  --threads THREADS     Number of threads to use for copying (default: 8)
  --disk                Enable this option if run has to be copied from the Windows external Hard disk 'G:\Vizgen data Z drive' instead of the default Z: Drive on the analysis machine [default:False]
  --vizgen_config VIZGEN_CONFIG
                        Path to vizgen config file [default:/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/lib/python3.12/site-packages/vizgen_data_transfer/etc/.vizgen_config.toml]
  --debug               Enable this option for debugging [default:False]

Contact: Gemy George Kaithakottil ([email protected])

This confirms that the vizgen_data_transfer package is installed and working correctly.

Summary

In this guide, we have successfully set up a custom conda environment using Miniforge on the HPC. We have installed packages from the Pixi package manager and created a wrapper script to easily activate the environment. Additionally, we demonstrated how to install a package from GitHub and create a wrapper script for it. This setup allows for efficient software development and installation on the HPC without affecting your $HOME directory.