Conda ecosystem (pixi, micromamba, conda, mamba) - danielnachun/dotfiles GitHub Wiki

Why use the conda ecosystem?

There are many options for package managers in the modern era, each of which has its own advantages and disadvantages. This repository is centered around the notion of the Conda ecosystem as a "universal package manager" that can manage all of your packages. It has several features that make it suitable for this role:

It can build packages for any language, and has well-developed infrastructure for many popular languages.
It is platform independent, supporting Linux, macOS and Windows.
It can build both native and platform-independent packages.
Native packages are build as relocatable binaries, so nothing needs to be built from source.
It can manage both command line executables and libraries used by languages like R and Python.
All packages can be installed in an unprivileged environment without administrative access (i.e. sudo).

Package managers

There are several package managers that can install and manage Conda packages. Four of them are in widespread use:

conda - the original package manager written in Python. The original dependency solver was written in Python and is quite slow, but it will soon default to using the same solver as mamba
mamba - developed to improve the speed of dependency resolution in conda. It passes most functions to conda, but replaces the slow Python-based dependency solver in conda with a much faster solver written in C++, derived from libsolv.
micromamba - a full rewrite of mamba in C++ that is deployed as a single statically linked executable. It can serve as a near complete replacement of conda in most use cases.
pixi - a new Conda package manager written in rust as a single, statically linked binary like micromamba. Although it can install Conda packages, its design is more inspired by Cargo and NPM, providing global packages and lockfiles that install environments in specific directories rather than a single folder

Key concepts

There are several concepts that are important to understanding how the Conda ecosystem works.

Package manager - the software that installs, uninstalls and upgrades packages. In the context of this repository, this will be one of pixi, micromamba, mamba or conda.
Package - a piece of software that can be installed by a package manager.
Channel - a collection of Conda packages hosted on anaconda.org. This repository uses the dnachun, conda-forge, and bioconda channels.
Environment - a locally installed collection of packages.
Dependency - a package that another package needs to function.
Dependency conflict - a scenario in which two or more packages need dependencies which cannot be installed in the same environment. The risk of dependency conflicts can be reduced by keeping environments simple, but cannot always be avoided.

Installation

Each of the package managers described above has one or more methods available to install them. However, the configuration in this repository is only guaranteed to function properly if you use the instructions provided in the README, which make sure that the proper Conda package manager is installed and configured correctly.

pixi

Pixi is the future of the Conda ecosystem and will eventually be the sole package manager used by this repository. There are two ways to use pixi: global packages and lockfiles.

Global packages

Global packages are packages with at least one executable in their bin folder that can be run from the command line. Each global packages lives in its own environment, and pixi will create special wrapper scripts that you call instead of the original executable which activate the environment seamlessly. This allows you to use these global packages as though they were not installed in an environment at all, avoiding the need for a base environment or to have to activate a specific environment.

To install a global package, use this command

pixi global install PACKAGE_NAME

You can also install multiple packages at once by listing them

pixi global install PACKAGE1 PACKAGE2

To remove a global package:

pixi global remove PACKAGE

To update all packages:

pixi global update-all PACKAGE

or update a specific package:

pixi global update PACKAGE

By default, pixi will install global package environments to $HOME/.pixi/envs and the wrapper scripts to $HOME/.pixi/bin and uses $XDG_CACHE_DIR/rattler for the package cache. $XDG_CACHE_DIR resolves to $HOME/.cache on Linux and $HOME/Library/Caches on macOS by default. If you are operating in an environment such as an HPC cluster where home folder has limited space, you can install global packages in a different folder and also use a different cache by defining the PIXI_HOME and RATTLER_CACHE_DIR folders like so:

PIXI_HOME=SOME_FOLDER RATTLER_CACHE_DIR=SOME_OTHER_FOLDER pixi global install PACKAGE_NAME

If you receive an error a package could not be found:

Cannot solve the request because of: No candidates were found for PACKAGE_NAME *.

this is usually due to one of two issues:

You have misspelled the package name
The package is not available for the platform you are trying to install it on In the latter case, the best solution is to make a new Conda package for your platform!

If you receive an error that an entrypoint could be found:

No executable entrypoint found in package PACKAGE_NAME, are you sure it exists?

this means the package you are trying to install does not have any executables in its bin folder. This may be because the package is a library that does not provide any executables, or because the package was not built correctly and is missing its executables from the bin folder.

The solution to this problem varies depending on your intended use of the package. If the package is truly just a library-only package, this means you would import it into a language like Python or R, or link to it in a language like C/C++ or Fortran, and you will need to add it to a lockfile or a micromamba environment. If the package is supposed to provide executables, you may find that these are provided in a similarly named package, as it is common in conda-forge in particular to separate executables and libraries. If the package was actually built incorrectly as described above, it would need be to rebuilt.

Adding packages to global environments

Pixi now supports adding packages to a global environment. The syntax is as follows:

pixi global install --environment ENV_NAME PKG1 PKG2 ...

For example, to install r-tidyverse in an existing r-base global environment:

pixi global install --environment r-base r-tidyverse

or to add jupyterlab to an existing python global environment:

pixi global install --environment python jupyterlab

Please note that due to the lack of support for post-link scripts, a work around is currently needed after installing some Bioconductor packages:

find ${HOME}/.pixi/envs/r-base/bin -name '*bioconductor-*-post-link.sh' | \
   xargs -I % bash -c "PREFIX=${HOME}/.pixi/envs/r-base PATH=${HOME}/.pixi/envs/r-base/bin:${PATH} %"

Lockfiles

Lockfiles are modern take on environments heavily inspired by Cargo and the Rust ecosystem. Rather than a globally visible environment like that provided by Conda-based package managers, pixi lockfiles are intended to be placed in the root directory of a source tree where the environment is relevant, although it still possible to access the environment while outside of this source tree.

To create a lockfile, navigate to the folder where your environment will be needed, and initialize it:

pixi init .

You can then begin to add to packages:

pixi add PACKAGE_NAME

By default, the newest version of the package is chosen, but you can specify a different version:

pixi add PACKAGE_NAME=VERSION

To use the environment created by the lockfile, you should usually navigate the source tree where you want to use it. Once in that directory, you can use

pixi run COMMAND

to run any command in environment specified by the lockfile. This is usually the most convenient way to use the environment, but if you need an interactive shell in the environment (similar to activating an environment with Conda package managers), you can use:

pixi shell

Unlike activating environments, pixi shell launches a new shell process. This means that to "deactivate" the environment, you need to use exit to leave the new shell process. This also means that this command is not suitable for use in non-interactive scripts. For simple scripts which only need to run a few lines, you can just use pixi run as described above. However, for long or complex scripts, at the top of your script, you can instead use

eval $(pixi shell-hook --shell SHELL)

where SHELL should usually be bash or zsh. This is equivalent to activating an environment non-interactively with a Conda package manager.

While you should usually navigate to the location of the lockfile to use its environment, all of the above commands also take the --manifest-path argument which allows you to point them to a pixi.toml located anywhere. For example, for pixi run, use:

pixi run --manifest-path PATH_TO_MANIFEST/pixi.toml

This approach should be used sparingly as it somewhat defeats the purpose of pixi, which improves on the concept of environments by linking them specific folders rather than making them global. It is much easier to keep track of what an environment is for when it is associated with a folder.

TODO: add more troubleshooting for lock files

Micromamba

Micromamba is the latest in the line of traditional Conda package managers and provides many advantages over conda and mamba. However, the approach taken by pixi of separating global individual packages from folder-specific environments is significantly easier to use and should be preferred whenever possible. There are still a few niche use cases for micromamba:

A globally accessible environment for interactive work that needs a version of Python older than the latest stable version - some Python libraries do not support the latest stable version of Python and thus cannot be used by the global Python environment. This is not a concern for R, as strict management in CRAN and Bioconductor requires that all packages support the latest stable versions of R. This may also apply to other languages like Perl, Ruby and Lua but these are not widely used for interactive data analyses. When possible, consider updating the library to work with the newest version of Python instead of using an environment like this.
A globally accessible environment for an older version of a CLI tool than the latest stable version - on occasion you may need an older version of a tool. When possible, try to update your setup to use the latest version of the tool instead of depending on an older version.

Managing environments

Micromamba does not require a base environment. Do not create one or install anything in it. Pixi global packages are your replacement for brittle base environments. To create a new environment use:

micromamba create -n ENV_NAME PACKAGE1 PACKAGE2 ...

You do not need to specify all the packages right away - you can always add more later:

micromamba install -n ENV_NAME PACKAGE

Like pixi lockfiles, micromamba environments can be run without activating them:

micromamba run -n ENV_NAME COMMAND

This is the preferred approach for using micromamba environments. However for more complex interactive work or shell scripts, you may need to activate the environment:

micromamba activate ENV_NAME

and deactivate it:

micromamba deactivate

If you need to activate an environment in a shell script, add this line to the top:

. $HOME/micromamba/etc/profile.d/micromamba.sh

To delete an environment, first remove it:

micromamba remove -n ENV_NAME

You should also clean the package cache after removing the environment to remove unused packages:

micromamba clean -ay

To install a micromamba environment in a different folder than ${HOME}/micromamba, specify --root-prefix and --prefix:

micromamba create --root-prefix=SOME_FOLDER/micromamba --prefix=SOME_FOLDER/micromamba/envs/ENV_NAME PKG1 PKG2 ...

All other micromamba commands can be used in the same way, so long as the --root-prefix and --prefix commands are both provided. Note that --root-prefix is needed so that the package cache is not stored in ${HOME}/micromamba.