Python - theunissenlab/lab-documentation GitHub Wiki

As of November 2020, our workstations are running Ubuntu 16.04 which has python3.5 installed by default. However, python3.5's end-of-life (EOL) has passed (no more security or bug fixes).

Thus, if possible, new projects should start with the latest Python (e.g. python3.9, which should have an EOL date some time in 2025), and existing projects should try to upgrade their python version.

On Ubuntu

Here is how to install python3.9 if is not already installed on your system. This installs 3.9 but does not replace the existing python installation(s) on your system. This is a good thing as many critical programs may require those older version.

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.9 python3.9-dev python3.9-venv

When creating a new python virtualenv for a project, use python3.9 -m venv ENVNAME.

On Windows

If you (god forbid) need to set up a Python/development environment on Windows, this is a page for any tips and tricks.

Install Python

Download and run the installer for the latest version of Python, or whatever version you want to install.

During installation, make sure you have the system at python to your PATH.

Set up a virtual environment

Managing Python packages and dependencies is hard enough already, so you really don't want to screw something up on windows and be unable to fix it. Use a virtual environment (with virtualenv) so that python packages you install are neatly packaged into an environment stored in one directory.

Install virtualenv (one time ever)

> pip install virtualenv

Create an environment (one time per project)

> python -m virtualenv ENV_FOLDER_NAME

Replace ENV_FOLDER_NAME with the name of the environment you want to create (usually env is a good choice)

Activate an environment (every time you start a new command line prompt)

Go to the directory your environment is in and run

> ENV_FOLDER_NAME\Scripts\activate

Installing dependencies

Many scientific dependencies which are trivial to install on Ubuntu or Mac are challenging to install on Windows. For example, numpy, scipy, scikit-learn, pytables, among others will most likely fail.

To install these, go to https://www.lfd.uci.edu/~gohlke/pythonlibs/ (hopefully this site is still active while you're reading this), download the packages for your version of Python. Install the ones you need, starting with the package numpy+mkl which all others depend on.

Install them like this:

> pip install FULL_PATH_TO_.whl_FILE_YOU_DOWNLOADED

UPDATE 2025 September:

Most of our work is done on savio now, which runs rocky linux 8.10. Python is NOT installed by default, you must set up your own environment. The most popular way to do this per their website is to install via conda3.

Standard conda env comes with a variety of packages, although you will still need to install many of them such as torch. Keep in mind different versions will not run on different underlying machine systems, eg GPUs vs CPUs.

Env Setup

You can use the following sbatch script to setup your environment. Typically, you probably would just install the relevant packages in an srun job, or if you're just using the CPUs you can use the scratch node to install the relevant packages in your conda env (after running source activate <your_conda_env>).

Savio Website Guide

#!/bin/bash
#SBATCH --job-name=setup_env
#SBATCH --account=<your_account>
#SBATCH --partition=<relevant_env>
#SBATCH --qos=<mostly for gpus>
#SBATCH --nodes=1
#SBATCH --gres=<relevant GPU/CPU>
#SBATCH --cpus-per-task=<cpus_per_task>
#SBATCH --time=00:10:00

module load anaconda3

# Create or update the environment
conda create --name new_audio_env python=3.11 -y
source activate new_audio_env

# Install correct PyTorch + CUDA + Lightning stack
conda install pytorch=2.2.2 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install -c conda-forge lightning torchmetrics -y

# Optional: pin compatible numpy/scipy if needed
conda install numpy=1.26 scipy -y

# Confirm working GPU access
python -c "import torch; print('Torch version:', torch.__version__); print('CUDA available:', torch.cuda.is_available()); print('CUDA version:', torch.version.cuda); print('Device:', torch.cuda.get_device_name(0))"

Using GPUs

In order to correctly set up GPUs, it's better to install either via separate setup script like above or in srun job. You can't do it from scratch because the environment will download the CPU version of your modules (such as torch), which will be a huge headache to decouple and remove. If this happens, you can always just create another conda env. Here is the guide for installing software. The savio chat bot is also pretty good for 90% of questions.

There's a couple params to keep in mind when using GPUs: CPU:GPU ratio: This is roughly how many CPU cores are available or expected per GPU. E.g. a ratio of 4:1 means there are ~4 CPU cores per GPU on that node (or that the node is provisioned to give about that ratio to GPU‐using jobs). It helps you size your job so you don’t under-utilize CPUs or GPUs.

Below are general steps you should follow to set up and run your GPU workflows, tuned to Savio’s hardware setup. You may need to adapt to your project’s needs (CUDA version, frameworks, etc.).

Choose the Right Partition / Node Type

Pick a GPU partition matching your resource needs. If you need very large GPU memory (unlikely), for example you might choose the nodes with A40s (48 GB), or L40 if available.

Consider the CPU:GPU ratio. If your GPU tasks also require preprocessing or CPU work (data loading, augmentation, etc.), the ratio matters: node with 2:1 may be CPU‐starved, while 8:1 may leave some CPUs idle unless GPU work is heavy.

Load or Install Software Environment

Confirm which CUDA version(s) are installed on the target node type (many GPU nodes already have CUDA, cuDNN, etc.).

Use a module system or environment module (if Savio offers modules) to load GPU‐related tools (e.g. module load cuda, module load ).

Test Small First

Run a small test job: e.g. simple GPU program (like nvidia-smi, or a small ML job) to verify that GPU is accessible and everything works.

Monitor GPU usage to ensure you are using the GPU (and not accidentally running on CPU only or having bottlenecks, e.g. I/O).

Savio Jupyter Notebook

There's a savio jupyter notebook which is also a great way to debug. I haven't checked it out myself personally but it would definitely be very helpful.

Data storage

I'd refer to the documentation for proper data storage. Typical workflow for us involves staging data on global scratch, and using local user scratch for saving code/sbatch jobs.