Recording model performance with Weights and Biases - dkoes/docs GitHub Wiki

Whether you're training your model locally or on the cluster, it's important to be able to monitor its progress and performance over time. A recently developed tool called Weights & Biases represents a solution that automatically generates arbitrary loss/metric plots and is accessible on the web.

Logging Metrics

To record metrics, follow the documentation here.

To visualize chemical data

Weights and Biases allows the recording of arbitrary 3D object files. While training a model, if you'd like to visualize some chemical data (i.e. PDB file), you can use PyMol's Python API to load the PDB file and then export it to a 3D object file (i.e. .gltf).

The preferred 3D object file format is GLTF. In order to export a PyMol session to GLTF, we must install the alpha version of Pymol from Schrodinger, along with a library they've ported that helps with the conversion. To install this version of PyMol with conda, do the following:

conda install -c schrodinger/label/alpha pymol --force-reinstall
conda install -c schrodinger collada2gltf

Now that you have the prerequisites, you can record during training any filetype that is loadable by PyMol. Given a PDB file saved at pdb_path:

import pymol
import wandb

# Load the PDB file into PyMol and convert it to gltf
pymol.cmd.load(pdb_path, "some_label")
pymol.cmd.save(pdb_path.replace("pdb", "gltf"), quiet=True)
pymol.cmd.delete("all") # clears pymol session

# Log the 3D object file with weights and biases
wandb.log({"structure": wandb.Object3D(pdb_path.replace("pdb", "gltf"))})

Running a sweep

First create your project. After the project is created navigate to the sweeps tab on the sidebar and click on the create sweep button.

This will bring up a page where you will be able to create a YAML file to control the parameters of the sweep. After setting the sweep to your specifications, click the "Initialize Sweep" button. This will bring you to the sweep overview page, from which you can copy the "Launch agent" command. It will look something like:

wandb agent <username>/<projectname>/<sweepname>

Running the command as is will launch 1 job that will run the sweep in serial. Thankfully, we can launch jobs in parallel to get through the sweep faster. Below is a sample slurm submission script that is compatible with launching multiple agents.

#!/bin/bash
#SBATCH -J Sweep
#SBATCH -t 672:00:00
#SBATCH -x g012,g013
#SBATCH -N 1
#SBATCH -p dept_gpu
#SBATCH --gres=gpu:1
#SBATCH --ntasks=1
export PATH=/net/pulsar/home/koes/dkoes/local/bin:$PATH
export LD_LIBRARY_PATH=/net/pulsar/home/koes/dkoes/local/lib:$LD_LIBRARY_PATH
export PYTHONPATH=/net/pulsar/home/koes/dkoes/local/python:$PYTHONPATH
which python3
module load cuda/11.1
echo Running on `hostname`
echo ld_library_path $LD_LIBRARY_PATH
echo pythonpath $PYTHONPATH
echo path $PATH
echo pwd `pwd` 
#logging into weights and biases 
wandb login <my personal wandb login code> 

#running the pKa regression sweep
wandb agent --count 1 <my complete sweep name>

exit

You then can submit an array job using the slurm script, where each job in the array will launch an agent for your sweep.

sbatch --array=1-<end number>%30 <script name>
⚠️ **GitHub.com Fallback** ⚠️