Known issues - til-ai/til-25 GitHub Wiki

Known issues and how to work around them.

Contents

Template Code

"ImportError: attempted relative import with no known parent package" when running CV container

Line 12 of til-25/cv/cv_server.py

from .cv_manager import CVManager

This would work if the package was run as a module, but for some reason doesn't work out of the box with uvicorn. For a working version, replace that line with:

from cv_manager import CVManager

Vertex AI Workbench

Data/team directories are missing on startup

On startup, the directories in /home/jupyter are sometimes missing. Don't panic, your data is not lost! The data in those directories aren't stored in your local filesystem, but are instead Google Cloud Storage buckets mounted using gcsfuse. As such, they just need to be recreated and remounted:

mkdir $TEAM_TRACK
mkdir $TEAM_NAME
sudo mount $TEAM_TRACK
sudo mount $TEAM_NAME

Unresponsive or 502 Error on JupyterLab instance

Sometimes, when you click on Open JupyterLab, the JupyterLab environment fails to load, instead displaying "502. That's an error. That's all we know." Alternatively, the JupyterLab instance can sometimes become unresponsive. This is generally due to a transient network issue of some kind, and is usually fixed by forcing a hard browser refresh (Ctrl-Shift-R on Windows or Cmd-Shift-R on Mac) on your browser page.

NVIDIA driver not recognized

Sometimes, the NVIDIA drivers stop being recognized on the instance. You can see this when you run the nvidia-smi command.

Try re-installing the CUDA drivers.

sudo apt-get purge nvidia-*
sudo apt-get update
sudo apt-get autoremove

Stop and start the instance.

wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-debian10-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo dpkg -i cuda-repo-debian10-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo cp /var/cuda-repo-debian10-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
sudo apt-get install -y cuda-drivers

At the last step (re-installing CUDA drivers), select "yes" for both prompts.

If the issue persists, ping @tech from your team's private Discord channel.

See also:

Dependencies

transformers incompatibility with torch_xla

Versions of the Hugging Face transformers library >4.37.0 may not work due to incompatibility with the torch_xla library on GCP instances. Unless you have a specific need otherwise, we recommend transformers==4.37.0, if you intend on using it.

pettingzoo incompatibility with python>=3.13

We pin the PettingZoo version to version 1.25.0 in til-25-environment, which is the newest version of PettingZoo released. However, it is not formally supported for python>=3.13, and will likely throw an error during installation. It is thus advised to be using Python 3.12 for your RL training and inference.