Infrastructure - til-ai/til-25 GitHub Wiki
This page documents the hardware and software infrastructure on which TIL-AI is built.
Contents
Repo structure
Take a look at the til-ai/til-25
template repository. You'll find a subdirectory for each challenge: asr/
, cv/
, ocr
, and rl/
, each containing:
- A
src/
directory, where your code lives.*_manager.py
, which manages your model. This is where your inference and computation takes place.*_server.py
, which runs a local FastAPI server that talks to the rest of the competition infrastructure.
Dockerfile
, which is used to build your Docker images.requirements.txt
, which lists the dependencies you need to have bundled into your Docker image.README.md
, which contains specifications for the format of each challenge.
You'll also find a final subdirectory, test/
. This contains tools to test and score your model locally.
There are two Git submodules, til-25-finals
and til-25-environment
. finals
contains code that will be pulled into your repo for Semifinals and Finals. environment
contains the til_environment
package, which you'll need to train and test your RL model, and is installed by pip
during setup. Make sure you run git submodule update --init
during the first time setup.
[!WARNING] Don't delete or modify the contents of
til-25-finals/
,til-25-environment/
, or the.gitmodules
file. Doing so would cause issues later when you need to update your repo with code for Semifinals and Finals. You can add submodules without issue.
Software infrastructure
Containerization strategy
Each of your models is run in its own isolated Docker container. The image contains a model manager (which runs your model and code), and a FastAPI server, which provides an interface between your container and the TIL-AI infrastructure.
The FastAPI server exposes a prediction endpoint (like /asr
) on a particular port (like 5001
). When your FastAPI server is started up, the TIL-AI evaluator sends a POST request to your endpoint (like localhost:5001/asr
) containing the test data your model is to evaluate. Your route handler function receives and parses this data, passes it to your model manager to get predictions, and returns the response containing the prediction results.
Here are the endpoints and ports for each challenge:
Challenge | Endpoint | Port |
---|---|---|
ASR | /asr |
5001 |
CV | /cv |
5002 |
OCR | /ocr |
5003 |
RL | /rl |
5004 |
The reason you build separate containers for each challenge (as opposed to having a single server that handles all endpoints) is to make development and testing easier. Because each container is an entirely isolated environment, you can use different Python versions, Python and Linux packages, device drivers, and configurations for each. This can be especially useful because some ML libraries are particular about specific versions of dependencies like numpy
, or device drivers like CUDA. Containerizing your code reduces the likelihood of dependency conflicts.
Evaluation methodology
During Qualifiers, we use Vertex AI online prediction to obtain predictions from your containers for the hidden test data. A Vertex AI Endpoint is created from each submitted container, and HTTP requests are made to that endpoint to trigger predictions.
During Semifinals and Finals, we set up a local competition server that interfaces with all your models in real time.
The interfaces for your servers will remain broadly unchanged throughout Qualifiers to Finals.
Hardware
Below are the hardware specifications of the machines on which your models will run.
Qualifiers | Semifinals and Finals | |
---|---|---|
Machine type | Virtual | Physical |
CPU | Intel CPU with 8 logical CPU cores | Intel Core i7-14700KF with 28 logical CPU cores |
RAM | 30 GB | 32 GB |
GPU | Nvidia Tesla T4 with 16 GB VRAM | Nvidia RTX 5070 Ti with 16 GB VRAM |
Disk space | 350 GB, excluding 150 GB boot disk | 1 TB |
Operating system | Debian 11 | Ubuntu 24.04 LTS |
CUDA Toolkit version | 11.8 | 12.8 |
[!IMPORTANT] During Semifinals and Finals, all your models must run simultaneously on the physical machine, sharing CPU, GPU, and RAM. We recommend against simply choosing the largest possible models for Qualifiers, because they may not be able to run simultaneously on the Finals machines. There's nothing stopping you from swapping to a different model for Semifinals and Finals after qualifying, but you should carefully consider whether that's the best use of your team's resources.
The GPU for Semifinals and Finals is based on the Blackwell architecture, while the GPU provided by the online development environment is based on Turing.
[!WARNING] We urge you to avoid changing low-level configurations or having your code rely on obscure features in specific driver versions, because this increases the risk of something breaking between the online and physical environments, which you might not have enough time to fix.