environments acpt pytorch 1.13 cuda11.7 - Azure/azureml-assets GitHub Wiki
Recommended environment for Deep Learning with PyTorch on Azure containing the Azure ML SDK with the latest compatible versions of Ubuntu, Python, PyTorch, CUDA\RocM, combined with optimizers like ORT Training,+DeepSpeed+MSCCL+ORT MoE and more. The image introduces preview of new fastcheckpointing capability called Nebula. Azure Container Registry:mcr.microsoft.com/azureml/curated/acpt-pytorch-1.13-cuda11.7
Version: 49
PyTorch : 1.13
GPU : Cuda11
OS : Ubuntu20.04
Training
Preview
Python : 3.8
DeepSpeed : 0.8.2
ONNXRuntime : 1.14.1
torch_ORT : 1.14.0
Checkpointing:Nebula : 0.16.2 (Preview)
View in Studio: https://ml.azure.com/registries/azureml/environments/acpt-pytorch-1.13-cuda11.7/version/49
Docker image: mcr.microsoft.com/azureml/curated/acpt-pytorch-1.13-cuda11.7:49
FROM mcr.microsoft.com/aifx/acpt/stable-ubuntu2004-cu117-py38-torch1131:biweekly.202410.2
# Install pip dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt --no-cache-dir
# Inference requirements
COPY --from=mcr.microsoft.com/azureml/o16n-base/python-assets:20230419.v1 /artifacts /var/
RUN /var/requirements/install_system_requirements.sh && \
cp /var/configuration/rsyslog.conf /etc/rsyslog.conf && \
cp /var/configuration/nginx.conf /etc/nginx/sites-available/app && \
ln -sf /etc/nginx/sites-available/app /etc/nginx/sites-enabled/app && \
rm -f /etc/nginx/sites-enabled/default
ENV SVDIR=/var/runit
ENV WORKER_TIMEOUT=400
EXPOSE 5001 8883 8888
# support Deepspeed launcher requirement of passwordless ssh login
RUN apt-get update
RUN apt-get install -y openssh-server openssh-client