NeMo Curator - DrAlzahraniProjects/csusb_fall2024_cse6550_team4 GitHub Wiki
NeMo Curator
NeMo Curator is part of NVIDIA’s NeMo framework designed for managing and deploying neural network models for tasks like automatic speech recognition (ASR). This guide provides steps to set up NeMo Curator in a Miniconda-based Docker environment, aligned with a typical AI project workflow.
Table of Contents
Installation
Step 1: Base Image We use Miniconda3 as the base image for managing Python environments:
FROM continuumio/miniconda3
Step 2: Install System Dependencies System dependencies like wget are installed for convenience:
RUN apt-get update && apt-get install -y wget && apt-get clean
Step 3: Set up Conda Environment Conda is updated and configured to work with Mamba for faster package management, and a new environment team4_env is created with Python 3.11:
RUN conda update -n base conda -y
RUN conda config --add channels defaults && conda config --add channels conda-forge
RUN conda install -c conda-forge mamba -y
RUN mamba create -n team4_env python=3.11 -y
Step 4: Install NeMo Install NeMo in the created Conda environment:
RUN source activate team4_env && mamba install --yes nemo_toolkit[all] && mamba clean --all -f -y
You can also install other necessary packages from requirements.txt:
COPY requirements.txt /app/requirements.txt
RUN source activate team4_env && mamba install --yes --file /app/requirements.txt
Configuration
Step 1: Environment Variables Define environment variables to specify the data path and the configuration file for NeMo Curator:
ENV NEMO_DATA_PATH=/data
ENV CURATOR_CONFIG=/app/curator_config.yaml
Step 2: Configuration File Create and copy the NeMo Curator configuration file (curator_config.yaml) into the container:
COPY curator_config.yaml /app/curator_config.yaml
Example configuration file (curator_config.yaml):
model:
type: "asr"
pretrained_model: "QuartzNet15x5"
data:
path: "/data/audio_files"
Implementation
Step 1: Running NeMo Curator To run NeMo Curator for ASR tasks, execute the following command inside the container:
python -m nemo.collections.asr.models.automatic_speech_recognition --config-file /app/curator_config.yaml
Step 2: Using NeMo Curator in Your Application In your Python application, load the pre-trained ASR model using NeMo Curator:
import nemo.collections.asr as nemo_asr
# Load a pre-trained ASR model
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="QuartzNet15x5")
# Transcribe an audio file
transcription = asr_model.transcribe(paths2audio_files=['/data/audio_files/sample.wav'])
print(transcription)
Usage
Performing ASR Inference Once NeMo Curator is set up, you can run ASR inference on audio data:
audio_file = '/data/audio_files/sample.wav'
transcription = asr_model.transcribe(paths2audio_files=[audio_file])
print(f"Transcription: {transcription}")
Troubleshooting
Step 1: Installation Issues Check if NeMo is correctly installed and the environment is set up properly. You can verify NeMo installation using:
docker exec -it <container_name> /bin/bash
python -c "import nemo"
Step 2: Configuration Problems Ensure that the configuration file (curator_config.yaml) is correctly formatted and paths like NEMO_DATA_PATH are valid.
Step 3: Logs and Debugging To debug any issues with the model loading or inference, inspect the container logs:
docker logs <container_name>
If the model doesn't load or transcriptions fail, check if the paths for the audio files exist and are accessible.