Kaldi HPC

2021

Source Code
- galaxy/tools/amp_stt/kaldi_hpc.xml
  Tool configuration detailing tool execution, input file, output file, and labeling.
- galaxy/tools/amp_stt/kaldi_hpc.py
  Python script to create a HPC job and submit it to the dropbox, returning AMP Speech to Text JSON\
Dependencies
- The source code, dependencies, and documentation for building and running Kaldi in a singularity container can be found here: https://github.com/AudiovisualMetadataPlatform/kaldi-pua-singularity. To run in HPC, batch tools for transporting and running batch jobs can be found here: https://github.com/AudiovisualMetadataPlatform/hpc_batch\
Running the tool
- The tool can be invoked from Galaxy UI as other tools. User needs to use Get Data / Upload from computer tool to ingest the input file into Galaxy before running the tool.
  When ingesting, choose binary (the default) as file format. The file then will be copied into a designated location in Galaxy file system. When invoked, the HPC tool will create a job file, place it in the specified dropbox, and submit it to HPC for execution\
Parameters
- $input_audio: the audio file to run the segmentation on.
- $kaldi_transcript_json: JSON result from Kaldi
- $kaldi_transcript_text: Text result from Kaldi
- $amp_transcript: AMP speech to text JSON

Document generated by Confluence on Feb 25, 2025 10:39

Kaldi HPC - AudiovisualMetadataPlatform/amp_documentation GitHub Wiki