Model Training - sabflik/MPAi GitHub Wiki

Model Training

Developer's Note: The model and scripts that it uses are the subject of a separate project, and may change between versions of MPAi. The findings here are not intended to be comprehensive, but to record our findings and process for potential future developers. These notes are adapted from those left by the developer.

This section is for advanced developers wishing to modify the HTK functionality. For most MPAi use cases, the HTK engine C# class will perform the needed tasks.

The HTK resources are not immediately obvious to the inexperienced developer. The author of the HTK tool MPAi uses left a series of scripts that automate the tasks MPAi needs and preventing future developers needing an advanced knowledge of HTK. Unless you have experience with the HTK tools, Batch, and Perl programming, it is not advised to change these scripts.

Folder Layout

The folder structure of the HTK folder is important to the way the batch files operate - certain files are expected to be in certain locations, and changing the locations can cause errors.

Original (They are necessary and should be there by default)

  • HTK/Batches/ - contains all the batch files, which can help you automatically train a language model.
  • HTK/Dictionaries/ - contains lexicon.txt, which defines all the words in your dictionary as well as their pronunciation. These words need to be sorted alphabetically.
  • HTK/Grammars/ - contains the Grammar.gram file to let you define the grammar for recognition.
  • HTK/HMMs/ - contains the language models you have trained, initially only have hmm0 and hmm4.
  • HTK/Params/ - contains the parameters you will use in the training process.
  • HTK/Perls/ - contains all the Perl scripts.
  • HTK/Tools/ - contains all of the tools compiled from HTK.

Created (They are automatically created during the training process)

  • HTK/Evaluations/ - contains the results of language model evaluations.
  • HTK/MFCs/ - contains all of the MFC files, which have the HTK-readable information of an audio recording.
  • HTK/MLFs/ - contains all of the MLF files, which have the important output from HTK.

Usage

The HTK/Batches files are the means by which we interact with the HTK model. They handle the passing of values between the different HTK tools, and the file input and output. Many appear to be for the purposes of testing and were not used during our project.

Retraining

This HTK folder is generated for MPAi specifically. To create your own model, or retrain the existing model, you will need 3 things:

  1. Audio recordings of the new words, named using the MPAi conventions. (<category>-word-<word>-<label>.wav - for example oldfemale-word-hau-R0001M.wav) These all must be in a single folder.
  2. A lexicon.txt file with the new words. Look at the current file for the format. It must be in the same location as the current file.
  3. A new grammar.gram file with the new words. As with the lexicon, see the existing file for the format.

With these, run DataPreparer.bat to convert these files to a machine-readable format. This will generate the remainder of the files in the HTK/Dicts folder. Next, run HMMsGenerater.bat to generate the underlying recognition model. This will be stored in the HTK/HMMs/hmmX folders. After these scripts are run, the data model is retrained and ready to use.

Analysis

Analysis of text files is done one of two batch files. ModelEvaluater.bat is used by the program to return a result, writing to the RecMLF.mlf file as output and terminating. ModelEvaluaterModified.bat was created for testing purposes and outputs the results to the command line, then waits for user input before terminating. This allows HTK to be run and used independent of MPAi and has been left for the next developer.

Other Scripts

The developer also left some utility scripts - his notes on these are left below.

‘Livetest.bat’ is the batch file used for live recognition, just double-click it and start the live testing of the created HMMs.

‘RecordingRenamer.bat’ is the batch file that renames the old recording as follows: R001M → oldfemale-word-hau-R001M. You can change the mapping between old and new names by editing the hex table in the relevant perl script.

⚠️ **GitHub.com Fallback** ⚠️