00 How to auto create tiers and speech segments in praat - lingdb/Sound-Comparisons GitHub Wiki
Introduction
This Praat script, auto-create-tiers-and-speech-segments.praat, performs the following functions:
Detect and segment speech
- Create the default Sound Comparisons TextGrid.
- Automatically detect silences and (what is presumably) speech in elicitation recordings.
- Allow the user to configure speech/silence detection parameters and add additional left and right margins to those that are auto-detected.
- Debug mode to better see what the auto-detect settings are doing and adjust them and the margins accordingly.
- Automatically create segments for speech and label them as such.
- Allow the user to review and manually change the automatic segmentation results, or to start over using new parameters.
Insert glosses
- Auto-detect and open Sound Comparisons index files and automatically insert the glosses they contain into the segments identified as containing speech. (Currently supports Malakula and Brazilian languages).
- Allow the user to modify the automatic gloss insertion results by manually moving boundaries and/or changing whether a given segment is treated as speech or silence, without losing work.
Finalize processing
- Remove debugging and auto-detection tiers, if present.
- Prompt the user to save the newly-created TextGrid file.
If you're feeling exceedingly lucky, you can choose to run all three functions at once and hope everything falls into place. However, greatest flexibility and best results will be had by running each of the three functions separately, while adjusting the script parameters after the first one and doing manual adjustments after the second one.
Installation
Praat
We're going to assume that you have Praat installed in C:\Praat (Windows) or ~/Praat (Linux, and presumably MacOS). If not, it's probably a good idea to move your installation to this directory (adjusting your shortcuts to the program file). We're going to call both of these locations [Praat Directory].
Script
Download the auto-create-tiers-and-speech-segments.praat file from here and save it in [Praat Directory]. From this directory, you can manually open and run the script through Praat, but this is clumsy and inefficient, so we're going to add a pair of dynamic buttons to the Praat Objects window.
- In the Objects window, open the script in Praat:
Praat|Open Praat script.... - In the Script window, select
File|Add to dynamic menu.... - In the Add to dynamic menu window, type
Soundin the Class 1 field,1in the Number 1: field, andSndCmp: AutoCreatein the Command: field. Leave all other fields with their default values. Click onOKone the window looks as follows:
- Now repeat steps 1. and 2.
- In the Add to dynamic menu window, type
TextGridin the Class 1 field,1in the Number 1: field, andSndCmp: AutoCreatein the Command: field. Leave all other fields with their default values and click onOK.
Now, when you select either a Sound or TextGrid object in the Objects Window, you'll have a new button called SndCmp: AutoCreate in the dynamic menu on the right, as seen in this screenshot (note that the button name is slightly different here):

Index files
The Insert glosses function reads glosses from index files (currently Brazil_index.txt and Malakula_index.txt; you can find them here).
Download these files to [Praat Directory]. By default, the script is set to look for them in C:\Praat (for Windows systems), but this can be easily changed by editing the script (which makes the change permanent) or typing the location into the Index file path field of the script's interface.
Usage: Detect and segment speech
The first step in using the script is to create the default TextGrid and auto-detect speech, optionally adding margins to the left and/or right of the auto-detected speech intervals.
Basic usage
- Open an audio file as a
Soundobject in the Objects window (Open|Read from file...). Note that the script will not work with audio opened as a LongSound object. - Make sure the recently opened
Soundobject is selected. - Click on the
SndCmp: AutoCreatedynamic button at the bottom left of the Objects window. - The script's interface will now appear, as seen in the following screenshot:

- Select the Detect and segment speech option.
- Click on
OK. - You will now have a
TextGridobject in the Objects window. Select both theSoundandTextGridobjects, and then click on theView & Editbutton at the top right.
Now you can review the automatic segmentation. Segments identified as having speech contain s in the adjustments tier (~ADJUST~), while segments identified as containing silence have empty intervals.
At this stage, it's recommended that you not make any manual adjustments -- rather, you should check to see whether the default values used by the script have produced accurate auto-segmentation in most cases. If so, then proceed to the Insert glosses section where you'll learn how to make manual adjustments. If not, continue on to Advanced options so you can try to get better results with the auto-detection algorithm.
Advanced options
The advanced options of the Detect and segment speech function allow you to customize the parameters used by the speech auto-detection algorithm. You'll almost certainly never get 100% accuracy, but that's to be expected. The idea is to adjust these parameters in order to get high enough accuracy to reduce the manual adjustments you'll have to make to a minimum.
Note that each time you run this function, the previous TextGrid is erased and a new one is generated.
Silence threshold (dB)
This value determines what the algorithm considers to be silence. Remember that we're talking about linguistic silence here, rather than acoustic silence, so for our purposes background noise counts as silence.
This value must always be a negative number!
This is not an absolute number, but rather the difference in dB between the peak intensity in the audio file (the loudest sound it contains) and what will be considered silence. We could express this as Silence = Peak_Intensity - Silence_Threshold (though technically we're not subtracting the silence threshold but rather adding a negative value for it).
Min silence interval duration (ms)
This is the minimum length, in milliseconds, that is required for a section of audio to be classified as silence. If you are seeing many words being broken up into two where there are stops or affricates (or silence for any other reason), you might consider increasing this value.
However, if words are spaced closely together (such as in a very fast elicitation session), a value here that is too high may cause two words to be interpreted as one.
Min speech interval duration (ms)
This is the minimum length, in milliseconds, that is required for a section of audio to be classified as speech.
Additional left margin (ms)
This sets the additional left margin that is added to the auto-detected speech intervals. The auto-segmentation algorithm tends to produce extremely tight margins, often cutting off the very beginning and end of sounds. This allows you to fix that.
Note that for playback purposes (to avoid speaker popping), it's a good idea to have at least a 100 ms left margin.
Additional right margin (ms)
This sets the additional right margin that is added to the auto-detected speech intervals. The auto-segmentation algorithm tends to produce extremely tight margins, often cutting off the very beginning and end of sounds. This allows you to fix that.
You'll probably want a shorter right margin than left margin.
Debug segmentation
This option shows the raw auto-detected speech and silence intervals (without added margins) in a tier called ~AUTO-DET~, allowing you to see the effects of the additional margin settings.
Note that if you add too large of a margin, you may end up producing overlapping intervals. This option will make it apparent if this happens.
Usage: Insert glosses
This function takes the TextGrid created by the Detect and segment speech function and inserts glosses from an index file into intervals detected as containing speech. These intervals are marked with s in the ~AUTO-DET~ tier. The glosses are inserted into the Rfc-Form tier.
After using this function once, open the TextGrid, make any manual adjustments you see fit, and then run the function again. The Insert glosses function does not overwrite any changes you make to intervals or speech markers (s).
You can repeat this process as much as you like. Once you're satisfied with the state of the TextGrid, run the Finalize processing function.
Basic usage
- Select the
TextGridin Praat's Objects window (be careful not to select theSoundobject). - Click the
SndCmp: AutoCreatebutton. - Select the Insert glosses option.
- Select an option from the
Type of elicitationmenu (see below). - If the location of the index files is not correct in the
Index file pathfield, correct it. - Click on
OK. - Aside from an informational message in the Praat Info Window, there will be no visible changes.
- Select the
TextGridandSoundobjects together and then click on theView & Editbutton.
You will now see a TextGrid with 14 or 15 tiers, depending on whether you selected the Debug segmentation option when running the Detect and segment speech function (which creates the ~AUTO-DET~ tier).
The two tiers that are relevant to the Insert glosses function are ~ADJUST~, where segments believed to contain speech are marked with s, and Rfc-Form, where glosses are inserted from the automatically selected index file. Below is an example:

Now you can move on to manually adjusting the segmentation and speech identification (see below).
Advanced options
The default Insert glosses setting (Normal) assumes that your elicitation session (or the edited recording thereof) consists of one repetition of each item in the index file, like this: one | two | three | four | five.
However, there are four other options for automatically inserting glosses:
Doubled words (all). This is for when each item in the index file was elicited twice in a row, like this:
one | one | two | two | three | three | four | four | five | five
Doubled words (suppress first word). This is for when each item in the index file was elicited twice in a row, but you only want to label the second one. Its output looks like this:
_____ | one | _____ | two | _____ | three | _____ | four | _____ | five
Doubled words (suppress second word). This is for when each item in the index file was elicited twice in a row, but you only want to label the first one. The output looks like this:
one | _____ | two | _____ | three | _____ | four | _____ | five | _____
List applied twice. This is for when you go through the list once and then repeat it a second time, as follows:
one | two | three | four | five | one | two | three | four | five
Manual adjustments
Segmentation
Manually adjusting the segmentation mainly involves moving the blue segment boundaries, which is done by holding the SHIFT key and dragging the boundaries to the left or right with the mouse (if you don't hold the SHIFT key, you only move the boundary in one tier, which should never be done).
It can also be done by adding segment boundaries. This should always be done by pressing CTRL-F9 in order to create the boundaries at the exact same point in time in all tiers.
There are three main cases where you'll want to manually adjust the segmentation.
To correct incorrectly auto-detected speech-silence boundaries.
This is simply a matter of moving one or more segment boundaries to a more appropriate location.
To deal with single words that were detected as two (in conjunction with removing the s from one of the two segments).
In the following image, the word "quatro" has an extremely long pause between the two syllables, and as a result was detected as two words, "quatro" and "cinco".

In order to fix this, do the following:
- Delete one of the two instances of
sin the~ADJUST~tier. In this case we will delete the one over "cinco", but it doesn't matter which one you remove. Also, note that it's not necessary to delete the word in theRfc-Formtier -- it will be automatically removed when you re-run the Insert glosses function. - Move the two segment boundaries that contain "cinco" to the right, so they're located after the end of the word "quatro".
- Move the right boundary of the segment that contains "quatro" until it's past the end of the word. At this point the
TextGridshould look like this:
- Once all manual adjustments have been made, re-run the script's Insert glosses function. The results will look like this:

To insert a segment for a word that was not detected.
It's possible that one or more words will not be detected by the automatic algorithm. This is especially likely to occur when significant background noise forces you to use a small value for the Silence threshold (meaning closer to 0, so -20 dB is smaller than -40 dB).
When this happens, the solution is to simply create a new interval for the undetected word, mark it as speech, and re-run the Insert glosses function of the script. The steps are as follows:
- Select a point about 100 ms before the beginning of the word.
- Press
CTRL-F9. This will insert a segment boundary in all tiers at the position of the cursor. - Do the same after the end of the word.
- Type an
sin the~AUTO-DET~tier of the interval you just created. - Re-run the Insert glosses function of the script.
Speech identification
Manually adjusting speech identification is done by adding or removing the character s in the ~AUTO-DET~ tier.
In the previous example, we saw one way to use the removal of s (along with moving segment boundaries) to manually fix the identification of speech.
Another use of this technique is for when background noise is incorrectly identified as speech. To remedy this, simply delete the s from the ~AUTO-DET~ tier.
Finally, if a segment contains speech but is identified incorrectly as silence, inserting s will fix it.
Usage: Finalize processing
This function is used when you're satisfied with the results of your work and wish to save them. It simply removes extra tiers and prompts the user to save the TextGrid is a folder of their choice.