Prepare excel file - selmling/Analytics-and-Data-Exploration GitHub Wiki
Steps for getting speech transcriptions prepped for analysis:
-
Find transcription .txt files
-
Look through them all for blatent errors / mistakes
-
make sure the formatting is consistent across all the files (onset, offset, transcript) you shouldn't have to change anything here
-
-
Set up your python environment
-
anaconda is the best way to install python for Mac and pc (download and install it before moving on to the next step ) https://www.anaconda.com/distribution/#download-section
-
Find your terminal app on your Mac and fire it up (lock it to the dock as well) type in
conda install pip
and press enter -
Install the python libraries you need by entering in the following commands into the terminal one at a time and pressing enter
-
pip install glob
-
pip install pandas
-
pip install os
-
-
-
Run the txt2xlsx.py python script and check that it worked (look in the converted_files folder) - watch this to figure out how to run python scripts.
-
Remove the unwanted characters from the transcript columns in excel:
-
Command find and replace “...” and “ ’ “ " ’ "(weird apostrophes) “—” and “ ‘ “ all of which are seen as ASCI in python -- only put the character that is inside the double quotes into the replace window within excel
- Ellipses may require different versions of the ellipses by searching through the document, copying it and pasting into the replace function."…"
-
Once it finds those and you delete them, you'll have empty cells - you want to remove those empty cells by highlighting all the cells and then pressing
control+G
then clickSpecial > Blanks > OK
and then right click within one of the cells and click delete and then selectshift up
(also see this video for further details) -
Save the excel file
-