Prepare excel file - selmling/Analytics-and-Data-Exploration GitHub Wiki

Steps for getting speech transcriptions prepped for analysis:

  1. Find transcription .txt files

    • Look through them all for blatent errors / mistakes

    • make sure the formatting is consistent across all the files (onset, offset, transcript) you shouldn't have to change anything here

  2. Set up your python environment

    • anaconda is the best way to install python for Mac and pc (download and install it before moving on to the next step ) https://www.anaconda.com/distribution/#download-section

    • Find your terminal app on your Mac and fire it up (lock it to the dock as well) type in conda install pip and press enter

    • Install the python libraries you need by entering in the following commands into the terminal one at a time and pressing enter

      • pip install glob

      • pip install pandas

      • pip install os

  3. Run the txt2xlsx.py python script and check that it worked (look in the converted_files folder) - watch this to figure out how to run python scripts.

  4. Remove the unwanted characters from the transcript columns in excel:

    • Command find and replace “...”  and “ ’ “ " ’ "(weird apostrophes) “—” and “ ‘ “ all of which are seen as ASCI in python -- only put the character that is inside the double quotes into the replace window within excel

      • Ellipses may require different versions of the ellipses by searching through the document, copying it and pasting into the replace function. "…"
    • Once it finds those and you delete them, you'll have empty cells - you want to remove those empty cells by highlighting all the cells and then pressing control+G then click Special > Blanks > OK and then right click within one of the cells and click delete and then select shift up (also see this video for further details)

    • Save the excel file