Prepare excel file - selmling/Analytics-and-Data-Exploration GitHub Wiki

Steps for getting speech transcriptions prepped for analysis:

Find transcription .txt files
- Look through them all for blatent errors / mistakes
- make sure the formatting is consistent across all the files (onset, offset, transcript) you shouldn't have to change anything here
Set up your python environment
- anaconda is the best way to install python for Mac and pc (download and install it before moving on to the next step ) https://www.anaconda.com/distribution/#download-section
- Find your terminal app on your Mac and fire it up (lock it to the dock as well) type in conda install pip and press enter
- Install the python libraries you need by entering in the following commands into the terminal one at a time and pressing enter
  - pip install glob
  - pip install pandas
  - pip install os
Run the txt2xlsx.py python script and check that it worked (look in the converted_files folder) - watch this to figure out how to run python scripts.
Remove the unwanted characters from the transcript columns in excel:
- Command find and replace “...” and “ ’ “ " ’ "(weird apostrophes) “—” and “ ‘ “ all of which are seen as ASCI in python -- only put the character that is inside the double quotes into the replace window within excel
  - Ellipses may require different versions of the ellipses by searching through the document, copying it and pasting into the replace function."…"
- Once it finds those and you delete them, you'll have empty cells - you want to remove those empty cells by highlighting all the cells and then pressing control+G then click Special > Blanks > OK and then right click within one of the cells and click delete and then select shift up (also see this video for further details)
- Save the excel file