Progress report 2 - ShelinaGobardhan/FOTD-Portfolio- GitHub Wiki

This project started with creating frequency lists of all the words in the book and in the movie script. We used this code:

tr -sc '[A-Z][a-z]' '[\012*]' < azkabanbook.txt |
sort |
uniq -c > azkabanbook.csv

tr -sc '[A-Z][a-z]' '[\012*]' < azkabanscript.txt |
sort |
uniq -c > azkabanscript.csv

Furthermore, we have to pay attention to the fact that in the movie script the names of the characters that are speaking are announced with a ':', for example Harry: “bla, bla”. That is why we created two movie scripts. One with the stagenames and one without.