Progress report Dialects - ShelinaGobardhan/FOTD-Portfolio- GitHub Wiki
Some characters talk in a certain dialect like Rubeus Hagrid and Stan Shunpike (the conductor of the Knight bus), but how is this portrayed in the book and the movie (script)?
Chosen utterances of dialect:
A. Obvious utterances of Stan Shunpike's dialect:
- gerunds and words like “something” ending in "-in'" vs. "-ing";
- omission of the 'd' in “and”;
- omission of the 'h' in words beginning with an 'h';
- pronunciation of 'th' as 'f(f)' or 'v';
- contraction of word combinations like "'Choo" vs. "What did you" and "dincha" vs. "didn't you".
B. Obvious utterances of Rubeus Hagrid's dialect:
- gerunds and words like “something” ending in "-in'" vs. "-ing";
- omission of the 'd' in words like “and”;
- "ter" vs. "to";
- “yeh” vs. “you;
- omission of the 't' in words ending with a 't'.
We have chosen the utterances with the highest frequency, for statistical reasons. These are:
– -in';
– an';
– ter;
– yeh.
"-ing" without 1-syllable words:
With the first code we made all words ending in "-ing" were counted, but in dialect only words ending in "-ing" with two or more syllables loose their 'g'. In the little files with our first results for dialect occurrences we quickly could see that in each file one word should not have been counted, but the other files were too big. Because of that we decided to count all 1-syllable words in the files with our first results for standard occurrences in book and script:
tr -sc '[A-Z][a-z]' '[\012*]' < azkabanscript_standard_gerund.csv |
grep -i '^[^aeiou][aeiou][^aeiou]$' > azkabanscript_standard_gerund_1syll.csv
wc -l azkabanscript_standard_gerund_1syll.csv #counts all occurrences of 1-syllable words ending in "-ing" in script
This resulted in files containing not only 1-syllable words, but 2-syllable words with 'y' in it too. The solution was to add the 'y' to the list of vocals:
grep -i '^[^aeiouy][aeiouy][^aeiouy]$'
With this corrected code only 1-syllable words were counted, and this number could be subtracted from the original number to achieve the right number of words ending in "-ing" that loose their 'g' in dialect.
Results:
Utterance Character(s) Book number Script number Book -> script %
in' Stan and Hagrid 68 7 10%
ing (all words) 4356 701 16%
ing (1-syllable words) 124 20
ing (words with 2 or more syllables) 4232 681 16%
an' Stan and Hagrid 52 11 21%
and 2677 465 17%
ter Hagrid 50 9 18%
to 2603 493 19%
yeh Hagrid 59 25 42%
you 1425 370 26%
Conclusion:
With the command wc -w we have counted the number of all words in the book, in the script, and in the script without stagenames:
book 108052 words 100%
script 25946 words 24%
script2 9808 words 10%
This means that the number of words has decreased from 100% to 24% (script with stagenames) or 10% (script without stagenames).
In our subject 'Characters' we found that the occurrences of Stan and Hagrid had decreased in this way:
Character Occurrences in book Occurrences in script2 Book -> script %
Stan 52 7 14%
Hagrid 265 53 20%
All percentages in the table with results of the chosen utterances are within this range of 'decreasing-percentages' (of numbers of words and occurrences of characters) from book to script except the percentages of "yeh" vs. "you": the utterance "yeh" is far more used in the script than one would expect.
An explanation of this result could be that it is a simple word to use by an actor to give an idea of speaking dialect.