Progress report Dialects - ShelinaGobardhan/FOTD-Portfolio- GitHub Wiki

Some characters talk in a certain dialect like Rubeus Hagrid and Stan Shunpike (the conductor of the Knight bus), but how is this portrayed in the book and the movie (script)?

Chosen utterances of dialect:

A. Obvious utterances of Stan Shunpike's dialect:

  1. gerunds and words like “something” ending in "-in'" vs. "-ing";
  2. omission of the 'd' in “and”;
  3. omission of the 'h' in words beginning with an 'h';
  4. pronunciation of 'th' as 'f(f)' or 'v';
  5. contraction of word combinations like "'Choo" vs. "What did you" and "dincha" vs. "didn't you".

B. Obvious utterances of Rubeus Hagrid's dialect:

  1. gerunds and words like “something” ending in "-in'" vs. "-ing";
  2. omission of the 'd' in words like “and”;
  3. "ter" vs. "to";
  4. “yeh” vs. “you;
  5. omission of the 't' in words ending with a 't'.

We have chosen the utterances with the highest frequency, for statistical reasons. These are:

– -in';
– an';
– ter;
– yeh.

"-ing" without 1-syllable words:

With the first code we made all words ending in "-ing" were counted, but in dialect only words ending in "-ing" with two or more syllables loose their 'g'. In the little files with our first results for dialect occurrences we quickly could see that in each file one word should not have been counted, but the other files were too big. Because of that we decided to count all 1-syllable words in the files with our first results for standard occurrences in book and script:

tr -sc '[A-Z][a-z]' '[\012*]' < azkabanscript_standard_gerund.csv |
grep -i '^[^aeiou][aeiou][^aeiou]$' > azkabanscript_standard_gerund_1syll.csv
wc -l azkabanscript_standard_gerund_1syll.csv #counts all occurrences of 1-syllable words ending in "-ing" in script

This resulted in files containing not only 1-syllable words, but 2-syllable words with 'y' in it too. The solution was to add the 'y' to the list of vocals:

grep -i '^[^aeiouy][aeiouy][^aeiouy]$'

With this corrected code only 1-syllable words were counted, and this number could be subtracted from the original number to achieve the right number of words ending in "-ing" that loose their 'g' in dialect.

Results:

Utterance   Character(s)      Book number   Script number   Book -> script % 

in'         Stan and Hagrid            68               7                10%  
ing (all words)                      4356             701                16%   
ing (1-syllable words)                124              20
ing (words with 2 or more syllables) 4232             681                16%

an'         Stan and Hagrid            52              11                21%  
and                                  2677             465                17%  

ter         Hagrid                     50               9                18%  
to                                   2603             493                19%  

yeh         Hagrid                     59              25                42%  
you                                  1425             370                26%  

Conclusion:

With the command wc -w we have counted the number of all words in the book, in the script, and in the script without stagenames:

book   108052 words   100%
script  25946 words    24%
script2  9808 words    10%

This means that the number of words has decreased from 100% to 24% (script with stagenames) or 10% (script without stagenames).

In our subject 'Characters' we found that the occurrences of Stan and Hagrid had decreased in this way:

Character   Occurrences in book   Occurrences in script2   Book -> script %   
Stan                         52                        7                14%  
Hagrid                      265                       53                20%    

All percentages in the table with results of the chosen utterances are within this range of 'decreasing-percentages' (of numbers of words and occurrences of characters) from book to script except the percentages of "yeh" vs. "you": the utterance "yeh" is far more used in the script than one would expect.

An explanation of this result could be that it is a simple word to use by an actor to give an idea of speaking dialect.