Progress report Characters - ShelinaGobardhan/FOTD-Portfolio- GitHub Wiki
Book vs. Movie: Character appearances
Getting the script to work: For the purpose of comparing the appearance of characters in the movie adaptation to their appearance in the book a listing of characters was made. In this list only the characters that had a given name were included. In other words, characters such as “Unidentified Muggle woman mentioned in Daily Prophet” and “Sleeping wizard on the Knight Bus” were excluded from the list. In the list the first names were separated from the surnames, because at times some characters are referred to by their last name. However after using a longer list as input for the code the output became quite unclear:
cat azkabanscript.word_freq.csv azkabanbook.word_freq.csv | grep -E 'Harry|Potter|Bathilda|Bagshot|Wendelin|Vernon|Dursley|Petunia|Dudley|Ron|Weasley|Hermione|Granger|Hedwig|Lily|James|Errol|Arthur|Bill|Molly|Scabbers|Ginny|Percy|Fred|George|Hagrid|Minvera|McGonagall|Sirius|Black|Marge|Ripper|Fubster|Stan|Shunpike|Neville|Longbottom|Ernie|Prang|Marsh|Cornelius|Fudge|Minister|Voldemort|Tom|Florean|Fortescue|Malkin|Cassandra|Vablatsky|Seamus|Finnigan|Dean|Thomas|Crookshanks|Penelope|Clearwater|Remus|Lupin|Draco|Malfoy|Vincent|Crabbe|Gregory|Goyle|Poppy|Pomfrey|Albus|Dumbledore|Professor|Kettleburn|Pansy|Parkinson|Cadogan|Fang|Sibyll|Trelawney|Buckbeak|Parvati|Patil|Lavender|Brown|Colin|Creevey|Argus|Filch|Hannah|Abbott|Marcus|Flint|Cedric|Diggory|Davey|Gudgeon|Moony|Wormtail|Padfoot|Prongs|Rosmerta|Peter|Pettigrew|Derek|Vector|Ernie|MacMillian|Cho|Chang|Davies|Warrington|Montague|Derrick|Bole|Walden|Macnair' | column -t
Black 76
Brown 2
Buckbeak 54
Cadogan 5
...
One could no longer tell which results were from the script (with name indication), which were from the book and looking up certain names to compare them took too much time because the lists were put below each other and couldn't be viewed next to each other. So to make the script easier to use a few additional changes were made:
cat azkabanscript.word_freq.csv azkabanbook.word_freq.csv | grep -E 'Harry|Potter|Bathilda|Bagshot|Wendelin|Vernon|Dursley|Petunia|Dudley|Ron|Weasley|Hermione|Granger|Hedwig|Lily|James|Errol|Arthur|Bill|Molly|Scabbers|Ginny|Percy|Fred|George|Hagrid|Minvera|McGonagall|Sirius|Black|Marge|Ripper|Fubster|Stan|Shunpike|Neville|Longbottom|Ernie|Prang|Marsh|Cornelius|Fudge|Minister|Voldemort|Tom|Florean|Fortescue|Malkin|Cassandra|Vablatsky|Seamus|Finnigan|Dean|Thomas|Crookshanks|Penelope|Clearwater|Remus|Lupin|Draco|Malfoy|Vincent|Crabbe|Gregory|Goyle|Poppy|Pomfrey|Albus|Dumbledore|Professor|Kettleburn|Pansy|Parkinson|Cadogan|Fang|Sibyll|Trelawney|Buckbeak|Parvati|Patil|Lavender|Brown|Colin|Creevey|Argus|Filch|Hannah|Abbott|Marcus|Flint|Cedric|Diggory|Davey|Gudgeon|Moony|Wormtail|Padfoot|Prongs|Rosmerta|Peter|Pettigrew|Derek|Vector|Ernie|MacMillian|Cho|Chang|Davies|Warrington|Montague|Derrick|Bole|Walden|Macnair' | column –t
cat azkabanbook.word_freq.csv | grep -E 'Harry|Potter|Bathilda|Bagshot|Wendelin|Vernon|Dursley|Petunia|Dudley|Ron|Weasley|Hermione|Granger|Hedwig|Lily|James|Errol|Arthur|Bill|Molly|Scabbers|Ginny|Percy|Fred|George|Hagrid|Minvera|McGonagall|Sirius|Black|Marge|Ripper|Fubster|Stan|Shunpike|Neville|Longbottom|Ernie|Prang|Marsh|Cornelius|Fudge|Minister|Voldemort|Tom|Florean|Fortescue|Malkin|Cassandra|Vablatsky|Seamus|Finnigan|Dean|Thomas|Crookshanks|Penelope|Clearwater|Remus|Lupin|Draco|Malfoy|Vincent|Crabbe|Gregory|Goyle|Poppy|Pomfrey|Albus|Dumbledore|Professor|Kettleburn|Pansy|Parkinson|Cadogan|Fang|Sibyll|Trelawney|Buckbeak|Parvati|Patil|Lavender|Brown|Colin|Creevey|Argus|Filch|Hannah|Abbott|Marcus|Flint|Cedric|Diggory|Davey|Gudgeon|Moony|Wormtail|Padfoot|Prongs|Rosmerta|Peter|Pettigrew|Derek|Vector|Ernie|MacMillian|Cho|Chang|Davies|Warrington|Montague|Derrick|Bole|Walden|Macnair' | column -t > bookcharacters.csv
cat azkabanscript.word_freq.csv | grep -E 'Harry|Potter|Bathilda|Bagshot|Wendelin|Vernon|Dursley|Petunia|Dudley|Ron|Weasley|Hermione|Granger|Hedwig|Lily|James|Errol|Arthur|Bill|Molly|Scabbers|Ginny|Percy|Fred|George|Hagrid|Minvera|McGonagall|Sirius|Black|Marge|Ripper|Fubster|Stan|Shunpike|Neville|Longbottom|Ernie|Prang|Marsh|Cornelius|Fudge|Minister|Voldemort|Tom|Florean|Fortescue|Malkin|Cassandra|Vablatsky|Seamus|Finnigan|Dean|Thomas|Crookshanks|Penelope|Clearwater|Remus|Lupin|Draco|Malfoy|Vincent|Crabbe|Gregory|Goyle|Poppy|Pomfrey|Albus|Dumbledore|Professor|Kettleburn|Pansy|Parkinson|Cadogan|Fang|Sibyll|Trelawney|Buckbeak|Parvati|Patil|Lavender|Brown|Colin|Creevey|Argus|Filch|Hannah|Abbott|Marcus|Flint|Cedric|Diggory|Davey|Gudgeon|Moony|Wormtail|Padfoot|Prongs|Rosmerta|Peter|Pettigrew|Derek|Vector|Ernie|MacMillian|Cho|Chang|Davies|Warrington|Montague|Derrick|Bole|Walden|Macnair' | column -t > scriptcharacters.csv
join bookcharacters.csv scriptcharacters.csv| column -t
This gave a more readable and clear output where the frequencies of the book and script could be seen next to each other:
Black 337 76
Brown 7 2
Buckbeak 93 54
Cadogan 16 5
...
To get a sorted list the following command was used:
join bookcharacters.csv scriptcharacters.csv| column -t | sort -n -k2 | uniq
join bookcharacters.csv scriptcharacters.csv| column -t | sort -n -k2 | uniq > bookscriptcharacters.csv
cat bookscriptcharacters.csv
And with this the output was finally ready for analyzation.
Results & Analysis: What this list shows are the names of the characters that appear in both the book and the script. Compared to the input in the code the outputted list of names is considerably shorter. From this we can already conclude that not all the characters that appear in the book are also mentioned and/or given screen time in the movie. A difficulty that comes with this list is that because the names and surnames are shown separately from each other one cannot immediately conclude that "Hermoine" is mentioned "668 162" times. One also has to add the number of times the surname is mentioned. so Hermoine is actually mentioned "Hermione 668 162" + "Granger 18 3" = "Hermoine Granger 686 165". However this cannot be applied to characters that e.g. have other family members that are mentioned by the same last same such as Harry and his father James Potter or Draco Malvoy and his father. It is also difficult to apply to characters with surnames that are commonly used words like Sirius Black, where black could also refer to e.g. the black arts.
Buckbeak 93 54
Pettigrew 103 31
Neville 106 22
Sirius 137 45
Dumbledore 162 32
Hagrid 265 53
Lupin 408 86
Hermione 668 162
Ron 786 138
Harry 2034 543
In the list above (random sample from output) it is also noticeable that sometimes characters with a larger part in the book have a part in the movie that less large, as is the case with Ron and Hermoine. However the opposite is also the case, where characters that are mentioned less in the book are mentioned more often in the film. This is the case with Pettigrew and Buckbeak. The exact difference in appearance between the book and the movie can be seen in the list that can be created by using the following commands:
awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' bookcharacters.csv bookcharacters.csv | column -t > bookpercentage.csv
awk 'NR==FNR{a = a + $2;next} {c = ($2/a)*100;print $1,$2,c }' scriptcharacters.csv scriptcharacters.csv | column -t > scriptpercentage.csv
join bookpercentage.csv scriptpercentage.csv | column -t | sort -n -k2 | uniq > bookscriptpercentage.csv
cat bookscriptpercentage.csv
This gives the following output (same random sample):
Buckbeak 93 1.18759 54 3.29872
Pettigrew 103 1.31529 31 1.89371
Neville 106 1.35359 22 1.34392
Sirius 137 1.74946 45 2.74893
Dumbledore 162 2.0687 32 1.9548
Hagrid 265 3.38399 53 3.23763
Lupin 408 5.21006 86 5.25351
Hermione 668 8.5302 162 9.89615
Ron 786 10.037 138 8.43005
Harry 2034 25.9737 543 33.1704
However it also might be interesting to roughly compare these statistics with the numbers from the output one gets while using a script without name indication of the speaker. Firstly, a word frequency list of the script without name indication was created:
cat azkabanscript2.txt | tr -sc '[A-Z][a-z]' '[\012*]' | sort | uniq -c | awk '{print $2 "\t" $1}' > azkabanscript2.word_freq.csv
Secondly, a search for the frequency of character appearances was conducted:
cat azkabanscript2.word_freq.csv | grep -E 'Harry|Potter|Bathilda|Bagshot|Wendelin|Vernon|Dursley|Petunia|Dudley|Ron|Weasley|Hermione|Granger|Hedwig|Lily|James|Errol|Arthur|Bill|Molly|Scabbers|Ginny|Percy|Fred|George|Hagrid|Minvera|McGonagall|Sirius|Black|Marge|Ripper|Fubster|Stan|Shunpike|Neville|Longbottom|Ernie|Prang|Marsh|Cornelius|Fudge|Minister|Voldemort|Tom|Florean|Fortescue|Malkin|Cassandra|Vablatsky|Seamus|Finnigan|Dean|Thomas|Crookshanks|Penelope|Clearwater|Remus|Lupin|Draco|Malfoy|Vincent|Crabbe|Gregory|Goyle|Poppy|Pomfrey|Albus|Dumbledore|Professor|Kettleburn|Pansy|Parkinson|Cadogan|Fang|Sibyll|Trelawney|Buckbeak|Parvati|Patil|Lavender|Brown|Colin|Creevey|Argus|Filch|Hannah|Abbott|Marcus|Flint|Cedric|Diggory|Davey|Gudgeon|Moony|Wormtail|Padfoot|Prongs|Rosmerta|Peter|Pettigrew|Derek|Vector|Ernie|MacMillian|Cho|Chang|Davies|Warrington|Montague|Derrick|Bole|Walden|Macnair' | column -t > script2characters.csv
Lastly, a command was used to be able to compare the lists:
paste bookcharacters.csv scriptcharacters.csv script2characters.csv | column -t
paste bookcharacters.csv scriptcharacters.csv script2characters.csv | column -t > bookscriptscript2characters.csv
cat bookscriptscript2characters.csv
Which gave the following output:
Abbott 1 Black 76 Black 29
Albus 5 Brown 2 Buckbeak 19
Arthur 6 Buckbeak 54 Cornelius 1
Bagshot 1 Cadogan 5 Crookshanks 2
...
The first column shows the characters that appear in the book, the second column the characters from the script with name indication and the third column shows the characters from the script without name indication. From this output you can conclude at first sight that the script without name indication is the most removed from the book compared to the script with name indication. The script with names has an output is slightly over half as long compared to the output from book. The script without names' output is considerably less long than that. This can be due to the fact that because of the omission of the names of the speaking characters, one does not know who is speaking, unless another speaker adresses them. Leaving out a lot of characters that are most certainly present, but not in a way that they can be noticed by our computers. In the book it is explicitly mentioned who is talking to whom and thus it is easier to get an image of the importance of characters by their appearance. Whoever this is not the case for the film scripts. One has to keep in mind that a lot is depicted by the presence and appearance of actors in the movie and because of that there is less need to make certain things explicit as they can be seen on screen by the audience.
Conclusion: To summarize our findings: It has shown that not all the characters that appear in the book are also mentioned or given screen time in the movie. For characters that are mentioned and given screen time there is a shift that occurs in the importance of the part of certain characters in the movie adaptation of the book. Thus, in conclusion we can say that the movie is significantly different from the book when it comes to character appearance. However when comparing the book with the scripts from its movie adaptation one has to keep the different natures of books and films in mind: In books everything that happens, such as who is speaking has to be made explicit in the writing of the author so that the reader will be able to create the image of the events in his mind. This is not necesary for a person who is watching a movie on a big screen, where all the events that take place in the book are depicted by the presence and appearance of actors and props. This lowers the need to make certain things explicit in the film script as they can be seen on screen by the audience.