Progress report Gender Issues - ShelinaGobardhan/FOTD-Portfolio- GitHub Wiki

One of the topics of choice to highlight is gender issues and if/how they occur within the script and the book. After close reading or deep reading the film script and the book it was clear that male (men, man, wizard, him, himself, his and he) and the female (woman, women, witch, her, hers and herself) words occurred. So distant reading was applied. By running scripts on these words and putting them together in a word count script in the command line, interesting things came forward that can be interpreted. The film script by itself consisted of a majority of male words.

Instead of scrolling trough the whole frequency lists of the book and the filmscript, scripts will be made to be able to work faster and more efficient.

'He' and 'She'

Step 1

cat azkabanbook.txt | tr -sc '[A-Z][a-z]' '[\012*]' | sort | uniq -c | awk '{print $2 "\t" $1}' >     azkabanbook.word_freq.csv

cat azkabanscript.txt | tr -sc '[A-Z][a-z]' '[\012*]' | sort | uniq -c | awk '{print $2 "\t" $1}' > azkabanscript.word_freq.csv

cat azkabanscript.word_freq.csv azkabanbook.word_freq.csv | grep -E 'he|she' | column -t

Problem: Al the words that start with 'he' occurred.

Step 2

cat azkabanscript.word_freq.csv azkabanbook.word_freq.csv | grep -w -i -E 'She|He' | column -t    

outcome:

He 54

She 10

he 61

she 9

HE 14

He 608

SHE 4

She 113

he 1474

she 251

Problem: distinction between capital letters and lower case letters.

Step 3

Merging uppercase and lowercase for both the filmscript and the book:

tr '[a-z]' '[A-Z]' < azkabanbook.word_freq.csv |
tr -sc '[A-Z]' '[\012*]' | sort | uniq -c
tr '[a-z]' '[A-Z]' < azkabanscript.word_freq.csv |
tr -sc '[A-Z]' '[\012*]' | sort | uniq -c 

outcome: no distinction between capitals and lower case letters anymore

problem: The merge did not succeed. The numbers that came foreword by running the script had changed significantly. For example 'he' was only mentioned 3 times. Probablely the 2 did not merge but only the upper case came out. After trying a bunch of code we did not succeed so we stuck with what is descrined in step 4.

Step 4

Running the following code provided a good overview on how many times different female and male words occurred.
grep -w -i "he" azkabanscript.word_freq.csv azkabanbook.word_freq.csv or appling multiple male and female words instead of just 'he'.

Step 5

Just to give an insight in how it looks:

azkabanscript.word_freq.csv:He 54

azkabanscript.word_freq.csv:he 61

azkabanbook.word_freq.csv:HE 14

azkabanbook.word_freq.csv:He 608

azkabanbook.word_freq.csv:he 1474

The following came foreword by running different gender related words:

Filmscript: he (115), she (19), wizard (4), witch (5), man (13), men (0), woman (3), women (0), him (52), her (14)

Book: he (2096), she (368), wizard (43), witch (42), man (38), men (9), woman (8), women (1), him (726), her (403)

By counting the female words and the male ones all together a better overview can be provided:

Filmscript: male (184), female (41)

Book: male (2912), female (822)

Step 6

Interpretation

The results show that both the filmscript and the book hide gender issues related aspects. Even though there are two male main characters and only one female main character, so if gender would be equal that means that twice as much โ€˜heโ€™sโ€™ can occur, โ€™heโ€™ is almost seven times as much present as โ€˜sheโ€™. It is striking to notice that one of the most populair stories world wide has such gender issues hidden.

Step 7

Interpretation from a different angle:

An other way to look at gender issues is to study the presence of non-main characters. After step 4 until 10 in the script description of character comparison. The following non-main characters (after erasing the main characters) occurred both in the book as in the script:

  • left=book right=script

Shunpike 2 1 (m)
Moony 8 2 (m)
Draco 9 2 (m)
Ernie 9 6 (m)
Padfoot 10 2 (m)
Cornelius 11 1 (m)
Thomas 11 1 (m)
Wormtail 11 2 (m)
Remus 15 7 (m)
Lily 16 4 (f)
Longbottom 16 1 (m)
Tom 16 2 (m)
Parvati 19 1 (f)
Filch 23 4 (m)
Rosmerta 24 2 (f)
Dursleys 26 1 (m/f)
Petunia 32 3 (f)
James 33 9 (m)
Minister 33 4 (m)
Pomfrey 38 2 (f)
Peter 47 8 (m)
Voldemort 48 5 (m)
Stan 52 1 (m)
Marge 64 6 (f)
George 66 1 (m)
Percy 68 1 (m)
Vernon 78 7 (m)
Trelawney 90 2 (f)
Pettigrew 103 13(m)
Neville 106 7 (m)
Fudge 109 2 (m)
Weasley 111 4 (m/f)
McGonagall 113 3 (f)
Sirius 137 34(m)
Dumbledore 162 14(m)
Malfoy 186 10(m)
Hagrid 265 16(m)
Black 337 29(m)
Lupin 408 8 (m)

  • First-names and surnames occur separate.

  • Only human characters are taken into account because some non-human characters don't have an obvious
    gender and are manually erased from the list.

Problems:
1a. What are non-main characters? There is no definition for this.
2a. Potter both refer to Harry and Harry's father.
3a. Characters that are called professor does not give an indication of gender.
4a. We could not find a code/script that recognized the gender of the names.

Solution:
1b. We chose non-characters as every character except for Harry, Hermione and Ron(ald).
2b. We chose to leave Potter out because it is to time consuming to discover which 'Potter' refers to Harry and which to Harry's father.
3b. We chose to leave 'Professor' out.
4b. The names where manually designated as female (f) or male (m).

Step 8

Conclusion related to the research question:

We can conclude that there is hardly any difference between the book and the script when it comes to gender issues. The both show a striking difference (see step 5 where the numbers of how many times the gender words occur). The only words that showed gender equality are 'wizard' and 'witch'. When looking deeper into gender issues and involving the non-main characters that both occur in the book and the script we can see that male words occur strikingly more because of the fact that there are more male characters. This explains the huge difference in gender words. This can be further interpreted that harry potter as a whole is facing gender issues, both the script as the book. Women are clearly under-represented.