Home - lenkaB/objects_to_data GitHub Wiki

Proposal

Lenka Bajcetic

Considering how language develops with people, my hypothesis would be that, in this faster-than-ever time, people tend to use shorter sentences and words than before, and avoid using more complicated tenses or grammar.
This hypothesis is based primarily on my experience, so it is quite possible that it turns out as untrue - we have to check the data to find out!
I would analyse a number of examples from English literature (courtesy of the Gutenberg project), using command line to parse them into words and sentences and easily find the longest word, sentence, and the average lengths. Also, using regular expressions and pattern matching, I could find the frequency of usages of more "complicated" tenses or syntax constructions - for example past perfect tense, or some of the conditionals.
Even though I believe that language has always been rapidly changing, my guess would be that in the last few decades, with the expanded use of the internet, language changes at an unbelievable pace. For this part of the research, digging into the modern language, I would try to find some less 'formal' database. Along with many books from the Gutenberg project, I could analyse articles from the New York times, or synopsis from Rotten Tomatoes, but what I would really need is a database of user's comments - someplace people write freely and don't think about the grammar, just about getting their message through.
If I could find such a database, maybe I could show what I think to be true - people now use a kind of English-derived language we could call Internet language, which also includes abbreviations and emojis, of course. My guess is that the Internet language is much, much shorter and simplified when compared to traditional English language, and I think this would be an interesting hypothesis to check out.