Troll bot and Google NGrams - gt-big-data/wiki GitHub Wiki

This program would use Google NGrams ( and another similarly structured dataset made up of insults, down-voted comments on Reddit, and possibly 4chan comments. We would harness the power of Ngrams to generate:

  • In the case of Google Books: short hopefully coherent stories

  • In the case of rude comments: even ruder, more offensive comments

If this were accomplished, TrollBot would be programmed to respond to poor online comments, probably just on YouTube, with it's own ngram-generated sentences.

Resources to get Started

  1. Amazon Article demonstrating use of Hive on Google NGram dataset:

  2. Hive Homepage:

  3. Hadoop Homepage (installation required for Hive):

  4. If you're super interested:

I've read through a little bit of this and it really helps understand what all is going on. relatively short book. ~300 pages

For the next week:

We will make our way through this page