Troll bot and Google NGrams - gt-big-data/wiki GitHub Wiki

This program would use Google NGrams (http://aws.amazon.com/datasets/8172056142375670) and another similarly structured dataset made up of insults, down-voted comments on Reddit, and possibly 4chan comments. We would harness the power of Ngrams to generate:

  • In the case of Google Books: short hopefully coherent stories

  • In the case of rude comments: even ruder, more offensive comments

If this were accomplished, TrollBot would be programmed to respond to poor online comments, probably just on YouTube, with it's own ngram-generated sentences.


Resources to get Started

  1. Amazon Article demonstrating use of Hive on Google NGram dataset: http://aws.amazon.com/articles/5249664154115844

  2. Hive Homepage: http://hive.apache.org

  3. Hadoop Homepage (installation required for Hive): http://hadoop.apache.org

  4. If you're super interested: http://www.amazon.com/Hadoop-Action-Chuck-Lam/dp/1935182196

I've read through a little bit of this and it really helps understand what all is going on. relatively short book. ~300 pages

For the next week:

We will make our way through this page http://aws.amazon.com/articles/5249664154115844