Methods - snakes-in-the-box/super-awesome-txt-classifier GitHub Wiki

If you have no idea where to begin with designing a large-scale algorithm for document classification, this section is meant to provide a baseline to get you started. It should be noted, though, that to achieve grades in the A-range, you’ll need to go above and beyond what is described here.

Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem of conditional probabilities. Let’s say we’re working with data X, which is n × d (n instances, each with d dimensions), each of which belongs to one of K possible classes. Given some instance ~x ∼ X, we want to classify it into one of K classes yk.