Naïve Bayes Research - SD-Group-11/ml-frontend GitHub Wiki

Naïve Bayes is a simple learning algorithm that utilizes Bayes rule together with a strong assumption that the attributes are conditionally independent, given the class.

Bayes Rule :

Bayes' theorem thus gives the probability of an event based on new information that is, or may be related, to that event. The formula can also be used to see how the probability of an event occurring is affected by hypothetical new information, supposing the new information will turn out to be true.

P(c|x) = ( P(x|c)P(c) )/ P(x)

Above,

P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).

P(c) is the prior probability of class.

P(x|c) is the likelihood which is the probability of predictor given class.

P(x) is the prior probability of predictor.

Bayes assumption:

It is a classification technique based on Bayes' Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Laplace smoothing:

Laplace smoothing is a smoothing technique that helps tackle the problem of zero probability in the Naïve Bayes machine learning algorithm. Using higher alpha values will push the likelihood towards a value of 0.5, i.e., the probability of a word equal to 0.5 for both the positive and negative reviews.

Useful link : https://www.upgrad.com/blog/naive-bayes-explained/

Advantages:

  • This algorithm works quickly and can save a lot of time.
  • Naive Bayes is suitable for solving multi-class prediction problems.
  • If its assumption of the independence of features holds true, it can perform better than other models and requires much less training data.
  • Naive Bayes is better suited for categorical input variables than numerical variables.
  • It doesn’t require as much training data
  • It handles both continuous and discrete data

Disadvantages:

  • Naive Bayes assumes that all predictors (or features) are independent, rarely happening in real life. This limits the applicability of this algorithm in real-world use cases.
  • This algorithm faces the ‘zero-frequency problem’ where it assigns zero probability to a categorical variable whose category in the test data set was not available in the training dataset. It would be best if you used a smoothing technique to overcome this issue.
  • Its estimations can be wrong in some cases, so you should not take its probability outputs very seriously.

Applications:

• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time.

• Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.

• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments)

• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not. (1)

Implementation:

A summary of how to implement the Naïve bayes model can be broken down as follows:

  • Handling Data
  • Summarising Data
  • Making Predictions
  • Evaluating Accuracy

An example of how this is done with and without the Scikit-Learn Libraries:

https://www.edureka.co/blog/naive-bayes-tutorial/** **

Libraries in Python to help implement the Naïve Bayes model

Naïve bayes using Scikit-Learn:

The SKLearn library contains a lot of efficient tools for machine learning and statistical modelling including classification, regression, clustering and dimensionality reduction**.**

Scikit-Learn is a higher-level library that includes implementations of several machine learning algorithms, so you can define a model object in a single line or a few lines of code, then use it to fit a set of points or predict a value. (2)

Naïve Bayes using TensorFlow:

TensorFlow is an open-source machine learning library can be used to solve classification and regression problems. The TensorFlow library helps to develop and train models using Python or JavaScript, that can easily be deployed in the browser. (3)

https://nicolovaligi.com/naive-bayes-tensorflow.html

https://github.com/nicolov/naive_bayes_tensorflow

Naïve Bayes using NumPy:

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation of the Machine Learning stack. This library can be used for reading the data, manipulating and summarizing it. (4)

https://geoffruddock.com/naive-bayes-from-scra

Sources cited:

  1. https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf
  2. https://www.analyticsvidhya.com/blog/2015/01/scikit-learn-python-machine-learning-tool/
  3. https://www.tensorflow.org/learn
  4. https://medium.com/mlpoint/numpy-for-machine-learning-211a3e58b574
⚠️ **GitHub.com Fallback** ⚠️