Nltk nativebayes classification by using Bigrams - sapnilcsecu/Nltk-sentiment-analysis GitHub Wiki
We will use Python's Nltk library for machine learning to train a text classification model.
Following are the steps required to create a text classification model in Python:
- Import the library
- Importing The movie_reviews dataset
- Training and Test Sets
- Training Text Classification Model and Evaluating The Model
Execute the following script to import the required libraries:
import nltk
from nltk.corpus import movie_reviews
from nltk.classify import NaiveBayesClassifier
import nltk.classify.util as util
from nltk.collocations import BigramCollocationFinder as BCF
from nltk.metrics import BigramAssocMeasures
import itertools
Execute the following script for importing movie_reviews dataset.
pid = movie_reviews.fileids('neg')
nid = movie_reviews.fileids('pos')
next code segment return bigram feature
prev = [(features(movie_reviews.words(fileids = id)), 'positive') for id in pid]
nrev = [(features(movie_reviews.words(fileids = id)), 'negative') for id in nid]
following script return the training and testing set
train_set = nrev[:ncutoff] + prev[:pcutoff]
test_set = nrev[ncutoff:] + prev[pcutoff:]
following script train the Text Classification Model and Evaluating The Model
classifier = NaiveBayesClassifier.train(train_set)
# Accuracy
print ("Accuracy is : ", util.accuracy(classifier, test_set) * 100)
complere source code you can get in this link movie_review_using_bigram