Using the model to predict tweet sentiment - lukegenung/twitter-sentiment-bot GitHub Wiki

Prerequisite

Build a classifier object. Refer here:

https://github.com/lukegenung/twitter-sentiment-bot/wiki/Building-a-sentiment-analysis-model-using-NLTK

Instructions

Using the classifier, provide new tweets for the model score sentiment. The new tweets should be provided as a list of key-value pairs (dictionaries).

Data from Twitter API often includes status, username, location, etc. So this supports usage data with an arbitrary number of key-value pairs.

Return scored data with positive and negative sentiment probabilities - and label which categorizes the tweet into a segment (e.g. Positive, Neutral, or Negative).

Code

Get positive and negative probabilities for each tweet

for tweet in tweets:
	custom_tokens = helpers.remove_noise(casual_tokenize(tweet['status']))
	dist = classifier.prob_classify(dict([token, True] for token in custom_tokens))

	# append probabilities to list
	pos_probability = dist.prob('Positive')
	neg_probability = dist.prob('Negative')

	# add sentiment probabilities to tweet dictionary
	try:
		tweet['positive'] = pos_probability
		tweet['negative'] = neg_probability
	except Exception as e:
		print(e)

Add sentiment labels to a dictionary. Arbitrary value thresholds are used to define segments.

	if pos_probability >= 0.9:
		tweet['label'] = 'Very Positive'
	elif pos_probability >= 0.7:
		tweet['label'] = 'Positive'
	elif pos_probability > 0.3 and neg_probability > 0.3:
		tweet['label'] = 'Neutral'
	elif neg_probability >= 0.9:
		tweet['label'] = 'Very Negative'
	elif neg_probability >= 0.7:
		tweet['label'] = 'Negative'
	else:
		tweet['label'] = 'None'

Optional step: Remove tweet status from the to reduce data size (keeping sentiment distribution and label)

	if tweet['status'] and keep_status == False:
		del tweet['status']