Using the model to predict tweet sentiment - lukegenung/twitter-sentiment-bot GitHub Wiki
Prerequisite
Build a classifier object. Refer here:
Instructions
Using the classifier, provide new tweets for the model score sentiment. The new tweets should be provided as a list of key-value pairs (dictionaries).
Data from Twitter API often includes status, username, location, etc. So this supports usage data with an arbitrary number of key-value pairs.
Return scored data with positive
and negative
sentiment probabilities - and label
which categorizes the tweet into a segment (e.g. Positive, Neutral, or Negative).
Code
- Get positive and negative probabilities for each tweet
for tweet in tweets:
custom_tokens = helpers.remove_noise(casual_tokenize(tweet['status']))
dist = classifier.prob_classify(dict([token, True] for token in custom_tokens))
# append probabilities to list
pos_probability = dist.prob('Positive')
neg_probability = dist.prob('Negative')
# add sentiment probabilities to tweet dictionary
try:
tweet['positive'] = pos_probability
tweet['negative'] = neg_probability
except Exception as e:
print(e)
- Add sentiment labels to a dictionary. Arbitrary value thresholds are used to define segments.
if pos_probability >= 0.9:
tweet['label'] = 'Very Positive'
elif pos_probability >= 0.7:
tweet['label'] = 'Positive'
elif pos_probability > 0.3 and neg_probability > 0.3:
tweet['label'] = 'Neutral'
elif neg_probability >= 0.9:
tweet['label'] = 'Very Negative'
elif neg_probability >= 0.7:
tweet['label'] = 'Negative'
else:
tweet['label'] = 'None'
- Optional step: Remove tweet status from the to reduce data size (keeping sentiment distribution and label)
if tweet['status'] and keep_status == False:
del tweet['status']