Classifier How To - cproof/tweet-analysis GitHub Wiki
Correctly Classified Instances 592 98.6667 %
Incorrectly Classified Instances 8 1.3333 %
Kappa statistic 0.9733
Mean absolute error 0.0137
Root mean squared error 0.1017
Relative absolute error 2.7313 %
Root relative squared error 20.3411 %
Total Number of Instances 600
=== Confusion Matrix ===
a b <-- classified as
295 5 | a = negative
3 297 | b = positive
Here are the detailed configurations from the WEKA-GUI output:
'processed-tweets-weka.filters.unsupervised.attribute.NominalToString-Cfirst-weka.filters.unsupervised.attribute.StringToWordVector-R1-W10000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.NGramTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\" -max 2 -min 1-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.ChiSquaredAttributeEval-Sweka.attributeSelection.Ranker -T 0.0 -N -1'
Important is the order of the Attributes.
@relation processed-tweets-weka.filters.unsupervised.attribute.NominalToString-Cfirst
@attribute Tweet string
@attribute Sentiment {negative,positive}
@data
'massiv pimpl near ear NEGATIVESMILE',negative
''happi POSITIVESMILE',positive',positive
...
Until now the Data contains 2 Attributes, one for the Tweet itself (String) and the Sentiment (Nominal, Pos and Neg).
The StringToWordVector changes the Data in the following way: Words as Attributes (depends on the Tokenizer), Data as Tweets (if a particular word is present in the Tweet).