Classifier How To - cproof/tweet-analysis GitHub Wiki

Information about the Configured Classifier!

Statistics with the first Model:

Correctly Classified Instances           592              98.6667 %
Incorrectly Classified Instances         8                1.3333 %
Kappa statistic                          0.9733
Mean absolute error                      0.0137
Root mean squared error                  0.1017
Relative absolute error                  2.7313 %
Root relative squared error             20.3411 %
Total Number of Instances              600     

=== Confusion Matrix ===
   a   b   <-- classified as
 295   5 |   a = negative
   3 297 |   b = positive

Here are the detailed configurations from the WEKA-GUI output:

'processed-tweets-weka.filters.unsupervised.attribute.NominalToString-Cfirst-weka.filters.unsupervised.attribute.StringToWordVector-R1-W10000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.NGramTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\" -max 2 -min 1-weka.filters.supervised.attribute.AttributeSelection-Eweka.attributeSelection.ChiSquaredAttributeEval-Sweka.attributeSelection.Ranker -T 0.0 -N -1'

ARFF Input for the Classifier:

Important is the order of the Attributes.

@relation processed-tweets-weka.filters.unsupervised.attribute.NominalToString-Cfirst
@attribute Tweet string
@attribute Sentiment {negative,positive}
@data
'massiv pimpl near ear NEGATIVESMILE',negative
''happi POSITIVESMILE',positive',positive
...

StringToWordVector:

Until now the Data contains 2 Attributes, one for the Tweet itself (String) and the Sentiment (Nominal, Pos and Neg).

The StringToWordVector changes the Data in the following way: Words as Attributes (depends on the Tokenizer), Data as Tweets (if a particular word is present in the Tweet).

Tokenizer:

nGramm-Dlimitizer:

WordTokenizer:

⚠️ **GitHub.com Fallback** ⚠️