4. Classification - adriannaziel/EmojiProject GitHub Wiki

Several types of classifiers have been used: LSTM, GRU, BiLSTM, BiGRU, LSTM with attention, GRU with attention, CNN, BERT, etc. Classifiers have been trained on our own fastext embeddings and with embedding layer. Best result has been achieved by GRU classifier with attention and mask. Classification has been performed on 45 classes of emoji and different emojis groups obtained during clusterization. We present our results below.

Classification model

Single emoji classification

Emoji mapping

{0: 'πŸ˜‚', 1: '❀', 2: '🀣', 3: '😊', 4: 'πŸ™', 5: 'πŸ’•', 6: '😭', 7: '😘', 8: 'πŸ’œ', 9: 'πŸ˜”', 10: '😎', 11: 'πŸ˜‡', 12: '🌹', 13: '🀦', 14: 'πŸŽ‰', 15: 'πŸ’ž', 16: '✌', 17: '✨', 18: '🀷', 19: '😱', 20: '😌', 21: '🌸', 22: 'πŸ™Œ', 23: 'πŸ˜‹', 24: 'πŸ’—', 25: 'πŸ’š', 26: '😏', 27: 'πŸ’›', 28: 'πŸ™‚', 29: 'πŸ’“', 30: 'πŸ‘', 31: 'πŸ˜…', 32: 'πŸ‘', 33: '😁', 34: 'πŸ”₯', 35: 'πŸ’”', 36: 'πŸ’–', 37: '😒', 38: 'πŸ€”', 39: 'πŸ˜†', 40: 'πŸ™„', 41: 'πŸ’ͺ', 42: 'πŸ˜‰', 43: 'πŸ‘Œ', 44: 'πŸ€—'}

Labels Distribution:

Results:

f1-score (macro) - 0.126575, accuracy - 0.168087

Confusion matrix:

There is a confusion matrix with 6 groups marked presented below

Attention:

Attention matrix To address the first hypothesis we made an attention matrix which contains attention sum for each word (except stop words) and each emoji normalize by attention sum assigned to corresponding emoji. We then computed difference sum for values assigned to first 2000 highest ranked words for each emoji. There’s a lot of drawbacks due to a lot of misspelled words and other problems specific for social media texts but we assume that we can capture some main tendencies. The higher this sum is the more model is biased through only some specific words for the emoji. If this sum is low we can assume that those tweets contain a lot of different words that draw back attention.

Wrong classification:

['i', 'love', 'america', 'and', 'all', 'of', 'our', 'protests']
{'i': 0.025778458, 'love': 0.37676516, 'america': 0.04802751, 'and': 0.08379555, 'all': 0.045900498, 'of': 0.26280254, 'our': 0.04455722, 'protests': 0.1123717}
y_true: πŸ€—, y_pred: πŸ‘

['damn', 'chief', 'master', 'flex']
{'damn': 0.010000608, 'chief': 0.0048183305, 'master': 0.017017549, 'flex': 0.9681627}
y_true: 😭, y_pred: πŸ”₯

['btw', 'michael', 'jackson', 'said', 'in', 'numerous', 'interviews', 'that', 'he', 'is', 'a', 'proud', 'black', 'american']
{'btw': 0.056591287, 'michael': 0.004642388, 'jackson': 0.0023407408, 'said': 0.0031850105, 'in': 0.0044036177, 'numerous': 0.007018037, 'interviews': 0.0017804742, 'that': 0.008941464, 'he': 0.006192718, 'is': 0.0023775315, 'a': 0.028840533, 'proud': 0.8142319, 'black': 0.031267878, 'american': 0.028185826}
y_true: 😌, y_pred: πŸ‘

['they', 'shudve', 'ask', 'u', 'to', 'give', 'them', 'a', 'lil', 'gym', 'workout']
{'they': 0.005616039, 'shudve': 0.010061448, 'ask': 0.04085927, 'u': 0.11402856, 'to': 0.12560055, 'give': 0.051928997, 'them': 0.02255508, 'a': 0.06699261, 'lil': 0.022766355, 'gym': 0.17242141, 'workout': 0.3671687}
y_true: πŸ˜‹, y_pred: πŸ’ͺ

Correct classification:

['lol', 'i', 'thought', 'you', 'were', 'through', 'with', 'the', 'hate', 'and', 'drama', 'on', 'social', 'media']
{'lol': 0.38014242, 'i': 0.076594636, 'thought': 0.4918506, 'you': 0.0036266951, 'were': 0.0012446358, 'through': 0.0029633555, 'with': 0.0021190864, 'the': 0.0009790901, 'hate': 0.011110178, 'and': 0.0014999098, 'drama': 0.018112028, 'on': 0.0018078982, 'social': 0.0004567742, 'media': 0.0074924645}
y_true: πŸ˜†, y_pred: πŸ˜†

['i', 'hope', 'i', 'win', 'your', 'giveaway', 'wish', 'me', 'luck']
{'i': 0.14888199, 'hope': 0.48214352, 'win': 0.05654105, 'your': 0.005445058, 'giveaway': 0.010920844, 'wish': 0.19810373, 'me': 0.00898792, 'luck': 0.07059709}
y_true: πŸ™, y_pred: πŸ™

['i', 'had', 'a', 'dream', 'i', 'was', 'getting', 'my', 'nails', 'done', 'last', 'night', 'this', 'is', 'soo', 'sad']
{'i': 0.00033217165, 'had': 0.00019247414, 'a': 3.9869414e-05, 'dream': 0.0034662462, 'was': 0.0004677434, 'getting': 0.00042943488, 'my': 0.0007447625, 'nails': 5.9523438e-05, 'done': 7.189297e-05, 'last': 0.0002751638, 'night': 5.112299e-05, 'this': 0.004354716, 'is': 0.028186372, 'soo': 0.0015113916, 'sad': 0.9597168}
y_true: 😒, y_pred: 😒

['thank', 'you', 'for', 'your', 'hardwork', 'lil', 'meow']
{'thank': 0.26464748, 'you': 0.025772238, 'for': 0.009349593, 'your': 0.0040656915, 'hardwork': 0.4456423, 'lil': 0.11511448, 'meow': 0.13540736}
y_true: 😘, y_pred: 😘

Group classification

We can assume than that grouping emojis and classification of groups (instead of single emojis) could improve model performance.

There are drawbacks worth mentioning: unbalanced dataset (especially for high k), some emojis that aren’t strongly coupled: usually we can see cluster that contains β€œother” (k means) or a lot of outliers (dbscan).

Tested for 7-10 groups

Labels distribution:

7 groups

0: ['πŸ’•', 'πŸ’ž', '🌸', 'πŸ’—', 'πŸ’“', 'πŸ’–']
1: ['❀', '😊', 'πŸ™', '😘', 'πŸ˜‡', '🌹', '✨', '😌', 'πŸ™‚', 'πŸ€—']
2: ['πŸ˜‚', '🀣', '🀦', '🀷', '😱', 'πŸ˜‹', '😏', 'πŸ˜…', '😁', 'πŸ€”', 'πŸ˜†', 'πŸ™„', 'πŸ˜‰']
3: ['😭', 'πŸ˜”', 'πŸ’”', '😒']
4: ['😎', '✌', 'πŸ™Œ', 'πŸ‘', 'πŸ‘', 'πŸ”₯', 'πŸ’ͺ', 'πŸ‘Œ']
5: ['πŸŽ‰']
6: ['πŸ’œ', 'πŸ’š', 'πŸ’›']

8 groups

0: ['😊', '😘', '😎', 'πŸ˜‡', '✌', '😌', 'πŸ˜‹', 'πŸ™‚', 'πŸ‘', '😁', 'πŸ˜‰', 'πŸ‘Œ', 'πŸ€—']
1: ['❀', 'πŸ’•', 'πŸ’œ', 'πŸ’ž', '✨', 'πŸ’—', 'πŸ’š', 'πŸ’›', 'πŸ’“', 'πŸ’–']
2: ['πŸ˜‚', '🀣']
3: ['πŸ™', 'πŸŽ‰', 'πŸ™Œ', 'πŸ‘', 'πŸ”₯', 'πŸ’ͺ']
4: ['🀦', '🀷', '😏', 'πŸ˜…', 'πŸ€”', 'πŸ˜†', 'πŸ™„']
5: ['😭', 'πŸ˜”', 'πŸ’”', '😒']
6: ['🌹', '🌸']
7: ['😱']

9 groups

0: ['😌', 'πŸ™‚', '😁', 'πŸ˜‰']
1: ['πŸ’œ', 'πŸ’š', 'πŸ’›']
2: ['😭', 'πŸ˜”', 'πŸ’”', '😒']
3: ['😎', '✌', 'πŸ‘', 'πŸ”₯', 'πŸ’ͺ', 'πŸ‘Œ']
4: ['πŸ˜‚', '🀣', '🀦', '🀷', '😏', 'πŸ˜…', 'πŸ€”', 'πŸ˜†', 'πŸ™„']
5: ['πŸ’•', 'πŸ’ž', '✨', '🌸', 'πŸ’—', 'πŸ’“', 'πŸ’–']
6: ['😱']
7: ['❀', '😊', 'πŸ™', '😘', 'πŸ˜‡', '🌹', 'πŸ˜‹', 'πŸ€—']
8: ['πŸŽ‰', 'πŸ™Œ', 'πŸ‘']\

10 groups

0: ['πŸŽ‰']
1: ['😊', '😘', '😎', 'πŸ˜‡', '✌', '😌', 'πŸ˜‹', 'πŸ™‚', 'πŸ‘', '😁', 'πŸ˜‰', 'πŸ‘Œ', 'πŸ€—']
2: ['πŸ’œ', 'πŸ’š', 'πŸ’›']
3: ['😭', 'πŸ˜”', 'πŸ’”', '😒']
4: ['🌹', '🌸']
5: ['❀', 'πŸ’•', 'πŸ’ž', 'πŸ’—', 'πŸ’“', 'πŸ’–']
6: ['😱']
7: ['πŸ™', '✨', 'πŸ™Œ', 'πŸ‘', 'πŸ”₯', 'πŸ’ͺ']
8: ['🀦', '🀷', '😏', 'πŸ˜…', 'πŸ€”', 'πŸ™„']
9: ['πŸ˜‚', '🀣', 'πŸ˜†']\

Results:

groups no. f1 accuracy
7 0.399312 0.491820
8 0.332877 0.444278
9 0.309019 0.410448
10 0.319187 0.414462

If we consider only top 10 words for each emoji words: love, thank, thanks, good, happy, morning, beautiful, birthday, miss, please, cute, well are getting the most attention for more than 10 emojis.

Those words are not distinct and in a lot of cases won’t help classify emoji correctly like words that are assigned to only one emoji: hot, yummy, fighting or lit.

For example 3 🀦 ['smh', 'oh', 'lol', 'god', 'dumb', 'sorry', 'omg', 'think', 'like', 'bad', 'thought', 'embarrassing', 'wrong', β€˜wonder’] are more unique than 5 πŸ’• ['love', 'thank', 'happy', 'beautiful', 'thanks', 'birthday', 'miss', 'morning'] 24 πŸ’— ['love', 'thank', 'thanks', 'happy', 'birthday', 'miss', 'cute', 'beautiful']

Confusion matrices:

7 groups

8 groups

9 groups

10 groups

For 14 emojis other emoji was labeled as the first one (incorrectly) more times than it was predicted correctly. There are emojis that could be seen as synonymous like πŸ’— and πŸ’–. But 🀣 was assigned incorrectly to over 20 other emojis. We can assume than those tweets contains a lot of words that are more specific for other emoji.

Soft evaluation

To check how models predicting amojis are far from real results soft evaluation has been used. In soft evaluation method class prediction was claimed correct if class label occured up to n-th place in results vector. Chart below represents the result of f1-score depending on n.

Other classifiers

Classifier discussed here and other RNN classifiers has been described in this work (in polish). Some of the other classifiers has been described here (in polish).