Transcript model - Lin-Brain-Lab/video_sentiment GitHub Wiki
Transcript Model
📌 Model Description
We are using an SBERT-powered One-vs-Rest sentiment classification model designed for sentence-level emotion recognition from dialogue transcripts. This approach combines semantic embeddings, SMOTE for class balancing, XGBoost classifiers, and logit-based class bias adjustments to handle extreme class imbalance and maximize both overall and per-class performance.
Key Features:
- Sentence-BERT (SBERT) for semantic embeddings
- SMOTE oversampling for underrepresented emotion classes (1, 2, 3, 5, 6)
- One-vs-Rest strategy using XGBoost with calibrated probabilities
- Logit scaling to adjust confidence in rare classes at prediction time
⚙️ How This Model Was Trained
Input Data
dd_train_best.csv
anddd_test_best.csv
extracted from the Daily Dialogue (https://www.kaggle.com/datasets/thedevastator/dailydialog-unlock-the-conversation-potential-in) dataset.- Dataset contains 43,000 sentences for training and 4,300 for testing.
- Dialogues of all "0" emotions were removed from the dataset to mqaintain class balance.
- Columns used:
"dialog"
– List of dialogue turns per scene (as stringified list)"emotion"
– List of emotion labels (0–6) matching the dialogue turns
Preprocessing Steps
- Removed rows with all-zero emotion labels
- Cleaned punctuation and whitespace
- Annotated different sentence changes within a dialogue
- Flattened each dialogue into sentence-level data
- Applied SMOTE on minority classes before training
Output Data
- Emotion prediction for each sentence:
0
: Neutral1
: Joy2
: Sadness3
: Anger4
: Fear5
: Disgust6
: Surprise
📈 Performance
Final evaluation on test set (4,300 samples):
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 | 0.79 | 0.94 | 0.86 | 2981 |
1 | 0.69 | 0.11 | 0.18 | 104 |
2 | 0.78 | 0.16 | 0.26 | 45 |
3 | 1.00 | 0.38 | 0.56 | 13 |
4 | 0.74 | 0.53 | 0.62 | 956 |
5 | 0.75 | 0.06 | 0.11 | 99 |
6 | 0.71 | 0.12 | 0.20 | 102 |
Summary:
- 🎯 Overall accuracy:
78%
over ~4,300 sentences. - 📊 Macro F1-score:
0.40
- 📊 Weighted F1-score:
0.75
The model significantly improves the performance of minority classes while preserving high accuracy on the majority class (0: Neutral). We acknowledge the poor F1-scores for the classes and are currently working towards a fix experimenting with other sampling techniques to finetune the model further. We are in progress of developing testable movie transcripts to feed into the model so we can manually test for accuracy.