Module 4 5 Evaluation Precision Recall and F1 Score - iffatAGheyas/NLP-handbook GitHub Wiki
Module 4.5: Evaluation Metrics â Precision, Recall & Fâ-Score
After training a classifier, it is crucial to evaluate its performance. For binary or multiclass classification, common metrics are:
- True Positives (TP): correctly predicted positive examples
- False Positives (FP): negative examples incorrectly predicted positive
- False Negatives (FN): positive examples incorrectly predicted negative
- True Negatives (TN): correctly predicted negative examples
From these we derive:
1. Evaluation with scikit-learn
Using the spam/ham example from Module 4.2 or 4.3, suppose:
tests = ["limited time offer", "lunch with project team",
"win cheap money", "report meeting tomorrow"]
y_true = ['spam', 'ham', 'spam', 'ham']
y_pred = ['spam', 'ham', 'spam', 'ham'] # replace with actual model predictions
Compute metrics and show a confusion matrix and classification report:
from sklearn.metrics import (
confusion_matrix,
classification_report,
precision_score,
recall_score,
f1_score
)
# 1. Confusion matrix
cm = confusion_matrix(y_true, y_pred, labels=['spam','ham'])
print("Confusion Matrix:\n", cm)
# Rows = true classes, columns = predicted classes
# Pred: spam Pred: ham
# True:
# spam [[ TP, FN ],
# ham [ FP, TN ]]
# 2. Classification report
print("\nClassification Report:\n",
classification_report(y_true, y_pred, target_names=['spam','ham']))
# 3. Precision, Recall, F1 (binary)
prec = precision_score(y_true, y_pred, pos_label='spam')
rec = recall_score(y_true, y_pred, pos_label='spam')
f1 = f1_score(y_true, y_pred, pos_label='spam')
print(f"Precision (spam) = {prec:.2f}")
print(f"Recall (spam) = {rec:.2f}")
print(f"F1 Score (spam) = {f1:.2f}")
# 4. Macro- and micro-averaged scores (multiclass)
print("Macro-avg F1:", f1_score(y_true, y_pred, average='macro'))
print("Micro-avg F1:", f1_score(y_true, y_pred, average='micro'))
Output:
2. Manual Computation Example
For deeper understanding, metrics can be computed by hand:
# Example predictions
y_true = ['spam', 'spam', 'ham', 'ham']
y_pred = ['spam', 'ham', 'ham', 'spam']
# 1. Count TP, FP, FN, TN for 'spam'
TP = sum(1 for t, p in zip(y_true, y_pred) if t=='spam' and p=='spam')
FP = sum(1 for t, p in zip(y_true, y_pred) if t=='ham' and p=='spam')
FN = sum(1 for t, p in zip(y_true, y_pred) if t=='spam' and p=='ham')
TN = sum(1 for t, p in zip(y_true, y_pred) if t=='ham' and p=='ham')
precision = TP / (TP + FP) if TP + FP > 0 else 0
recall = TP / (TP + FN) if TP + FN > 0 else 0
f1 = 2 * precision * recall / (precision + recall) if precision + recall > 0 else 0
print(f"TP={TP}, FP={FP}, FN={FN}, TN={TN}")
print(f"Precision={precision:.2f}, Recall={recall:.2f}, F1={f1:.2f}")
Output:
TP=1, FP=1, FN=1, TN=1
Precision=0.50, Recall=0.50, F1=0.50
Continue to [Module 5: Neural Network Fundamentals] (https://github.com/iffatAGheyas/NLP-handbook/wiki/Module-5-Neural-Network-Fundamentals)