Evaluation Results - HeidelTime/heideltime GitHub Wiki
Table of contents
Introduction
This page contains the evaluation results of version 2.2 of HeidelTime.
Operating system: Debian Linux
Java version: 1.8.0_101
Locale: en_GB (unless given in the workflow description in ReproduceEvaluationResults)
Tokenization and POS-Tagging: TreeTaggerWrapper, JVnTextProWrapper (Vietnamese corpora: JVnTextPro 2.0, Maxent model), StanfordPOSTaggerWrapper (Arabic corpora: Stanford POS Tagger 3.3.1, arabic.tagger model), HunPosTaggerWrapper (Croatian WikiWarsHR: HunPos 1.0, Croatian model from 09.05.2013)
ACE Tern 2004 Training Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
95.8% |
79.0% |
86.6% |
Extraction (strict) |
87.3% |
72.0% |
78.9% |
Normalization (value) |
86.8% |
87.3% |
87.1% |
Extraction & Normalization (lenient + VAL) |
83.1% |
68.6% |
75.1% |
Extraction & Normalization (strict + VAL) |
78.2% |
64.6% |
70.7% |
AncientTimes Arabic
|
Precision |
Recall |
F-Score |
Extraction (strict) |
83.33% |
74.26% |
78.53% |
Extraction (relaxed) |
93.33% |
83.17% |
87.96% |
- Attribute value F1: 83.77%
- Attribute type F1: 87.96%
AncientTimes German
|
Precision |
Recall |
F-Score |
Extraction (strict) |
86.75% |
71.98% |
78.68% |
Extraction (relaxed) |
95.36% |
79.12% |
86.49% |
- Attribute value F1: 81.08%
- Attribute type F1: 85.89%
AncientTimes English
|
Precision |
Recall |
F-Score |
Extraction (strict) |
88.85% |
78.88% |
83.57% |
Extraction (relaxed) |
97.03% |
86.14% |
91.26% |
- Attribute value F1: 84.97%
- Attribute type F1: 90.56%
AncientTimes Spanish
|
Precision |
Recall |
F-Score |
Extraction (strict) |
80.85% |
72.04% |
76.19% |
Extraction (relaxed) |
96.28% |
85.78% |
90.73% |
- Attribute value F1: 85.71%
- Attribute type F1: 88.22%
AncientTimes French
|
Precision |
Recall |
F-Score |
Extraction (strict) |
89.07% |
77.19% |
82.71% |
Extraction (relaxed) |
98.38% |
85.26% |
91.35% |
- Attribute value F1: 90.23%
- Attribute type F1: 91.35%
AncientTimes Italian
|
Precision |
Recall |
F-Score |
Extraction (strict) |
79.63% |
75.11% |
77.3% |
Extraction (relaxed) |
91.2% |
86.03% |
88.54% |
- Attribute value F1: 79.55%
- Attribute type F1: 85.84%
AncientTimes Dutch
|
Precision |
Recall |
F-Score |
Extraction (strict) |
81.67% |
78.4% |
80.0% |
Extraction (relaxed) |
94.17% |
90.4% |
92.24% |
- Attribute value F1: 88.16%
- Attribute type F1: 88.16%
AncientTimes Vietnamese
|
Precision |
Recall |
F-Score |
Extraction (strict) |
87.27% |
82.76% |
84.96% |
Extraction (relaxed) |
97.27% |
92.24% |
94.69% |
- Attribute value F1: 92.04%
- Attribute type F1: 93.81%
ACE Tern 2005 Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
89.3% |
75.5% |
81.8% |
Extraction (strict) |
77.3% |
65.3% |
70.8% |
Normalization (value) |
74.8% |
77.3% |
76% |
Extraction & Normalization (lenient + VAL) |
66.8% |
56.4% |
61.2% |
Extraction & Normalization (strict + VAL) |
62.8% |
53.1% |
57.5% |
Arabic test-150 Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
80.1% |
90.9% |
85.2% |
Extraction (strict) |
64.9% |
73.7% |
69.0% |
Arabic test-50 Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
79.7% |
90.4% |
84.7% |
Extraction (strict) |
62.8% |
71.3% |
66.8% |
Arabic test-50-star Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
91.9% |
91.3% |
91.6% |
Extraction (strict) |
84.8% |
84.2% |
84.5% |
Normalization (value) |
91.9% |
91.9% |
91.9% |
Extraction & Normalization (lenient + VAL) |
84.5% |
83.9% |
84.2% |
Extraction & Normalization (strict + VAL) |
80.1% |
79.5% |
79.8% |
Arabic test-50-star Corpus evaluated with TE3-Tools
|
Precision |
Recall |
F-Score |
Extraction (strict) |
80.99% |
80.99% |
80.99% |
Extraction (relaxed) |
90.91% |
90.91% |
90.91% |
- Attribute value F1: 82.23%
- Attribute type F1: 84.3%
I-CAB Test Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
92.7% |
81.5% |
86.8% |
Extraction (strict) |
64.1% |
56.4% |
60.0% |
Normalization (value) |
75.6% |
78.3% |
76.9% |
Extraction & Normalization (lenient + VAL) |
70.1% |
61.7% |
65.6% |
Extraction & Normalization (strict + VAL) |
51.4% |
45.2% |
48.1% |
TempEval2 Evaluation Corpus
Precision |
Recall |
F-Score |
88.0% |
86.0% |
87.0% |
- Attribute type: 96.0 %
- Attribute value: 86.0 %
TempEval2 Spanish Evaluation Corpus
The Spanish TempEval2 Evaluation Corpus is essentially the same as TempEval 3 version further down in this document, but with some improvements, so please refer to that as it also uses our preferred evaluation method.
TempEval2 Italian Evaluation Corpus
Precision |
Recall |
F-Score |
93.1% |
89.6% |
91.3% |
- Attribute type: 98.0 %
- Attribute value: 94.0 %
TempEval 2 Italian Training Corpus evaluated with TE3-Tools
|
Precision |
Recall |
F-Score |
Extraction (strict) |
73.3% |
88.72% |
80.28% |
Extraction (relaxed) |
77.41% |
93.69% |
84.78% |
- Attribute value F1: 76.47%
- Attribute type F1: 82.18%
TempEval 2 Italian Test Corpus evaluated with TE3-Tools
|
Precision |
Recall |
F-Score |
Extraction (strict) |
77.93% |
89.68% |
83.39% |
Extraction (relaxed) |
83.45% |
96.03% |
89.3% |
- Attribute value F1: 81.18%
- Attribute type F1: 85.61%
TempEval 2 Chinese Original Training Corpora
Precision |
Recall |
F-Score |
96.0% |
93.9% |
94.9% |
- Attribute type: 92.0 %
- Attribute value: 79.0 %
TempEval 2 Chinese CLEAN Training Corpora
Precision |
Recall |
F-Score |
80.1% |
95.7% |
87.2% |
- Attribute type: 94.0 %
- Attribute value: 90.0 %
TempEval 2 Chinese IMPROVED Training Corpora
Precision |
Recall |
F-Score |
97.4% |
95.6% |
96.5% |
- Attribute type: 94.0 %
- Attribute value: 91.0 %
TempEval 2 Chinese Original Evaluation Corpora
Precision |
Recall |
F-Score |
93.8% |
87.5% |
90.5% |
- Attribute type: 93.0 %
- Attribute value: 70.0 %
TempEval 2 Chinese CLEAN Evaluation Corpora
Precision |
Recall |
F-Score |
62.4% |
91.8% |
74.3% |
- Attribute type: 96.0 %
- Attribute value: 89.0 %
TempEval 2 Chinese IMPROVED Evaluation Corpora
Precision |
Recall |
F-Score |
95.8% |
89.3% |
92.4% |
- Attribute type: 96.0 %
- Attribute value: 86.0 %
TimeBank 1.2 Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
92.6% |
91.5% |
92.0% |
Extraction (strict) |
86.6% |
85.6% |
86.1% |
Normalization (value) |
87.6% |
87.6% |
87.6% |
Extraction & Normalization (lenient + VAL) |
81.0% |
80.1% |
80.6% |
Extraction & Normalization (strict + VAL) |
77.0% |
76.2% |
76.6% |
WikiWars Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
98.3% |
86.1% |
91.8% |
Extraction (strict) |
93.3% |
81.8% |
87.2% |
Normalization (value) |
90.5% |
91.1% |
90.8% |
Extraction & Normalization (lenient + VAL) |
89.0% |
78.0% |
83.1% |
Extraction & Normalization (strict + VAL) |
85.9% |
75.3% |
80.2% |
WikiWarsDE Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
98.7% |
89.3% |
93.8% |
Extraction (strict) |
92.6% |
83.8% |
88.0% |
Normalization (value) |
88.5% |
88.5% |
88.5% |
Extraction & Normalization (lenient + VAL) |
87.4% |
79.1% |
83.0% |
Extraction & Normalization (strict + VAL) |
83.2% |
75.3% |
79.1% |
WikiWarsVN Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
92.1% |
97.8% |
94.8% |
Extraction (strict) |
72.9% |
77.4% |
75.1% |
Normalization (value) |
95% |
95% |
95% |
Extraction & Normalization (lenient + VAL) |
87.5% |
92.9% |
90.1% |
Extraction & Normalization (strict + VAL) |
69.2% |
73.5% |
71.2% |
WikiWarsVN Corpus evaluated with TE3-Tools
|
Precision |
Recall |
F-Score |
Extraction (strict) |
94.09% |
94.09% |
94.09% |
Extraction (relaxed) |
98.18% |
98.18% |
98.18% |
- Attribute value F1: 91.36%
- Attribute type F1: 93.64%
WikiWarsHR Corpus evaluated with TE3-Tools
|
Precision |
Recall |
F-Score |
Extraction (strict) |
88.93% |
86.86% |
87.88% |
Extraction (relaxed) |
92.62% |
90.46% |
91.53% |
- Attribute value F1: 80.8%
- Attribute type F1: 89.74%
Time4SCI Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
96.2% |
70.6% |
81.4% |
Extraction (strict) |
88.9% |
65.3% |
75.3% |
Normalization (value) |
88.9% |
88.9% |
88.9% |
Extraction & Normalization (lenient + VAL) |
85.5% |
62.8% |
72.4% |
Extraction & Normalization (strict + VAL) |
80.0% |
58.8% |
67.7% |
Time4SMS Corpus
|
Precision |
Recall |
F-Score |
Extraction (lenient) |
99.4% |
91.3% |
95.2% |
Extraction (strict) |
98.2% |
90.2% |
94.1% |
Normalization (value) |
97.1% |
97.1% |
97.1% |
Extraction & Normalization (lenient + VAL) |
96.5% |
88.7% |
92.4% |
Extraction & Normalization (strict + VAL) |
96.1% |
88.3% |
92.1% |
TempEval 3 AQUAINT Training Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
80.99% |
81.69% |
81.34% |
Extraction (relaxed) |
92.12% |
92.92% |
92.52% |
- Attribute value F1: 73.09%
- Attribute type F1: 84.44%
TempEval 3 TimeBank Training Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
86.4% |
84.31% |
85.34% |
Extraction (relaxed) |
93.08% |
90.83% |
91.94% |
- Attribute value F1: 79.56%
- Attribute type F1: 89.66%
TempEval 3 trainT3 Spanish Training Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
90.83% |
81.44% |
85.88% |
Extraction (relaxed) |
96.33% |
86.38% |
91.08% |
- Attribute value F1: 84.14%
- Attribute type F1: 89.54%
TempEval 3 Platinum English Evaluation Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
83.97% |
79.71% |
81.78% |
Extraction (relaxed) |
93.13% |
88.41% |
90.71% |
- Attribute value F1: 78.07%
- Attribute type F1: 83.27%
TempEval 3 Spanish Evaluation Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
91.48% |
80.9% |
85.87% |
Extraction (relaxed) |
96.02% |
84.92% |
90.13% |
- Attribute value F1: 85.33%
- Attribute type F1: 87.47%
French TimeBank 1.1 Corpus
|
Precision |
Recall |
F-Score |
Extraction (strict) |
86.81% |
85.18% |
85.99% |
Extraction (relaxed) |
91.85% |
90.12% |
90.97% |
- Attribute value F1: 73.63%
- Attribute type F1: 82.66%
EVALITA 2014 Test Corpus
|
Precision |
Recall |
F-Score |
Type F1 |
Value F1 |
Strict extraction/normalization |
85.1% |
79.% |
82.% |
78.5% |
71.% |
Relaxed extraction/normalization |
92.7% |
86.1% |
89.3% |
84.% |
75.% |
Portuguese TimeBank 1.0 Corpus (test subset)
|
Precision |
Recall |
F-Score |
Extraction (strict) |
76.98% |
66.9% |
71.59% |
Extraction (relaxed) |
87.3% |
75.86% |
81.18% |
- Attribute value F1: 63.47%
- Attribute type F1: 76.75%