Baseline system scores - rbawden/discomt17-pronouns GitHub Wiki
Baseline systems (KenLM)
en-de
Confusion matrix:
er sie es man OTHER <-- classified as
+------------------------------+ -SUM-
er | 4 1 3 1 6 | 15
sie | 22 23 18 3 64 | 130
es | 4 6 60 0 52 | 122
man | 2 0 2 1 3 | 8
OTHER | 8 7 14 2 149 | 180
+------------------------------+
-SUM- 40 37 97 7 274
Accuracy (calculated for the above confusion matrix) = 237/455 = 52.09%
Results for the individual labels:
er : P = 4/ 40 = 10.00% R = 4/ 15 = 26.67% F1 = 14.55%
sie : P = 23/ 37 = 62.16% R = 23/ 130 = 17.69% F1 = 27.54%
es : P = 60/ 97 = 61.86% R = 60/ 122 = 49.18% F1 = 54.79%
man : P = 1/ 7 = 14.29% R = 1/ 8 = 12.50% F1 = 13.33%
OTHER : P = 149/ 274 = 54.38% R = 149/ 180 = 82.78% F1 = 65.64%
Micro-averaged result: P = 237/ 455 = 52.09% R = 237/ 455 = 52.09% F1 = 52.09%
MACRO-averaged result: P = 40.54% R = 37.76% F1 = 35.17%
<<< II. OFFICIAL SCORE >>>
MACRO-averaged R: 37.76%
de-en
Confusion matrix:
he she it they you this these there OTHER <-- classified as
+------------------------------------------------------+ -SUM-
he | 12 2 5 3 3 0 0 0 6 | 31
she | 11 0 0 1 3 0 0 1 6 | 22
it | 12 2 99 10 7 7 0 1 38 | 176
they | 13 2 9 22 15 0 0 0 21 | 82
you | 16 1 4 10 61 0 0 0 25 | 117
this | 2 0 9 2 1 1 0 0 2 | 17
these | 0 0 0 0 0 0 1 0 1 | 2
there | 0 0 1 1 0 1 0 32 2 | 37
OTHER | 21 0 3 1 5 4 2 1 143 | 180
+------------------------------------------------------+
-SUM- 87 7 130 50 95 13 3 35 244
Accuracy (calculated for the above confusion matrix) = 371/664 = 55.87%
Results for the individual labels:
he : P = 12/ 87 = 13.79% R = 12/ 31 = 38.71% F1 = 20.34%
she : P = 0/ 7 = 0.00% R = 0/ 22 = 0.00% F1 = 0.00%
it : P = 99/ 130 = 76.15% R = 99/ 176 = 56.25% F1 = 64.71%
they : P = 22/ 50 = 44.00% R = 22/ 82 = 26.83% F1 = 33.33%
you : P = 61/ 95 = 64.21% R = 61/ 117 = 52.14% F1 = 57.55%
this : P = 1/ 13 = 7.69% R = 1/ 17 = 5.88% F1 = 6.67%
these : P = 1/ 3 = 33.33% R = 1/ 2 = 50.00% F1 = 40.00%
there : P = 32/ 35 = 91.43% R = 32/ 37 = 86.49% F1 = 88.89%
OTHER : P = 143/ 244 = 58.61% R = 143/ 180 = 79.44% F1 = 67.45%
Micro-averaged result: P = 371/ 664 = 55.87% R = 371/ 664 = 55.87% F1 = 55.87%
MACRO-averaged result: P = 43.25% R = 43.97% F1 = 42.10%
<<< II. OFFICIAL SCORE >>>
MACRO-averaged R: 43.97%
en-fr
Confusion matrix:
ce elle elles il ils cela on OTHER <-- classified as
+------------------------------------------------+ -SUM-
ce | 57 0 0 7 0 1 0 7 | 72
elle | 3 12 0 6 0 2 0 2 | 25
elles | 5 3 0 8 0 2 1 6 | 25
il | 6 2 0 43 0 3 5 3 | 62
ils | 9 6 0 27 0 1 10 19 | 72
cela | 2 2 0 6 0 13 1 11 | 35
on | 0 0 0 2 0 0 5 2 | 9
OTHER | 18 3 0 4 0 0 1 119 | 145
+------------------------------------------------+
-SUM- 100 28 0 103 0 22 23 169
Accuracy (calculated for the above confusion matrix) = 249/445 = 55.96%
Results for the individual labels:
ce : P = 57/ 100 = 57.00% R = 57/ 72 = 79.17% F1 = 66.28%
elle : P = 12/ 28 = 42.86% R = 12/ 25 = 48.00% F1 = 45.28%
elles : P = 0/ 0 = 0.00% R = 0/ 25 = 0.00% F1 = 0.00%
il : P = 43/ 103 = 41.75% R = 43/ 62 = 69.35% F1 = 52.12%
ils : P = 0/ 0 = 0.00% R = 0/ 72 = 0.00% F1 = 0.00%
cela : P = 13/ 22 = 59.09% R = 13/ 35 = 37.14% F1 = 45.61%
on : P = 5/ 23 = 21.74% R = 5/ 9 = 55.56% F1 = 31.25%
OTHER : P = 119/ 169 = 70.41% R = 119/ 145 = 82.07% F1 = 75.80%
Micro-averaged result: P = 249/ 445 = 55.96% R = 249/ 445 = 55.96% F1 = 55.96%
MACRO-averaged result: P = 36.61% R = 46.41% F1 = 39.54%
<<< II. OFFICIAL SCORE >>>
MACRO-averaged R: 46.41%
es-en
dev1
<<< I. EVALUATION >>>
Confusion matrix:
he she it they you there OTHER <-- classified as
+------------------------------------------+ -SUM-
he | 19 3 4 5 5 2 10 | 48
she | 1 0 0 0 0 0 0 | 1
it | 12 0 98 5 0 6 23 | 144
they | 6 2 11 14 8 1 20 | 62
you | 7 2 3 6 29 0 5 | 52
there | 3 0 14 1 0 35 3 | 56
OTHER | 25 1 24 9 19 7 190 | 275
+------------------------------------------+
-SUM- 73 8 154 40 61 51 251
Accuracy (calculated for the above confusion matrix) = 385/638 = 60.34%
Results for the individual labels:
he : P = 19/ 73 = 26.03% R = 19/ 48 = 39.58% F1 = 31.40%
she : P = 0/ 8 = 0.00% R = 0/ 1 = 0.00% F1 = 0.00%
it : P = 98/ 154 = 63.64% R = 98/ 144 = 68.06% F1 = 65.77%
they : P = 14/ 40 = 35.00% R = 14/ 62 = 22.58% F1 = 27.45%
you : P = 29/ 61 = 47.54% R = 29/ 52 = 55.77% F1 = 51.33%
there : P = 35/ 51 = 68.63% R = 35/ 56 = 62.50% F1 = 65.42%
OTHER : P = 190/ 251 = 75.70% R = 190/ 275 = 69.09% F1 = 72.24%
Micro-averaged result: P = 385/ 638 = 60.34% R = 385/ 638 = 60.34% F1 = 60.34%
MACRO-averaged result: P = 45.22% R = 45.37% F1 = 44.80%
<<< II. OFFICIAL SCORE >>>
MACRO-averaged R: 45.37%
dev2
Confusion matrix:
he she it they you there OTHER <-- classified as
+------------------------------------------+ -SUM-
he | 1 0 0 0 0 0 2 | 3
she | 0 0 2 0 2 0 3 | 7
it | 2 1 43 2 1 8 20 | 77
they | 7 0 8 8 3 1 19 | 46
you | 3 0 2 1 11 0 4 | 21
there | 1 1 8 0 0 17 1 | 28
OTHER | 5 0 11 1 5 1 33 | 56
+------------------------------------------+
-SUM- 19 2 74 12 22 27 82
Accuracy (calculated for the above confusion matrix) = 113/238 = 47.48%
Results for the individual labels:
he : P = 1/ 19 = 5.26% R = 1/ 3 = 33.33% F1 = 9.09%
she : P = 0/ 2 = 0.00% R = 0/ 7 = 0.00% F1 = 0.00%
it : P = 43/ 74 = 58.11% R = 43/ 77 = 55.84% F1 = 56.95%
they : P = 8/ 12 = 66.67% R = 8/ 46 = 17.39% F1 = 27.59%
you : P = 11/ 22 = 50.00% R = 11/ 21 = 52.38% F1 = 51.16%
there : P = 17/ 27 = 62.96% R = 17/ 28 = 60.71% F1 = 61.82%
OTHER : P = 33/ 82 = 40.24% R = 33/ 56 = 58.93% F1 = 47.83%
Micro-averaged result: P = 113/ 238 = 47.48% R = 113/ 238 = 47.48% F1 = 47.48%
MACRO-averaged result: P = 40.46% R = 39.80% F1 = 36.35%
<<< II. OFFICIAL SCORE >>>
MACRO-averaged R: 39.80%