Baseline system scores - rbawden/discomt17-pronouns GitHub Wiki

Baseline systems (KenLM)

en-de

Confusion matrix:

         er   sie    es   man  OTHER <-- classified as
      +------------------------------+ -SUM-
  er  |    4     1     3     1     6 |    15
 sie  |   22    23    18     3    64 |   130
  es  |    4     6    60     0    52 |   122
 man  |    2     0     2     1     3 |     8
OTHER |    8     7    14     2   149 |   180
      +------------------------------+
  -SUM-    40    37    97     7   274 

Accuracy (calculated for the above confusion matrix) = 237/455 = 52.09%

Results for the individual labels:
     er  :    P =     4/   40 =  10.00%     R =     4/   15 =  26.67%     F1 =  14.55%
    sie  :    P =    23/   37 =  62.16%     R =    23/  130 =  17.69%     F1 =  27.54%
     es  :    P =    60/   97 =  61.86%     R =    60/  122 =  49.18%     F1 =  54.79%
    man  :    P =     1/    7 =  14.29%     R =     1/    8 =  12.50%     F1 =  13.33%
   OTHER :    P =   149/  274 =  54.38%     R =   149/  180 =  82.78%     F1 =  65.64%

Micro-averaged result: P = 237/ 455 = 52.09% R = 237/ 455 = 52.09% F1 = 52.09%

MACRO-averaged result: P = 40.54% R = 37.76% F1 = 35.17%

<<< II. OFFICIAL SCORE >>>

MACRO-averaged R: 37.76%

de-en

Confusion matrix:

          he   she    it  they   you  this  these there OTHER <-- classified as
       +------------------------------------------------------+ -SUM-
   he  |   12     2     5     3     3     0     0     0     6 |    31
  she  |   11     0     0     1     3     0     0     1     6 |    22
   it  |   12     2    99    10     7     7     0     1    38 |   176
 they  |   13     2     9    22    15     0     0     0    21 |    82
  you  |   16     1     4    10    61     0     0     0    25 |   117
 this  |    2     0     9     2     1     1     0     0     2 |    17
 these |    0     0     0     0     0     0     1     0     1 |     2
 there |    0     0     1     1     0     1     0    32     2 |    37
 OTHER |   21     0     3     1     5     4     2     1   143 |   180
       +------------------------------------------------------+
 -SUM-    87     7   130    50    95    13     3    35   244 

Accuracy (calculated for the above confusion matrix) = 371/664 = 55.87%

Results for the individual labels:
       he  :    P =    12/   87 =  13.79%     R =    12/   31 =  38.71%     F1 =  20.34%
      she  :    P =     0/    7 =   0.00%     R =     0/   22 =   0.00%     F1 =   0.00%
       it  :    P =    99/  130 =  76.15%     R =    99/  176 =  56.25%     F1 =  64.71%
     they  :    P =    22/   50 =  44.00%     R =    22/   82 =  26.83%     F1 =  33.33%
      you  :    P =    61/   95 =  64.21%     R =    61/  117 =  52.14%     F1 =  57.55%
     this  :    P =     1/   13 =   7.69%     R =     1/   17 =   5.88%     F1 =   6.67%
     these :    P =     1/    3 =  33.33%     R =     1/    2 =  50.00%     F1 =  40.00%
     there :    P =    32/   35 =  91.43%     R =    32/   37 =  86.49%     F1 =  88.89%
     OTHER :    P =   143/  244 =  58.61%     R =   143/  180 =  79.44%     F1 =  67.45%

Micro-averaged result: P = 371/ 664 = 55.87% R = 371/ 664 = 55.87% F1 = 55.87%

MACRO-averaged result: P = 43.25% R = 43.97% F1 = 42.10%

<<< II. OFFICIAL SCORE >>>

MACRO-averaged R: 43.97%

en-fr

Confusion matrix:

          ce  elle  elles   il   ils  cela    on  OTHER <-- classified as
      +------------------------------------------------+ -SUM-
  ce  |   57     0     0     7     0     1     0     7 |    72
elle  |    3    12     0     6     0     2     0     2 |    25
elles |    5     3     0     8     0     2     1     6 |    25
  il  |    6     2     0    43     0     3     5     3 |    62
 ils  |    9     6     0    27     0     1    10    19 |    72
cela  |    2     2     0     6     0    13     1    11 |    35
  on  |    0     0     0     2     0     0     5     2 |     9
OTHER |   18     3     0     4     0     0     1   119 |   145
      +------------------------------------------------+
  -SUM-   100    28     0   103     0    22    23   169 

Accuracy (calculated for the above confusion matrix) = 249/445 = 55.96%

Results for the individual labels:
     ce  :    P =    57/  100 =  57.00%     R =    57/   72 =  79.17%     F1 =  66.28%
   elle  :    P =    12/   28 =  42.86%     R =    12/   25 =  48.00%     F1 =  45.28%
   elles :    P =     0/    0 =   0.00%     R =     0/   25 =   0.00%     F1 =   0.00%
     il  :    P =    43/  103 =  41.75%     R =    43/   62 =  69.35%     F1 =  52.12%
    ils  :    P =     0/    0 =   0.00%     R =     0/   72 =   0.00%     F1 =   0.00%
   cela  :    P =    13/   22 =  59.09%     R =    13/   35 =  37.14%     F1 =  45.61%
     on  :    P =     5/   23 =  21.74%     R =     5/    9 =  55.56%     F1 =  31.25%
   OTHER :    P =   119/  169 =  70.41%     R =   119/  145 =  82.07%     F1 =  75.80%

Micro-averaged result: P = 249/ 445 = 55.96% R = 249/ 445 = 55.96% F1 = 55.96%

MACRO-averaged result: P = 36.61% R = 46.41% F1 = 39.54%

<<< II. OFFICIAL SCORE >>>

MACRO-averaged R: 46.41%

es-en

dev1

<<< I. EVALUATION >>>

Confusion matrix:

          he   she    it  they   you  there OTHER <-- classified as
       +------------------------------------------+ -SUM-
   he  |   19     3     4     5     5     2    10 |    48
  she  |    1     0     0     0     0     0     0 |     1
   it  |   12     0    98     5     0     6    23 |   144
 they  |    6     2    11    14     8     1    20 |    62
  you  |    7     2     3     6    29     0     5 |    52
 there |    3     0    14     1     0    35     3 |    56
 OTHER |   25     1    24     9    19     7   190 |   275
       +------------------------------------------+
  -SUM-    73     8   154    40    61    51   251 

Accuracy (calculated for the above confusion matrix) = 385/638 = 60.34%

Results for the individual labels:
       he  :    P =    19/   73 =  26.03%     R =    19/   48 =  39.58%     F1 =  31.40%
      she  :    P =     0/    8 =   0.00%     R =     0/    1 =   0.00%     F1 =   0.00%
       it  :    P =    98/  154 =  63.64%     R =    98/  144 =  68.06%     F1 =  65.77%
     they  :    P =    14/   40 =  35.00%     R =    14/   62 =  22.58%     F1 =  27.45%
      you  :    P =    29/   61 =  47.54%     R =    29/   52 =  55.77%     F1 =  51.33%
     there :    P =    35/   51 =  68.63%     R =    35/   56 =  62.50%     F1 =  65.42%
     OTHER :    P =   190/  251 =  75.70%     R =   190/  275 =  69.09%     F1 =  72.24%

Micro-averaged result: P = 385/ 638 = 60.34% R = 385/ 638 = 60.34% F1 = 60.34%

MACRO-averaged result: P = 45.22% R = 45.37% F1 = 44.80%

<<< II. OFFICIAL SCORE >>>

MACRO-averaged R: 45.37%

dev2

Confusion matrix:

          he   she    it  they   you  there OTHER <-- classified as
       +------------------------------------------+ -SUM-
   he  |    1     0     0     0     0     0     2 |     3
  she  |    0     0     2     0     2     0     3 |     7
   it  |    2     1    43     2     1     8    20 |    77
 they  |    7     0     8     8     3     1    19 |    46
  you  |    3     0     2     1    11     0     4 |    21
 there |    1     1     8     0     0    17     1 |    28
 OTHER |    5     0    11     1     5     1    33 |    56
       +------------------------------------------+
  -SUM-    19     2    74    12    22    27    82 

Accuracy (calculated for the above confusion matrix) = 113/238 = 47.48%

 Results for the individual labels:
       he  :    P =     1/   19 =   5.26%     R =     1/    3 =  33.33%     F1 =   9.09%
      she  :    P =     0/    2 =   0.00%     R =     0/    7 =   0.00%     F1 =   0.00%
       it  :    P =    43/   74 =  58.11%     R =    43/   77 =  55.84%     F1 =  56.95%
     they  :    P =     8/   12 =  66.67%     R =     8/   46 =  17.39%     F1 =  27.59%
      you  :    P =    11/   22 =  50.00%     R =    11/   21 =  52.38%     F1 =  51.16%
     there :    P =    17/   27 =  62.96%     R =    17/   28 =  60.71%     F1 =  61.82%
     OTHER :    P =    33/   82 =  40.24%     R =    33/   56 =  58.93%     F1 =  47.83%

Micro-averaged result: P = 113/ 238 = 47.48% R = 113/ 238 = 47.48% F1 = 47.48%

MACRO-averaged result: P = 40.46% R = 39.80% F1 = 36.35%

<<< II. OFFICIAL SCORE >>>

MACRO-averaged R: 39.80%