Evaluation - JulianThijssen/TelegraafES GitHub Wiki

Evaluation

In order to evaluate the success of our search engine we have made five information needs and their (albeit quite badly formed) queries. We then calculate the Cohen's Kappa measure and average precision

<topic number="1">
  <query>vluchtelingen vietnam</query>
  <description>Informatie over vietnamese vluchtelingen in Thailand</description>
</topic>

Judge 1: N R N R R N N N N N
Judge 2: N R N N R N N N N R
Kappa: 0.52
Pave(J1): 0.53
Pave(J2): 0.40

<topic number="2">
  <query>tweede wereld oorlog</query>
  <description>Informatie over Tweede Wereld Oorlog</description>
</topic>

Judge 1: R R R R R N N N N N
Judge 2: N N N R R N N N N N
Kappa: 0.4
Pave(J1): 1.0
Pave(J2): 0.325

<topic number="3">
  <query>vogeltrek</query>
  <description>Informatie over de vogeltrek in Nederland</description>
</topic>

Judge 1: N R R R R R R N R R
Judge 2: N N R R N N R N R R
Kappa: 0.4
Pave(J1): 0.75
Pave(J2): 0.44

<topic number="4">
  <query>griep</query>
  <description>Informatie over griep in Nederland</description>
</topic>

Judge 1: N N N N N N N N N N
Judge 2: N N N N N N N N N N
Kappa: NaN
Pave(J1): 0
Pave(J2): 0

<topic number="5">
  <query>terrorisme bestrijding</query>
  <description>Informatie over terrorisme bestrijding in Nederland</description>
</topic>

Judge 1: R R N R N R R R R R
Judge 2: N R N R N R N R R R
Kappa: 0.55
Pave(J1): 0.81
Pave(J2): 0.53

Precision@10

⚠️ **GitHub.com Fallback** ⚠️