Evaluation - ojuba-org/arabic-ml-data GitHub Wiki
Testing is performed like this
<T1>
to<T2>
is like<T3
> to<?>
for example
Athens
toGreece
is likeBaghdad
to ?
the answer should be Iraq
.
In symbols it's written as
T4 = T3 + (T2-T1)
relations are listed below
- capital/country - like Amman/Jordan.
- country/currency - like Jordan/Dinar.
- family members male/female - like brother/sister.
- city/state (USA) - like Houston/Texas.
- adjective/adverb - like apparent/apparently.
- opposite - like aware/unaware.
- comparative - like bad/worse and big/bigger.
- superlative - like bad/worst and big/biggest.
- present-participle like code/coding
- nationality/adjective like Albania/Albanian
- past-tense like decreasing/decreased
- plural like bird/birds
- plural-verbs like eat/eats
A simple CSV
file would do the
t1,t2,t3,t4
make sure to cycle the pivot t1
and t2
# Amman as pivot
Amman,Jordan,Baghdad,Iraq
Amman,Jordan,Cairo,Egypt
# Baghdad as pivot
Baghdad,Iraq,Amman,Jordan
Baghdad,Iraq,Cairo,Egypt
# Cairo as pivot
Cairo,Egypt,Amman,Jordan
Cairo,Egypt,Baghdad,Iraq
The evaluation should have terms from the corpus, for example if the corpus is Quran or Hadith or any traditional textbook we should not expect that to be hold any semantics of reflecting modern political geography like City/Country relations.