Evaluation

How news-like corpus is evaluated?

Testing is performed like this

<T1> to <T2> is like <T3> to <?>

for example

Athens to Greece is like Baghdad to ?

the answer should be Iraq.

In symbols it's written as

T4 = T3 + (T2-T1)

relations are listed below

Relations

Semantic relations:

capital/country - like Amman/Jordan.
country/currency - like Jordan/Dinar.
family members male/female - like brother/sister.
city/state (USA) - like Houston/Texas.

Grammatical relations

adjective/adverb - like apparent/apparently.
opposite - like aware/unaware.
comparative - like bad/worse and big/bigger.
superlative - like bad/worst and big/biggest.
present-participle like code/coding
nationality/adjective like Albania/Albanian
past-tense like decreasing/decreased
plural like bird/birds
plural-verbs like eat/eats

Format

A simple CSV file would do the

t1,t2,t3,t4

make sure to cycle the pivot t1 and t2

# Amman as pivot
Amman,Jordan,Baghdad,Iraq
Amman,Jordan,Cairo,Egypt
# Baghdad as pivot
Baghdad,Iraq,Amman,Jordan
Baghdad,Iraq,Cairo,Egypt
# Cairo as pivot
Cairo,Egypt,Amman,Jordan
Cairo,Egypt,Baghdad,Iraq

Relations and Corpus

The evaluation should have terms from the corpus, for example if the corpus is Quran or Hadith or any traditional textbook we should not expect that to be hold any semantics of reflecting modern political geography like City/Country relations.

Evaluation - ojuba-org/arabic-ml-data GitHub Wiki

Evaluation

How news-like corpus is evaluated?

Relations

Semantic relations:

Grammatical relations

Format

Relations and Corpus

⚠️ GitHub.com Fallback ⚠️

Evaluation - ojuba-org/arabic-ml-data GitHub Wiki

Evaluation

How news-like corpus is evaluated?

Relations

Semantic relations:

Grammatical relations

Format

Relations and Corpus

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️