Turkish WordNet KeNet - StarlangSoftware/TurkishWordNet GitHub Wiki

A WordNet is a graph data structure where the nodes are word senses with their associated lemmas (and collocations in the case of multiword expressions (MWEs)) and edges are semantic relations between the sense pairs. Usually, the multiple senses corresponding to a single lemma are enumerated and are referenced as such. For example, the triple 􏰀

w52,w73,r1

represents an edge in the WordNet graph and corresponds to a semantic relation r1 between the second sense of the lemma w5 and the third sense of the lemma w7. The direction of the relation is usually implicit in the ordering of the elements of the triple. For synonymy, the direction is symmetric. For hypernymy, as a convention, the first sense is an hyponym of the second.

The main lexical source for KeNet is the Contemporary Dictionary of Turkish (CDT) (Güncel Türkçe Sözlük) published online and in paper by the Turkish Language Institute (TLI) (Türk Dil Kurumu), a government organization. Among other literary and academic works, the TLI publishes specialized and comprehensive dictionaries. These dictionaries are often taken as an authoritative reference by other dictionaries. The online version of the CDT contains 65,944 lemmas. Although the TLI publishes a separate dictionary of idioms and proverbs, the CDT still contains some MWE entries that have idiomatic senses.

Data Format

The structure of a sample synset is as follows:

<SYNSET>
	<ID>TUR10-0038510</ID>
	<LITERAL>anne<SENSE>2</SENSE>
	</LITERAL>
	<POS>n</POS>
	<DEF>...</DEF>
	<EXAMPLE>...</EXAMPLE>
</SYNSET>

Each entry in the dictionary is enclosed by and tags. Synset members are represented as literals and their sense numbers. shows the unique identifier given to the synset. and tags denote part of speech and definition, respectively. As for the tag, it gives a sample sentence for the synset.

Cite

If you use this resource on your research, please cite the following paper:

@article{ehsani18,
  title={Constructing a WordNet for {T}urkish Using Manual and Automatic Annotation},
  author={R. Ehsani and E. Solak and O.T. Yildiz},
  journal={ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)},
  volume={17},
  number={3},
  pages={24},
  year={2018},
  publisher={ACM}
}

@inproceedings{bakay19b,
  title={Integrating {T}urkish {W}ord{N}et {K}e{N}et to {P}rinceton {W}ord{N}et: The Case of One-to-Many Correspondences},
  author={Ozge Bakay and Ozlem Ergelen and Olcay Taner Yildiz},
  booktitle={Innovations in Intelligent Systems and Applications},
  year={2019}
}

@inproceedings{bakay19a,
  title={Problems Caused by Semantic Drift in WordNet SynSet Construction},
  author={Ozge Bakay and Ozlem Ergelen and Olcay Taner Yildiz},
  booktitle={International Conference on Computer Science and Engineering},
  year={2019}
}

@inproceedings{ozcelik19,
  title={User Interface for {T}urkish Word Network {K}e{N}et},
  author={Riza Ozcelik and Selen Parlar and Ozge Bakay and Ozlem Ergelen and Olcay Taner Yildiz},
  booktitle={Signal Processing and Communication Applications Conference},
  year={2019}
}
⚠️ **GitHub.com Fallback** ⚠️