asianlang_resources - shigashiyama/nlp_survey GitHub Wiki

Asian Language Resources

List of Language Resources

Multilingual

Simultaneous Interpretation/Translation

News, Web, and General Text

Science, Patent, and Technical Documents

Speech and Dialogue

User-generated Text

  • μtopia - Microblog Translated Posts Parallel Corpus
    • Weibo Corpus: Chinese --> English, Arabic, Russian, Korean, German, French, Spanish, Portuguese, Czech
    • Twitter Corpus: English <--> Chinese, Arabic, Russian, Korean, Japanese
    • Twitter Gold Corpus: English <--> Spanish, French, Russian, Korean, Japanese
    • http://www.cs.cmu.edu/~lingwang/microtopia/
  • MTNT: Machine Translation of Noisy Text
  • PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on User-Generated Contents

Other

Monolingual

Chinese

Korean

Vietnamese

Burmese (Myanmar)

Thai

Khmer