Participle - Potato-W/Poetic_Language GitHub Wiki
Word segmentation is the foundation of NLP, whether later it is to do emotional analysis or content understanding or anything else. So, a good beginning is half the battle.
Modern Chinese word segmentation is a huge challange, not to motion Middle Chinese. The meaning of a word(字) forming a phrase (词)maybe different from that of a word. There is my strategy is as follows:
- first, participle with jieba to have most words.
- second, using informationentropy to get unregistered words, which are more common in Tang Poem, rather than modern Chinese.