Questions Contracts - ufal/NPFL095 GitHub Wiki

Questions - Corpus Based Classification of Text in Australian Contracts

Is there something suspicious form the methodological point of view in Section 4.2?
Table 1 shows the accuracy of the hand-coded tagger on Set B is 86.27%.
2a) The accuracy on Set C is not reported. Could it be higher than 86%? Why?
2b) Why is the contract by contract evaluation (82.84%) worse than the accuracy on Set B (86.27%)? Try to describe at least two possible reason.
A common approach for text classification is so-called bag of words. Guess why it resulted in low accuracy as reported in section 4.3 (name possible reasons).
What are the prerequisites for supervised machine learning? What is your opinion on the question when one should use machine learning and when rule-based (hand-coded) approaches?