Brainstorm annotations - rbawden/discomt17-pronouns GitHub Wiki
Brainstorm annotations
Matters to address -- is there a placeholder for zero pronoun in Es?
Syntax ideas:
A GRU that takes a sequence of POS, where the i-th label is the POS of the head of i. E.g if i is a DET, its head will be NOUN.
Lexical info of the governer, starting with the word itself.
The dependency label of the word (especially if we can have a dedicated feature for the pronoun itself).
Impersonals:
Detecting impersonals. There is a tool called Nada which works for En. There is a tool for Fr.
Animacy:
Could be fished out from WordNet(s).
A set of verbs will take animate subjects.
Gender
Hypothesis: If we profile the gender of nouns, we can improve the translation of the pronouns that refer to them.
Es, Fr and De have Gender (2, 2 and 3 respectively). English has gendered pronouns that refer to people and boats. We can retrieve the gender from a finer POS, a lexicon, or a suffix model.
TODO: Dependency parsing (HM) for En, De, Fr, Es
Modify treebanks so that they only use lemmas and not token-lemma pairs .