Brainstorm annotations - rbawden/discomt17-pronouns GitHub Wiki

Brainstorm annotations

  1. Matters to address -- is there a placeholder for zero pronoun in Es?

Syntax ideas:

  1. A GRU that takes a sequence of POS, where the i-th label is the POS of the head of i. E.g if i is a DET, its head will be NOUN.
  2. Lexical info of the governer, starting with the word itself.
  3. The dependency label of the word (especially if we can have a dedicated feature for the pronoun itself).

Impersonals:

  1. Detecting impersonals. There is a tool called Nada which works for En. There is a tool for Fr.

Animacy:

  1. Could be fished out from WordNet(s).
  2. A set of verbs will take animate subjects.

Gender

  1. Hypothesis: If we profile the gender of nouns, we can improve the translation of the pronouns that refer to them.
  2. Es, Fr and De have Gender (2, 2 and 3 respectively). English has gendered pronouns that refer to people and boats. We can retrieve the gender from a finer POS, a lexicon, or a suffix model.

TODO: Dependency parsing (HM) for En, De, Fr, Es

  1. Modify treebanks so that they only use lemmas and not token-lemma pairs .
  2. Train dep parsers for all languages.
  3. Train pos taggers for all languages.
  4. Tag all languages.
  5. Parse all languages.