Dependency parse derivation from treebank data - UCDenver-ccp/CRAFT GitHub Wiki

The dependency parses distributed as part of the CRAFT corpus are automatically derived from the treebank data. For reproducibility and understandability, this wiki page details this derivation process.

The methodology used for deriving dependency parses from treebank data is described in:
Choi, J. D., & Palmer, M. (2012). Guidelines for the clear style constituent to dependency conversion. Technical Report 01–12. [link]

A Java implementation of the conversion methodology is available here, and code that uses that Java implementation to do the CRAFT treebank-to-dependency conversion is available here.

The Clojure Boot script that is part of the CRAFT distribution is capable of doing the treebank-to-dependency conversion. Instructions for running the conversion are below.

System requirements (please install before proceeding)

  • Java 8 (or better)
  • Clojure Boot (see installation instructions here)

Important: run boot from the base directory of the distribution

Clojure Boot is a command line utility. It makes use of the build.boot file that is in the base directory of the CRAFT distribution. Make sure you are in the base directory of the distribution (where the build.boot file is located) when trying to run a boot command. If you see an error message containing java.lang.IllegalArgumentException: No such task ([TASK_NAME]) then you may not be in the correct directory, or you may have a typo in [TASK_NAME].

Example: convert the CRAFT treebank files to CoNLL-U dependency parse files

boot treebank-to-dependency

Note that running this command will overwrite the dependency files that are part of the CRAFT distribution in structural-annotation/dependency/conllu/.

⚠️ **GitHub.com Fallback** ⚠️