Dependency parse derivation from treebank data - UCDenver-ccp/CRAFT GitHub Wiki
The dependency parses distributed as part of the CRAFT corpus are automatically derived from the treebank data. For reproducibility and understandability, this wiki page details this derivation process.
The methodology used for deriving dependency parses from treebank data is described in:
Choi, J. D., & Palmer, M. (2012). Guidelines for the clear style constituent to dependency conversion. Technical Report 01–12. [link]
A Java implementation of the conversion methodology is available here, and code that uses that Java implementation to do the CRAFT treebank-to-dependency conversion is available here.
The Clojure Boot script that is part of the CRAFT distribution is capable of doing the treebank-to-dependency conversion. Instructions for running the conversion are below.
- Java 8 (or better)
- Clojure Boot (see installation instructions here)
Clojure Boot is a command line utility. It makes use of the build.boot
file that is in the base directory of the CRAFT distribution. Make sure you are in the base directory of the distribution (where the build.boot
file is located) when trying to run a boot
command. If you see an error message containing java.lang.IllegalArgumentException: No such task ([TASK_NAME])
then you may not be in the correct directory, or you may have a typo in [TASK_NAME].
boot treebank-to-dependency
Note that running this command will overwrite the dependency files that are part of the CRAFT distribution in structural-annotation/dependency/conllu/
.