PORTAGE_sharedTranslatingDecoding - SamuelLarkin/LizzyConversion GitHub Wiki
Up PortageII / Translating Next: RescoringNbestLists
The command canoe -h
describes how to run the decoder. Here we assume you already trained your models and tuned them, with the results of tune.py
available in model file canoe.ini.cow
If you are translating French text that is already preprocessed, sentence-split and tokenized, say test_fr.tok
, into English, this command does the decoding step:
canoe-escapes.pl -add text_fr.tok text_fr.rule
canoe -f canoe.ini < text_fr.rule > text_en.out
See UsingPhrasetablesInCanoe for more information on using multiple phrasetables in canoe.
If the source file contains
MarkedUpText#MarkedUpText, then any translations specified in markup are used instead of those provided by the translation model, unless the -bypass-marked
switch is given to canoe, in which case the two are combined. (If you use rules, don't use canoe-escapes.pl
; your rule-creating software should escape any <
, >
and \
meant to be interpreted literally.)
Running canoe on the untuned model will yield very bad results, so don't do it unless you're working on a new optimization technique. The tuning procedure described in
OptimizingWeights#TrainingOptimizingWeights sets optimized value for all the model weights. In general, the remaining parameters such as beam-threshold
and ttable-limit
control the trade-off between accuracy and speed: they should be set to values that give a good compromise between quality and speed for your application.
Up PortageII / Translating Next: RescoringNbestLists