Documentation - mtuoc/mtuoc.github.io GitHub Wiki

Each component has its own documentation in their respective Wikis. Here you can find the links to the documentation for each component:

  • MTUOC-any2text: Scripts for converting several format files to text.
  • MTUOC-TMX2tabtxt: Scripts and programs for the conversion of TMX files into tabbed text files (tsv).
  • MTUOC-SDLTM2tabtxt: Scripts and programs for the conversion of Trados translation memory files (SDLTM) into tabbed text files (tsv).
  • MTUOC-tokenizers: Tokenizers for several languages.
  • MTUOC-segmenter: Scripts and programs to segment text files and corpora.
  • MTUOC-aligner: Scripts and programs to automatically align text files using Hunalign or SBERT.
  • MTUOC-web-downloader: A set of scripts to download a whole website and store it locally.
  • MTUOC-clean-parallel-corpus: A python script for cleaning parallel corpora.
  • MTUOC-PCorpus-rescorer: A set of programs to rescore parallel corpora.
  • MTUOC-corpus-combination: A Pyhton program to select similar segments from a very large parallel corpus.
  • MTUOC-corpus-preprocessing: MTUOC script for preprocessing corpora to train machine translation systems.
  • MTUOC-training-scripts: Scripts and configuration files for training machine translation systems (Moses, Marian, OpenNMT...)
  • MTUOC-server: A server to start machine translation systems.
  • MTUOC-eval: An easy-to-use program for machine translation evaluation using automatic metrics.
  • PosEdiOn: A set of programs to perform translation and post-edition experiments and calculate post-edition effort indicators.
  • MTUOC-translator: A set of programs to translate files with the MTUOC server.
  • MTUOC-editor: TO DO
  • MTUOC-novaIEC: A simple script and data file to chang Catalan texts from old ortography to new othograpy. All changes have been taken from ORTOGRAFIA Modificacions entrades DIEC2.
  • MTUOC-Trados-plugin: A plug-in for Trados to use MTUOC machine translation servers
  • MTUOC: General files and scripts for the project MTUOC.