Documentation - mtuoc/mtuoc.github.io GitHub Wiki
Each component has its own documentation in their respective Wikis. Here you can find the links to the documentation for each component:
- MTUOC-any2text: Scripts for converting several format files to text.
- MTUOC-TMX2tabtxt: Scripts and programs for the conversion of TMX files into tabbed text files (tsv).
- MTUOC-SDLTM2tabtxt: Scripts and programs for the conversion of Trados translation memory files (SDLTM) into tabbed text files (tsv).
- MTUOC-tokenizers: Tokenizers for several languages.
- MTUOC-segmenter: Scripts and programs to segment text files and corpora.
- MTUOC-aligner: Scripts and programs to automatically align text files using Hunalign or SBERT.
- MTUOC-web-downloader: A set of scripts to download a whole website and store it locally.
- MTUOC-clean-parallel-corpus: A python script for cleaning parallel corpora.
- MTUOC-PCorpus-rescorer: A set of programs to rescore parallel corpora.
- MTUOC-corpus-combination: A Pyhton program to select similar segments from a very large parallel corpus.
- MTUOC-corpus-preprocessing: MTUOC script for preprocessing corpora to train machine translation systems.
- MTUOC-training-scripts: Scripts and configuration files for training machine translation systems (Moses, Marian, OpenNMT...)
- MTUOC-server: A server to start machine translation systems.
- MTUOC-eval: An easy-to-use program for machine translation evaluation using automatic metrics.
- PosEdiOn: A set of programs to perform translation and post-edition experiments and calculate post-edition effort indicators.
- MTUOC-translator: A set of programs to translate files with the MTUOC server.
- MTUOC-editor: TO DO
- MTUOC-novaIEC: A simple script and data file to chang Catalan texts from old ortography to new othograpy. All changes have been taken from ORTOGRAFIA Modificacions entrades DIEC2.
- MTUOC-Trados-plugin: A plug-in for Trados to use MTUOC machine translation servers
- MTUOC: General files and scripts for the project MTUOC.