Command Line Tool - TrnsltLife/HunspellXML GitHub Wiki

If you download version 1.8 or above from the releases, you will be able to use the command line utility to convert HunspellXML to Hunspell or vice versa.

The utility is located in the bin/ folder as a Windows .bat file and as a Unix shell script, so you can use it on Windows (.bat) or on Linux or Mac (shell script).

Once you've added the bin/ directory to your system's PATH variable, you'll be able to use the command from your command line like this:

HunspellXML -h

That will list a summary of the command line options which currently look like this:

Usage:
*************************************
*Print this HunspellXML help message*
*************************************
HunspellXML -h
*or*
HunspellXML -?

*************************************
*Convert XML file to Hunspell format*
*************************************
HunspellXML [Options] [Output Suppression] [Export Options] hunspellXML_input_file.xml

     [Optional Flags]
-o=file_base       Base filename (no extension) for creating Hunspell dictionary, e.g. path/to/en_US
-l=level           Log level: none, error, warning, info, debug
-rng               Create RelaxNG schema for HunspellXML
-rt                Run dictionary tests

     [Output Suppression]
-s                 Suppress all extra output
-sa                Suppress automatic comments and blank lines
-sab               Suppress automatic blank lines
-sac               Suppress automatic comments
-sd                Suppress metadata output
-sm                Suppress my comments and blank lines
-smb               Suppress my blank lines
-smc               Suppress my comments

     [Export Options: If none are specified, all will be created.]
-hs                Create Hunspell dictionary files
-ts                Create Hunspell test files
-th                Create MyThes thesaurus files
-rm                Create Readme file
-lc                Create License file
-ff                Create Firefox dictionary plugin
-lo                Create LibreOffice dictionary plugin
-op                Create Opera dictionary plugin

*************************************
*Convert Hunspell format to XML file*
*************************************
HunspellXML [Optional Flags] [Output Suppression] [Processing Options] hunspell_input_file.aff
*or*
HunspellXML [Optional Flags] [Output Suppression] [Processing Options] hunspell_input_file.dic

     [Optional Flags]
-o=filename.xml    Filename for exporting HunspellXML file, e.g. path/to/en_US.xml
-oc=output-charset Convert output to this Hunspell character set
-dc=charset        Default charset for reading and exporting if none is 
                      specified in the .aff file. Valid charsets are:
                      [UTF-8, ISO8859-1 to ISO8859-10, ISO8859-13 to ISO8859-15, 
                      KOI8-R, KOI8-U, microsoft-cp1251, ISCII-DEVANAGARI]
-df=flag-type      Default flag type if none is specified in the .aff file.
                      [short, long, UTF-8, num]
-dl=lang_code      Default language code if none is specified in the .aff file, e.g. en_US, fr_FR
-l=level           Log level: none, error, warning, info, debug

     [Output Suppression]
-s                 Suppress all extra output
-sm                Suppress my comments and blank lines
-smb               Suppress my blank lines
-smc               Suppress my comments

     [Processing Options: If none are specified, all will be processed.]
-aff               Process the Hunspell .aff file
-dic               Process the Hunspell .dic file
-tst               Process the Hunspell .good and .wrong files
-dat               Process the MyThes .dat file
-dat=thes_file.dat Process the MyThes .dat file named thes_file.dat. This allows the
                      .dat file to start with a different name than the .aff/.dic files.

Convert from Hunspell format to HunspellXML

Let's say you have a Hunspell dictionary with files named my_dic.aff and my_dic.dic. To convert this into HunspellXML format, the simplest way would be to run this command:

HunspellXML my_dic.dic

This will load the files my_dic.aff and my_dic.dic and convert them to a file named my_dic.xml stored in the same directory as my_dic.aff and my_dic.dic. This .xml file will contain all the affix and dictionary data from the .aff and .dic files.

If you had a MyThes thesaurus file named thes_my_dic.dat in the same directory, the above command would have also added a thesaurus section to the .xml file with the thesaurus data included.

If there are files named my_dic.good and my_dic.wrong which specify tests for matching correctly spelled words (.good) and incorrectly spelled words (.wrong), these files will also be converted and included in the <tests>...</tests> section of the HunspellXML file.

Specify what data to convert

To create a .xml file with only some of the data (only affix data, dictionary data, or thesaurus data), use the -aff, -dic, -dat, and -tst flags to specify what data you want to include. -tst is for .good and .wrong files, and -dat is for thesaurus data.

For example: HunspellXML -aff my_dic.dic would only export data from the .aff file into the .xml file.

HunspellXML -dic -dat my_dic.dic would export dictionary and thesaurus data into the .xml file but not the affix rule data.

Specify a different thesaurus file

Thesaurus files are usually named thes_[name].dat. HunspellXML knows how to look for this if your .aff and .dic files are named [name].aff and [name].dic. But if your thesaurus is named differently, you can pass this name like this: HunspellXML -aff -dic -data=thes_data.dat my_dic.dic This will load the my_dic.aff and my_dic.dic files and the thes_data.dat file to output affix, dictionary, and thesaurus data to the my_dic.xml file.

Specify a different output file

You can change the name of the .xml file that will be created using the -o= flag.

HunspellXML -o=MyDic.xml my_dic.dic

Instead of creating the default .xml file based on the name of the input file (my_dic.dic -> my_dic.xml), with the above command the converted HunspellXML file would be named MyDic.xml

Convert character encoding

If your Hunspell files are in one character encoding and you want your HunspellXML file to be in another character encoding (e.g. UTF-8), use the -oc= option to specify that character encoding.

HunspellXML -oc=UTF-8 my_dic.dic

This command would convert from whatever character encoding the dictionary is currently in to UTF-8, resulting in a my_dic.xml file that is encoded in UTF-8.

Default charset, flag type, and language code

In some instances you may need to provide a hint to the converter of what character encoding the Hunspell file is in, what the flag type is for affixation flags (short, long, num, or UTF-8), or what the language code is. Hunspell has some logic to detect what character set is used, or this may be specified in the .aff file. The same goes for the flag type. But if you don't choose to export the affix data (by omitting the -aff option while including on of the other export options -dic, -dat, or -tst), or if there is some other problem in converting the dictionary correctly, then specifying these defaults may help.

Default charset

HunspellXML -dc=KOI8-U -oc=UTF-8 my_dic.dic

This would signal that the current dictionary should be read in as encoding KOI8-U (unless something different is specified in the .aff file) and that it should be output as UTF-8 in the .xml file.

Default language code

HunspellXML -dl=uk_UA my_dic.dic

This command would indicate a default language code of uk_UA for Ukrainian. This will only be overridden if the .aff file contains a LANG command that is different.

Default flag type

HunspellXML -dic -dat -tst -df=short my_dic.dic

This sets the default flag type to short. This could be overridden by a FLAG command in the .aff file. This should really only be needed if you don't export the .aff file data as in the above command (note that -aff is not specified) or if an .aff file doesn't specify the flag type like it should.

Log level

You can specify how much information you want to see output from the conversion process by specifying the log level. There are five log levels: none, error, warning, info, and debug. By default the log output is set to warning which means you will see warnings and errors. To see all the information output by the conversion process:

HunspellXML -l=debug my_dic.dic

Comment suppression

You can suppress comments and blank lines from the input files by using output suppression options:

-s   (suppress all my comments and blank lines from the files)
-sm  (same as -s)
-smb (suppress my blank lines from the files)
-smc (suppress my comments from the files)

Convert from HunspellXML format to Hunspell

⚠️ **GitHub.com Fallback** ⚠️