Command Line Tool - TrnsltLife/HunspellXML GitHub Wiki
If you download version 1.8 or above from the releases, you will be able to use the command line utility to convert HunspellXML to Hunspell or vice versa.
The utility is located in the bin/ folder as a Windows .bat file and as a Unix shell script, so you can use it on Windows (.bat) or on Linux or Mac (shell script).
Once you've added the bin/ directory to your system's PATH variable, you'll be able to use the command from your command line like this:
HunspellXML -h
That will list a summary of the command line options which currently look like this:
Usage:
*************************************
*Print this HunspellXML help message*
*************************************
HunspellXML -h
*or*
HunspellXML -?
*************************************
*Convert XML file to Hunspell format*
*************************************
HunspellXML [Options] [Output Suppression] [Export Options] hunspellXML_input_file.xml
[Optional Flags]
-o=file_base Base filename (no extension) for creating Hunspell dictionary, e.g. path/to/en_US
-l=level Log level: none, error, warning, info, debug
-rng Create RelaxNG schema for HunspellXML
-rt Run dictionary tests
[Output Suppression]
-s Suppress all extra output
-sa Suppress automatic comments and blank lines
-sab Suppress automatic blank lines
-sac Suppress automatic comments
-sd Suppress metadata output
-sm Suppress my comments and blank lines
-smb Suppress my blank lines
-smc Suppress my comments
[Export Options: If none are specified, all will be created.]
-hs Create Hunspell dictionary files
-ts Create Hunspell test files
-th Create MyThes thesaurus files
-rm Create Readme file
-lc Create License file
-ff Create Firefox dictionary plugin
-lo Create LibreOffice dictionary plugin
-op Create Opera dictionary plugin
*************************************
*Convert Hunspell format to XML file*
*************************************
HunspellXML [Optional Flags] [Output Suppression] [Processing Options] hunspell_input_file.aff
*or*
HunspellXML [Optional Flags] [Output Suppression] [Processing Options] hunspell_input_file.dic
[Optional Flags]
-o=filename.xml Filename for exporting HunspellXML file, e.g. path/to/en_US.xml
-oc=output-charset Convert output to this Hunspell character set
-dc=charset Default charset for reading and exporting if none is
specified in the .aff file. Valid charsets are:
[UTF-8, ISO8859-1 to ISO8859-10, ISO8859-13 to ISO8859-15,
KOI8-R, KOI8-U, microsoft-cp1251, ISCII-DEVANAGARI]
-df=flag-type Default flag type if none is specified in the .aff file.
[short, long, UTF-8, num]
-dl=lang_code Default language code if none is specified in the .aff file, e.g. en_US, fr_FR
-l=level Log level: none, error, warning, info, debug
[Output Suppression]
-s Suppress all extra output
-sm Suppress my comments and blank lines
-smb Suppress my blank lines
-smc Suppress my comments
[Processing Options: If none are specified, all will be processed.]
-aff Process the Hunspell .aff file
-dic Process the Hunspell .dic file
-tst Process the Hunspell .good and .wrong files
-dat Process the MyThes .dat file
-dat=thes_file.dat Process the MyThes .dat file named thes_file.dat. This allows the
.dat file to start with a different name than the .aff/.dic files.
Let's say you have a Hunspell dictionary with files named my_dic.aff
and my_dic.dic
. To convert this into HunspellXML format, the simplest way would be to run this command:
HunspellXML my_dic.dic
This will load the files my_dic.aff
and my_dic.dic
and convert them to a file named my_dic.xml
stored in the same directory as my_dic.aff
and my_dic.dic
. This .xml file will contain all the affix and dictionary data from the .aff and .dic files.
If you had a MyThes thesaurus file named thes_my_dic.dat
in the same directory, the above command would have also added a thesaurus section to the .xml file with the thesaurus data included.
If there are files named my_dic.good
and my_dic.wrong
which specify tests for matching correctly spelled words (.good) and incorrectly spelled words (.wrong), these files will also be converted and included in the <tests>...</tests>
section of the HunspellXML file.
To create a .xml file with only some of the data (only affix data, dictionary data, or thesaurus data), use the -aff, -dic, -dat, and -tst flags to specify what data you want to include. -tst is for .good and .wrong files, and -dat is for thesaurus data.
For example:
HunspellXML -aff my_dic.dic
would only export data from the .aff file into the .xml file.
HunspellXML -dic -dat my_dic.dic
would export dictionary and thesaurus data into the .xml file but not the affix rule data.
Thesaurus files are usually named thes_[name].dat
. HunspellXML knows how to look for this if your .aff and .dic files are named [name].aff and [name].dic. But if your thesaurus is named differently, you can pass this name like this:
HunspellXML -aff -dic -data=thes_data.dat my_dic.dic
This will load the my_dic.aff
and my_dic.dic
files and the thes_data.dat
file to output affix, dictionary, and thesaurus data to the my_dic.xml
file.
You can change the name of the .xml file that will be created using the -o=
flag.
HunspellXML -o=MyDic.xml my_dic.dic
Instead of creating the default .xml file based on the name of the input file (my_dic.dic
-> my_dic.xml
), with the above command the converted HunspellXML file would be named MyDic.xml
If your Hunspell files are in one character encoding and you want your HunspellXML file to be in another character encoding (e.g. UTF-8), use the -oc=
option to specify that character encoding.
HunspellXML -oc=UTF-8 my_dic.dic
This command would convert from whatever character encoding the dictionary is currently in to UTF-8, resulting in a my_dic.xml
file that is encoded in UTF-8.
In some instances you may need to provide a hint to the converter of what character encoding the Hunspell file is in, what the flag type is for affixation flags (short, long, num, or UTF-8), or what the language code is. Hunspell has some logic to detect what character set is used, or this may be specified in the .aff file. The same goes for the flag type. But if you don't choose to export the affix data (by omitting the -aff option while including on of the other export options -dic, -dat, or -tst), or if there is some other problem in converting the dictionary correctly, then specifying these defaults may help.
HunspellXML -dc=KOI8-U -oc=UTF-8 my_dic.dic
This would signal that the current dictionary should be read in as encoding KOI8-U (unless something different is specified in the .aff file) and that it should be output as UTF-8 in the .xml file.
HunspellXML -dl=uk_UA my_dic.dic
This command would indicate a default language code of uk_UA
for Ukrainian. This will only be overridden if the .aff file contains a LANG command that is different.
HunspellXML -dic -dat -tst -df=short my_dic.dic
This sets the default flag type to short
. This could be overridden by a FLAG command in the .aff file. This should really only be needed if you don't export the .aff file data as in the above command (note that -aff is not specified) or if an .aff file doesn't specify the flag type like it should.
You can specify how much information you want to see output from the conversion process by specifying the log level. There are five log levels: none, error, warning, info, and debug. By default the log output is set to warning
which means you will see warnings and errors. To see all the information output by the conversion process:
HunspellXML -l=debug my_dic.dic
You can suppress comments and blank lines from the input files by using output suppression options:
-s (suppress all my comments and blank lines from the files)
-sm (same as -s)
-smb (suppress my blank lines from the files)
-smc (suppress my comments from the files)