HunspellXML Format (ThesaurusFile) - TrnsltLife/HunspellXML GitHub Wiki

HunspellXML   HunspellXML Format > ThesaurusFile


UNDER CONSTRUCTION

<thesaurusFile>...</thesaurusFile>

The <thesaurusFile>...</thesaurusFile> section of the HunspellXML file can be used to build a MyThes thesaurus file which is used in OpenOffice and LibreOffice to provide synonym suggestions. HunspellXML gives you three ways to add synonyms into the MyThes file.

  1. <include .../> - Put the list of synonyms in a separate file in MyThes format.
  2. <entries>...</entries> - Put a list of synonyms in MyThes format directly inside the <entries>...</entries> block(s).
  3. <entry><synonyms>[list of synonyms]</synonyms></entry> - Create the list of synonyms in an XML format instead of a MyThes format.
<thesaurusFile>
<include file="my-thesaurus-file.txt"/>
<include file="my-thesaurus-file2.txt"/>
<entries>
[MyThes data...]
</entries>
<entry word="...">
	<synonyms info="...">
		word
		word
		...
	</synonyms>
	<synonyms info="...">
		<s>word</s>
		<s>word</s>
		...
	</synonyms>
</entry>
<entry word="...">
...
</entry>
</thesaurusFile>

MyThes File Format

Before we look more in depth at the three options for listing synonyms, we need to understand the MyThes data format. Here's what MyThes's data_layout.txt file has to say about the MyThes data format:

- All of the remaining lines of the file follow this structure

entry|num_mean
pos|syn1_mean|syn2|...
.
.
.
pos|mean_syn1|syn2|...


where:

   entry      - all lowercase version of the word or phrase being described
   num_mean   - number of meanings for this entry

   There is one meaning per line and each meaning is comprised of

   pos        - part of speech or other meaning specific description
   syn1_mean  - synonym 1 also used to describe the meaning itself 
   syn2       - synonym 2 for that meaning etc.


To make this even more clear, here is actual data for the
entry "simple".

simple|9
(adj)|simple |elemental|ultimate|oversimplified|simplistic|simplex|simplified|unanalyzable|undecomposable|uncomplicated|unsophisticated|easy|plain|unsubdivided
(adj)|elementary|uncomplicated|unproblematic|easy
(adj)|bare|mere|plain
(adj)|childlike|wide-eyed|dewy-eyed|naive |naif
(adj)|dim-witted|half-witted|simple-minded|retarded
(adj)|simple |unsubdivided|unlobed|smooth
(adj)|plain
(noun)|herb|herbaceous plant
(noun)|simpleton|person|individual|someone|somebody|mortal|human|soul


It says that "simple" has 9 different meanings and each 
meaning will have its part of speech and at least 1 synonym 
with other if present following on the same line.

<entries>...</entries>

The <entries>...</entries> tags should contain multiple lines of text, formatted according to the MyThes format.

<entries>
...
simple|9
(adj)|simple |elemental|ultimate|oversimplified|simplistic|simplex|simplified|unanalyzable|undecomposable|uncomplicated|unsophisticated|easy|plain|unsubdivided
(adj)|elementary|uncomplicated|unproblematic|easy
(adj)|bare|mere|plain
(adj)|childlike|wide-eyed|dewy-eyed|naive |naif
(adj)|dim-witted|half-witted|simple-minded|retarded
(adj)|simple |unsubdivided|unlobed|smooth
(adj)|plain
(noun)|herb|herbaceous plant
(noun)|simpleton|person|individual|someone|somebody|mortal|human|soul
...
</entries>

There can be multiple <entries>...</entries> sections and they will all be stitched together to form the final thesaurus data.

<entry>...</entry>

<include .../>

The <include .../> element instructs HunspellXML to open an external file and load all its MyThes rules (in the MyThes forma listed above) into the synonym list that will be used to create the MyThes .dat file. Anything that can go in a <entries>...</entries> block can go in the external file.

<include file="lin_synonyms.txt"/>
⚠️ **GitHub.com Fallback** ⚠️