Dictionary support - koreader/koreader GitHub Wiki
KOReader supports dictionary lookup in EPUB and PDF/DJVU documents. To select a phrase for the dictionary or Wikipedia, simply hold on a word or, hold and drag to select multiple words for other functions.
To use the dictionary lookup function, you first need to install one or more dictionaries in the StarDict format.
The StarDict-format dictionary files have suffixes *.idx
, *.ifo
or *.ifo.gz
, *.dict
or *.dict.dz
.
The dictionaries need to be installed into one of these directories:
-
/sdcard/koreader/data/dict
directory for Android -
/mnt/private/koreader/data/dict
for Cervantes -
koreader/data/dict
directory for Kindle -
.adds/koreader/data/dict/
directory for Kobo -
applications/koreader/data/dict
directory for Pocketbook -
$HOME/.config/koreader/data/dict
directory for Linux -
$HOME/Library/Application Support/koreader/data/dict
directory for macOS
Since v2020.04 you can override the directory where dictionaries are installed. This is useful if your device has more than one application that can read StarDict dictionaries to avoid duplicates. To do so, you'll need to add the full path to defaults.custom.lua
. For example: STARDICT_DATA_DIR = "/mnt/onboard/.adds/vlasovsoft/dictionary"
.
- The reader.dict (ex "BoboTiG/ebook-reader-dict") project provides StarDict version of daily dumps of Wiktionary monolingual dictionaries for a variety of languages. It also provides non-free multilingual, and universal, dictionaries.
- The WikDict project provides bilingual StarDict dictionaries (download link) based on Wiktionary for a lot of language pairs.
- This Github repository contains dictionaries based on Wiktionary from many languages to English, including English-English.
- The DictInfo website provides outdated monolingual dictionaries based on Wiktionary.
- The Firedict site contains a list of freely available dictionaries.
- One can convert between different dictionaries formats using PyGlossary.
- Some freely available dictionaries can be converted to the StarDict format with stardicter. See also wiktionary-to-stardict.
- It is also possible to convert dict.cc dictionaries to the StarDict format with dictcc-stardict.
- You may also be able to use
DICT
files used by the standard dictd daemon and the related dict packages that contain.dict
files. Those files can be converted tostardict
format using the/usr/lib/stardict-tools/dictd2dic
command provided in thestardict-tools
package, although it seems to fail to create the necessary metadata files like the.ifo
file. - You can download dictionaries from the internet within KOReader as shown here.
- Fictionaries provides dictionaries for various speculative fiction books and series.
You can use HTML encoded dictionaries, as described here.
Also, dictionaries can be tweaked with a custom CSS file, as described here and here. You can find sample files showing how to tweak them here. And some more discussion can be found here.
MuPDF is used to render the HTML dictionary results. If KOReader notices MuPDF didn't like the HTML, it falls back to stripping tags, keeping line feeds, and gives it back to MuPDF.
We can't easily fix up HTML, but one can add a .lua
file in the dict
directory with code to tweak the output before feeding it to MuPDF.
You need to be at ease with Lua, or just hack the samples @poire-z created for some french dicts. More details in #3585 (and #3606, #3611).
You can strip (or more simply make them not interpreted by MuPDF) the inline CSS with something like the following in the <dictfilename>.lua
:
return function(html)
-- html = html:gsub(' style=', ' zzztyle=')
html = html:gsub(' [Ss][Tt][Yy][Ll][Ee]=', ' zzztyle=')
return html
end
- Edit an
.ifo
file in the dictionary folder. There should be a parametersametypesequence
. To make CSS stripping work it should besametypesequence=h
. - Keep in mind that CSS stripping is a very powerful tool which can lead to enormous substitutions. To play it safe, check the output of the Stardict binary to find out what tags are used in the HTML layout. For example, from SSH or a terminal on a device, go to the
koreader/
directory and callsdcv -02 data/dict quaint
, wheredata/dict
is the dictionary folder andquaint
in the search query. The output should look like this:
[root@kindle koreader]# ./sdcv -02 data/dict/ quaint
Found 2 items, similar to quaint.
-->Longman Dictionary of Contemporary English 5th Ed. (En-En)
-->quaint
<k>quaint</k>
<c c="blue"><b>quaint</b></c> /kweɪnt/ <abr>BrE</abr> <rref>bre_quaint0205.wav</rref> <abr>AmE</abr> <rref>ame_quaint.wav</rref><i><c> adjective</c></i>
<blockquote><blockquote>[<c c="lightcoral">Date: </c><c c="darkgray">1100-1200</c>; <c c="lightcoral">Language: </c><c c="darkgray">Old French</c>; <c c="lightcoral">Origin: </c><c c="darkgray">cointe</c><c c="darkgray"> </c><i><c c="lightseagreen">'clever'</c></i><c c="darkgray">, from </c><c c="darkgray">Latin</c><c c="darkgray"> </c><c c="darkgray">cognitus</c><c c="darkgray"> </c><i><c c="lightseagreen">'known'</c></i>]</blockquote></blockquote>
<blockquote><blockquote> unusual and attractive, especially in an old-fashioned way: </blockquote></blockquote>
<blockquote><blockquote><blockquote><blockquote> <rref>exa_p008-000464505.wav</rref> <ex>a quaint little village in Yorkshire</ex></blockquote></blockquote></blockquote></blockquote>
From the output, several things can be extracted. One - the main tag for paragraphs is <blockquote>
. Two - the main tag for colored text is <c c="color">
which is not a classical CSS-coloring scheme. Moreover, colors themselves are written out as text instead of HTML-RGB references, so they might be completely ignored by KOReader. Three - there are references to .wav
sound files, which are redundant for KOReader. In dictionary applications that support such references, these are essentially small icons of a speaker action as a button to trigger the sound. However in KOReader, they will be rendered plainly as in the html source, e.g. bre_quaint0205.wav
. Four - there is an extra word of the query in the <k>
tag.
- After you figure out what you would like to replace, create a
.lua
file with exactly the same name as the.ifo
file (before the file extension). Here is an example content of such a file to replace color schemes and definitions with classical ones, in it, we replaced.wav
references with a Unicode icon of a speaker (to distinguish sound examples from the word explanation), we removed any<k>
tag words, and made sure the images are pointing to the right path, realtive to...koreader/data/dict/DICTNAME/res/
directory.
return function(html)
html = html:gsub('<rref[^>]*>[^<]*%.wav</rref>', '🔊')
html = html:gsub('<k[^>]*>[^<]*</k>', '')
html = html:gsub('<c>', '<span>')
html = html:gsub('</c>', '</span>')
html = html:gsub('<c c="', '<span style="color:')
html = html:gsub('"color:indigo"', '"color:#4B0082"')
html = html:gsub('"color:darkgray"', '"color:#A9A9A9"')
html = html:gsub('"color:lightcoral"', '"color:#F08080"')
html = html:gsub('"color:lightseagreen"', '"color:#20B2AA"')
html = html:gsub('"color:darkgoldenrod"', '"color:#B8860B"')
html = html:gsub('<rref[^>]*>', '<img src="/')
html = html:gsub('.jpg</rref>', '.jpg">')
return html
end
- If you want to tweak the text output with css, create a
.css
file with the same name as the.ifo
and.lua
files (before the file extension). For this particular example, the CSS file looks like:
blockquote{
margin-left: 1.0rem;
margin-right: 0.5rem;
text-align: justify;
}
Here is a screenshot of how it was before with sametypesequence=x
by default, and after making it sametypesequence=h
and adding .lua
and .css
:


KOReader has a built-in OCR engine for recognizing words in scanned PDF/DJVU pages. To use OCR on scanned pages, you need to install the appropriate Tesseract trained data set and add new document languages to koreader/defaults.lua
(if your language is other than English or Chinese).
-
Download language data files for Tesseract 4.00+ and copy the appropriate language data file (e.g.
eng.traineddata
in thetesseract-fast repository
for English andspa.traineddata
for Spanish) intokoreader/data/tessdata
. -
To add new languages, open
koreader/defaults.custom.lua
and add languages via theirISO 3-letter code
(important, this needs to match the training data filename!) to theDKOPTREADER_CONFIG_DOC_LANGS_CODE
array:
DKOPTREADER_CONFIG_DOC_LANGS_CODE = {"eng", "chi_sim"} -- language code, make sure you have corresponding training data
For example, for Kazakh these would be kaz
; for Russian - rus
, etc. If you are unsure of the code for your language, look at the tessdata filenames first.
If you've never customized any advanced settings before, the file will not exist, in which case, just follow the directions in the next sentence, any modified entries will appear in bold, and will automatically be added to the file on exit (this will also help making sure that file is syntactically sound).
If you don't need to add new entries, and simply want to modify the existing ones, you can also go to Tools
> More tools
> Advanced settings
in the file-manager's top menu, and find the DKOPTREADER_CONFIG_DOC_LANGS_CODE
entry there.
Forced OCR
option make KOReader to ignore any built-in text layers that come with pdf/djvu and use only OCR tessdata instead.
You can configure the order of dictionaries in the interface below.
Tap the name of one dictionary (not the checkbox) to select it, you can then move it up or down using the buttons at the bottom of the screen.
More info can be found here.
To look up a word in the dictionary, press and hold on the word. If you press and hold for more than 3 seconds, it will open a menu with more options, as described here.
The dictionary supports a history of searched words, accessible through the menu. More info can be found here (with images).
You can cancel any search by tap. More on this here.