Skip to content

DBF encodings in GeoDa

Xun Li edited this page May 8, 2019 · 7 revisions

DBF encodings in GeoDa

When saving a DBF file, it is possible a .CPG file is created if a specific encoding (e.g. GB2312 for Chinese characters) is used in GeoDa Table. The encoding information could be loaded from original dataset (see following discussion of LDID and code page in a .CPG file) or specified manually by a user using Table->Encode menu.

1. .CPG file

A .CPG file is an optional file that can be used to specify the code page for identifying the character set to be used. From OGR's Shapefile/DBF driver page: https://www.gdal.org/drv_shapefile.html

An attempt is made to read the code page setting in the .cpg file, or as a fallback in the LDID/codepage setting from the .dbf file, and use it to translate string fields to UTF-8 on read, and back when writing.

LDID valid Language Driver ID code can be found in http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM, and this page also shows which LDID value matches to which code page.

Please note that: Shapefile's DBF table contains a valid Language Driver ID (LDID) value in its header. However, the .CPG file has the highest priority.

2. How to deal with a .CPG file

From GDAL/OGR source code, it seems that OGR provides the code to translate LDID to code page, which is used internally in GDAL/OGR. A code page value (instead of a LDID value) can also be directly written in a .CPG file (e.g. "Big5" for Traditional Chinese). See the code here:

https://github.com/lixun910/gdal/blob/abcc93ccb4a712aed843f13f794a439982310f22/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L233

The valid code page includes:

Windows code page: CPxxxx
ISO code page: ISO-88859-xxx
Others: e.g. UTF-8, Big5, etc.

For development in GeoDa, the logic to handle additional files when saving a DBF file:

  1. don't create a .prj file
  2. don't create a .cpg file if no specific encoding is used in GeoDa
  3. only create a .cpg file when a specific encoding is used in GeoDa

3. Test cases:

  1. Open a dbf file in GeoDa (no .cpg file). When exporting to a new dbf file, no .cpg file should be created.

  2. Open a dbf file in GeoDa (with a .cpg file). When exporting to a new dbf file, the same .cpg file should be created with the new dbf file.

  3. Open a dbf file in GeoDa (no .cpg file). Manually specify encodings (e.g. Chinese Simplified). When exporting to a new dbf file, a .cpg file (with content CP936) should be created with the new dbf file.

  4. Open a dataset with geometries (e.g. Guerry). When exporting to a new dbf file (table only), no .shp file should be created.