File format - wattzhikang/terrainHydrology GitHub Wiki

With PR #56, this project uses a new save file. Note that this change will likely lead to significant architectural changes, so this format will change. Moreover, with Issue #57, support for PostGIS will be added.

This document will not describe the data model for this program; a separate document should be written for that. This document will merely describe how it is encoded.

Legacy format

The original save file format was developed in April and May 2021. The first logic that serialized parts of the data model into a sequence of bytes was written to enable communication between and the native module. (Some of the earliest logic to do this was committed in 224fe7c on 13 April 2021.) This logic was soon adapted to serialize the entire data model. This architecture was largely finalized with commit b9adaff on 26 May. Breaking changes were often introduced to the serialization logic, necessitating that every save file be prepended with a version number to indicate these breaking changes.

The file was written by painstakingly encoding every value into binary one at a time, and it was not done very well. Interested parties can examine the details of this format by inspecting the git history of

Motivation to develop a new file

I always knew that this was a terrible scheme. For a while I had looked into the possibility of writing the data model as a set of ESRI Shapefiles (see Issue #9), but I was not satisfied with this solution. Finally, it occurred to me that SpatiaLite was the obvious solution. SpatiaLite would:

  • Keep all of the data in a single file
  • Perfectly preserve the relationships among the data
  • Be easily usable by end users
  • Allow the basic task of serialization to be done by developers who are smarter than me
  • Make schema changes less likely to make older files entirely unusable
  • Speed up development by not forcing me to write a lot of verbose logic for every minor change to the data model

New schema

The new file is a SQLite database that uses SpatiaLite extensions.

A diagram of the database schema. Note that the relationship between RiverNodes and RiverPaths is incorrectly indicated as being 1 : many. It is actually 1:1.

This diagram can be recreated with using the following markup:

Table Shoreline {
  id int [pk]
  loc geom

Table RiverNodes {
  id int [pk]
  parent int [ref: >]
  elevation float
  localwatershed float
  inheritedwatershed float
  flow float
  contourIndex int [ref: -]
  loc geom

Table Qs {
  id int [pk]
  elevation float
  loc geom

Table Cells {
  rivernode int [ref: >]
  polygonOrder int
  q int [ref: >]

Table Ts {
  id int [pk]
  rivercell int [ref: >]
  elevation float
  loc geom

Table Edges {
  id int [pk]
  q0 int [ref: >]
  q1 int [ref: >]
  hasRiver bool
  isShore bool
  shore0 int [ref: >]
  shore1 int [ref: >]

Table RiverPaths {
  id int [pk]
  rivernode int [ref: >]
  path geom

Table Parameters {
  key text [pk]
  value text

All technical details of the schema can be found in src/db-init.sql, which creates all the tables and sets up their relationships.

The following sections will review each table.


Each record in this table represents a single river node in the Hydrology.

This is one of the most important tables in the schema. A river node doesn't just have a river. A river node is an area enclosed by a polygon, partitioning its cell from its neighbors. That cell consists of ridge primitives. It also has a set of terrain primitives.

There is data in these records that is redundant, and will probably be removed in a future release:

  • localwatershed and inheritedwatershed are just the areas of cell polygons
  • flow is computed from the areas of cell polygons


Each record represents the path of the river that flows through a river node.

In this table, there is a 1:1 relationship between river nodes and river paths. In the data model, every leaf node has a unique river, and the rest of the river nodes just reference one of those. This will be reconciled eventually.


Each record represents a ridge primitive.


Each record represents an edge that either divides 2 river node cells, or demarcates the shore between a river node and the ocean.

Each edge lies between 2 ridge primitives.

There is some redundant data that will probably be removed in a future release:

  • hasRiver just means that the 2 cells that this edge divides have a parent-child relationship. This could be easily determined by a query.
  • isShore is more dubious, because it could just mean that shore0 and shore1 are NULL.


Each record represents a point on the shoreline.

This table is probably redundant. In reality, each point on the shore is a ridge primitive, but there is no relationship between Qs and Shoreline. This doesn't make any sense, and will be changed in a future release.


Each record indicates a relationship between a ridge primitive and a river node.

This is a through table. Every river node will have a number of ridge primitives, and many ridge primitives will border 3 or more river nodes.

The column polygonOrder indicates the sort order for the Qs relative to a particular river node. This can sort the Qs in counterclockwise order for a particular river node. As suggested by the name, this column is important for deriving the geometry of cells.


Each record represents a terrain primitive.


Each record represents a parameter that was used to generate the terrain.

The parameters currently stored in the table, edgeLength and resolution, will probably be rendered irrelevant by future architecture changes. But there will be future architecture changes, such as the effort to make this program more deterministic, that will probably preserve the relevance of this table.


Because this schema preserves the data model's structure, not only is it possible to reconstruct the data model, it is possible to derive many different kinds of information with SQL queries. This reduces the need to write more and more scripts to derive information.

Edge lines

Cell polygons

Watershed polygons