File format - wattzhikang/terrainHydrology GitHub Wiki
With PR #56, this project uses a new save file. Note that this change will likely lead to significant architectural changes, so this format will change. Moreover, with Issue #57, support for PostGIS will be added.
This document will not describe the data model for this program; a separate document should be written for that. This document will merely describe how it is encoded.
Legacy format
The original save file format was developed in April and May 2021. The first logic that serialized parts of the data model into a sequence of bytes was written to enable communication between hydrology.py
and the native module. (Some of the earliest logic to do this was committed in 224fe7c
on 13 April 2021.) This logic was soon adapted to serialize the entire data model. This architecture was largely finalized with commit b9adaff
on 26 May. Breaking changes were often introduced to the serialization logic, necessitating that every save file be prepended with a version number to indicate these breaking changes.
The file was written by painstakingly encoding every value into binary one at a time, and it was not done very well. Interested parties can examine the details of this format by inspecting the git history of SaveFile.py
.
Motivation to develop a new file
I always knew that this was a terrible scheme. For a while I had looked into the possibility of writing the data model as a set of ESRI Shapefiles (see Issue #9), but I was not satisfied with this solution. Finally, it occurred to me that SpatiaLite was the obvious solution. SpatiaLite would:
- Keep all of the data in a single file
- Perfectly preserve the relationships among the data
- Be easily usable by end users
- Allow the basic task of serialization to be done by developers who are smarter than me
- Make schema changes less likely to make older files entirely unusable
- Speed up development by not forcing me to write a lot of verbose logic for every minor change to the data model
New schema
The new file is a SQLite database that uses SpatiaLite extensions.
This diagram can be recreated with dbdiagram.io using the following markup:
Table Shoreline {
id int [pk]
loc geom
}
Table RiverNodes {
id int [pk]
parent int [ref: > RiverNodes.id]
elevation float
localwatershed float
inheritedwatershed float
flow float
contourIndex int [ref: - Shoreline.id]
loc geom
}
Table Qs {
id int [pk]
elevation float
loc geom
}
Table Cells {
rivernode int [ref: > RiverNodes.id]
polygonOrder int
q int [ref: > Qs.id]
}
Table Ts {
id int [pk]
rivercell int [ref: > RiverNodes.id]
elevation float
loc geom
}
Table Edges {
id int [pk]
q0 int [ref: > Qs.id]
q1 int [ref: > Qs.id]
hasRiver bool
isShore bool
shore0 int [ref: > Shoreline.id]
shore1 int [ref: > Shoreline.id]
}
Table RiverPaths {
id int [pk]
rivernode int [ref: > RiverNodes.id]
path geom
}
Table Parameters {
key text [pk]
value text
}
All technical details of the schema can be found in src/db-init.sql
, which creates all the tables and sets up their relationships.
The following sections will review each table.
RiverNodes
Each record in this table represents a single river node in the Hydrology.
This is one of the most important tables in the schema. A river node doesn't just have a river. A river node is an area enclosed by a polygon, partitioning its cell from its neighbors. That cell consists of ridge primitives. It also has a set of terrain primitives.
There is data in these records that is redundant, and will probably be removed in a future release:
localwatershed
andinheritedwatershed
are just the areas of cell polygonsflow
is computed from the areas of cell polygons
RiverPaths
Each record represents the path of the river that flows through a river node.
In this table, there is a 1:1 relationship between river nodes and river paths. In the data model, every leaf node has a unique river, and the rest of the river nodes just reference one of those. This will be reconciled eventually.
Qs
Each record represents a ridge primitive.
Edges
Each record represents an edge that either divides 2 river node cells, or demarcates the shore between a river node and the ocean.
Each edge lies between 2 ridge primitives.
There is some redundant data that will probably be removed in a future release:
hasRiver
just means that the 2 cells that this edge divides have a parent-child relationship. This could be easily determined by a query.isShore
is more dubious, because it could just mean thatshore0
andshore1
areNULL
.
Shoreline
Each record represents a point on the shoreline.
This table is probably redundant. In reality, each point on the shore is a ridge primitive, but there is no relationship between Qs
and Shoreline
. This doesn't make any sense, and will be changed in a future release.
Cells
Each record indicates a relationship between a ridge primitive and a river node.
This is a through table. Every river node will have a number of ridge primitives, and many ridge primitives will border 3 or more river nodes.
The column polygonOrder
indicates the sort order for the Qs relative to a particular river node. This can sort the Qs in counterclockwise order for a particular river node. As suggested by the name, this column is important for deriving the geometry of cells.
Ts
Each record represents a terrain primitive.
Parameters
Each record represents a parameter that was used to generate the terrain.
The parameters currently stored in the table, edgeLength
and resolution
, will probably be rendered irrelevant by future architecture changes. But there will be future architecture changes, such as the effort to make this program more deterministic, that will probably preserve the relevance of this table.
Usage
Because this schema preserves the data model's structure, not only is it possible to reconstruct the data model, it is possible to derive many different kinds of information with SQL queries. This reduces the need to write more and more scripts to derive information.