Digiroad to OSM transformation - HY-OHTUPROJ-OSRM/osrm-project GitHub Wiki

What you need

Getting iceroad data:

If you want iceroads (optional), you will need to download the Maastotietokanta GeoPackage and extract the iceroads from the road layer.

This can be done in QGIS as follows:

Load the Maastotietokanta GeoPackage file in QGIS.
Add the tieviiva table as a layer from the GeoPackage.
With the tieviiva layer selected, click Select Features Using an Expression (Ctrl+F3).
Write kohdeluokka=12312 in the expression editor and click Select Features.
Right click on the tieviiva layer and click Export->Save Selected Features As....
In the dialog that opens, select where to save your output GeoPackage file with the ... button.
Enter iceroads as the layer name in the dialog. (otherwise dr2osm will not find the layer!)
Click OK.

With the iceroads extracted as a GeoPackage file, now you can combine them with the Digiroad data with the --mml-iceroads <ice-road-path> switch as described in the dr2osm README.

Instructions

The dependencies for building the tool are libsqlite3 and libproj, both of which are available in the Cubbli package repository in the packages libsqlite3-dev and libproj-dev respectively. On Windows the script install_dependencies.bat can be used to automatically install the dependencies via OSGeo4W.

The tool is built using the build.sh script (build.bat on Windows). Other than for debugging purposes, it should be given the argument release to enable compiler optimizations and disable assertions, i.e. ./build.sh release. The resulting binary dr2osm (dr2osm.exe) should be executed with two arguments, the first being the path to the source geopackage file and the second the one to the destination OSM file.

At the moment, the output is is always in the OSM XML format, resulting in a very large file when converting the data of entire Finland. To get a more manageably sized file, the XML can be converted to the OSM PBF format, for example by using the osmconvert tool available on Cubbli in the package osmctools. By giving dr2osm - as the second argument, its output can be piped directly into osmconvert followingly:

dr2osm <input> - | osmconvert --out-pbf -o=<output> -

See the README file for further information.

Technical overview

The entry point and main bulk of the program live in the file dr2osm.c. After intialization the program is split roughly into two steps:

The first step is to query the way and speed limit data from the geopackage file using sqlite. The geometry data is parsed, snapped to an integer grid and inserted one node at a time into an associative array (implemented as a quadrant tree). If a node does not have the same location as a previously seen node, it is assigned a unique ID and written out. Otherwise it will use the ID of the previously seen node and not be output again. In either case the node ID is added to the list of node IDs for the current way. After the nodes for a particular way have been processed, the way itself is assigned an ID and its other metadata is processed. The way ID followed by the list of node IDs and other data is then buffered to be written out later.

The second step is to simply write out the buffered way data for all the links as XML. The up-to-date format in which ways are buffered can be seen in the part of the code where they are output. At the time of writing, the format is roughly following:

int way_id
int... node_ids
int highway
int route
int oneway
int maxspeed
string name
int... additional_tags

node_ids and additional_tags are both sequences of ints terminated by a zero. way_id, maxspeed and name are output as-is, while highway, route, oneway and each element of additional_tags are all indices into the global array osm_strings.

Memory allocation and buffering functionality is in the separate file buffer.c. The program's memory management consists of reserving two large address ranges, one each for the node and link buffers, and committing memory to them as it is needed. The lifetime of both buffers is the entire runtime of the program, so no memory can be leaked and separate free calls are not necessary.

Potential improvements

Support directly outputting a PBF file to avoid passing several gigabytes of XML data between programs.
Multithreading of node buffering. This accounts for around half of the program's runtime, but it is unclear exactly how much it can be sped up this way and whether or not it is worth the added complexity. Try it.
Apart from the IDs, the values buffered in the way buffer require nowhere near the entire int number range. There is potential to reduce buffer usage by for example packing multiple into a single int. The additional_tags field could be stored as either a bit field or a string.