GeoDMS Default Tiling - ObjectVision/GeoDMS GitHub Wiki

Default tiling

By default, GeoDMS tiles data into segments of 256x256 blocks (for 2D raster/grid data) or 65536 rows (for 1D data).

Tile pipelining

Most GeoDMS operators, even when reading data using for instance gdal.grid, can make use of tile pipelines, meaning data can be processed on a per-tile basis by multiple threads. This can in theory reduce memory consumption significantly, as there is no need to keep the full domain or compacted domain in memory.

LazyCalculated

The property LazyCalculated = "True" can be set on a unit or attribute to indicate that its data should only be calculated when actually requested. This is particularly useful for source data units containing multiple derived attributes: only the attributes that are eventually consumed are processed, and the underlying source data is only read for the tiles needed by those attributes.

For example, when defining a unit that reads from a gdal.grid source and exposes several derived calculations (resampling, modus, etc.), setting LazyCalculated = "True" ensures that requesting only one of these derivations does not trigger reading or computing the others:

unit<ipoint> ahn
:  StorageName    = "='%AHN_Dir%/AHN2/DTM/' + naam + '.tif'"
,  StorageType    = "gdal.grid"
,  DialogData     = "rdc_meter"
,  LazyCalculated = "True"
{
    attribute<Float32> GridData (rdc_05m);
    attribute<float32> resample (rdc_50m) := mean(GridData, rdc_50m_rel);
    attribute<float32> modus    (rdc_50m) := modus(GridData, rdc_50m_rel);
}

Optimal tiling of raster/grid SourceData

To optimally make use of the GeoDMS internal default tiling data model, it is best to provide SourceData in 256x256 tiled blocks. The GeoTiff format supports tiling. Tiling can be specified in most major GIS packages such as QGIS (which uses GDAL under the hood), and also via common GDAL tools, for instance gdal_translate:

gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=DEFLATE -co PREDICTOR=3 in.tif out.tif

The PREDICTOR option improves compression by storing differences between adjacent values rather than the values themselves:

  • PREDICTOR=1 (default): no predictor
  • PREDICTOR=2: horizontal difference, used for integer data
  • PREDICTOR=3: floating point predictor, specific for float data

For float32 elevation data, DEFLATE with PREDICTOR=3 typically achieves 50 to 60 percent compression, fully lossless.

Alternatively, slightly faster reads with larger files and LZW compression:

gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=LZW in.tif out.tif

Batch retiling a folder with PowerShell

For a whole set of files, the following PowerShell 7 script processes all .tif files in a folder in parallel.

  1. Check your PowerShell version: run pwsh --version in a cmd. If this returns an error, install PowerShell 7 via winget install Microsoft.PowerShell.
  2. Create a file retile.ps1 with the script below, adjusting the input and output paths.
  3. Open PowerShell 7 from the start menu.
  4. Add the GDAL binaries to PATH for the current session, for instance:
    $env:PATH = "C:\Program Files\QGIS 3.40.7\bin;" + $env:PATH
    
    (adjust the path to your QGIS or OSGeo4W installation).
  5. Navigate to the folder containing retile.ps1 (e.g. cd C:\scripts) and run it with .\retile.ps1.
$inputDir = "H:\AHN2\DTM"
$outputDir = "H:\AHN2\DTM_tiled"
New-Item -ItemType Directory -Force -Path $outputDir | Out-Null

Get-ChildItem -Path $inputDir -Filter *.tif | Where-Object {
    -not (Test-Path (Join-Path $outputDir $_.Name))
} | ForEach-Object -Parallel {
    $output = Join-Path $using:outputDir $_.Name
    & gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=DEFLATE -co PREDICTOR=3 -q $_.FullName $output
} -ThrottleLimit 12

The Where-Object filter skips files that already exist in the output folder, so the script can be safely restarted if interrupted.

Tuning ThrottleLimit

Adjust -ThrottleLimit to match your storage:

  • HDD: use 1 (parallel I/O slows HDDs down due to head seeks)
  • SSD: start with 8 to 12, monitor disk usage in Task Manager and increase if disk is below 70 percent utilized

Using suboptimal tiling

Significant performance loss can occur when using GeoDMS default tiling on a dataset whose tiling diverts significantly from the default. For instance, reading a GeoTiff with a striped tiling of 1xN (1 row, N columns), filling each internal 256x256 GeoDMS tile requires reading 256 stripes. This causes the entire dataset to be read (and possibly decompressed) multiple times, especially when neighboring data from multiple source files is combined as in raster_merge.

Recognizing tilesize mismatch

GeoDMS reports a tilesize mismatch in the event log when source data tiling diverts from the GeoDMS internal default. The warning looks like this:

gdal Warning(3): GridStorageManager : Tilesize mismatch between data source 
E:/SourceData/AHN/AHN2/DTM/n69GN2.tif of item /Kaartblad/n69GN2/ahn/GridData: 
10000,1 and GeoDMS tiling: 256,256

The first dimensions (10000,1) describe the source file tiling, the second (256,256) the GeoDMS internal tiling. When the values differ substantially, consider retiling the source data as described above.

Performance example: AHN raster_merge

A concrete measurement from issue #1098, processing AHN elevation tiles (originally stored as 10000x1 stripes):

Step Original tiling (10000x1 stripes) Retiled (256x256)
raster_merge over 14 tiles 4 minutes 48 seconds 4 seconds

This is roughly a 67x speed-up on the merge step, achieved purely by aligning the source storage layout with the GeoDMS internal tiling. For a production run over 1373 AHN tiles, the original raster_merge step took approximately 12 hours; after retiling it is expected to complete in minutes. The eventual cost is a one-time retiling pass and additional disk space for the retiled set (the latter mitigated by DEFLATE compression).

Note that increasing GDAL_CACHEMAX did not measurably help in this scenario: the bottleneck is the access pattern mismatch between strip-organized source files and tile-organized GeoDMS reads, not GDAL cache pressure.

⚠️ **GitHub.com Fallback** ⚠️