GeoDMS Default Tiling - ObjectVision/GeoDMS GitHub Wiki

Default tiling

By default, GeoDMS tiles data into segments of 256x256 blocks (for 2D raster/grid data) or 65536 rows (for 1D data).

Tile pipelining

Most GeoDMS operators, even when reading data using for instance gdal.grid, can make use of tile pipelines, meaning data can be processed on a per-tile basis by multiple threads. This can in theory reduce memory consumption significantly, as there is no need to keep the full domain or compacted domain in memory.

LazyCalculated

The property LazyCalculated = "True" can be set on a unit or attribute to indicate that its data should only be calculated when actually requested. This is particularly useful for source data units containing multiple derived attributes: only the attributes that are eventually consumed are processed, and the underlying source data is only read for the tiles needed by those attributes.

For example, when defining a unit that reads from a gdal.grid source and exposes several derived calculations (resampling, modus, etc.), setting LazyCalculated = "True" ensures that requesting only one of these derivations does not trigger reading or computing the others:

unit<ipoint> ahn
:  StorageName    = "='%AHN_Dir%/AHN2/DTM/' + naam + '.tif'"
,  StorageType    = "gdal.grid"
,  DialogData     = "rdc_meter"
,  LazyCalculated = "True"
{
    attribute<Float32> GridData (rdc_05m);
    attribute<float32> resample (rdc_50m) := mean(GridData, rdc_50m_rel);
    attribute<float32> modus    (rdc_50m) := modus(GridData, rdc_50m_rel);
}

Automatic alignment to the source file's native tiling (GeoDMS 20.1.0)

Since GeoDMS 20.1.0 (issue #1098) you can read and aggregate large GeoTiff files fast, regardless of the file's block size, without retiling the source first.

The mechanism is: when a grid domain unit is left unbound (no := expression) and is read from a gdal.grid or tif storage, GeoDMS no longer forces its own 256x256 default tiling onto that domain. Instead it adopts the native block/strip layout of the file (subdividing only where a file block exceeds the internal tile-size limits). As a result GeoDMS reads each on-disk block or strip exactly once:

  • a 256x256 tiled GeoTiff is read as 256x256 internal tiles;
  • a striped GeoTiff (e.g. AHN's 10000x1 strips, or world data with 42000 or 65000 columns per strip) is read strip-by-strip.

Either way the I/O amplification that occurs when 256x256 reads have to be assembled from many full strips disappears. The tilesize mismatch warning is not raised for such storage-determined domains.

StorageTileSizeX / StorageTileSizeY

Two read-only properties expose the native block size that GeoDMS detected for a storage holder, so you can verify the layout of a file from within the GeoDMS:

  • StorageTileSizeX — native block width (columns), e.g. 10000 for an AHN strip, 256 for a tiled GeoTiff;
  • StorageTileSizeY — native block height (rows), e.g. 1 for an AHN strip, 256 for a tiled GeoTiff.

They work for StorageType = "gdal.grid" and StorageType = "tif".

Example: read + resample an AHN tile, no retiling

This reads a 0.5 m AHN elevation tile and aggregates it to a 50 m grid by averaging (100x100 cells per output cell). The fine grid-domain blad_05m is determined by the TIFF (it inherits range, projection and the file's strip tiling); the coarse grid-domain blad_50m uses the default tiling.

container ahn_data
:  StorageName    = "='%AHN_Dir%/'+ AHN_versie +'/DTM/' + naam + '.tif'"
,  StorageType    = "gdal.grid"
,  DialogData     = "rdc_meter"
,  LazyCalculated = "True"
{
    // Fine 0.5 m grid-domain, DETERMINED BY THE TIFF:
    // range, projection and native (strip/block) tiling are all read from the file.
    unit<ipoint> blad_05m;

    // World origin (from the projection) and pixel dimensions of the tile:
    parameter<int32> x     := int32(Get_X(lowerbound(blad_05m))); // origin x in meters
    parameter<int32> y     := int32(Get_Y(lowerbound(blad_05m))); // origin y in meters
    parameter<int32> nrrow := pointrow(upperbound(blad_05m));     // number of pixel rows
    parameter<int32> nrcol := pointcol(upperbound(blad_05m));     // number of pixel columns

    // Coarse 50 m grid-domain, DEFAULT (256x256) tiling, aligned to blad_05m:
    unit<ipoint> blad_50m :=
        range(
             gridset(
                 rdc_meter
                ,point_yx(-50.0[Meter], 50.0[Meter], rdc_meter)
                ,point_yx(y[Meter], x[Meter], rdc_meter)
                ,ipoint
            )
            ,point_yx(0i, 0i)
            ,point_yx(nrrow / 100i, nrcol / 100i)
        ),  DialogData = "rdc_meter";

    // Relation fine -> coarse: 100x100 fine cells per coarse cell.
    attribute<blad_50m> blad_50m_rel (blad_05m) := mapping(blad_05m, blad_50m);

    attribute<meter> GridData       (blad_05m); // read strip-by-strip, each strip once
    attribute<meter> GridData_0_max (blad_05m) := GridData < -3.4028235e+37[meter] || GridData > 10000[meter] ? null_f : GridData; // strip nodata / outliers
    attribute<meter> Calc_resample_m       (blad_50m) := mean(GridData_0_max, blad_50m_rel);
    attribute<int16> Calc_resample_int16cm (blad_50m) := value(Calc_resample_m[cm], int16);
}

In a measurement on tile n69CN2 (10000 x 12500 pixels, stored as Block=10000x1 strips, ~500 MB), computing Calc_resample_int16cm reads the strip file once and produces the aligned 100 x 125 cell 50 m result in about half a second, with no retiling step and no tilesize-mismatch warning.

Key points that make this work:

  • blad_05m is declared but not assigned (unit<ipoint> blad_05m;); the storage fills in its range, projection and tiling.
  • lowerbound(blad_05m) returns the world origin (from the projection); upperbound(blad_05m) returns the pixel count. Together they parameterise the aligned coarse gridset.
  • mapping relates each fine cell to its coarse cell; mean does the averaging.

When you only need to read and aggregate such files, this is the simplest and most efficient route. Retiling the source (next section) remains useful when you also want optimally tiled storage for repeated use, smaller files, or fast neighbour access across many files as in raster_merge.

Optimal tiling of raster/grid SourceData

To optimally make use of the GeoDMS internal default tiling data model, it is best to provide SourceData in 256x256 tiled blocks. The GeoTiff format supports tiling. Tiling can be specified in most major GIS packages such as QGIS (which uses GDAL under the hood), and also via common GDAL tools, for instance gdal_translate:

gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=DEFLATE -co PREDICTOR=3 in.tif out.tif

The PREDICTOR option improves compression by storing differences between adjacent values rather than the values themselves:

  • PREDICTOR=1 (default): no predictor
  • PREDICTOR=2: horizontal difference, used for integer data
  • PREDICTOR=3: floating point predictor, specific for float data

For float32 elevation data, DEFLATE with PREDICTOR=3 typically achieves 50 to 60 percent compression, fully lossless.

Alternatively, slightly faster reads with larger files and LZW compression:

gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=LZW in.tif out.tif

Batch retiling a folder with PowerShell

For a whole set of files, the following PowerShell 7 script processes all .tif files in a folder in parallel.

  1. Check your PowerShell version: run pwsh --version in a cmd. If this returns an error, install PowerShell 7 via winget install Microsoft.PowerShell.
  2. Create a file retile.ps1 with the script below, adjusting the input and output paths.
  3. Open PowerShell 7 from the start menu.
  4. Add the GDAL binaries to PATH for the current session, for instance:
    $env:PATH = "C:\Program Files\QGIS 3.40.7\bin;" + $env:PATH
    
    (adjust the path to your QGIS or OSGeo4W installation).
  5. Navigate to the folder containing retile.ps1 (e.g. cd C:\scripts) and run it with .\retile.ps1.
$inputDir = "H:\AHN2\DTM"
$outputDir = "H:\AHN2\DTM_tiled"
New-Item -ItemType Directory -Force -Path $outputDir | Out-Null

Get-ChildItem -Path $inputDir -Filter *.tif | Where-Object {
    -not (Test-Path (Join-Path $outputDir $_.Name))
} | ForEach-Object -Parallel {
    $output = Join-Path $using:outputDir $_.Name
    & gdal_translate -co TILED=YES -co BLOCKXSIZE=256 -co BLOCKYSIZE=256 -co COMPRESS=DEFLATE -co PREDICTOR=3 -q $_.FullName $output
} -ThrottleLimit 12

The Where-Object filter skips files that already exist in the output folder, so the script can be safely restarted if interrupted.

Tuning ThrottleLimit

Adjust -ThrottleLimit to match your storage:

  • HDD: use 1 (parallel I/O slows HDDs down due to head seeks)
  • SSD: start with 8 to 12, monitor disk usage in Task Manager and increase if disk is below 70 percent utilized

Using suboptimal tiling

Significant performance loss can occur when using GeoDMS default tiling on a dataset whose tiling diverts significantly from the default. For instance, reading a GeoTiff with a striped tiling of 1xN (1 row, N columns), filling each internal 256x256 GeoDMS tile requires reading 256 stripes. This causes the entire dataset to be read (and possibly decompressed) multiple times, especially when neighboring data from multiple source files is combined as in raster_merge.

Recognizing tilesize mismatch

GeoDMS reports a tilesize mismatch in the event log when source data tiling diverts from the GeoDMS internal default. The warning looks like this:

gdal Warning(3): GridStorageManager : Tilesize mismatch between data source 
E:/SourceData/AHN/AHN2/DTM/n69GN2.tif of item /Kaartblad/n69GN2/ahn/GridData: 
10000,1 and GeoDMS tiling: 256,256

The first dimensions (10000,1) describe the source file tiling, the second (256,256) the GeoDMS internal tiling. When the values differ substantially, either let the grid-domain be determined by the file (GeoDMS 20.1+, no retiling), or retile the source data as described above.

Performance example: AHN raster_merge

A concrete measurement from issue #1098, processing AHN elevation tiles (originally stored as 10000x1 stripes):

Step Original tiling (10000x1 stripes) Retiled (256x256)
raster_merge over 14 tiles 4 minutes 48 seconds 4 seconds

This is roughly a 67x speed-up on the merge step, achieved purely by aligning the source storage layout with the GeoDMS internal tiling. For a production run over 1373 AHN tiles, the original raster_merge step took approximately 12 hours; after retiling it is expected to complete in minutes. The eventual cost is a one-time retiling pass and additional disk space for the retiled set (the latter mitigated by DEFLATE compression).

Note that increasing GDAL_CACHEMAX did not measurably help in this scenario: the bottleneck is the access pattern mismatch between strip-organized source files and tile-organized GeoDMS reads, not GDAL cache pressure.

⚠️ **GitHub.com Fallback** ⚠️