Instruction file contents - rosepearson/GeoFabrics Wiki

The processor module classes accept the contents of a JSON instruction file as a dictionary. This page briefly details the expected instruction file structure and supported keywords for each class before providing links to different example instruction files.

Basic structure

The instruction file has the following basic structure, where the output, data_paths, general and apis keywords are the only top-level keywords expected by the RawLidarDemGenerator, HydrologicDemGenerator and RoughnessLengthGenerator classes. While the RiverBathymetryGenerator, and DrainBathymetryGenerator classes each expect the additional keyword rivers, and drains respectively.

{
    "output": {    
        "crs": { "horizontal": 2193, "vertical": 7839 },
        "grid_params": { "resolution": 10 },
    },
    "data_paths": { "cache_path": "path/to/cache", "subfolder": "run_name", "result_dem": results_dem.nc", "catchment_boundary": "catchment.geojson" },
    "apis": { "open_topography": "True", 
              "linz": { "land": { "layers": [51153], "type": "GEOMETRY" }
    }
    "general": { ... }
}

Instruction files and main.py

The main.py entry point script decides which processor classes to run based on the top-level keyword(s) of the instruction file. The keyword options are rivers, drains, dem and roughness, which respectively call through to the RiverBathymetryGenerator, DrainBathymetryGenerator, LidarDemGenerator and HydrologicDemGenerator, and RoughnessLengthGenerator. If multiple keywords are included in the instruction file, each related processor class is called in the order of rivers, drains, dem and roughness.

Below is an example of a single instruction file which will trigger the RiverBathymetryGenerator, DrainBathymetryGenerator and LidarDemGenerator classes to run in turn and produce a DEM with ocean, river and drain bathymetry included.

{ 
"rivers":  {
    "output": { ... },
    "data_paths": { ... },
    "apis": { ... },
    "general": { ... },
    "rivers":  { ... }
    },
"drains":   {
    "output": { ... },
    "data_paths": { ... },
    "apis": { ... },
    "general": { ... },
    "drains":  { ... }   
    },
"dem": {
    "output": { ... },
    "data_paths": { ... },
    "apis": { ... },
    "general": { ... }
    },
"roughness": {
    "output": { ... },
    "data_paths": { ... },
    "apis": { ... },
    "general": { ... }
    }
}

Core Instructions

The following core instructions are generally expected by all processor classes.

Output [Required]

The output section contains information about the resolution and CRS of the DEM generated by the GeoFabricsGenerator class. All output keywords are mandatory unless specified otherwise. Accepted keywords are:

  • crs [Optional] - The CRS is optional with default values of horizontal=2193 (NZTM2000 - EPSG:2193) and vertical=7839 (NZVD2016 - EPSG:7839)
  • grid_params - The resolution are not optional and must be specified. This defined the DEM grid geometry in metres, where the grid is square.

Processing [Required]

The processing section contains information used by Dask to allocate CPU cores and to chunk up the DEM into separate processing tasks.

  • chunk_size [Default is None] - This is the number of DEM pixels to have in each chunk of the DEM that is processed separately. This will equate to a square area with sides of resolution x chunk_size. Reduce the chunk size if you are getting memory errors in the log file. A good initial value is 1 to 1.5x a single LiDAR tile (i.e. a 1km x 1km Lidar tile with a resolution of 10m will equate to a chunk_size of 100). The default is None, which will only work if their is only one LiDAR file being processed.
  • number_of_cores [Default is 1] - The numbe of separate CPU cores or processes to run at the same time. This should not exceed the number of cores on your device. If running on your own device (i.e. not NeSI), it can be good to leave 1-2 cores unused by geofabrics for other background tasks.

Data Paths [Required]

The data_paths specify the local location where data is to read from or written to. All paths should be a forward-slash separated file path. All data_path aside from the local_cache should be relative to the local_cache or absolute paths. Accepted keywords are:

  • local_cache [Required str] - The location to download a copy of remotely sourced data, and of the generated 'geofabrics.log' file. This is required if the apis keyword is specified. If result_dem or dense_dem_extents are note specified. They will be written here with default names.
  • subfolder [Optional with a default of results] - The folder to store generated results in. All downloaded data is stored in the local_cache folder.
  • catchment_boundary [Required str] - The location to the catchment boundary polygon within which to generate a DEM.
  • result_dem [Optional with default of generated_dem.nc in the local_cache. str] - The location to save the hydrologically conditioned DEM to.
  • result_geofabric [Optional with default of generated_geofabric.nc in the local_cache. str] - The location to save the geofabric with hydrologically conditioned DEM and roughness layers to.
  • raw_dem [Optional with default of dem.nc in the local_cache. str] - The DEM generated from just LiDAR and any specified reference DEM.
  • raw_dem_extents [Optional with default of raw_extents.geojson in the local_cache. str] - The extents of pixels with values in the raw DEM.
  • lidar [Optional and only used if 'open topography' is not specified as an API. list] - The location of a local copy of one or more LiDAR tiles. By preference defined in the apis section.
  • land [Only optional if land is specified in apis. str] - The location of a local copy of a polygon defining the land (i.e. NZ Coastline). By preference defined in the apis section.
  • bathymetry_contours [Optional list] - The location of any local vector data defining the bathymetry contours (i.e. NZ Depth Contours).
  • river_bathymetry [Optional, but required if river_polygons is specified list] - The location of any local vector data defining the river bathymetry point measurements.
  • river_polygons [Optional, but required if river_bathymetry is specified list] - The location of any local vector data defining the extents of the associated river_bathymetry point measurements.
  • benchmark_dem [Optional unless for a unit test. str] - The location of a DEM file if you want your DEM to be compared against it. This is used by the unit tests, and can optionally by used for testing during development in main.py.

APIs [Optional]

the apis keyword and contents is optional and only specified if data is being pulled from remote locations. The apis keyword is optional. It's contents specifies the data services where data is to be read from. Accepted apis keywords are:

  1. open_topography - Use this keyword if LiDAR data is to be pulled from OpenTopography. This must be followed by a dictionary containing a single (note support for multiple datasets will be added in time - please create an issue if you need this now) LiDAR dataset to download as a key (i.e. "open_topography": { "NZ18_Banks": true }. In the case that the .LAZ files for a dataset do not contain full datum information (i.e. Wellington_2013), you can specify the .LAZ CRS information using "open_topography": { "Wellington_2013": { "crs": { "horizontal": 2193, "vertical": 7839 } } }.
  2. linz or lris - Specify these if you want vector data to be downloaded from either the LINZ or LRIS Data Service. These requires the same keywords. i. key - This is mandatory for both and should contain YOUR_API_KEY for that data service as a string. ii. land or bathymetry_contours [Optional] - These are the accepted vector values. Both are optional, although there is not point specifying your API key if you are not going to specify one of these layers to download. In either case you should then specify a layer and optionally the geometry_name of that layer. See geoapis: Basic Usage for more details on the geometry_name. The layer is the unique identifier in the URL when you view the dataset on the dataservice. (i.e. 51153 for an example of a land vector on the LINZ LDS, or 50448 for and example of a bathymetry_contours.

General

The general section controls the general code flow of the RawLidarDemGenerator, HydrologicDemGenerator, and RoughnessLengthGenerator classes. Defaults exist for all values. The accepted keywords include: set_dem_shoreline, bathymetry_contours_z_label, drop_offshore_lidar, lidar_classifications_to_keep.

  • set_dem_shoreline [Optional - Default is True, bool] - Bool that specifies if the background DEM (is added) is set to zero or not where in the foreshore.
  • bathymetry_contours_z_label [Optional - Default is None (null in JSON, str)] - String defining the column label of vector data containing the contour depth information. If not specified it is assumed that the depth information is included in the polyline geometry.
  • bathymetry_points_type [Optional - Default is None (null in JSON, list)] - A list of strings defining the type of water body represented by each of the corresponding estimated bathymentry points/polygon files. Options include rivers and drains.
  • bathymetry_points_z_labels [Optional - Default is None (null in JSON, list)] - A list of strings defining the z labels used in the corresponding estimated bathymentry points/polygon files. If None is specified the z labels are all assumed to be depths.
  • drop_offshore_lidar [Optional - Default is True, bool] - Bool defining if offshore LiDAR is discarded or kept. See to True if their is offshore LiDAR that reflects the ocean surface.
  • lidar_classifications_to_keep [Optional - Default is [2], list] - list defining if LiDAR points are filtered to retain only those points with the specified classification values. The standard LAS/LAZ classification values can be found at LAS 1.4 specifications. The subset of classifications used can also be found in the survey summary for Open topography datasets (i.e. Wellington_2013, or NZ20_Westport).
  • interpolation_method [Optional - Default is None (null in JSON), str] - Defines what interpolation method, if any, to apply to any missing values in the raster (were there is no LiDAR on land, or offshore if no Bathymetry has been supplied). The options are None for no interpolation, or linear, nearest or cubic.
  • lidar_interpolation_method [Optional - Default is idw, str] - Defines what interpolation method to apply where there is LiDAR or reference DEM data. The options are: idw, mean, median, linear, min, max, std, and count. idw for inverse distance weighted, or mean for taking the arithmetic mean, median for the arithmetic median, linear for linear interpolation as calculated by scipy.interpolate.griddata, min for taking the minimum value, max for taking the maximum value, std for returning the standard deviation of the elevations, and count for returning the number of points in that grid cell.
  • elevation_range [Optional - Default is None, list] - A list of the form [minimum_elevation, maximum_elevation], where the minimum_elevation and maximum_elevation values define the range of allowable elevations. If this is not defined then all elevations are kept.

Specialist instructions

The following specialist instructions are only used by some of the processor classes.

rivers [required by RiverBathymetryGenerator]

The rivers section is required if generating river bathymetry estimates. It's presence will trigger main.py to create and run an instance of the RiverBathymetryGenerator class. These values define the river channel of interest and some of its geometric properties.

  • channel_rec_id [Required int] - The REC nzsegment reach ID of the river channel's most downstream reach segment.

  • osm_id [Optional str] - If specified this is used to auto-align the coarse REC channel to the river.

  • channel_area_threshold [Required float] - Used to determine how far upstream to trace the channel. Defines the minimum reach CUM_AREA to include.

  • rec_file [Required str] - The path to the REC file defining river reaches to use.

  • flow_file [Required str] - The path to the flow file defining flow and friction along the REC defined reaches.

  • min_bank_height [Required float] - The minimum bank height to try detect. This should be larger than the LiDAR survey vertical precision.

  • max_bank_height [Required float] - The maximum height to detect banks up to before considering them cliffs.

  • veg_lidar_classifications_to_keep [Required list] - A list of integers defining the ground and vegetation classifications. i.e. [2, 3, 4, 5, 9]

  • max_channel_width [Required float] - The maximum expected width of the channel in metres.

  • min_channel_width [Required float] - The minimum expected width of the channel in metres.

  • rec_alignment_tolerance [Required float] - The maximum miss-alignment between the REC defined channel and the actual channel.

  • cross_section_spacing [Required float] - The along channel spacing to estimate river bathymetry at.

  • width_centre_smoothing [Required float] - The amount to smooth the estimated aligned channel by.

waterways [required by WaterwayBedElevationEstimator]

The drains section is required if incorporating drains from Open Street Mat (OSM). It's presence will trigger main.py to create and run an instance of the WaterwayBedElevationEstimator class.

  • widths [Required dict] - The widths respectively assumed for the OSM drains, streams and rivers. i.e. {"drain": 5, "stream": 7.5, "river": 10}. Note the rivers entry is optional. If it is omitted all rivers are ignored.

Benchmarking [only used by benchmarking.py]

The benchmarking section is required if running benchmarking.py. It defines the numbers_of_cores and chunk_sizes to iterate over when producing a plot of execution times for various processing settings.

  • numbers_of_cores [Required list] - A list of integers that a cycled through for each number_of_cores and defines the chunk size applied. This number should not exceed the number of cores on the machine or allocated in a distributed system.
  • chunk_sizes [Required list] - A list of integers that a cycled through for each number_of_cores and defines the chunk size applied.
  • title [Required str] - Defines the title of the plot of execution time against processing settings.
  • delete_dems [Required bool] - If True the DEMs generated in each run are deleted.

Examples

Check out unit test instruction files for each test for examples using the instruction keywords.

Example - Create a DEM from remote LiDAR and vector data

Example - Create a DEM from local LiDAR and vector data

Example - Estimate river bathymetry values

Example - Estimate drain bed elevations

Example - Create a DEM from remote LiDAR and vector data and estimated river and drain bed elevations