General, Output and Processing Instructions - rosepearson/GeoFabrics GitHub Wiki

These options control behaviour associated with the DEM generation in each stage and are all grouped under the general, output and processing key-values.

General

The general section controls the general code flow associated with each stage int eh framework for behaviour related to generating and manipulating the DEM. Defaults exist for all values. The accepted options are listed below.

z_labels [Optional dict - Default is {"waterways": "elevations", "rivers": "bed_elevation_Rupp_and_Smart", "ocean": None} (None is null in JSON)] - A dict specifying the column name for z values not included within the geometry for each of the bathymetry/elevation information that can be included for hydrological conditioning. If None is specified for one of the categories (ocean, rivers and waterways) it is assumed that the depth information is included in the polyline geometry. If using the GeoFabric estimated river data values include: bed_elevation_Neal_et_al, bed_elevation_Rupp_and_Smart and if using the GeoFabric measured river data the river z_label will be z.
drop_offshore_lidar [Optional - Default is True, bool or dict] - Bool defining if offshore LiDAR is discarded or kept. See to True if their is offshore LiDAR that reflects the ocean surface. Also Bool that specifies if the background DEM (is added) is set to zero or not where in the foreshore. If multiple LiDAR datasets can define as a dict of bools with each dataset defined by its name. e.g. "drop_offshore_lidar": {"dataset_1": true, "dataset_2": false}
lidar_classifications_to_keep [Optional - Default is [2], list] - list defining if LiDAR points are filtered to retain only those points with the specified classification values. The standard LAS/LAZ classification values can be found at LAS 1.4 specifications. The subset of classifications used can also be found in the survey summary for Open topography datasets (i.e. Wellington_2013, or NZ20_Westport).
interpolation [Optional dict - Default is "interpolation": {"rivers": "rbf", "waterways": "cubic", "ocean": "linear", "lidar": "idw", "no_data": None} (None is null in JSON)] - Defines what interpolation method to use for each data source category before applying the no_data option at the end to any missing values in the final raster. The no_data options are: None or linear, nearest or cubic. The rivers and ocean and waterways options are: rbf, cubic, and linear. The lidar options are: idw, mean, median, linear, min, max, std, and count. idw stands for inverse distance weighted, or mean for taking the arithmetic mean, median for the arithmetic median, linear for linear interpolation as calculated by scipy.interpolate.griddata, min for taking the minimum value, max for taking the maximum value, std for returning the standard deviation of the elevations, and count for returning the number of points in that grid cell.
elevation_range [Optional - Default is None, list] - A list of the form [minimum_elevation, maximum_elevation], where the minimum_elevation and maximum_elevation values define the range of allowable elevations. If this is not defined then all elevations are kept.
download_limit_gbytes [Optional - default 100, float] - The maximum amount of LiDAR data to download locally. If the region specified covers more LiDAR the process will not run.
lidar_buffer [Optional - default 0, float] - The number of cells around LiDAR data to interpolate to any added coarse DEM values. A default of 0 means coarse DEM value will be added directly next to LiDAR values.
filter_waterways_by_osm_ids [Optional - default [], list] - A list of any OSM IDs to filter or effectively ignore when incorporating the OSM waterways features.
ignore_clipping [Optional - default False, bool] - If True the LiDAR DEM is not clipped in the Raw LIDAR generation stage. This will cause changes if drop_offshore_lidar is also set.

Output [Required]

The output section contains information about the resolution and CRS of the DEM generated by the GeoFabricsGenerator class. All output keywords are mandatory unless specified otherwise. Accepted keywords are:

crs [Optional] - The CRS is optional with default values of horizontal=2193 (NZTM2000 - EPSG:2193) and vertical=7839 (NZVD2016 - EPSG:7839)
grid_params - The resolution are not optional and must be specified. This defined the DEM grid geometry in metres, where the grid is square.

Processing [Required]

The processing section contains information used by Dask to allocate CPU cores and to chunk up the DEM into separate processing tasks.

chunk_size [Default is None] - This is the number of DEM pixels to have in each chunk of the DEM that is processed separately. This will equate to a square area with sides of resolution x chunk_size. Reduce the chunk size if you are getting memory errors in the log file. A good initial value is 1 to 1.5x a single LiDAR tile (i.e. a 1km x 1km Lidar tile with a resolution of 10m will equate to a chunk_size of 100). The default is None, which will only work if their is only one LiDAR file being processed.
number_of_cores [Default is 1] - The number of separate CPU cores or processes to run at the same time. This should not exceed the number of cores on your device. If running on your own device (i.e. not NeSI), it can be good to leave 1-2 cores unused by geofabrics for other background tasks.
memory_limit [Default is 10GBi] - The maximum memory to be used by a single Dask task.