ML GeoDatasets - jejjohnson/ml4eo GitHub Wiki

Concepts

  • NonGeoDatasets
  • GeoDatasets
  • Transforms
  • DataLoaders
  • DataModules

Datasets

  • Numpy Arrays
  • GeoTiffs

GeoDatasets

These are datasets which are geoscience oriented. In many cases, this means datasets which keep track of data and meta-data. These are datasets which have values but also have important meta-data related to the coordinates and geospatial positioning. Something like xarray, rasterio or geopandas.

GeoTiff

# save dataset
xds["name_of_variable"].rio.to_raster('path/to/file/file.tif')

NetCDF

# save dataset
xds[“variable”].to_netcdf(“path/to/file/file.nc”)

Non-GeoDatasets

These include all data types that we typically use in many standard ML scenarios. These include images for discrete data and numpy arrays for continuous data. These are datasets which have the values but do not keep the meta-data contained within the same dataset.

Numpy Array

# convert to array
np_data: Array = xds[“variable”].values.astype(np.float32)

# save dataset to numpy array
np.save(“path/to/file/file.npy”, data)

Generic Image

from imageio import imsave

# convert to image
image: Array = xds[“variable”].values.astype(np.uint8)

# save as image
imsave(save_path, image)

Transforms

These are transformations that happen on-the-fly as we load each item of our datasets. There is a lot