esdc_toolz - jejjohnson/ml4eo GitHub Wiki

Nomanclature

Recipe -

Pipelines - a sequence

Benchmark - A fixed


Steps

How to effectively use the xarray.open_mfdataset function with a custom preprocess function.

  • Test Function on one file
  • Fix Coordinates (Space, Time, Sorted)
  • Fix Units
  • Do Reductions
  • Make Preprocessing Function
  • Apply Preprocessing Function to multiple files

Core Operations

These are mainly native xarray functions that already exists that can do basic things.

  • Validate Coordinates - Lat, Lon, Time (Names, Attributes, Ranges, Bounds)
  • Selection/Subset/Slice - Region, Period
  • Coordinate Reference System
  • Resample - Frequency
  • Coarsen Reductions - Spatial Scale, construct, reduce
  • Rolling Transformations - construct, reduce
  • Groupby Reductions
  • Weighted Reductions

Higher Level Tasks

  • Calculate Physical Quantities - Radiance, Reflectance, Kinetic Energy,
  • Discretization - Histogram (Counts, Max, Mean)
  • Climatology - Frequency
  • Anomalies - Filtering + Climatology
  • Coordinate Encoders - Time, Space, Wavelength
  • Reprojection
  • Interpolate - Unstructured, Curvilinear, Rectilinear, Regular, Target (Regrid), Lower Res (Resample/Coarsen)
  • Interpolate NANs - Astrophysics - Conv + NANs, pyinterp - LOESS, Gauss-Seidel, SciPy - Unstructured, Rectilinear
  • Filtering - Channel, Space, Time
  • Masking - RegionMask

Machine Learning Pre-Processing

  • Running Standardization - Channel, Space, Time

Statistics

  • Power Spectrum Stats
  • Pixel-Based Stats
  • MultiScale Pixel-Based Stats