Good practices for managing dependent objects and features (import, typing) - GlacioHack/xdem GitHub Wiki
There are several good practices to remember when creating inter-dependent objects and features, to create a logical package structure and avoids circularity issues, which we use throughout GeoUtils and xDEM. (Original posts here and here)
Problem
Let's take the example of an Object that implements feature() (in our case Object could be Raster and feature() could be reproject()).
We separate object and features into separate python modules, and Object depends on feature by needing to import a _feature() in separate feature.py module (that is the natural way of doing it, and good practice).
However, we might want to also have Object appear within feature at different occasions:
- To declare the type of
Objectas a possible input offeature(), - To check user-input at runtime doing
isinstance(user_input, Object), - Or to create a new
Objectby calling a class method ofObject.
This problem can scale badly with multiple objects that depend on each other through multiple features...
Additionally, within a _feature method, the layering can also be complex in case of multiple backends (chunked operations, Dask/Multiprocessing support), for instance:
- Our objects methods
Object.feature()calls a main functionfeature(), - The parent function
feature()dispatches towards either a_feature()(core function, unchunked), or a_feature_dask(Dask backend), or_feature_multiproc(Multiproc backend), - The
_feature_daskand_feature_multiprocrely on a similar block/chunk logic, so both call a_feature_chunk, which itself calls_feature()underneath.
So how do we scale this reliably?
Solutions for structure
In terms of layers, the mental model we have is this (example for GeoUtils):
Domain objects (RasterBase / VectorBase / PointCloudBase; inherited by accessors and main objects)
↓
Feature API (parent functions for reproject, proximity, interpolate, grid, for any backend)
↓
Execution backends (Dask / Multiprocessing / Direct core)
↓
Chunked function logic (only for Dask/MP)
↓
Core functions
So we need a structure like this, either for each feature or for a group of features:
geoutils/
├── feature/ # feature subsystem
│ ├── __init__.py
│ ├── api.py
│ ├── core.py
│ ├── chunked.py
│ ├── backends/
│ │ ├── __init__.py
│ │ ├── dask.py
│ │ └── multiprocessing.py
├── object/
Solution for imports/typing
There are several aspects to consider to avoid issues with imports and facilitate typing:
- Non-runtime type checking (in function declaration)
To solve static, non-runtime typing, a good practice is to use typing's TYPE_CHECKING at top of file to isolate typing imports
For example, instead of doing:
# In feature.py
from geoutils import Raster # Creates a circular dependency
myfeature(input: Raster, ...):
...
One should do:
# In feature.py
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from geoutils import Raster # Only runs during type checking, no circularity issue
myfeature(input: Raster, ...):
...
- Duck typing (for runtime user-input checks)
To solve object type check during runtime, a good practice is to use duck typing. (or use typing.Protocol in certain cases, but those fit less well with our objects)
This means doing hasattr(obj, "obj_attr") instead of isinstance(obj, Object), thereby removing the need to import Object during runtime.
For example, instead of doing:
# In feature.py
from geoutils import Raster # Creates a circular dependency
myfeature(input, ...):
# Check user input is correct
if not isinstance(input, Raster):
raise ValueError("Wrong input")
# Feature uses some specific attributes...
return func(input.crs)
One should do:
# In feature.py
myfeature(input, ...):
if not hasattr(input, "crs"):
raise ValueError("Wrong input, did not implement 'crs'")
# Feature uses specific attributes safely
return func(input.crs)
- Lazy/In-method imports (for runtime object instantiation)
Sometimes a cross-import is absolutely necessary, for instance to create a new instance of Object in feature() (this can usually be circumvented for the same Object by calling a class method from self (input) such as our from_array, but the issue can be unavoidable when working with different objects, like creating Object1 from Object2.feature()).
In this case, we can simply import "lazily" = from within the method (only triggers during runtime) to avoid circularity issues.
For example, instead of doing:
# In feature.py called by Raster module
# Creates a circular dependency if a feature of PointCloud needs to do the same for Raster
from geoutils import PointCloud
myfeature(input, ...):
pc = func(input)
return PointCloud(pc)
One should do:
# In feature.py called by Raster module
myfeature(input, ...):
from geoutils import PointCloud # Only happens at runtime
pc = func(input)
return PointCloud(pc)