db_connection_decorate_operation and db_connection_recover_operation hooks - vmware/versatile-data-kit GitHub Wiki
How the hooks are called
The hooks are defined in the ConnectionHookSpec class and used via the ConnectionHookSpecFactory class. The factory is used in the ManagedCursor class.
Decorate operation
The registered hook that decorates operations is called on every cursor execution.
Recover operation
The registered hook is called if an operation fails.
Known problems
From the pluggy documentation
By default hooks are called in LIFO registered order, however, a hookimpl can influence its call-time invocation position using special attributes. If marked with a "tryfirst" or "trylast" option it will be executed first or last respectively in the hook call loop
If you have more than one plugin, which has registered one of the hooks, all hooks will be called regardless of the code in the previous section. This issue was reproduced here https://github.com/vmware/versatile-data-kit/issues/2973
Note: The same should be valid for the db_connection_execute_operation hook, but it's not implemented very often, so plugins just fall back to the default implementation, preventing problems. This should also be tested.
Proposed solutions
Option 1
Move the functions to ManagedConnectionBase, make them abstract and remove anything pluggy-related. Functions are passed into the ManagedCursor constructor and used the same way as before. Every time ManagedConnectionBase is extended for a database plugin, the functions will be overridden. They will still get called, but only if the plugin's connection is being used. They won't end up in pluggy's LIFO buffer.
class ManagedConnectionBase(PEP249Connection, IManagedConnection):
@abstractmethod
def _connect(self) -> PEP249Connection:
"""
this method MUST be implemented by inheritors and should return PEP249 Connection object (unmanaged)
"""
raise NotImplementedError
@abstractmethod
def _db_connection_validate_operation(
self, operation: str, parameters: Optional[Container]
) -> None:
pass
@abstractmethod
def _db_connection_decorate_operation(
self, decoration_cursor: DecorationCursor
) -> None:
pass
# Note: maybe this can have a default implementation
@abstractmethod
def _db_connection_execute_operation(
self, execution_cursor: ExecutionCursor
) -> Optional[ExecuteOperationResult]:
pass
@abstractmethod
def _db_connection_recover_operation(self, recovery_cursor: RecoveryCursor) -> None:
pass
def cursor(self, *args, **kwargs):
if hasattr(self._db_con, "cursor"):
return ManagedCursor(
self._db_con.cursor(*args, **kwargs),
self._log,
self._db_connection_validate_operation,
self._db_connection_decorate_operation,
self._db_connection_execute_operation,
self._db_connection_recover_operation,
)
return super().cursor()
There are plugins that don't actually use the DecorationCursor, but still decorate operations, e.g. vdk-lineage. Those plugins should still be allowed to have a decorator hook, provided they don't have access to the actual cursor and run SQL.
@hookspec
def db_connection_decorate_operation(
self, operation: ManagedOperation
) -> None:
pass
PRO: Ties the db-operations that run queries directly to the cursor implementation
PRO: Separates logic that does SQL queries from logic that just analyzes SQL operations
CON: Code duplication, i.e. you have two types of methods, hooks and non-hooks, that decorate the same SQL operations
CON: Disruptive approach, you have to refactor all the plugins that use the hooks and ensure backwards compatibility
Option 2
Pass the db_type (if any) to these kinds of hooks
@hookimpl
db_connection_xxx_operation(dbType, ...)
if (dbType == "impala"):
# impala logic
PRO: Simpler and less disruptive approach
CON: Relies on the plugin developer to "do the right thing" and check the db_type
CON: Potential for abuse, e.g. you can pass the wrong db_type to hack something together