db_connection_decorate_operation and db_connection_recover_operation hooks - vmware/versatile-data-kit GitHub Wiki

How the hooks are called

The hooks are defined in the ConnectionHookSpec class and used via the ConnectionHookSpecFactory class. The factory is used in the ManagedCursor class.

Decorate operation

The registered hook that decorates operations is called on every cursor execution.

https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-core/src/vdk/internal/builtin_plugins/connection/managed_cursor.py#L92

https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-core/src/vdk/internal/builtin_plugins/connection/managed_cursor.py#L135

Recover operation

The registered hook is called if an operation fails.

https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-core/src/vdk/internal/builtin_plugins/connection/managed_cursor.py#L103

https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-core/src/vdk/internal/builtin_plugins/connection/managed_cursor.py#L221

Known problems

From the pluggy documentation

By default hooks are called in LIFO registered order, however, a hookimpl can influence its call-time invocation position using special attributes. If marked with a "tryfirst" or "trylast" option it will be executed first or last respectively in the hook call loop

If you have more than one plugin, which has registered one of the hooks, all hooks will be called regardless of the code in the previous section. This issue was reproduced here https://github.com/vmware/versatile-data-kit/issues/2973

Note: The same should be valid for the db_connection_execute_operation hook, but it's not implemented very often, so plugins just fall back to the default implementation, preventing problems. This should also be tested.

Proposed solutions

Option 1

Move the functions to ManagedConnectionBase, make them abstract and remove anything pluggy-related. Functions are passed into the ManagedCursor constructor and used the same way as before. Every time ManagedConnectionBase is extended for a database plugin, the functions will be overridden. They will still get called, but only if the plugin's connection is being used. They won't end up in pluggy's LIFO buffer.

class ManagedConnectionBase(PEP249Connection, IManagedConnection):

    @abstractmethod
    def _connect(self) -> PEP249Connection:
        """
        this method MUST be implemented by inheritors and should return PEP249 Connection object (unmanaged)
        """
        raise NotImplementedError

    @abstractmethod
    def _db_connection_validate_operation(
        self, operation: str, parameters: Optional[Container]
    ) -> None:
        pass

    @abstractmethod
    def _db_connection_decorate_operation(
        self, decoration_cursor: DecorationCursor
    ) -> None:
        pass

    # Note: maybe this can have a default implementation
    @abstractmethod
    def _db_connection_execute_operation(
        self, execution_cursor: ExecutionCursor
    ) -> Optional[ExecuteOperationResult]:
        pass

    @abstractmethod
    def _db_connection_recover_operation(self, recovery_cursor: RecoveryCursor) -> None:
        pass

    def cursor(self, *args, **kwargs):
        if hasattr(self._db_con, "cursor"):
            return ManagedCursor(
                self._db_con.cursor(*args, **kwargs),
                self._log,
                self._db_connection_validate_operation,
                self._db_connection_decorate_operation,
                self._db_connection_execute_operation,
                self._db_connection_recover_operation,
            )
        return super().cursor()

There are plugins that don't actually use the DecorationCursor, but still decorate operations, e.g. vdk-lineage. Those plugins should still be allowed to have a decorator hook, provided they don't have access to the actual cursor and run SQL.

https://github.com/search?q=repo%3Avmware%2Fversatile-data-kit+db_connection_decorate_operation+path%3A%2F%5Eprojects%5C%2Fvdk-plugins%5C%2F%2F+language%3APython&type=code&l=Python

https://github.com/vmware/versatile-data-kit/blob/6e4e0a0638128f78e177f68bcfcfb65af526b405/projects/vdk-core/src/vdk/internal/builtin_plugins/connection/connection_plugin.py#L26

@hookspec
def db_connection_decorate_operation(
    self, operation: ManagedOperation
) -> None:
    pass

PRO: Ties the db-operations that run queries directly to the cursor implementation

PRO: Separates logic that does SQL queries from logic that just analyzes SQL operations

CON: Code duplication, i.e. you have two types of methods, hooks and non-hooks, that decorate the same SQL operations

CON: Disruptive approach, you have to refactor all the plugins that use the hooks and ensure backwards compatibility

Option 2

Pass the db_type (if any) to these kinds of hooks

@hookimpl
db_connection_xxx_operation(dbType, ...)
  if (dbType == "impala"):  
     # impala logic

PRO: Simpler and less disruptive approach

CON: Relies on the plugin developer to "do the right thing" and check the db_type

CON: Potential for abuse, e.g. you can pass the wrong db_type to hack something together