ODI Blueprint: FileNet ‐ Data Vault (DV2.0) - HenryN-DEN21DS/CV_STUFF GitHub Wiki

Overview

This blueprint describes an ODI-based ETL pattern to load FileNet metadata (Oracle source) into a Data Vault.

Layers:

  • SOURCE: FileNet views (FILENET_DOC_VIEW)
  • STAGING: STG_FILENET_DOCS (raw)
  • HUBS/LINKS/SATELLITES: Data Vault schema (HUB_, LINK_, SAT_*)

Source View (example)

CREATE OR REPLACE VIEW FILENET_DOC_VW AS SELECT DOC.DOCUMENT_ID AS DOCUMENT_NATURAL_ID, DOC.TITLE AS DOCUMENT_TITLE, DOC.MIME_TYPE, DOC.VERSION_NUMBER, META.ZAAKNUMMER, META.AFDELING, DOC.CREATED_BY, DOC.DATE_CREATED FROM DOCVERSION DOC JOIN CUSTOM_METADATA META ON DOC.DOCUMENT_ID = META.DOCUMENT_ID WHERE DOC.ISCURRENTVERSION = 1;

ODI Mappings (high-level)

  1. Extract to STG (STG_FILENET_DOCS)

    • Reverse-engineer FILENET_DOC_VW as a datastore.
    • Mapping: FILENET_DOC_VW -> STG_FILENET_DOCS
    • Incremental: use TIMESTAMP WHERE clause or Oracle CDC/GoldenGate.
    • Compute hub/hashcols in staging:
      • HUB_DOCUMENT_HASHKEY = STANDARD_HASH(DOCUMENT_NATURAL_ID,'SHA1')
      • SAT_DOCUMENT_HASHDIFF = STANDARD_HASH(DOCUMENT_TITLE||'|'||NVL(MIME_TYPE,''),'SHA1')
  2. Load HUB_DOCUMENT

    • Lookup by HUB_DOCUMENT_HASHKEY; if not exists INSERT.
    • Use MERGE or IKM Oracle Incremental Update.
  3. Load HUB_ZAAK (and other hubs)

    • Similar pattern: compute HUB_ZAAK_HASHKEY from ZAAK_NUMMER.
  4. Load LINK_DOCUMENT_ZAAK

    • Compute link hash: STANDARD_HASH(HUB_DOCUMENT_HASHKEY||HUB_ZAAK_HASHKEY,'SHA1')
    • Insert if not exists.
  5. Load SATELLITES (e.g., SAT_DOCUMENT_DETAILS)

    • For each staging row, compare SAT_DOCUMENT_HASHDIFF with latest satellite row's HASHDIFF.
    • If changed or no existing row, insert new satellite row with LOAD_DATE = SYSTIMESTAMP.

ODI KMs suggested

  • LKM: Generic JDBC to Oracle (or Oracle to Oracle)
  • IKM: Oracle Merge for hubs/links; Insert-only IKM for satellites
  • Use array fetch/insert and direct-path options for large loads.

Error handling & Audit

  • STG_FILENET_DOCS_ERRORS table captures bad rows (e.g. missing natural key)
  • ETL_JOB_AUDIT stores job run metadata (rows extracted, rows loaded, duration, errors)

Incremental strategy

  • Prefer CDC (GoldenGate) for near-real-time
  • Otherwise use LAST_MODIFIED timestamp in view and track last run in ETL_JOB_AUDIT

Example MERGE for HUB_DOCUMENT (pseudo-SQL)

MERGE INTO HUB_DOCUMENT H USING (SELECT :hub_hashkey AS HUB_DOCUMENT_HASHKEY, :doc_id AS DOCUMENT_NATURAL_ID FROM dual) S ON (H.HUB_DOCUMENT_HASHKEY = S.HUB_DOCUMENT_HASHKEY) WHEN NOT MATCHED THEN INSERT (HUB_DOCUMENT_HASHKEY, DOCUMENT_NATURAL_ID, LOAD_DATE, RECORD_SOURCE) VALUES (S.HUB_DOCUMENT_HASHKEY, S.DOCUMENT_NATURAL_ID, SYSTIMESTAMP, 'FILENET');