ODI Blueprint: FileNet ‐ Data Vault (DV2.0) - HenryN-DEN21DS/CV_STUFF GitHub Wiki
Overview
This blueprint describes an ODI-based ETL pattern to load FileNet metadata (Oracle source) into a Data Vault.
Layers:
- SOURCE: FileNet views (FILENET_DOC_VIEW)
- STAGING: STG_FILENET_DOCS (raw)
- HUBS/LINKS/SATELLITES: Data Vault schema (HUB_, LINK_, SAT_*)
Source View (example)
CREATE OR REPLACE VIEW FILENET_DOC_VW AS SELECT DOC.DOCUMENT_ID AS DOCUMENT_NATURAL_ID, DOC.TITLE AS DOCUMENT_TITLE, DOC.MIME_TYPE, DOC.VERSION_NUMBER, META.ZAAKNUMMER, META.AFDELING, DOC.CREATED_BY, DOC.DATE_CREATED FROM DOCVERSION DOC JOIN CUSTOM_METADATA META ON DOC.DOCUMENT_ID = META.DOCUMENT_ID WHERE DOC.ISCURRENTVERSION = 1;
ODI Mappings (high-level)
-
Extract to STG (STG_FILENET_DOCS)
- Reverse-engineer FILENET_DOC_VW as a datastore.
- Mapping: FILENET_DOC_VW -> STG_FILENET_DOCS
- Incremental: use TIMESTAMP WHERE clause or Oracle CDC/GoldenGate.
- Compute hub/hashcols in staging:
- HUB_DOCUMENT_HASHKEY = STANDARD_HASH(DOCUMENT_NATURAL_ID,'SHA1')
- SAT_DOCUMENT_HASHDIFF = STANDARD_HASH(DOCUMENT_TITLE||'|'||NVL(MIME_TYPE,''),'SHA1')
-
Load HUB_DOCUMENT
- Lookup by HUB_DOCUMENT_HASHKEY; if not exists INSERT.
- Use MERGE or IKM Oracle Incremental Update.
-
Load HUB_ZAAK (and other hubs)
- Similar pattern: compute HUB_ZAAK_HASHKEY from ZAAK_NUMMER.
-
Load LINK_DOCUMENT_ZAAK
- Compute link hash: STANDARD_HASH(HUB_DOCUMENT_HASHKEY||HUB_ZAAK_HASHKEY,'SHA1')
- Insert if not exists.
-
Load SATELLITES (e.g., SAT_DOCUMENT_DETAILS)
- For each staging row, compare SAT_DOCUMENT_HASHDIFF with latest satellite row's HASHDIFF.
- If changed or no existing row, insert new satellite row with LOAD_DATE = SYSTIMESTAMP.
ODI KMs suggested
- LKM: Generic JDBC to Oracle (or Oracle to Oracle)
- IKM: Oracle Merge for hubs/links; Insert-only IKM for satellites
- Use array fetch/insert and direct-path options for large loads.
Error handling & Audit
- STG_FILENET_DOCS_ERRORS table captures bad rows (e.g. missing natural key)
- ETL_JOB_AUDIT stores job run metadata (rows extracted, rows loaded, duration, errors)
Incremental strategy
- Prefer CDC (GoldenGate) for near-real-time
- Otherwise use LAST_MODIFIED timestamp in view and track last run in ETL_JOB_AUDIT
Example MERGE for HUB_DOCUMENT (pseudo-SQL)
MERGE INTO HUB_DOCUMENT H USING (SELECT :hub_hashkey AS HUB_DOCUMENT_HASHKEY, :doc_id AS DOCUMENT_NATURAL_ID FROM dual) S ON (H.HUB_DOCUMENT_HASHKEY = S.HUB_DOCUMENT_HASHKEY) WHEN NOT MATCHED THEN INSERT (HUB_DOCUMENT_HASHKEY, DOCUMENT_NATURAL_ID, LOAD_DATE, RECORD_SOURCE) VALUES (S.HUB_DOCUMENT_HASHKEY, S.DOCUMENT_NATURAL_ID, SYSTIMESTAMP, 'FILENET');