Praktisch plan ontsluiting Filenet Tabellen - HenryN-DEN21DS/CV_STUFF GitHub Wiki

hier komt een praktisch, uitvoerbaar plan:

  1. een Data Vault (DV2.0) model-schets voor FileNet-metadata (hubs / links / satellites) en
  2. een concrete ODI-mapping / ETL-patroon om die metadata vanaf FileNet (Oracle/DB) naar een Data Vault in je DWH te laden.

Ik houd het direct bruikbaar: tabelnamen, kolommen, voorbeelden voor hashkeys (Oracle), en een stap-voor-stap ODI-mapping-patroon inclusief welke LKM/IKM/Options je typisch gebruikt.

  1. Data Vault model voor FileNet-metadata (DV2.0 β€” high level) Principes toegepast

Hubs: bevatten natuurlijke sleutel(en) en hash_key (business key hash).

Links: modelleren relaties tussen hubs (n-n of 1-n) en hebben eigen hash_key + referenties naar hub hash_keys.

Satellites: tijdsgebonden attributen/metadata van hubs of links; bevatten geschiedenis, LOAD_DATE, RECORD_SOURCE en hash_diff voor detectie van veranderingen.

Hubs (voor FileNet metadata)

HUB_DOCUMENT

Natural key: DOCUMENT_NATURAL_ID (bijv. DocumentId / GUID uit FileNet)

Columns:

HUB_DOCUMENT_HASHKEY (e.g. SHA1 of DOCUMENT_NATURAL_ID)

DOCUMENT_NATURAL_ID

LOAD_DATE

RECORD_SOURCE

HUB_ZAAK (zaak / dossier)

HUB_ZAAK_HASHKEY (hash van Zaaknummer)

ZAAK_NUMMER

LOAD_DATE, RECORD_SOURCE

HUB_USER

HUB_USER_HASHKEY (hash van username / userId)

USER_ID

LOAD_DATE, RECORD_SOURCE

HUB_FOLDER

HUB_FOLDER_HASHKEY, FOLDER_ID, LOAD_DATE, RECORD_SOURCE

HUB_DOCUMENTTYPE

HUB_DOCUMENTTYPE_HASHKEY, DOCUMENTTYPE_CODE, LOAD_DATE, RECORD_SOURCE

(Extra hubs mogelijk: ORGANISATION_UNIT, CASETYPE, WORKFLOW_DEFINITION)

Links (relaties)

LINK_DOCUMENT_ZAAK

LINK_DOC_ZAAK_HASHKEY (hash van doc_hash + zaak_hash + relatie_typ)

HUB_DOCUMENT_HASHKEY, HUB_ZAAK_HASHKEY

LOAD_DATE, RECORD_SOURCE

LINK_DOCUMENT_FOLDER

koppelt documenten aan folders.

LINK_DOCUMENT_USER

koppelt document aan creator / laatste bewerker.

LINK_DOCUMENT_WORKFLOW

koppelt document aan workflow instances.

Satellites (attributen & historie)

SAT_DOCUMENT_DETAILS (satellite op HUB_DOCUMENT)

HUB_DOCUMENT_HASHKEY

DOCUMENT_TITLE, MIME_TYPE, VERSION, IS_CURRENT

LOAD_DATE, RECORD_SOURCE

SAT_DOCUMENT_HASHDIFF (SHA1 over relevante attributen)

SAT_DOCUMENT_METADATA (custom props)

HUB_DOCUMENT_HASHKEY

ZAAK_NUMMER, AFDELING_CODE, PUBLICATIEDATUM, etc.

LOAD_DATE, RECORD_SOURCE, HASHDIFF

SAT_WORKFLOW_HISTORY (sat op link of hub)

LINK_DOCUMENT_WORKFLOW_HASHKEY of HUB_DOCUMENT_HASHKEY

WORKFLOW_STATUS, TASK_OWNER, START_DATE, END_DATE

LOAD_DATE, RECORD_SOURCE, HASHDIFF

SAT_ACL (toegangsrechten)

HUB_DOCUMENT_HASHKEY

ACL_JSON of gederecomposeerde velden; LOAD_DATE, RECORD_SOURCE

Voorbeeld fysieke DDL (Oracle-stijl, vereenvoudigd) CREATE TABLE HUB_DOCUMENT ( HUB_DOCUMENT_HASHKEY VARCHAR2(64) PRIMARY KEY, DOCUMENT_NATURAL_ID VARCHAR2(100), LOAD_DATE TIMESTAMP, RECORD_SOURCE VARCHAR2(50) );

CREATE TABLE SAT_DOCUMENT_DETAILS ( HUB_DOCUMENT_HASHKEY VARCHAR2(64), LOAD_DATE TIMESTAMP, RECORD_SOURCE VARCHAR2(50), DOCUMENT_TITLE VARCHAR2(4000), MIME_TYPE VARCHAR2(100), VERSION_NUMBER NUMBER, SAT_DOCUMENT_HASHDIFF VARCHAR2(64), CONSTRAINT fk_sat_doc FOREIGN KEY(HUB_DOCUMENT_HASHKEY) REFERENCES HUB_DOCUMENT(HUB_DOCUMENT_HASHKEY) );

Hash-functie in Oracle (voorbeeld):

-- berekening hashkey en hashdiff select standard_hash(DOCUMENT_NATURAL_ID,'SHA1') as HUB_DOCUMENT_HASHKEY, standard_hash(DOCUMENT_TITLE || '|' || MIME_TYPE || '|' || VERSION_NUMBER ,'SHA1') as SAT_DOCUMENT_HASHDIFF from FILENET_DOC_VIEW;

  1. ODI-mapping / ETL patroon naar Data Vault

Ik beschrijf het patroon in concrete stappen (ODI 12c/ODI 11g koncepts toepasbaar). Gebruik ODI mappings (of oudere Interfaces) met LKM/IKM waar nodig.

Architectuur van de pipeline (hoog niveau)

Source: FileNet DB (Oracle/Db2) β€” metadata views created for ETL.

Staging: STG_FILENET_DOCS (RAW) β€” één-to-one extract van benodigde velden.

Hash Calculations: in staging (ODI expression) bereken je HUB hashkeys en SAT hashdiffs.

Load Hubs: insert new hub records (only if hashkey not exists).

Load Links: insert links when combinations appear (only if link hashkey not exists).

Load Satellites: insert new satellite rows when hashdiff changed (compare latest sat hashdiff).

CDC / Incremental: use ODI Journalizing or GoldenGate; process only new/changed rows.

Stap-voor-stap ODI mapping (concreet) A) Voorbereiding in de source DB

Maak materialized views of views die:

DOCUMENT_ID, DOCUMENT_TITLE, MIME_TYPE, CREATED_BY, DATE_CREATED, ZAAKNUMMER, AFDELING etc. leveren.

Houd data types en LOB-veldnauwkeurig (CLOB β†’ eventueel truncatie of aparte opslag).

B) STG laag (ODI)

Maak STG_FILENET_DOCS in je DWH-schema (RAW).

Maak een ODI Reverse Engineered Datastore op de source view.

Mapping: Source View β†’ STG_FILENET_DOCS (Full load of incremental using CDC).

Key in STG:

DOCUMENT_NATURAL_ID

HUB_DOCUMENT_HASHKEY = STANDARD_HASH(DOCUMENT_NATURAL_ID,'SHA1')

SAT_DOCUMENT_HASHDIFF = STANDARD_HASH(concat of relevant props,'SHA1')

C) Hub loading mapping (ODI)

Mapping: STG β†’ HUB_DOCUMENT

Logic:

Lookup HUB_DOCUMENT by HUB_DOCUMENT_HASHKEY

If not exists β†’ Insert (populate LOAD_DATE = SYSDATE, RECORD_SOURCE = 'FILENET')

ODI components:

LKM: LKM FileNet (JDBC) to Oracle (or LKM Generic / Oracle)

IKM: IKM Oracle Incremental Update or IKM Oracle Merge (performant upsert)

D) Link loading mapping

Compute link hashkey: HASH(HUB_DOC_HASHKEY || HUB_ZAAK_HASHKEY || 'LINKTYPE')

Insert if not exists.

E) Satellite loading mapping

For each sat (e.g., SAT_DOCUMENT_DETAILS):

Lookup latest sat row for HUB_DOCUMENT_HASHKEY ordered by LOAD_DATE (or use staging table with last known hashdiff).

Compare SAT_DOCUMENT_HASHDIFF (staging) vs latest.SAT_DOCUMENT_HASHDIFF.

If different β†’ Insert new satellite row with new LOAD_DATE, RECORD_SOURCE.

In ODI: use Lookup + Filter + IKM Oracle Insert pattern (or use MERGE with conditional insert when HASHDIFF != existing).

F) CDC / Incremental pattern

Option 1: ODI Journalizing β€” configure journalizing on source view/tables, ODI captures changed rows.

Option 2: Oracle GoldenGate β€” stream naar staging schema, then ODI picks up.

Option 3: Timestamp-based incremental: WHERE LAST_UPD > :LAST_RUN_DATE

G) Exception handling / auditing

Sla foutregels op in STG_FILENET_DOCS_ERRORS

Maak auditlog table ETL_JOB_AUDIT (rows_in, rows_loaded, errors, duration)

Gebruik REJECT handling in IKM.

ODI technische settings (aanbevolen)

LKM: JDBC Generic to Oracle (of LKM Oracle to Oracle)

IKM: IKM Oracle Merge / IKM Oracle Slowly Changing Dimension (maar DV vereist inserts for history; gebruik INSERT-only semantics for satellites)

KM options:

Batch size: 1000–10000

Use Array Fetch / Array Insert

Use direct-path insert for large loads (Oracle Direct Load)

Voorbeeld pseudo-ODI-SQL voor Hub insert (MERGE) MERGE INTO HUB_DOCUMENT H USING (SELECT :hub_hashkey AS HUB_DOCUMENT_HASHKEY, :doc_id AS DOCUMENT_NATURAL_ID FROM dual) S ON (H.HUB_DOCUMENT_HASHKEY = S.HUB_DOCUMENT_HASHKEY) WHEN NOT MATCHED THEN INSERT (HUB_DOCUMENT_HASHKEY, DOCUMENT_NATURAL_ID, LOAD_DATE, RECORD_SOURCE) VALUES (S.HUB_DOCUMENT_HASHKEY, S.DOCUMENT_NATURAL_ID, SYSTIMESTAMP, 'FILENET');

Voor satellites:

-- insert only when hashdiff differs INSERT INTO SAT_DOCUMENT_DETAILS (HUB_DOCUMENT_HASHKEY, LOAD_DATE, RECORD_SOURCE, DOCUMENT_TITLE, SAT_DOCUMENT_HASHDIFF) SELECT s.HUB_DOCUMENT_HASHKEY, SYSTIMESTAMP, 'FILENET', s.DOCUMENT_TITLE, s.SAT_DOCUMENT_HASHDIFF FROM STG_FILENET_DOCS s LEFT JOIN ( SELECT HUB_DOCUMENT_HASHKEY, MAX(LOAD_DATE) AS LD FROM SAT_DOCUMENT_DETAILS GROUP BY HUB_DOCUMENT_HASHKEY ) latest ON s.HUB_DOCUMENT_HASHKEY = latest.HUB_DOCUMENT_HASHKEY LEFT JOIN SAT_DOCUMENT_DETAILS sd ON sd.HUB_DOCUMENT_HASHKEY = s.HUB_DOCUMENT_HASHKEY AND sd.LOAD_DATE = latest.LD WHERE sd.SAT_DOCUMENT_HASHDIFF IS NULL OR sd.SAT_DOCUMENT_HASHDIFF <> s.SAT_DOCUMENT_HASHDIFF;

  1. Operational & ontwerp-overwegingen

Hash collisions: SHA1 zeer onwaarschijnlijk, maar log eventuele collisions. Overweeg SHA256 indien gewenst.

Partitionering: partitioneer satellites op LOAD_DATE en hubs op hash-prefix voor performance.

LOBs: store grote content-veldniet in DWHβ€”sla alleen verwijzing (content URL, content_store_id). Als je contentmetadata (OCR tekst) wil analyseren, zet dat in aparte ELT proces (indexeer in search engine of text-analytic store).

Beveiliging: respecteer ACLs β€” filter in source of zorg dat BI alleen toegestane metadata ziet; implementeer row-level security in DWH indien nodig.

Lineage: bewaak RECORD_SOURCE, SOURCE_LOAD_BATCH_ID en preserveer origin; gebruik Purview/Collibra voor data governance.

Latency: bepaal SLA (batch nightly vs near-real time). Voor near-real time gebruik GoldenGate/Gardant + streaming.

  1. Wat lever ik je nu concreet? (en volgende stap)

Ik kan direct voor je aanleveren (zonder wachttijd):

A. SQL DDL/ DML scripts voor de voorgestelde hubs/links/satellites (Oracle dialect).

B. ODI mapping blueprint (stappen en pseudo-scripts zoals hierboven) klaar om in ODI te implementeren β€” inclusief voorbeeld IKM/LKM opties.

C. Visuele Data Vault diagram (PNG/PDF) waarin hubs/links/sats getekend zijn met voorbeeldkolommen.

D. Voorbeeld ODI package / mapping export (concept XML) β€” als je wilt dat ik het maak kan ik het nu genereren.

Welke van A–D wil je dat ik nu genereer? (Ik maak het direct aan en geef downloadlink.)

Je zei: ja graag als voorbeeld ChatGPT zei: Geanalyseerd Altijd details weergeven

Creating example deliverables: Oracle DDL for Data Vault, ODI mapping blueprint (markdown), and a Data Vault diagram PNG.

from graphviz import Digraph import textwrap, os, json

Paths

sql_path = '/mnt/data/filenet_datavault_ddl.sql' odi_path = '/mnt/data/filenet_odi_blueprint.md' diagram_path = '/mnt/data/filenet_datavault_diagram'

1) Create Oracle DDL (simplified/example)

sql_content = textwrap.dedent("""
-- FileNet Data Vault (DV2.0) Example DDL (Oracle dialect) -- HUBS CREATE TABLE HUB_DOCUMENT ( HUB_DOCUMENT_HASHKEY VARCHAR2(64) PRIMARY KEY, DOCUMENT_NATURAL_ID VARCHAR2(100) NOT NULL, LOAD_DATE TIMESTAMP DEFAULT SYSTIMESTAMP, RECORD_SOURCE VARCHAR2(50) );

CREATE TABLE HUB_ZAAK (
  HUB_ZAAK_HASHKEY VARCHAR2(64) PRIMARY KEY,
  ZAAK_NUMMER      VARCHAR2(100) NOT NULL,
  LOAD_DATE        TIMESTAMP DEFAULT SYSTIMESTAMP,
  RECORD_SOURCE    VARCHAR2(50)
);

CREATE TABLE HUB_USER (
  HUB_USER_HASHKEY VARCHAR2(64) PRIMARY KEY,
  USER_ID          VARCHAR2(200) NOT NULL,
  LOAD_DATE        TIMESTAMP DEFAULT SYSTIMESTAMP,
  RECORD_SOURCE    VARCHAR2(50)
);

-- LINKS
CREATE TABLE LINK_DOCUMENT_ZAAK (
  LINK_DOC_ZAAK_HASHKEY VARCHAR2(64) PRIMARY KEY,
  HUB_DOCUMENT_HASHKEY  VARCHAR2(64) NOT NULL,
  HUB_ZAAK_HASHKEY      VARCHAR2(64) NOT NULL,
  LINK_TYPE             VARCHAR2(50),
  LOAD_DATE             TIMESTAMP DEFAULT SYSTIMESTAMP,
  RECORD_SOURCE         VARCHAR2(50)
);

-- SATELLITES
CREATE TABLE SAT_DOCUMENT_DETAILS (
  HUB_DOCUMENT_HASHKEY VARCHAR2(64) NOT NULL,
  LOAD_DATE TIMESTAMP DEFAULT SYSTIMESTAMP,
  RECORD_SOURCE VARCHAR2(50),
  DOCUMENT_TITLE VARCHAR2(4000),
  MIME_TYPE VARCHAR2(100),
  VERSION_NUMBER NUMBER,
  IS_CURRENT NUMBER(1),
  SAT_DOCUMENT_HASHDIFF VARCHAR2(64)
);

CREATE TABLE SAT_DOCUMENT_METADATA (
  HUB_DOCUMENT_HASHKEY VARCHAR2(64) NOT NULL,
  LOAD_DATE TIMESTAMP DEFAULT SYSTIMESTAMP,
  RECORD_SOURCE VARCHAR2(50),
  ZAAK_NUMMER VARCHAR2(100),
  AFDELING_CODE VARCHAR2(100),
  PUBLICATIEDATUM TIMESTAMP,
  EXTRA_PROPERTIES CLOB,
  SAT_METADATA_HASHDIFF VARCHAR2(64)
);

-- Simple indexes and partitions recommended in production (not shown).
-- Example hash calculation (Oracle)
-- SELECT STANDARD_HASH('DOCID-1234', 'SHA1') FROM DUAL;
-- SELECT STANDARD_HASH(DOCUMENT_TITLE || '|' || NVL(MIME_TYPE,''), 'SHA1') FROM DUAL;
""")

with open(sql_path, 'w') as f: f.write(sql_content)

2) Create ODI blueprint (markdown)

odi_content = textwrap.dedent("""
# ODI Blueprint: FileNet -> Data Vault (DV2.0)

## Overview
This blueprint describes an ODI-based ETL pattern to load FileNet metadata (Oracle source) into a Data Vault.

**Layers**:
- SOURCE: FileNet views (FILENET_DOC_VIEW)
- STAGING: STG_FILENET_DOCS (raw)
- HUBS/LINKS/SATELLITES: Data Vault schema (HUB_*, LINK_*, SAT_*)

## Source View (example)
CREATE OR REPLACE VIEW FILENET_DOC_VW AS
SELECT
  DOC.DOCUMENT_ID AS DOCUMENT_NATURAL_ID,
  DOC.TITLE AS DOCUMENT_TITLE,
  DOC.MIME_TYPE,
  DOC.VERSION_NUMBER,
  META.ZAAKNUMMER,
  META.AFDELING,
  DOC.CREATED_BY,
  DOC.DATE_CREATED
FROM DOCVERSION DOC
JOIN CUSTOM_METADATA META ON DOC.DOCUMENT_ID = META.DOCUMENT_ID
WHERE DOC.ISCURRENTVERSION = 1;

## ODI Mappings (high-level)
1. **Extract to STG (STG_FILENET_DOCS)**
   - Reverse-engineer FILENET_DOC_VW as a datastore.
   - Mapping: FILENET_DOC_VW -> STG_FILENET_DOCS
   - Incremental: use TIMESTAMP WHERE clause or Oracle CDC/GoldenGate.
   - Compute hub/hashcols in staging:
     - HUB_DOCUMENT_HASHKEY = STANDARD_HASH(DOCUMENT_NATURAL_ID,'SHA1')
     - SAT_DOCUMENT_HASHDIFF = STANDARD_HASH(DOCUMENT_TITLE||'|'||NVL(MIME_TYPE,''),'SHA1')

2. **Load HUB_DOCUMENT**
   - Lookup by HUB_DOCUMENT_HASHKEY; if not exists INSERT.
   - Use MERGE or IKM Oracle Incremental Update.

3. **Load HUB_ZAAK (and other hubs)**
   - Similar pattern: compute HUB_ZAAK_HASHKEY from ZAAK_NUMMER.

4. **Load LINK_DOCUMENT_ZAAK**
   - Compute link hash: STANDARD_HASH(HUB_DOCUMENT_HASHKEY||HUB_ZAAK_HASHKEY,'SHA1')
   - Insert if not exists.

5. **Load SATELLITES (e.g., SAT_DOCUMENT_DETAILS)**
   - For each staging row, compare SAT_DOCUMENT_HASHDIFF with latest satellite row's HASHDIFF.
   - If changed or no existing row, insert new satellite row with LOAD_DATE = SYSTIMESTAMP.

## ODI KMs suggested
- LKM: Generic JDBC to Oracle (or Oracle to Oracle)
- IKM: Oracle Merge for hubs/links; Insert-only IKM for satellites
- Use array fetch/insert and direct-path options for large loads.

## Error handling & Audit
- STG_FILENET_DOCS_ERRORS table captures bad rows (e.g. missing natural key)
- ETL_JOB_AUDIT stores job run metadata (rows extracted, rows loaded, duration, errors)

## Incremental strategy
- Prefer CDC (GoldenGate) for near-real-time
- Otherwise use LAST_MODIFIED timestamp in view and track last run in ETL_JOB_AUDIT

## Example MERGE for HUB_DOCUMENT (pseudo-SQL)
MERGE INTO HUB_DOCUMENT H
USING (SELECT :hub_hashkey AS HUB_DOCUMENT_HASHKEY, :doc_id AS DOCUMENT_NATURAL_ID FROM dual) S
ON (H.HUB_DOCUMENT_HASHKEY = S.HUB_DOCUMENT_HASHKEY)
WHEN NOT MATCHED THEN
INSERT (HUB_DOCUMENT_HASHKEY, DOCUMENT_NATURAL_ID, LOAD_DATE, RECORD_SOURCE)
VALUES (S.HUB_DOCUMENT_HASHKEY, S.DOCUMENT_NATURAL_ID, SYSTIMESTAMP, 'FILENET');

""")

with open(odi_path, 'w') as f: f.write(odi_content)

3) Create Data Vault diagram (Graphviz)

dot = Digraph(comment='FileNet Data Vault Model', format='png') dot.attr('node', shape='record', fontsize='10')

Hubs

dot.node('HUB_DOCUMENT', '{HUB_DOCUMENT|+ HUB_DOCUMENT_HASHKEY\l+ DOCUMENT_NATURAL_ID\l+ LOAD_DATE\l+ RECORD_SOURCE\l}') dot.node('HUB_ZAAK', '{HUB_ZAAK|+ HUB_ZAAK_HASHKEY\l+ ZAAK_NUMMER\l+ LOAD_DATE\l+ RECORD_SOURCE\l}') dot.node('HUB_USER', '{HUB_USER|+ HUB_USER_HASHKEY\l+ USER_ID\l+ LOAD_DATE\l+ RECORD_SOURCE\l}')

Links

dot.node('LINK_DOC_ZAAK', '{LINK_DOCUMENT_ZAAK|+ LINK_DOC_ZAAK_HASHKEY\l+ HUB_DOCUMENT_HASHKEY\l+ HUB_ZAAK_HASHKEY\l+ LOAD_DATE\l+ RECORD_SOURCE\l}')

Satellites

dot.node('SAT_DOC_DETAILS', '{SAT_DOCUMENT_DETAILS|+ HUB_DOCUMENT_HASHKEY\l+ LOAD_DATE\l+ DOCUMENT_TITLE\l+ MIME_TYPE\l+ VERSION_NUMBER\l+ IS_CURRENT\l+ SAT_DOCUMENT_HASHDIFF\l}') dot.node('SAT_DOC_META', '{SAT_DOCUMENT_METADATA|+ HUB_DOCUMENT_HASHKEY\l+ LOAD_DATE\l+ ZAAK_NUMMER\l+ AFDELING_CODE\l+ PUBLICATIEDATUM\l+ EXTRA_PROPERTIES\l+ SAT_METADATA_HASHDIFF\l}')

Layout edges

dot.edge('HUB_DOCUMENT', 'LINK_DOC_ZAAK', arrowhead='none') dot.edge('HUB_ZAAK', 'LINK_DOC_ZAAK', arrowhead='none') dot.edge('HUB_DOCUMENT', 'SAT_DOC_DETAILS', arrowhead='none') dot.edge('HUB_DOCUMENT', 'SAT_DOC_META', arrowhead='none')

Render diagram

diagram_png_path = dot.render(diagram_path, format='png')

diagram_png_path, sql_path, odi_path

Resultaat ('/mnt/data/filenet_datavault_diagram.png', '/mnt/data/filenet_datavault_ddl.sql', '/mnt/data/filenet_odi_blueprint.md')

Top β€” ik heb drie voorbeeldbestanden voor je klaargezet:

SQL DDL voor een Data Vault (Oracle-dialect): πŸ“„ Download filenet_datavault_ddl.sql

ODI blueprint (stappen, voorbeelden, MERGE-queries): πŸ“„ Download filenet_odi_blueprint.md

Visuele Data Vault diagram (PNG): πŸ–ΌοΈ Bekijk/download filenet_datavault_diagram.png

Wil je dat ik één van deze uitbreid (bijv. volledige DDL met indexen/partities, een concrete ODI mapping export, of een Visio-versie van het diagram)? Welke volgende stap wil je dat ik nu uitvoer?