Data Vault - OptimalBI/optimal-data-engine-mssql GitHub Wiki

Dan Linstedt, its creator, describes a resulting Data Vault database as:

“A detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3NF and Star Schemas. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise.” (http://en.wikipedia.org/wiki/Data_Vault_Modeling)

Focused on the business process, the Data Vault as a data integration architecture, has robust standards and definitional methods which unite information in order to make sense if it. The Data Vault model is comprised of three basic table types:

HUB (blue): containing a list of unique business keys having its own surrogate key. Metadata describing the origin of the business key, or record ‘source’ is also stored to track where and when the data originated.

LNK (red): establishing relationships between business keys (typically hubs, but links can link to other links); essentially describing a many-to-many relationship. Links are often used to deal with changes in data granularity reducing the impact of adding a new business key to a linked Hub.

SAT (yellow): holding descriptive attributes that can change over time (similar to a Kimball Type II slowly changing dimension). Where Hubs and Links form the structure of the data model, Satellites contain temporal and descriptive attributes including metadata linking them to their parent Hub or Link tables. Metadata attributes within a Satellite table containing a date the record became valid and a date it expired provide powerful historical capabilities enabling queries that can go ‘back-in-time’.

There are several key advantages to the Data Vault approach:

  • Simplifies the data ingestion process
  • Removes the cleansing requirement of a Star Schema
  • Instantly provides auditability for HIPPA and other regulations
  • Puts the focus on the real problem instead of programming around it
  • Easily allows for the addition of new data sources without disruption to existing schema

Simply put, the Data Vault is both a data modeling technique and methodology which accommodates historical data, auditing, and tracking of data.