Data Modeling for Business Users - jmadison222/knowledge GitHub Wiki
| Home |
Data modeling data is a technical domain, but business users can get better data from the technical staff if the business users know just enough data modeling. This topic provides that just-enough list with a level of accuracy that is sufficient for working discussions.
Table - Any set of rows and columns that helps a user gain insight and make decisions. The form doesn’t matter. It can be in a database, a spreadsheet, a CSV file, or similar technical format.
Grain - The level of detail or aggregation in a table. For example, geography. In geography, state is very aggregated. County is more detailed. Town even more. Street even more. Street address even more. Perhaps even lat/long which is different on every corner of a building. Another example, vehicle. VIN identifies exactly one vehicle—it is the most detailed grain. Year/Make/Model is a set of many vehicles—it is more aggregated.
Primary Key - The columns or set of columns in a table that identify exactly one row. In many simple cases, grain and business key are the same, such as VIN for vehicle. For geography, state/county is the key for the county grain. You need both since a county name like "Madison" occurs in many states.
One-to-One - When you have two tables, and a row in one table is related to exactly one row in another table. this is not very common for tables as it would typically be flattened into a single table. This can also apply to elements, such as code/description pairs.
One-to-Many - When you have two tables, and a row in one table is related to many rows in another table. This can also apply to elements, and when it does, you’re probably stepping into the notion of a hierarchy as discussed below.
Many-to-Many - When you have two tables, and many rows in one table can relate to many rows in another table. Under the covers, it’s actually two one-to-many relationships and a third table has to be put in the middle. But in design discussions, we may hide the third table.
Parent/Child - When you have a one-to-many relationship, the table on the "one" side is the parent, and the table on the "many" side is the child. You have to put the key of the parent in the row of the child.
Foreign Key - When you put the parent key in the child table, it is called a foreign key in the child. It is still the primary key of the parent.
Flattening - Computers like when data is in lots of little pieces. This is for processing efficiency. Humans like when data is all on one row. This is for ease of understanding. This tension never goes away. When you see data in lots of pieces and it gets confusing, ask to have a flattened table. Flattening is the process of taking the "to many" relationships and squishing them into one table. This comes with compromises, but the compromises are often worth the easier understanding.
Metrics - Any numerical values that can be used in a math operations.
Hierarchy - When the relationship of two or more columns is such that a given value in one column always has exactly one value in another column. State/County/Town is a hierarchy. Note that primary keys with multiple columns can have a hierarchy in them. But many hierarchies are not primary keys.
Roll Up - Aggregating metrics up a hierarchy. This is the easy case. Business intelligence tools do this automatically.
Allocation Rules - Dividing metrics down a hierarchy. This is the VERY hard case. Business rules must be used and often involve logic using a number of non-key columns.
Additive - When you can sum a metric across all levels of a hierarchy, it’s additive. If it can be summed across only some levels, it’s semi-additive. If it’s numeric but you can’t sum it, it’s non-numeric even though it looks like a number.
Attribution - Having lots of other fields not otherwise classified as something more specific like keys and metrics. Most of the time you want lots of attribution. An attribute is often a column in a table, if that helps you think more concretely. But don’t use the term "grain" to mean "attribution", which is sometimes done.
Source/Derived - Attributes generally come from some source or are derived with some algorithm. If it comes from somewhere, call it a source attribute. If you use some number of source attributes to then derive more attributes, those are derived attributes. You can use derived attributes to derive more attributes.
Business Intelligence - If you keep doing enough deriving, you have to decide whether that should be in the database or in some tool, typically a business intelligence (BI) tool. Watch for deviations that are best done in a BI tool and avoid putting them in a database unless you really need to. Remember that most modern BI tools can be published to a community. That is often the best way to create and reuse advanced derivations.