Data - barialim/architecture GitHub Wiki

Table of Content

Data

Data Pipeline

Data pipeline, relational data pipeline

Technologies for driving Data Pipelines

Example Data pipeline solutions

Data Ingestion

Data Ingestion technologies

Data Governance

As the amount of available data is grown exponentially over the last decade, and more regulation is been put in place around data and information management. Many organizations started to think about Data Governance.

Data Governance is the:

  • Rules
  • Processes
  • Accountability

around data.

Goal of Data Governance

  • Routine data use: you want the organization to use the data in a routine way
  • Harmonized data sources: for sources to be harmonized
  • Access for the right roles: for the people to have access to it the data
  • No extra access: and for people that doesn't have access to not have access to it
  • Ownership of data: it also means ownership of the data
  • Data accuracy: who is responsible for being right
  • Regular updates: who is responsible for being managed and updated correctly.

Successful data governance considers the who, what, when, where, how and why of the data that is governing.

While controlling the security of the data and ensuring compliance among many other thing, DG should also be concerned with how can this data be made useful to the organization? How can we do more-than just have a giant storage location for information? You must've heard of Data management 🤔

Data Management vs Data Governance

The main difference is that DG outlines the overall structure that exists around data...making sure it has the Rules, it has the Processes, and the Accountability. Its more about what should happen, and how should things happen.

Data Management is about implementing all of those Rules. Its the hands-on everyday work to ensure that governance is been put in-place, and being followed. So it's the IT team executing on it, it's the day-to-day management of information, and access requirements and what not that people may need when it comes to data.

Why DG matters

DG is not just about the Rules. It's also about the use of the Data, and making it useful.

Really good DG implementation means that quality data is accessible to the right people, and only the right people in an efficient way throughout the organization.

It means, NOT having Multiple Databases that have the same information, or access to systems for certain people that shouldn't have them. It's making sure that these are in-place so that there is a consistent understanding of who has access, why they've access, and what they're doing with that information that exists.

How DG looks like in practice

Let's talk about getting started with DG. What that actually look like in practically for an organization that's implementing it or focusing on DG?

When it comes to DG, one of the first things that you want to think about is who is involved? Typically, there multiple roles for very structured larger organization that is implementing DG. Let's understand these ROLES 👫

Roles

  • Data Owner/Sponsor: The first role is Data Owner/Sponsor. These are people that has ultimate decision making ability about the data, and ultimate accountability for that data being correct and up-to-date. Making sure that those working beneath them are complying with what is outlined in the roles that are defined as part of DG.

    There're typically many data owners/sponsors when it comes to DG often overseeing specific types of data. So in an organization you might have a manufacturing Data owner/sponsor who is responsible for maintaining everything related to a product that's being manufactured. You may also have a financial data owner, they're responsible for owning, providing and following whatever guidelines are outlined for their set of data.

  • Data Committee https://www.youtube.com/watch?v=cRmI_Kkrb8E 8mins

Terminology