AAC : Azure Data Architecture Guide - amitbhilagude/userfullinks GitHub Wiki

  1. Modern data architecture follows a polyglot persistence pattern which means there is no single Database type is used. It is decided based on the requirements.

  2. Data transformation Process

    1. ETL: Extract data into one place, Transform it there, and then load it into a destination.
    2. ELT: Extract data in Destination and Transform it into destination only.
  3. Data Types

    1. Relational Data stored as records
    2. JSON or csv
    3. Key-value Pair
    4. Graph
    5. Indexing DB specially designed for the search query
    6. Time series data which is mapped with Timestamp always
  4. Data Warehousing

    1. Centralised repository to ingest all data sources.
    2. When to choose data warehousing
      1. Massive data coming from different sources.
      2. Required some simplified schema compared with sources.
      3. Restrict access to the data for specific users and not query directly to sources.
      4. Efficient repository for heavy reads and OLTP system may be designed for heavy writes.
    3. Data warehousing offering in Azure
      1. Warehousing options finalized based on Two categories
        1. Symmetric Multiprocessing(SMP)
          1. Sequence steps and not parallel processing are required for tasks
          2. Azure offering can be used as SQL DB or SQL in VM.
        2. Massively Parallel Processing(MPP)
          1. Tasks need to be performed in parallel.
          2. Azure offerings are Azure Synapse Analytics(Redshift in AWS), Apache Hive on HDInsight, and Interactive Query on HD Insights.
    4. Challenges
      1. When to clean up data
      2. How to copy data into Datawarehouse
      3. Data consistency and accuracy.
      4. Maintain relationship same as sources if required.
  5. Semantic Model

    1. Data Model transformed for reporting is a semantic model.
  6. Data solutions categories

    1. Traditional RDBMS have Online transaction Processing(OLTP) and it requires Online Analytics Processing(OLAP) Mechanisms for analysis.

      1. Online Transactional Processing(OLTP)
        1. Database system is designed for online transactions and records in an efficient way. Which needs to have consistency and a rollback mechanism.
        2. This system is designed for a heavy read, low write workload
      2. Online Analytical Processing(OLAP)
        1. The databases that are used for OLTP were not designed for analysis. Therefore, retrieving answers from these databases is costly in terms of time and effort. OLAP systems were designed to help extract this business intelligence information from the data in a highly performant way. This is because OLAP databases are optimized for a heavy read, and low write workloads.
      3. Client Application uses OLTP systems like SQL to store transactions into DB. ADF is used to move these data into Datawarehouse if it is massive data and multiple sources and it is following ETL process.. OLAP systems like Azure Analytics Service perform the transformation and make data available for reporting like Power BI.
    2. Big Data solutions

      1. Big data is a solution built for massive NOSQL data of any type e.g. Json, documents, key values pairs, Graphs or sometimes transactions.