Fivetran - ja-guzzle/guzzle_docs GitHub Wiki

Overview

  1. Its very basic - only supports ingestion and SQL base transformation. Their  https://fivetran.com/docs/getting-started/core-concepts just few pages and covers the entire functionality core functionality of product including how it handles data types and schema drift
  2. Their message of "faithfully" replicating data on the cloud warehouse is all what they do with limitd SQL base transoformation ( I don't see ability to parameterize those and run for new incremental data). They have this simple messaging which shops in their docs and blogs "We want you to have a high performant data stack, as you should. This is why our zero maintenance, zero configuration approach works. We’ve figured out the best way to build pipelines so that you don’t have to. Focus on growing your business instead of data validation and maintenance."
  3. Yet, Its powerful in terms of connectivity - they support five types of sources: Files, database (via CDC), Application connector (this list is huge and every release notes is full of how they keep  evolving that: https://fivetran.com/docs/changelog/july-2019), Events (webhooks and other) and Functions (Azure/GC Functions and AWS Lamdba). As a sink the support wide range of Data warehouse - Azure SQL, Azure SQL WH, Redshift, Bigquery, Snowflake and many others.
  4. They have kept the data flow simple - detect the schema, replicate the tables with few audit columns (https://fivetran.com/docs/getting-started/system-columns-and-tables) ,
  5. They have thought of all the basics well: timestamp, schema drfit and water marking, onetime sync done gradually, etc.
  6. CDC (for all the DB: https://fivetran.com/blog/approaches-to-database-replication ) and schema drift s very powerful
  7. It moves data through its own fivetran EC2 instances - and not yet clear whats the compute,  To make it PaaS they have worked through the security standard well: https://fivetran.com/docs/security/eu-data-protection
  8. They have clear cloud-only paas strategy. They have support for logs to: AWS CloudWatch, Azure Log Analytics and Google Stackdriver; and I assume more native integration to come along
  9. This guys are real DWH practitioner - they use simple language: https://fivetran.com/blog/fivetran-updates-warehoused-databases-via-log-based-replication; https://fivetran.com/blog/ela-o-toole

Stregnths

  1. Simple to use, and very targetted in terms of: source, sinks and functionality it supports
  2. Reslient- handles data types mapping, cdc (for all the data sources), security,
  3. Wide support for connectors - specially application
  4. Very good marketing; Simple documentation

Features which are missing

  1. Data lineage - they don't care much as they are doing one to one replication
  2. Not clear of how much handle it gives to keep daily snapshot of customer table and others
  3. Does not support Complex ETL orchestration - the processing modules clear states that that it will attempt those agg SQL but if something goes wrong, it goes wrong
  4. Recon, housekeeping and CDE monitoring - again they don't care.
  5. Adapters like fie are simple - they only support basic delimited and with regex file name / patterns (https://fivetran.com/docs/files/azure-blob-storage) No logic to do transformation as data flows through neither custom validation (As their focus is to do faithful replication)
  6. They have file upload template - but again it does not support the level of sophistication like: maker /checker and validation
⚠️ **GitHub.com Fallback** ⚠️