Awesome Catalogs backup - Evanto/qna GitHub Wiki

Awesome Data Discovery and Observability

Awesome forthebadge

This repository contains a curated list of awesome data data catalogs and observability platforms that help you discover, manage, and observe data in your organization.


Contents: Existing Data Discovery and Observability Solutions

OSS Proprietary Monocloud Observability
πŸ“™ Amundsen πŸ“• Collibra πŸ“’ Google DC πŸ” Monte Carlo
πŸ“™ DataHub πŸ“• Informatica πŸ“’ Azure DC πŸ” Databand
πŸ“™ Marquez πŸ“• Alation πŸ” Datafold
πŸ“™ Atlas πŸ“• Atlan πŸ” Ataccama
πŸ“™ CKAN πŸ“•Stemma
πŸ“™ Magda
OSS Proprietary
πŸ“™ Amundsen πŸ“• Collibra
πŸ“™ DataHub πŸ“• Informatica
πŸ“™ Marquez πŸ“• Alation
πŸ“™ Atlas πŸ“• Atlan
πŸ“™ CKAN πŸ“•Stemma
πŸ“™ Magda πŸ“’ Google DC
πŸ“’ Azure DC
πŸ” Monte Carlo
πŸ” Databand
πŸ” Datafold
πŸ” Ataccama

πŸ“™ Open-Source Data Catalogs

Amundsen

Website | GitHub

A popular open-source data catalog for metadata management and data discovery originated from Lyft.

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌ ❌ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: Yes
  • Schemas, Description: Yes
  • Complex schemas: No
  • Data preview: Yes
  • Column statistics: Yes
  • Data owner: Yes
  • Top data users: Yes
  • Change notifications:No
  • Change feed: No
  • Deployment:
  • Supported data sources: Hive, Redshift, Druid, RDBMS, Presto, Snowflake

DataHub

DataHub is an open-source data catalog featuring data discovery, data governance, metadata management.

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌ ❌ ❌ ❌
More features
  • Strategy: Push, Pull
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: ?
  • Schemas, Description: Yes
  • Complex schemas: No
  • Data preview: ?
  • Column statistics: No
  • Data owner: Yes
  • Top data users: ?
  • Change notifications: No
  • Change feed: No
  • Deployment:
  • Supported data sources: Hive, Kafka, RDBMS

Marquez

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
OpenLineage βœ”οΈ ❌ βœ”οΈ ? ❌ ❌ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: No
  • Schemas, Description: Yes
  • Complex schemas: No
  • Data preview: Yes
  • Column statistics: No
  • Data owner: Yes
  • Top data users: ?
  • Change notifications: No
  • Change feed: No
  • Deployment:
  • Supported data sources: S3, Kafka

Atlas

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ ❌ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: No
  • Schemas, Description: Yes
  • Complex schemas: No
  • Data preview: No
  • Column statistics: No
  • Data owner: No
  • Top data users: ?
  • Change notifications: Yes
  • Change feed: No
  • Deployment:
  • Supported data sources:HBase, Hive, Sqoop, Kafka, Storm

CKAN

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: ?
  • Schemas, Description: ?
  • Complex schemas: ?
  • Data preview: ?
  • Column statistics: ?
  • Data owner: ?
  • Top data users: ?
  • Change notifications: ?
  • Change feed: ?
  • Deployment:
  • Supported data sources:

Magda

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌ ❌ ❌
More features
  • Strategy: Push via UI
  • UX personalization: No
  • AI autowiring: No
  • Rich data profiling: No
  • Recommendations: No
  • Schemas, Description: Yes
  • Complex schemas: No
  • Data preview: Yes
  • Column statistics: No
  • Data owner: Yes
  • Top data users: ?
  • Change notifications: No
  • Change feed: No
  • Deployment:
  • Supported data sources: Mostly geodata

πŸ“• Proprietary Data Catalogs

Collibra

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ? βœ”οΈ ❌ ❌ ? ❌ ❌
More features
  • Strategy: Push
  • UX personalization: Yes
  • AI autowiring: ?
  • Network-based: No
  • Rich data profiling: ?
  • Supported data sources:

Informatica

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: Yes
  • Rich data profiling: Yes
  • Supported data sources:

Alation

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: Yes
  • AI autowiring: No
  • Network-based: No
  • Rich data profiling: No
  • Supported data sources:

Atlan

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌
More features
  • Strategy: Pull
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: No
  • Rich data profiling: ?
  • Supported data sources: Presto, Deequ, Atlas, Airflow, Hudi

Stemma

Website

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌ ? ❌ ❌
More features
  • Strategy: Push
  • UX personalization: No
  • AI autowiring: No
  • Network-based: No
  • Rich data profiling: No
  • Supported data sources:

Talend

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ? βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌
More features
  • Strategy: Push
  • UX personalization: Yes
  • AI autowiring: ?
  • Network-based: ?
  • Rich data profiling: Yes
  • Supported data sources:

πŸ“’ Monocloud Data Catalogs

Google Cloud Data Catalog

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ ? ❌ ❌
More features
  • Strategy: Pull
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: No
  • Rich data profiling: No
  • Supported data sources:

Azure Data Catalog

Website

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ? βœ”οΈ ❌ ❌ ? ❌ ❌
More features
  • Strategy: Pull
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: ?
  • Rich data profiling: ?
  • Supported data sources:

πŸ” Data Observability Platforms

Monte Carlo

Website

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ βœ”οΈ
More features
  • Strategy: Pull
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: ?
  • Rich data profiling: ?
  • Supported data sources: Snowflake, Hive, Kafka, Looker, Redshift, Tableau, Big Query, Airflow, Fivetran, Presto, Mode, Periscope, Databricks, Glue, dbt, Chartio, Spark, AWS, S3, data.world, Google Cloud Platform

Databand

Website | GitHub

Databand is an observability platform that helps data engineers identify and troubleshoot pipeline issues and data quality problems.

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ ? ? ? ❌ ? ? ? βœ”οΈ
More features
  • Strategy: Push
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: ?
  • Rich data profiling: ?
  • Supported data sources:

Datafold

Website | GitHub

Datafold is a data monitoring and observability platform.

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ βœ”οΈ βœ”οΈ ❌ ❌ βœ”οΈ ❌ βœ”οΈ
More features
  • Strategy: Push
  • UX personalization: ?
  • AI autowiring: ?
  • Network-based: ?
  • Rich data profiling: ?
  • Supported data sources:

Ataccama

Website | GitHub

Based on Open Standard Search-based Network-based Lineage-based Federation ML 1st Citizen Data Quality End-to-end Lineage Observability
❌ βœ”οΈ ❌ βœ”οΈ ❌ ❌ βœ”οΈ ❌ ❌
More features
  • Strategy: Pull
  • UX personalization: Yes
  • AI autowiring: No
  • Network-based: No
  • Rich data profiling: Yes
  • Supported data sources:

⚠️ **GitHub.com Fallback** ⚠️