Awesome Data Discovery and Observability
This repository contains a curated list of awesome data data catalogs and observability platforms that help you discover, manage, and observe data in your organization.
Contents: Existing Data Discovery and Observability Solutions
π Open-Source Data Catalogs
Website | GitHub
A popular open-source data catalog for metadata management and data discovery originated from Lyft.
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
βοΈ
βοΈ
β
β
β
β
β
More features
Strategy: Push
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: Yes
Schemas, Description: Yes
Complex schemas: No
Data preview: Yes
Column statistics: Yes
Data owner: Yes
Top data users: Yes
Change notifications: No
Change feed: No
Deployment:
Supported data sources: Hive, Redshift, Druid, RDBMS, Presto, Snowflake
DataHub is an open-source data catalog featuring data discovery, data governance, metadata management.
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
βοΈ
βοΈ
β
β
β
β
β
More features
Strategy: Push, Pull
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: ?
Schemas, Description: Yes
Complex schemas: No
Data preview: ?
Column statistics: No
Data owner: Yes
Top data users: ?
Change notifications: No
Change feed: No
Deployment:
Supported data sources: Hive, Kafka, RDBMS
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
OpenLineage
βοΈ
β
βοΈ
?
β
β
β
β
More features
Strategy: Push
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: No
Schemas, Description: Yes
Complex schemas: No
Data preview: Yes
Column statistics: No
Data owner: Yes
Top data users: ?
Change notifications: No
Change feed: No
Deployment:
Supported data sources: S3, Kafka
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
β
β
β
More features
Strategy: Push
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: No
Schemas, Description: Yes
Complex schemas: No
Data preview: No
Column statistics: No
Data owner: No
Top data users: ?
Change notifications: Yes
Change feed: No
Deployment:
Supported data sources: HBase, Hive, Sqoop, Kafka, Storm
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
β
βοΈ
β
β
β
β
More features
Strategy: Push
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: ?
Schemas, Description: ?
Complex schemas: ?
Data preview: ?
Column statistics: ?
Data owner: ?
Top data users: ?
Change notifications: ?
Change feed: ?
Deployment:
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
β
βοΈ
β
β
β
β
More features
Strategy: Push via UI
UX personalization: No
AI autowiring: No
Rich data profiling: No
Recommendations: No
Schemas, Description: Yes
Complex schemas: No
Data preview: Yes
Column statistics: No
Data owner: Yes
Top data users: ?
Change notifications: No
Change feed: No
Deployment:
Supported data sources: Mostly geodata
π Proprietary Data Catalogs
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
?
βοΈ
β
β
?
β
β
More features
Strategy: Push
UX personalization: Yes
AI autowiring: ?
Network-based: No
Rich data profiling: ?
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
βοΈ
βοΈ
β
β
βοΈ
β
β
More features
Strategy: Push
UX personalization: ?
AI autowiring: ?
Network-based: Yes
Rich data profiling: Yes
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
βοΈ
β
β
More features
Strategy: Push
UX personalization: Yes
AI autowiring: No
Network-based: No
Rich data profiling: No
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
βοΈ
β
β
More features
Strategy: Pull
UX personalization: ?
AI autowiring: ?
Network-based: No
Rich data profiling: ?
Supported data sources: Presto, Deequ, Atlas, Airflow, Hudi
Website
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
βοΈ
βοΈ
β
β
?
β
β
More features
Strategy: Push
UX personalization: No
AI autowiring: No
Network-based: No
Rich data profiling: No
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
?
βοΈ
β
β
βοΈ
β
β
More features
Strategy: Push
UX personalization: Yes
AI autowiring: ?
Network-based: ?
Rich data profiling: Yes
Supported data sources:
π Monocloud Data Catalogs
Google Cloud Data Catalog
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
?
β
β
More features
Strategy: Pull
UX personalization: ?
AI autowiring: ?
Network-based: No
Rich data profiling: No
Supported data sources:
Website
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
?
βοΈ
β
β
?
β
β
More features
Strategy: Pull
UX personalization: ?
AI autowiring: ?
Network-based: ?
Rich data profiling: ?
Supported data sources:
π Data Observability Platforms
Website
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
βοΈ
β
βοΈ
More features
Strategy: Pull
UX personalization: ?
AI autowiring: ?
Network-based: ?
Rich data profiling: ?
Supported data sources: Snowflake, Hive, Kafka, Looker, Redshift, Tableau, Big Query, Airflow, Fivetran, Presto, Mode, Periscope, Databricks, Glue, dbt, Chartio, Spark, AWS, S3, data.world, Google Cloud Platform
Website | GitHub
Databand is an observability platform that helps data engineers identify and troubleshoot pipeline issues and data quality problems.
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
?
?
?
β
?
?
?
βοΈ
More features
Strategy: Push
UX personalization: ?
AI autowiring: ?
Network-based: ?
Rich data profiling: ?
Supported data sources:
Website | GitHub
Datafold is a data monitoring and observability platform.
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
βοΈ
βοΈ
β
β
βοΈ
β
βοΈ
More features
Strategy: Push
UX personalization: ?
AI autowiring: ?
Network-based: ?
Rich data profiling: ?
Supported data sources:
Website | GitHub
Based on Open Standard
Search-based
Network-based
Lineage-based
Federation
ML 1st Citizen
Data Quality
End-to-end Lineage
Observability
β
βοΈ
β
βοΈ
β
β
βοΈ
β
β
More features
Strategy: Pull
UX personalization: Yes
AI autowiring: No
Network-based: No
Rich data profiling: Yes
Supported data sources: