Apache Atlas - vidyasekaran/bigdata_frameworks_components GitHub Wiki

Welcome to the bigdata_frameworks_components wiki!

https://community.hortonworks.com/questions/68406/what-is-the-difference-between-apache-atlas-and-ap.html Atlas:

-really like an 'atlas' to almost all of the metadata that is around in HDP like Hive metastore, Falcon repo, Kafka topics, Hbase table etc. This single view on metadata makes for some powerfull searching capabilities on top of that with full text search (based on solr)

-Since Atlas has this comprehensive view on metadata it is also capable of providing insight in lineage, so it can tell by combining Hive DDL's what table was the source for another table.

-Another core feature is that you assign tags to all metadata entities on Atlas. So you can say that column B in Hive table Y holds sensitive data by assigning a 'PII' tag to it. But a hdfs folder can also be assigned a 'PII' tag or a CF from Hbase. From there you can create tag based policies from Ranger to manage access to anything 'PII' tagged in Atlas.

list of hadoop frameworks which we can use to select for a particular usecase.

Data Governance - http://searchdatamanagement.techtarget.com/feature/Data-governance-tools-for-Hadoop-infiltrate-the-enterprise

I) Apache Atlas - http://atlas.apache.org/

Features - Data Classification,Centralized Auditing,Search & Lineage (Browse),Security & Policy Engine, has below mentioned bridges to pull data from respective systems and hooks to update/add/remove data in atlas from respective systems.

Hive Bridge Sqoop Bridge Falcon Bridge Storm Bridge

  1. Tools required to build and run Apache Atlas on Eclipse http://atlas.apache.org/EclipseSetup.html

2.5 - Atlas High Level Architecture - Overview http://atlas.apache.org/Architecture.html

  1. Security - Apache Ranger is an advanced security management solution for the Hadoop ecosystem having wide integration with a variety of Hadoop components. By integrating with Atlas, Ranger allows security administrators to define metadata driven security policies for effective governance. Ranger is a consumer to the metadata change events notified by Atlas.

II) Commercially, Hortonworks rival Cloudera has released Navigator, which the company calls "the only complete data governance solution for Apache Hadoop" and features data discovery, continuous optimization, audit, lineage, metadata management and policy enforcement. The product is part of Cloudera Enterprise.