TALK TRACK
Data is powering successful clinical care and successful operations.
[NEXT SLIDE]
How fast ? 7 months !
7
Apache Atlas is the only open source project created to solve the governance challenge in the open. The founding members of the project include all the members of the data governance initiative and others from the Hadoop community. The core functionality defined by the project includes the following:
Data Classification – create an understanding of the data within Hadoop and provide a classification of this data to external and internal sources
Centralized Auditing – provide a framework to capture and report on access to and modifications of data within Hadoop
Search & Lineage – allow pre-defined and ad hoc exploration of data and metadata while maintaining a history of how a data source or explicit data was constructed
Security and Policy Engine – implement engines to protect and rationalize data access and according to compliance policy
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis
** bring meta from external systems into hadoop – keep it together
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis
** bring meta from external systems into hadoop – keep it together
Which Vendors would you be interested in ?
The point of Atlas is to leverage metadata to drive exchange, agility and scalability in the HDP gov solution. The paradigm shift requires that in a true data lake with multi-tenant environment with 10K+ of objects, conventional management of entitlement and enforcement will not work and new patterns must be used. One group cannot both understand the data and manage policy efficiently — the domain is too large. These activities must be de-coupled. The data stewards curate the data as they are the SMEs (tagging), and the policy folks create a policy once based on tags (access rules). In our thinking, this the ONLY scalable solution. We have it and CDH does not.
Apache Atlas = low level service like yarn. It will be common to the whole HDP platform, providing core metadata services and enriching the whole HDP stack. We start with Hive in HDP 2.3 and will extend to Ranger and Falcon in M10 and continue with Kafka and Storm by the end of 2015.
Yellow + Atlas = governance features.
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis
** bring meta from external systems into hadoop – keep it together
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagonsis
** bring meta from external systems into hadoop – keep it together