TALK TRACK
Data is powering successful clinical care and successful operations.
[NEXT SLIDE]
We have a lot to cover, want to apologize in advance
The point of Atlas is to leverage metadata to drive exchange, agility and scalability in the HDP gov solution. The paradigm shift requires that in a true data lake with multi-tenant environment with 10K+ of objects, conventional management of entitlement and enforcement will not work and new patterns must be used. One group cannot both understand the data and manage policy efficiently — the domain is too large. These activities must be de-coupled. The data stewards curate the data as they are the SMEs (tagging), and the policy folks create a policy once based on tags (access rules). In our thinking, this the ONLY scalable solution. We have it and CDH does not.
Apache Atlas = low level service like yarn. It will be common to the whole HDP platform, providing core metadata services and enriching the whole HDP stack. We start with Hive in HDP 2.3 and will extend to Ranger and Falcon in M10 and continue with Kafka and Storm by the end of 2015.
Yellow + Atlas = governance features.
Show – clearly identify customer metadata. Change
Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagnosis
** bring meta from external systems into hadoop – keep it together