SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Talk Title Here
Author Name, Company
Security and Compliance with
Atlas and Ranger
Vimal Sharma, Hortonworks
Agenda
• Apache Atlas
– Introduction
– Architecture
– Cross Component Lineage
• Apache Ranger
– Introduction
– Architecture
• Tag Based Policies
– Use cases and advantages
– Demo
Apache Atlas
• Incubated to Apache in May 2015
• Organizations : IBM, Hortonworks, Aetna, Merck
• 3 releases in last year
• Graduated to a Top Level Project in June 2017
0.7
(July 2016)
0.7.1
(Jan 2017)
0.8
(Mar 2017)
TLP
(June 2017)
Apache Atlas Introduction
 Governance and Metadata framework for Hadoop
 Model a component and capture metadata
 Data Assets - Hive Table, HBase column family
 Process – Hive CTAS, Storm Topology
 Classification - Tag metadata entities
 Built-in support for popular components
 Extensible Architecture
 Cross Component Lineage
 Export/Import of metadata
Integration
Core
Apps
Type System
Graph Abstraction/Engine
API
<HTTP/REST>
Titan/Janusgraph
Metadata
Store <HBase>
Index Store
<Solr>
UI
Metadata Sources
Messaging
<Kafka>
Hive Sqoop Storm Custom Ranger Tag Based
Policies
Ingest / Export Search
Apache Atlas: Architecture
Cross Component Lineage
• Lineage: Upstream and downstream Data Assets
relationship
• Individual Components : Own Metadata store
• Cross Component events are common
• Atlas : Flexibility to model arbitrary components
– Arbitrary lineage can be captured
HDFS Path
Spark
Process
Kafka
Topic
Lineage Demo Example
PayrollDetails
(HDFS PATH)
VariableComponent
(HDFS PATH)
SalaryProcessor
(DATAFRAME)
EmployeeSalary
(KAKFA TOPIC)
Lineage Use Cases
 ETL Pipelines
• Upstream failure analysis
• Alerts to downstream processes
 Redundant Processing
• Can metadata classification be used to determine this?
• Avoid expensive processing
 Compliance and Security
• Impose security constraints on sensitive data
• Data can span multiple Hadoop components
• One policy to govern them all
Apache Ranger Introduction
• Framework to enforce security on Hadoop
• Support for Hive, HBase, YARN and more
• Policies for resources like table, files
• Specific policies for users/groups
• Audit and policy analytics
• Atlas Integration
• Import and export of policies
Ranger Architecture
Ranger Plugins
• Reside in component process space
• Periodically poll Ranger Policy Store
• Keep a cache of current policies
• Copy of policies in disk
• Access request evaluated against list of policies
• User request data sent to Audit store
Atlas Ranger Integration
• Ranger : Listener on Tag addition/deletion
• Attribute based policies rather than asset based policies
• Advantages
– No need to create/update policies for individual resources
– Resources belonging to multiple components can be tagged
Atlas
Tag - PII
Ranger
TagSync
Enforce
Policies
Policy Evaluation Flow
Tag Based Policy Demo
• Define tag EXPIRES_ON in Atlas with attribute
expiry_date
• Attach this tag to Hive tables:
– tax_2010 with expiry_date – Dec 2016
– tax_2015 with expiry_date – Dec 2017
• Data access should be refused for first but allowed for
second
• Inspect Ranger Audit to verify
Why Tag based policies?
• Data Stewards
– Mine data to determine qualifying tags - PII, GeoLocation
– Attach tag to resource
– No overlap with admin’s responsibilities
• Lineage – crucial to determine candidate tags
• Tag policies are intact when resources are renamed/deleted
– Tag instances can be removed but tag definition cannot
– Resources may be volatile and so are policies on them
– Migration of tags and policies across clusters
Contribute
• Atlas Project page - http://atlas.apache.org/
• Atlas Mailing list - dev@atlas.apache.org
• Ranger Project page - https://ranger.apache.org/
• Ranger Mailing list - dev@ranger.apache.org
• Apache ID – svimal2106@apache.org
Questions
Image Source : LinkedIn

Weitere ähnliche Inhalte

Was ist angesagt?

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Leveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOpsLeveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOpsSeniorStoryteller
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyKishore Gopalakrishna
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep diveDataWorks Summit
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to KibanaVineet .
 

Was ist angesagt? (20)

Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Leveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOpsLeveraging Nexus Repository Manager at the Heart of DevOps
Leveraging Nexus Repository Manager at the Heart of DevOps
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Building real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case studyBuilding real time analytics applications using pinot : A LinkedIn case study
Building real time analytics applications using pinot : A LinkedIn case study
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Introduction to Kibana
Introduction to KibanaIntroduction to Kibana
Introduction to Kibana
 

Ähnlich wie Tag based policies using Apache Atlas and Ranger

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkVimal Sharma
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...DataWorks Summit/Hadoop Summit
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnectorNigel Jones
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Nishant Gandhi
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)Cloud Foundry Roadmap (Cloud Foundry Summit 2014)
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)VMware Tanzu
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Rukmani Gopalan
 
Cf summit2014 roadmap
Cf summit2014 roadmapCf summit2014 roadmap
Cf summit2014 roadmapJames Bayer
 

Ähnlich wie Tag based policies using Apache Atlas and Ranger (20)

JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Fifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas TalkFifth Elephant Apache Atlas Talk
Fifth Elephant Apache Atlas Talk
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)Cloud Foundry Roadmap (Cloud Foundry Summit 2014)
Cloud Foundry Roadmap (Cloud Foundry Summit 2014)
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Cloud Strategy
Cloud StrategyCloud Strategy
Cloud Strategy
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Cf summit2014 roadmap
Cf summit2014 roadmapCf summit2014 roadmap
Cf summit2014 roadmap
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Tag based policies using Apache Atlas and Ranger

  • 1. Talk Title Here Author Name, Company Security and Compliance with Atlas and Ranger Vimal Sharma, Hortonworks
  • 2. Agenda • Apache Atlas – Introduction – Architecture – Cross Component Lineage • Apache Ranger – Introduction – Architecture • Tag Based Policies – Use cases and advantages – Demo
  • 3. Apache Atlas • Incubated to Apache in May 2015 • Organizations : IBM, Hortonworks, Aetna, Merck • 3 releases in last year • Graduated to a Top Level Project in June 2017 0.7 (July 2016) 0.7.1 (Jan 2017) 0.8 (Mar 2017) TLP (June 2017)
  • 4. Apache Atlas Introduction  Governance and Metadata framework for Hadoop  Model a component and capture metadata  Data Assets - Hive Table, HBase column family  Process – Hive CTAS, Storm Topology  Classification - Tag metadata entities  Built-in support for popular components  Extensible Architecture  Cross Component Lineage  Export/Import of metadata
  • 5. Integration Core Apps Type System Graph Abstraction/Engine API <HTTP/REST> Titan/Janusgraph Metadata Store <HBase> Index Store <Solr> UI Metadata Sources Messaging <Kafka> Hive Sqoop Storm Custom Ranger Tag Based Policies Ingest / Export Search Apache Atlas: Architecture
  • 6. Cross Component Lineage • Lineage: Upstream and downstream Data Assets relationship • Individual Components : Own Metadata store • Cross Component events are common • Atlas : Flexibility to model arbitrary components – Arbitrary lineage can be captured HDFS Path Spark Process Kafka Topic
  • 7. Lineage Demo Example PayrollDetails (HDFS PATH) VariableComponent (HDFS PATH) SalaryProcessor (DATAFRAME) EmployeeSalary (KAKFA TOPIC)
  • 8. Lineage Use Cases  ETL Pipelines • Upstream failure analysis • Alerts to downstream processes  Redundant Processing • Can metadata classification be used to determine this? • Avoid expensive processing  Compliance and Security • Impose security constraints on sensitive data • Data can span multiple Hadoop components • One policy to govern them all
  • 9. Apache Ranger Introduction • Framework to enforce security on Hadoop • Support for Hive, HBase, YARN and more • Policies for resources like table, files • Specific policies for users/groups • Audit and policy analytics • Atlas Integration • Import and export of policies
  • 11. Ranger Plugins • Reside in component process space • Periodically poll Ranger Policy Store • Keep a cache of current policies • Copy of policies in disk • Access request evaluated against list of policies • User request data sent to Audit store
  • 12. Atlas Ranger Integration • Ranger : Listener on Tag addition/deletion • Attribute based policies rather than asset based policies • Advantages – No need to create/update policies for individual resources – Resources belonging to multiple components can be tagged Atlas Tag - PII Ranger TagSync Enforce Policies
  • 14. Tag Based Policy Demo • Define tag EXPIRES_ON in Atlas with attribute expiry_date • Attach this tag to Hive tables: – tax_2010 with expiry_date – Dec 2016 – tax_2015 with expiry_date – Dec 2017 • Data access should be refused for first but allowed for second • Inspect Ranger Audit to verify
  • 15. Why Tag based policies? • Data Stewards – Mine data to determine qualifying tags - PII, GeoLocation – Attach tag to resource – No overlap with admin’s responsibilities • Lineage – crucial to determine candidate tags • Tag policies are intact when resources are renamed/deleted – Tag instances can be removed but tag definition cannot – Resources may be volatile and so are policies on them – Migration of tags and policies across clusters
  • 16. Contribute • Atlas Project page - http://atlas.apache.org/ • Atlas Mailing list - dev@atlas.apache.org • Ranger Project page - https://ranger.apache.org/ • Ranger Mailing list - dev@ranger.apache.org • Apache ID – svimal2106@apache.org

Hinweis der Redaktion

  1. Atlas is a tool to model elements in the Hadoop ecosystem and create objects of those components Data Assets e.g Hive table Processes : Storm Topology Store : Metadata, classifications as tags Built in support for popular components Extensible architechture