SlideShare a Scribd company logo
1 of 15
© 2016 IBM CorporationHadoop Summit – San Jose 2016Hadoop Summit – San Jose 2015
Apache Ranger Hive Metastore Security
Yan Zhou (zhouya@us.ibm.com),
Tanping Wang(wangta@us.ibm.com)
IBM Big Insights Product Lead Architects, Silicon Valley Lab, IBM
© 2016 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Apache Ranger
 Provides centralized policy definition for authorizing & auditing access to resources
in a consistent manner.

Agent AgentAgent AgentAgent Agent
HBase Hive YARN Knox Storm Solr Kafka
Agent
HDFS
Agent
Audit
Server
Policy
Server
Administration
Portal
REST
APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group
syncLog4j
© 2016 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
HiveServer2 Ranger Authorization Model
Ranger
Policy
Manager
HiveServer2
Ranger
Agent
Admin sets policies for Hive
Databases/Tables/Columns
…
User
Application
Users access Hive data
through application HiveServer2
IT/Analysis
users access
HiveServer2
through Beeline
Hiveserver2 uses
Agent for
Authorization
Ranger Audit
Database Audit logs pushed to DB
HiveServer2
provides table data
access to user/client
1
2
2
3
4
5
Policy Refreshing
© 2016 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Motivation:
Gaps for the Current Hive Ranger Authorization Model
DO DO NOT
Hive CLI Hive CLI does not work with
Ranger
HiveServer 2 • Provides ACL to the database,
tables, columns and locks.
• Supports Ranger policy
creation or deletion from the
Hive Grant or Revoke
statements.
Do not support adjustments of
Hive-created policies as result of
DDLs:
• Once the DB object name is
changed from DDL, the Hive-
created policy in Ranger is out
of sync;
• Once the DB object is deleted,
the Hive-created policy in
Ranger becomes orphan.
© 2016 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Motivation:
Gaps for the Current HiveServer2 Ranger Authorization Mode (cont’d)
Resource ACL Sync Up GOOD NOT GOOD
Storage-based
Authorization
Consistent access controls by
Hive and HDFS
Is not good at controlling of SQL
data access at finer granularity
like COLUMN
SQL Standard-based
Authorization
Fits well with SQL standard
privilege model
Does not provide consistent
privileges across Hive and HDFS,
and potentially forbids the sharing
of Hive data with other Hadoop
apps
Needs a holistic view of the HDFS and Hive ACLs to provide a consistent privilege
control.
© 2016 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
We Introduce:
The New Hive Metastore Ranger Security Agent
Provides Use Cases
Hive CLI • ACLs for Hive CLI hive> SELECT * FROM employee;
Before: Hive decides the ACL on its own.
After: invoke the Hive Metastore Ranger security
agent to get the ACL from Ranger.
HiveServer2 • Authorization for the Metastore
objects
• ACLs is in sync with the SQL
objects all the time.
hive> GRANT SELECT on table employee to
user hr1;
hive> ALTER TABLE employee RENAME TO
employees;
Before: No changes on the Range policy for the
user, hr1 on the table, employee.
After: Ranger policy for hr1 changed to be on
employees.
© 2016 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
We Introduce:
The New Hive Metastore Ranger Security Agent (cont’d)
Provide Use Cases
Resource
ACL Sync
Up
 Provide consistent access control
between Hive and HDFS for SQL-
standard based privilege model.
beeline> CREATE TABLE employee(name
STRING); // by user “hr1”
beeline> LOAD DATA LOCAL INPATH
‘/data/input.txt’ OVERWRITE INTO TABLE
employee;
pig> LOAD ‘/user/hive/warehouse/employee’
USING PigStorage() AS (name:chararray)
Before: not allowed by the user, hr1
After: allowed by the user, hr1.
© 2016 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Hive Metastore Security Workflow – Hive CLI
Ranger
Policy
Manager
Admin sets policies
for Hive
Databases/Tables/C
olumns …
User
Application
Users access Hive data
through application
invoking Hive CLI
Hive CLI
IT/Analysis
users access
Hive data
through CLI
Ranger Audit
Database
Audit logs pushed to DB
Hive CLI
provides table
data access to
user/client
1
2
2
4
5
Ranger
Metastore
Agents
Hive CLI uses
agents for Authz,
and Policy Object
Sync from DDL
3
Policy Refreshing
© 2016 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Hive Metastore Security Workflow – HiveServer2
Ranger
Policy
Manager
Ranger
HiveServer2
Agent
Admin sets policies
for Hive
Databases/Tables/Col
umns …
User
Application
Users access Hive
data through
application
HiveServer2
IT/Analysis
users access
HiveServer2
through Beeline
Ranger Audit
Database
Audit logs pushed to DB
HiveServer2
provides table
data access to
user/client
1
2
2
3
5
6
Ranger
Metastore
Agents
4
Policy Refreshing
Hiveserver2
uses Ranger
Agent for
Authz
HiveServer2
uses Ranger
Metastore
agent for ACL
Object Sync
on DDL
© 2016 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Metastore Security Workflow – HDFS ACL Sync (Ongoing)
Ranger
Policy
Manager
Admin sets policies
for Hive
Databases/Tables/Col
umns …
HiveServer2
IT/Analysis
user Joe
1
1 Ranger
Metastore
Agents
HDFS uses Agent
for authorization
Create table t1
Sets new HDFS policy for Joe on
/user/hive/warehouse/t1
2
2
Ranger
HDFS
Agent
HDFS
NameNode
HiveServer2
passes Hive
Metadata to
Metastore
Agents
5
Joe uses
PIG to
read Hive
Data in
/user/hive/
warehouse
/t1
PIG
6
Policy Refreshing
Passes HDFS security
info to Policy Manager3
4
© 2016 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Hive Security Hooks and Their Ranger Implementation/Extensions
Hive
Authorizer
MetaStorePre
EventListener
MetaStore
EventListener
RangerHive
Authorizer
RangerHive
Metastore
Authorizer
RangerHive
Metastore
PrivilegeHandler
implements extends extends
Hive
© 2016 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Implementation/Extensions of Hive Security Hooks
 RangerHiveAuthorizer
 Existing Ranger Hive Agent
 Methods: check/grant/revokePrivileges
 Handles: HiveServer2 Authorization; Grant/Revoke
 RangerHiveMetastoreAuthorizer
 New Ranger Hive Metastore Agent
 Methods: on(Create/Drop/Alter)(Table/Database/Index/…)
 Handles: CLI Authorization
 RangerHiveMetastorePrivilegeHandler
 New Ranger Hive Metastore Agent
 Methods: (create/drop/alter)(Table/Databse/Index/…)
 Handles: Sync of Hive ACL objects and Resource ACLs
© 2016 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Status, Future Plan and References
 Patch Ready:
o CLI access control
o Policy Object Sync from DDL
 Ongoing Work:
o Resource ACL Sync
 References:
o https://issues.apache.org/jira/browse/RANGER-768
o https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization
o https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+
Metastore+Server
© 2016 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Demo
 Software Versions: Ranger 6.0 + Hadoop 2.7.0 + Hive 1.2.1
 Test Cases:
With Ranger HiveServer2 Agent but without Ranger Hive Metastore Security Agents
• CLI: SQL not subject to Ranger ACLs
• HiveServer2: No Object sync of Ranger ACLs as result of SQL DDL
With Ranger HiveServer2 Agent and Ranger Hive Metastore Security Agents
• CLI: SQL subject to Ranger ACLs
• HiveServer2: Object sync of Ranger ACLs as result of SQL DDL
© 2016 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Q & A

More Related Content

What's hot

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 

What's hot (20)

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Apache flink
Apache flinkApache flink
Apache flink
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 

Viewers also liked

Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 

Viewers also liked (20)

End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 

Similar to Apache Ranger Hive Metastore Security

Lessons Learned on How to Secure Petabytes of Data
Lessons Learned on How to Secure Petabytes of DataLessons Learned on How to Secure Petabytes of Data
Lessons Learned on How to Secure Petabytes of Data
DataWorks Summit
 

Similar to Apache Ranger Hive Metastore Security (20)

Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorksBig Data Security on Microsoft Azure - HDInsight and HortonWorks
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
 
Lessons Learned on How to Secure Petabytes of Data
Lessons Learned on How to Secure Petabytes of DataLessons Learned on How to Secure Petabytes of Data
Lessons Learned on How to Secure Petabytes of Data
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
 
PRAFUL_HADOOP
PRAFUL_HADOOPPRAFUL_HADOOP
PRAFUL_HADOOP
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Api manager preconference
Api manager preconferenceApi manager preconference
Api manager preconference
 
Building Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and KafkaBuilding Big Data Applications using Spark, Hive, HBase and Kafka
Building Big Data Applications using Spark, Hive, HBase and Kafka
 

More from DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Apache Ranger Hive Metastore Security

  • 1. © 2016 IBM CorporationHadoop Summit – San Jose 2016Hadoop Summit – San Jose 2015 Apache Ranger Hive Metastore Security Yan Zhou (zhouya@us.ibm.com), Tanping Wang(wangta@us.ibm.com) IBM Big Insights Product Lead Architects, Silicon Valley Lab, IBM
  • 2. © 2016 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Apache Ranger  Provides centralized policy definition for authorizing & auditing access to resources in a consistent manner.  Agent AgentAgent AgentAgent Agent HBase Hive YARN Knox Storm Solr Kafka Agent HDFS Agent Audit Server Policy Server Administration Portal REST APIs DB SOLR HDFS KMS LDAP/AD user/group syncLog4j
  • 3. © 2016 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 HiveServer2 Ranger Authorization Model Ranger Policy Manager HiveServer2 Ranger Agent Admin sets policies for Hive Databases/Tables/Columns … User Application Users access Hive data through application HiveServer2 IT/Analysis users access HiveServer2 through Beeline Hiveserver2 uses Agent for Authorization Ranger Audit Database Audit logs pushed to DB HiveServer2 provides table data access to user/client 1 2 2 3 4 5 Policy Refreshing
  • 4. © 2016 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Motivation: Gaps for the Current Hive Ranger Authorization Model DO DO NOT Hive CLI Hive CLI does not work with Ranger HiveServer 2 • Provides ACL to the database, tables, columns and locks. • Supports Ranger policy creation or deletion from the Hive Grant or Revoke statements. Do not support adjustments of Hive-created policies as result of DDLs: • Once the DB object name is changed from DDL, the Hive- created policy in Ranger is out of sync; • Once the DB object is deleted, the Hive-created policy in Ranger becomes orphan.
  • 5. © 2016 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Motivation: Gaps for the Current HiveServer2 Ranger Authorization Mode (cont’d) Resource ACL Sync Up GOOD NOT GOOD Storage-based Authorization Consistent access controls by Hive and HDFS Is not good at controlling of SQL data access at finer granularity like COLUMN SQL Standard-based Authorization Fits well with SQL standard privilege model Does not provide consistent privileges across Hive and HDFS, and potentially forbids the sharing of Hive data with other Hadoop apps Needs a holistic view of the HDFS and Hive ACLs to provide a consistent privilege control.
  • 6. © 2016 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 We Introduce: The New Hive Metastore Ranger Security Agent Provides Use Cases Hive CLI • ACLs for Hive CLI hive> SELECT * FROM employee; Before: Hive decides the ACL on its own. After: invoke the Hive Metastore Ranger security agent to get the ACL from Ranger. HiveServer2 • Authorization for the Metastore objects • ACLs is in sync with the SQL objects all the time. hive> GRANT SELECT on table employee to user hr1; hive> ALTER TABLE employee RENAME TO employees; Before: No changes on the Range policy for the user, hr1 on the table, employee. After: Ranger policy for hr1 changed to be on employees.
  • 7. © 2016 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 We Introduce: The New Hive Metastore Ranger Security Agent (cont’d) Provide Use Cases Resource ACL Sync Up  Provide consistent access control between Hive and HDFS for SQL- standard based privilege model. beeline> CREATE TABLE employee(name STRING); // by user “hr1” beeline> LOAD DATA LOCAL INPATH ‘/data/input.txt’ OVERWRITE INTO TABLE employee; pig> LOAD ‘/user/hive/warehouse/employee’ USING PigStorage() AS (name:chararray) Before: not allowed by the user, hr1 After: allowed by the user, hr1.
  • 8. © 2016 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Hive Metastore Security Workflow – Hive CLI Ranger Policy Manager Admin sets policies for Hive Databases/Tables/C olumns … User Application Users access Hive data through application invoking Hive CLI Hive CLI IT/Analysis users access Hive data through CLI Ranger Audit Database Audit logs pushed to DB Hive CLI provides table data access to user/client 1 2 2 4 5 Ranger Metastore Agents Hive CLI uses agents for Authz, and Policy Object Sync from DDL 3 Policy Refreshing
  • 9. © 2016 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Hive Metastore Security Workflow – HiveServer2 Ranger Policy Manager Ranger HiveServer2 Agent Admin sets policies for Hive Databases/Tables/Col umns … User Application Users access Hive data through application HiveServer2 IT/Analysis users access HiveServer2 through Beeline Ranger Audit Database Audit logs pushed to DB HiveServer2 provides table data access to user/client 1 2 2 3 5 6 Ranger Metastore Agents 4 Policy Refreshing Hiveserver2 uses Ranger Agent for Authz HiveServer2 uses Ranger Metastore agent for ACL Object Sync on DDL
  • 10. © 2016 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Metastore Security Workflow – HDFS ACL Sync (Ongoing) Ranger Policy Manager Admin sets policies for Hive Databases/Tables/Col umns … HiveServer2 IT/Analysis user Joe 1 1 Ranger Metastore Agents HDFS uses Agent for authorization Create table t1 Sets new HDFS policy for Joe on /user/hive/warehouse/t1 2 2 Ranger HDFS Agent HDFS NameNode HiveServer2 passes Hive Metadata to Metastore Agents 5 Joe uses PIG to read Hive Data in /user/hive/ warehouse /t1 PIG 6 Policy Refreshing Passes HDFS security info to Policy Manager3 4
  • 11. © 2016 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Hive Security Hooks and Their Ranger Implementation/Extensions Hive Authorizer MetaStorePre EventListener MetaStore EventListener RangerHive Authorizer RangerHive Metastore Authorizer RangerHive Metastore PrivilegeHandler implements extends extends Hive
  • 12. © 2016 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Ranger Implementation/Extensions of Hive Security Hooks  RangerHiveAuthorizer  Existing Ranger Hive Agent  Methods: check/grant/revokePrivileges  Handles: HiveServer2 Authorization; Grant/Revoke  RangerHiveMetastoreAuthorizer  New Ranger Hive Metastore Agent  Methods: on(Create/Drop/Alter)(Table/Database/Index/…)  Handles: CLI Authorization  RangerHiveMetastorePrivilegeHandler  New Ranger Hive Metastore Agent  Methods: (create/drop/alter)(Table/Databse/Index/…)  Handles: Sync of Hive ACL objects and Resource ACLs
  • 13. © 2016 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Status, Future Plan and References  Patch Ready: o CLI access control o Policy Object Sync from DDL  Ongoing Work: o Resource ACL Sync  References: o https://issues.apache.org/jira/browse/RANGER-768 o https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization o https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+ Metastore+Server
  • 14. © 2016 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Demo  Software Versions: Ranger 6.0 + Hadoop 2.7.0 + Hive 1.2.1  Test Cases: With Ranger HiveServer2 Agent but without Ranger Hive Metastore Security Agents • CLI: SQL not subject to Ranger ACLs • HiveServer2: No Object sync of Ranger ACLs as result of SQL DDL With Ranger HiveServer2 Agent and Ranger Hive Metastore Security Agents • CLI: SQL subject to Ranger ACLs • HiveServer2: Object sync of Ranger ACLs as result of SQL DDL
  • 15. © 2016 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016 Q & A