More Related Content Similar to Apache Ranger Hive Metastore Security (20) More from DataWorks Summit/Hadoop Summit (20) Apache Ranger Hive Metastore Security 1. © 2016 IBM CorporationHadoop Summit – San Jose 2016Hadoop Summit – San Jose 2015
Apache Ranger Hive Metastore Security
Yan Zhou (zhouya@us.ibm.com),
Tanping Wang(wangta@us.ibm.com)
IBM Big Insights Product Lead Architects, Silicon Valley Lab, IBM
2. © 2016 IBM Corporation2 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Apache Ranger
Provides centralized policy definition for authorizing & auditing access to resources
in a consistent manner.
Agent AgentAgent AgentAgent Agent
HBase Hive YARN Knox Storm Solr Kafka
Agent
HDFS
Agent
Audit
Server
Policy
Server
Administration
Portal
REST
APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group
syncLog4j
3. © 2016 IBM Corporation3 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
HiveServer2 Ranger Authorization Model
Ranger
Policy
Manager
HiveServer2
Ranger
Agent
Admin sets policies for Hive
Databases/Tables/Columns
…
User
Application
Users access Hive data
through application HiveServer2
IT/Analysis
users access
HiveServer2
through Beeline
Hiveserver2 uses
Agent for
Authorization
Ranger Audit
Database Audit logs pushed to DB
HiveServer2
provides table data
access to user/client
1
2
2
3
4
5
Policy Refreshing
4. © 2016 IBM Corporation4 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Motivation:
Gaps for the Current Hive Ranger Authorization Model
DO DO NOT
Hive CLI Hive CLI does not work with
Ranger
HiveServer 2 • Provides ACL to the database,
tables, columns and locks.
• Supports Ranger policy
creation or deletion from the
Hive Grant or Revoke
statements.
Do not support adjustments of
Hive-created policies as result of
DDLs:
• Once the DB object name is
changed from DDL, the Hive-
created policy in Ranger is out
of sync;
• Once the DB object is deleted,
the Hive-created policy in
Ranger becomes orphan.
5. © 2016 IBM Corporation5 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Motivation:
Gaps for the Current HiveServer2 Ranger Authorization Mode (cont’d)
Resource ACL Sync Up GOOD NOT GOOD
Storage-based
Authorization
Consistent access controls by
Hive and HDFS
Is not good at controlling of SQL
data access at finer granularity
like COLUMN
SQL Standard-based
Authorization
Fits well with SQL standard
privilege model
Does not provide consistent
privileges across Hive and HDFS,
and potentially forbids the sharing
of Hive data with other Hadoop
apps
Needs a holistic view of the HDFS and Hive ACLs to provide a consistent privilege
control.
6. © 2016 IBM Corporation6 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
We Introduce:
The New Hive Metastore Ranger Security Agent
Provides Use Cases
Hive CLI • ACLs for Hive CLI hive> SELECT * FROM employee;
Before: Hive decides the ACL on its own.
After: invoke the Hive Metastore Ranger security
agent to get the ACL from Ranger.
HiveServer2 • Authorization for the Metastore
objects
• ACLs is in sync with the SQL
objects all the time.
hive> GRANT SELECT on table employee to
user hr1;
hive> ALTER TABLE employee RENAME TO
employees;
Before: No changes on the Range policy for the
user, hr1 on the table, employee.
After: Ranger policy for hr1 changed to be on
employees.
7. © 2016 IBM Corporation7 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
We Introduce:
The New Hive Metastore Ranger Security Agent (cont’d)
Provide Use Cases
Resource
ACL Sync
Up
Provide consistent access control
between Hive and HDFS for SQL-
standard based privilege model.
beeline> CREATE TABLE employee(name
STRING); // by user “hr1”
beeline> LOAD DATA LOCAL INPATH
‘/data/input.txt’ OVERWRITE INTO TABLE
employee;
pig> LOAD ‘/user/hive/warehouse/employee’
USING PigStorage() AS (name:chararray)
Before: not allowed by the user, hr1
After: allowed by the user, hr1.
8. © 2016 IBM Corporation8 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Hive Metastore Security Workflow – Hive CLI
Ranger
Policy
Manager
Admin sets policies
for Hive
Databases/Tables/C
olumns …
User
Application
Users access Hive data
through application
invoking Hive CLI
Hive CLI
IT/Analysis
users access
Hive data
through CLI
Ranger Audit
Database
Audit logs pushed to DB
Hive CLI
provides table
data access to
user/client
1
2
2
4
5
Ranger
Metastore
Agents
Hive CLI uses
agents for Authz,
and Policy Object
Sync from DDL
3
Policy Refreshing
9. © 2016 IBM Corporation9 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Hive Metastore Security Workflow – HiveServer2
Ranger
Policy
Manager
Ranger
HiveServer2
Agent
Admin sets policies
for Hive
Databases/Tables/Col
umns …
User
Application
Users access Hive
data through
application
HiveServer2
IT/Analysis
users access
HiveServer2
through Beeline
Ranger Audit
Database
Audit logs pushed to DB
HiveServer2
provides table
data access to
user/client
1
2
2
3
5
6
Ranger
Metastore
Agents
4
Policy Refreshing
Hiveserver2
uses Ranger
Agent for
Authz
HiveServer2
uses Ranger
Metastore
agent for ACL
Object Sync
on DDL
10. © 2016 IBM Corporation10 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Metastore Security Workflow – HDFS ACL Sync (Ongoing)
Ranger
Policy
Manager
Admin sets policies
for Hive
Databases/Tables/Col
umns …
HiveServer2
IT/Analysis
user Joe
1
1 Ranger
Metastore
Agents
HDFS uses Agent
for authorization
Create table t1
Sets new HDFS policy for Joe on
/user/hive/warehouse/t1
2
2
Ranger
HDFS
Agent
HDFS
NameNode
HiveServer2
passes Hive
Metadata to
Metastore
Agents
5
Joe uses
PIG to
read Hive
Data in
/user/hive/
warehouse
/t1
PIG
6
Policy Refreshing
Passes HDFS security
info to Policy Manager3
4
11. © 2016 IBM Corporation11 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Hive Security Hooks and Their Ranger Implementation/Extensions
Hive
Authorizer
MetaStorePre
EventListener
MetaStore
EventListener
RangerHive
Authorizer
RangerHive
Metastore
Authorizer
RangerHive
Metastore
PrivilegeHandler
implements extends extends
Hive
12. © 2016 IBM Corporation12 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Ranger Implementation/Extensions of Hive Security Hooks
RangerHiveAuthorizer
Existing Ranger Hive Agent
Methods: check/grant/revokePrivileges
Handles: HiveServer2 Authorization; Grant/Revoke
RangerHiveMetastoreAuthorizer
New Ranger Hive Metastore Agent
Methods: on(Create/Drop/Alter)(Table/Database/Index/…)
Handles: CLI Authorization
RangerHiveMetastorePrivilegeHandler
New Ranger Hive Metastore Agent
Methods: (create/drop/alter)(Table/Databse/Index/…)
Handles: Sync of Hive ACL objects and Resource ACLs
13. © 2016 IBM Corporation13 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Status, Future Plan and References
Patch Ready:
o CLI access control
o Policy Object Sync from DDL
Ongoing Work:
o Resource ACL Sync
References:
o https://issues.apache.org/jira/browse/RANGER-768
o https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization
o https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+
Metastore+Server
14. © 2016 IBM Corporation14 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Demo
Software Versions: Ranger 6.0 + Hadoop 2.7.0 + Hive 1.2.1
Test Cases:
With Ranger HiveServer2 Agent but without Ranger Hive Metastore Security Agents
• CLI: SQL not subject to Ranger ACLs
• HiveServer2: No Object sync of Ranger ACLs as result of SQL DDL
With Ranger HiveServer2 Agent and Ranger Hive Metastore Security Agents
• CLI: SQL subject to Ranger ACLs
• HiveServer2: Object sync of Ranger ACLs as result of SQL DDL
15. © 2016 IBM Corporation15 Hadoop Summit – San Jose, CA – June 2015Hadoop Summit – San Jose, CA – June 2016
Q & A