Weitere ähnliche Inhalte
Ähnlich wie Fine-Grained Security for Spark and Hive (20)
Mehr von DataWorks Summit/Hadoop Summit (20)
Kürzlich hochgeladen (20)
Fine-Grained Security for Spark and Hive
- 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Fine-Grained Security
for Spark and Hive
Carter Shanklin - Director PM
Don Bosco Durai - Security Architect
June 29, 2016
- 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
● Current security options and challenges
● Apache Ranger Overview
● LLAP Overview
● Use Cases and Demo
● Apache Atlas Integration
- 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Options and Challenges
- 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Options and Challenges
⬢ Limited to storage level access control for Spark, Pig and MR
⬢ Column Level Access via HiveServer2
⬢ Row Level filtering need Hive Views
– Multiple Hive Views needs to be created and managed
– Explicit permissions need to be given for each view/user
– User need to know which view to use
⬢ Masking needs custom UDF
– Needs to be wrapped using Views
- 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger Overview
- 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all
access requests
• Support multiple destination
sources (HDFS, Solr, etc.)
• Real-time visual query
interface
AuditingAuthorization
• Store and manage
encryption keys
• Support HDFS TDE
• Integration with HSM
Ranger KMS
• Centralized platform to
define, administer and
manage security policies
consistently
• Enforce policies within each
component
- 9. © Hortonworks Inc. 2015. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit
Server
Ranger
Plugin
HadoopComponentsEnterprise
Users
Ranger
Plugin
Ranger
Plugin
Legacy Tools and Data
Governance
HDFS
Knox
NifI
Ranger
Plugin
Ranger
Plugin
RDBMS
Solr
Ranger
Plugin
Ranger Policy
Server Integration API
Kafka
Ranger
Plugin
YARN
Ranger
Plugin
Ranger
Plugin
Storm
Ranger
Plugin
Atlas
- 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Audits - Data Access
- 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Audits - Admin Actions
- 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LLAP Overview
- 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2.0 and LLAP
⬢ At a High Level:
– 2000+ features, improvements and bug
fixes in Hive since HDP 2.4.
– 600+ of these from outside of
Hortonworks.
⬢ Major Improvements:
– Preview: Hive LLAP: Persistent query
servers with intelligent in-memory
caching.
– ACID GA: Hardened and proven at scale.
– Expanded SQL Compliance: More capable
integration with BI tools.
– Performance: Interactive query, 2x faster
ETL.
– Security: Row / Column security
extending to views, Column level security
for Spark.
- 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Architecture Overview
- 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive 2 with LLAP: Open Interfaces
- 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Integration with Hive and LLAP
- 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hive / LLAP Security Capabilities with Ranger
⬢ Ranger Hive plugin provides authorization / access controls.
⬢ Column Masking:
– Inject Hive UDFs that mask characters or hash values.
– Dynamic, per-user.
⬢ Dynamic Row Filtering:
– Query is analyzed and policies applied.
– Dynamic, per-user.
⬢ All operations run as ordinary SQL queries:
– Masking statements convert to clauses in the SQL select clause.
– Filters convert to clauses in the SQL where clause.
- 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Native Hive Masking Capabilities
UDF Purpose Example Start Example Result
mask Convert letters to X/x and
numbers to n.
123 Fake St. nnn Xxxx Xx.
mask_first_n Mask only the first n
characters.
433-54-3937 nnn-54-3937
mask_last_n Mask only the last n
characters.
433-54-3937 433-54-nnnn
mask_show_first_n Mask, showing only the first
n characters.
555-233-1234 555-nnn-nnnn
mask_show_last_n Mask, showing only the last
n characters.
433-54-3937 nnn-nn-3937
mask_hash Produce a consistent hash of
the field.
CA 21f241cccaa5cfa33190f56ff1510e37
- 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Delivering Spark Security
- 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Features: Spark Column Security with LLAP
⬢ Fine-Grained Column Level Access Control for SparkSQL.
⬢ Fully dynamic policies per user. Doesn’t require views.
⬢ Use Standard Ranger policies and tools to control access and masking policies.
Flow:
1. SparkSQL gets data locations
known as “splits” from
HiveServer and plans query.
2. HiveServer2 authorizes access
using Ranger. Per-user policies
like row filtering are applied.
3. Spark gets a modified query
plan based on dynamic security
policy.
4. Spark reads data from LLAP.
Filtering / masking guaranteed
by LLAP server.
- 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Per-User Row Filtering by Region in SparkSQL
- 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases
- 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo Setup
⬢Customer User and Sales data in ORC (Metadata in MetaStore)
⬢Data can be access via SparkSQL or HiveServer2
⬢Marketing needs access to Sales and Users data for analytics
⬢Fraud Investigation team needs access to data for fraud detection
⬢Billing team needs access to Sales and Users data for billing
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
Sales
customer_id
product_id
promotion_id
cookie_id
tracking_id
Group Users
Fraud frank
Marketing mark
Billing bill
Tables
- 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 1: Restricting Column Access
This is a simple use case where certain groups or users don’t permission to view
the query
⬢Billing group has access to all columns in table Users
⬢Marketing group can’t access credit card column from table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column customer_phone customer_ccn
bill (Billing) 😀 😀
mark (Marketing) 😀 😡
- 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns
- 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns - Results
bill
from
Billing
mark
from
Marketing
- 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Restrict Columns - Audit Screen
- 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 2: Column Masking
In this use case where certain groups or users won't be able to see the real
value of certain columns.
⬢Billing group can see the real/raw values for all columns in table Users
⬢Fraud group can only see masked values of PII and PCI fields from table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column customer_email,
customer_phone,
customer_ccn
bill (Billing) 😀
frank (Fraud) 😎
- 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Mask Fields
- 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Column Masking - Results
bill
from
Billing
frank
from
Fraud
- 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Column Masking - Audit Screen
- 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 3: Row Filtering
In this use case where certain groups or users won't be able to see all the rows
from certain tables
⬢Billing group can see all the rows in the table Users
⬢Marketing can only see rows/data from their region in the table Users
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column Rows in Users table
bill (Billing) 😀
Mark (Marketing-
CA)
Only CA Users
- 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Row Filtering
- 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policy - Row Filtering - Results
bill
from
Billing
mark
from
Marketing
- 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case 4: Row Filtering - Cross Table
This an extension of previous use cases, where the context information for
filtering the row is in another table.
⬢Billing group can see all the rows in the table Sales
⬢Marketing can only see rows/data from their region in the table Sales,
however Sales table doesn’t have the customer geographic information, so it
needs to be derived from Users table
Users
customer_id
customer_name
customer_email
customer_phone
customer_ccn
customer_state
customer_zip
User/Column Rows in Sales table
bill (Billing) 😀
Mark (Marketing-
CA)
Only CA Users
Sales
customer_id
product_id
promotion_id
cookie_id
tracking_id
- 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Policies - Row Filtering - Cross Table
- 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Integration
- 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cross Product Symbiosis
Apache
Atlas
Apache
Ranger
LLAP
Classification/
Tagging
Governance
Lineage
Tag Based
Policies
Dynamic Custom
Policies
Enforcement hooks
HDFS S3
Meta
Store
* Column Masking and Row Filtering not yet supported by tag based policy
- 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger - Tag Based Policies
- 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q & A