This document summarizes a presentation about security in Hadoop systems. It discusses how the Apache Ranger project provides centralized security administration, authorization, and auditing across Hadoop. Ranger also helps with authentication using Kerberos and Knox. The presentation outlines current security capabilities and future plans to enhance Ranger with dynamic policies, attribute-based access control, improved auditing, and encryption support.
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Apache Ranger Secures Hadoop Ecosystem
1. Page1 Hadoop Summit, Brussels, April 2015
Security needs in Hadoop’s Current and
Future – How Apache Ranger can help?
Balaji Ganesan
Don Bosco Durai
@Hortonworks
April 16, 2015
2. Page2 Hadoop Summit, Brussels, April 2015
Hadoop exacerbates the security challenge
New Security
Requirements
• Hadoop as data lake –
data being centralized
• Different methods for
accessing same data
• Data security for multi
tenant use cases
• Need for centralized and
consistent approach
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMP
P
EDW
Clickstream Web
& Social
Geolocation Sensor
& Machine
Server
Logs
Unstructured
SOURCES
Existing Systems
ERP CRM SCM
4. Page4 Hadoop Summit, Brussels, April 2015
Security in Hadoop today
First level of security requirements
built in
Administration
Central management & consistent security
Apache Ranger
Authentication
Authenticate users and systems
Apache Knox, Native Kerberos
Authorization
Provision access to data
Apache Ranger
Audit
Maintain a record of data access
Apache Ranger, Hadoop native
audit
Data Protection
Protect data at rest and in motion
HDFS transparent, Hbase
encryption, Vendor solutions
5. Page5 Hadoop Summit, Brussels, April 2015
Central Security Administration, Authorization & Audit
Apache Ranger
(fka XA Secure)
• Delivers a ‘single pane of glass’ for the
security administrator
• Centralizes administration of security
policy
• Ensures consistent coverage across
HDFS, Hive, Hbase, Storm and Knox
6. Page6 Hadoop Summit, Brussels, April 2015
Authentication – Kerberos
What does Kerberos Do?
• Establishes identity for clients, hosts and
services
• Prevents impersonation/passwords are never
sent over the wire
• Integrates w/ enterprise identity mgmt tools such
as LDAP &Active Directory
• More granular auditing of data access/job
execution
Ambari 2.0 automates Kerberos deployment
7. Page7 Hadoop Summit, Brussels, April 2015
Authentication - API Security with Knox
• Eliminates SSH “edge node”
• Central API management
• Central audit control
• Service level Authorization
• SSO Integration –
Siteminder and OAM*
• LDAP & AD integration
Apache Knox extends the reach of Hadoop
REST API without Kerberos complexities.
Integrated with existing
systems to simplify
identity maintenance
Single, simple point of
access for a cluster
Central controls ensure
consistency across one or
more clusters
• Kerberos Encapsulation
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
8. Page8 Hadoop Summit, Brussels, April 2015
Data Protection
Hadoop permits you to apply data protection policy at
different layers across the Hadoop stack
Layer What? How ?
Storage Encrypt data while it is at rest HDFS file encryption, Hbase Encryption
Transmission Encrypt data as it moves Supported in Hadoop
10. Page11 Hadoop Summit, Brussels, April 2015
Future of Hadoop Security
How Apache Ranger can help?
11. Page12 Hadoop Summit, Brussels, April 2015
Security Requirements
Beyond basic security..
Administration
Central management & consistent security
• Tag based policies
• Extend beyond Hadoop
Authentication
Authenticate users and systems
• Single Sign on
Authorization
Provision access to data
• Dynamic, Attribute based access
control (ABAC)
Audit
Maintain a record of data access
• Activity monitoring, intrusion
detection
Data Protection
Protect data at rest and in motion
• Encryption as first class citizen,
masking and anonymization
12. Page13 Hadoop Summit, Brussels, April 2015
Apache Atlas
Future of Security – Data Classification w/ Apache
Atlas
Knowledge Store
Knowledge store categorized with appropriate business-
oriented taxonomy
• Data sets & objects
• Tables / Columns
• Logical context
• Source, destination
Support exchange of metadata between foundation
components and third-party applications/governance tools
Leverages existing Hadoop metastores
Audit Store
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Knowledge Store
ModelsType-System
Policy RulesTaxonomies
13. Page14 Hadoop Summit, Brussels, April 2015
Hive Policy
Table1, Col A | Marketing | Select
Table 2, All | IT Admin | Create
HDFS
HiveServer 2
A B C
Beeline
Client
Ranger
Source
Data
ETL,
Data
Ingest
Current Ranger Setup
Sqoop,
Flume
14. Page15 Hadoop Summit, Brussels, April 2015
HDFS
HiveServer 2
A B C
Beeline
Client
Ranger
Source
Data
ETL,
Data
Ingest
Flume,
Sqoop
Metadata
Server
Tag Policy
Campaign | Marketing | Select
Logs | IT Admin | Create
Data Classification
Table1, Col A | “Campaign”
Table 2 | “Logs”
Future of Security – Tag based Policies
15. Page16 Hadoop Summit, Brussels, April 2015
Future of Security - Administration
Centralized Administration across big data applications
• Ranger provides a pluggable architecture for policy administration and
enforcement
Future Needs
• Custom plugins can be created for any data store, hooked up to Ranger
admin
• Build plugins to manage ACLs for big data BI applications, EDW
• Provides “single pane of glass” for end users managing security for the entire
big data environment
16. Page17 Hadoop Summit, Brussels, April 2015
Future of Security – Centralized Administration
Ranger Stacks
• Easily added a new “service” to Ranger
• Enable customers and partners to add new component support easily
Ranger Administration Portal
HDFS
Hive Server2
Ranger Policy
Server
Ranger Audit
Server
Ranger
Plugin
Ranger
Plugin
Hbase
Ranger
Plugin
New Service
Ranger
Plugin*
17. Page18 Hadoop Summit, Brussels, April 2015
Future of Security – Adding new service to Ranger
Adding a new
service using
JSON
18. Page19 Hadoop Summit, Brussels, April 2015
Future of Security – Adding new plugins
Permission Interface
Ranger Implementation
Component Process (e.g. HiveServer2)
Create/Ins
ert
Edit/Updat
e
View/Sele
ct
Other
Actions
Check Permission
Ranger Policy
Admin
DB
Ranger Centralized
Audit Store
Ranger
Policy
Cache
19. Page20 Hadoop Summit, Brussels, April 2015
Future of Security - Authorization
Dynamic, Attribute based access control (ABAC)
• Ranger currently provides hooks to embed dynamic rules in the policies
Future Security Needs
• Extend Ranger to support data or user attributes in policy decisions
• Examples,
• Use geo location of users to determine access
• Access available only between 9a -5p local time
21. Page22 Hadoop Summit, Brussels, April 2015
Future of Security - Auditing
Monitoring, intrusion detection through audit data
• Ranger currently captures detailed audit data, stores in HDFS or RDBMS
Future Work
• Ranger can stream audit data through Kafka, Storm into multiple datastores
• Add support for correlation, processing in Storm
• Alerts based on rules
• Add support for feeding in audit data from external sources (network events,
syslogs etc)
• Ranger UI can provide dashboard to monitor audit events
22. Page23 Hadoop Summit, Brussels, April 2015
Future of Security - Auditing
Ranger
Audit
Hive
Storm
Kafka
Solr
Other Audit
Logs
(Network,
SNMP)
Add
context,
Enrich,
Alerts
Long term store,
Query
Interactive Audit Query
AnalyticalApplications
23. Page24 Hadoop Summit, Brussels, April 2015
Future of Security – Data Protection
Encryption as first class citizen
• Encryption introduced in HDFS and Hbase
Future Roadmap
- Build native encryption support in HDFS, Hive and Hbase
- Ranger based key management to support encryption
- Authorization policies for KMS in Ranger
- Column level masking supported in Hive, Phoenix