SlideShare ist ein Scribd-Unternehmen logo
1 von 39
6 ways to exploit Hive
– and what to do about it
Brock Noland |Software Engineer, Cloudera
January 23, 2013

1
Outline
Introduction
• Hadoop security primer
•

•
•

•

Security options
•
•
•

•

2

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Introduction
Tonight's focus is SQL-on-Hadoop
• Vast majority of Hadoop users use Hive or Cloudera
Impala
• Data warehouse offload is the most common use
case
• Data warehouse offload is a two step process
1.
2.

3

Automatic transformations moved to Hadoop
Data analysts given query access
Data warehouse use case

Online
Database

4

Hadoop

Data Warehouse
Outline
Introduction
• Hadoop Security Primer
•

•
•

•

Security options
•
•
•

•

5

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Authentication
Authentication is who you are
• Hadoop models
•

•
•

6

Default - “trusted network”
Strong - Kerberos
Default Authentication – trusted network
Default security mechanism
• Hadoop client uses local username
• Used in
•

•
•
•
•

7

POCs
Startups
Demos
Pre-prod environments
Default Authentication – trusted network

Client Host

User: brock
File: a.txt
Contents: some data

$ whoami
brock
$ cat a.txt
some data
$ hadoop fs -put file .

8

Hadoop
Strong Authentication – Kerberos
•

Hadoop is secured with Kerberos
•
•

•

Every user and service has a Kerberos “principal”
•
•

•

Service: impala/hostname@MYCOMPANY.COM
User: brock@MYCOMPANY.COM

Credentials
•
•

9

Provides mutual authentication
Protects against eavesdropping and replay attacks

Service: keytabs
User: password
Strong Authentication – Kerberos

Client Host

User: brock
<kerberos ticket>
<encrypted data> *

$ whoami
brock
$ kinit
Password: *******
$ cat a.txt
some data
$ hadoop fs -put file .
10

Hadoop

* RPC Encryption must be enabled
Strong Authentication – Kerberos
•

Keytab
•
•

11

Encrypted key for servers (similar to a “password”)
Generated by server such as MIT Kerberos or Active
Directory
Hive Server 2 and Oozie
Beeline
(Hive CLI)

Tableau

JDBC

Hive Server 2 (HS2)

Oozie

Hadoop
12

Oozie CLI

Control-M
Strong Authentication – Kerberos
•

Impersonation
•
•
•

13

Services such as Hive Server2 impersonate users
Data loaded by “joe” via HS2 is owned by “joe”
Oozie jobs submitted by “brock” are run as “brock”
Authorization
•

HDFS permissions
•
•
•

•

Other Hadoop components have authorization
•
•

14

Unix style
Read/Write/Execute for Owner/Group/Other
Coarse grained
MapReduce who can use which job queues
HBase table ACL’s
HDFS Permisssions
$ hadoop fs -ls file
-rw-r----1 analyst1 analysts

•

Permissions
•
•
•

•

Owner
•

•

Unix style permissions
Read/Write/Execute
Owner/Group/Other

One and only one owner

Group
•

One and only one group

2244 2014-01-19 12:15 file
Back to our use case
•

Scenario facts
•
•
•

•

Next step
•
•

16

ETL offload is a success
Data warehouse is expensive and at capacity
Same data is in Hadoop
End users start using Hadoop to augment the DW
Security becomes primary concern
End users need to share data
Unlike automated ETL jobs, end users want to share
data with peers
• Must manage HDFS permissions manually
• Each file has a single group
• End result is users set permissions to world
readable/writeable
•

17
Outline
Introduction
• Hadoop Security Primer
•

•
•

•

Security options
•
•
•

•

18

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Hive: Security holes
CREATE TEMPORARY FUNCTION
custom_udf AS ’com.mycompany.
MaliciousClass’;
SELECT TRANSFORM(stuff)
USING 'malicious-script.pl'
AS thing1, thing;
CREATE EXTERNAL TABLE
external_table(column1 string)
LOCATION ‘/path/to/any/table’;
19
Hive: Security holes
CREATE TABLE test (c1 string)
ROW FORMAT SERDE
'com.mycompany.MaliciousClass';
FROM (
FROM t1
MAP t1.c1
USING 'malicious-script1.pl'
CLUSTER BY key) map_output
INSERT OVERWRITE TABLE t2
REDUCE t2.c1
USING 'malicious-script2.pl'
AS c2;
20
Default: Authorization
•

Hive ships with an “advisory” authorization system
•
•
•

21

All users see all databases/tables/columns
Does not fix any security holes
Users grant themselves permissions
Outline
Introduction
• Hadoop Security Primer
•

•
•

•

Security options
•
•
•

•

22

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Kerberos with impersonation: Sharing data
The user “manager1” wants to share the table “manager1_table”
with senior analysts but not junior analysts.
# hadoop fs -ls -R /user/hive/warehouse
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1

23

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos with impersonation: Sharing data
IT must create a group
# groupadd senioranalysts

Then add the appropriate members to group
# usermod -G analyst,senioranalysts analyst1
# usermod -G management,analyst,senioranalysts manager1

24
Kerberos with impersonation: Sharing data
Then “manager1” can manually change the file permissions
$ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
senioranalysts

25

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos with impersonation: Sharing data
Now any senior-level analyst can query the data
$ whoami
analyst1
$ beeline ...
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default>
select count(*) from manager1_table;
+------------+
| count(*)
|
+------------+
| 47
|
+------------+

26

⏎
Kerberos with impersonation: Sharing data
Junior analysts cannot query the data:
$ whoami
jranalyst1
$ beeline ....
Connected to: Hive (version 0.10.0)
0: jdbc:hive2://localhost:10000/default> ⏎
select * from manager1_table;
Error: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/mana
ger1_table":manager1:senioranalysts:drwxr-x--T

27
Kerberos with impersonation: Sharing data

What happens in the real world?

28
Kerberos with impersonation: Sharing data
Table “manager1_table” is owned by user/group “manager1”
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxr-x--T
- manager1
manager1

29

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos with impersonation: Sharing data
User “manager1” makes “manager1_table” world readable/writable
$ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table
$ hadoop fs -ls /user/hive/warehouse/
Found 3 items
drwxr-x--T
- analyst1
analyst1
drwxr-x--T
- jranalyst1 jranalyst1
drwxrwxrwt
- manager1
manager1

30

0
0
0

analyst1_table
jranalyst1_table
manager1_table
Kerberos with impersonation: Summary
•

Securing Hive with Kerberos makes Hive unusable for
DW offload
•
•
•
•

31

Manual file permission management
End state is world writable/readable
No ability to restrict access to columns or rows
All users see all databases/tables/columns
Outline
Introduction
• Hadoop Security Primer
•

•
•

•

Security options
•
•
•

•

32

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Fine Grained Security: Apache Sentry
Authorization module for Hive, Search, & Impala
Unlocks Key RBAC Requirements
Secure, fine-grained, role-based authorization
Multi-tenant administration

Open Source
Apache Incubator project

Ecosystem Support
Apache SOLR, HiveServer2, & Impala 1.1+

33
Key Benefits of Sentry
Store Sensitive Data in Hadoop
Extend Hadoop to More Users

Comply with Regulations

34
Key Capabilities of Sentry
Fine-Grained Authorization
Specify security for SERVERS, DATABASES, TABLES & VIEWS

Role-Based Authorization
SELECT privilege on views & tables
INSERT privilege on tables
ALL privilege on the server, databases, tables & views
ALL privilege is needed to create/modify schema

Multi-Tenant Administration
Separate policies for each database/schema
Can be maintained by separate admins

35
Sentry Architecture
Impala

Binding
Layer

Impala

HiveServer2

Hive

Authorization
Provider

SOLR

Search

Pig

Policy Engine
Policy Provider
File

Local FS/HDFS

36

Database

…
Query Execution Flow
SQL

Parse

Validate SQL grammar

Build

Construct statement tree

Check

Sentry

Forward to execution planner

Plan
MR
37

Validate statement objects
• First check: Authorization

Query
Outline
Introduction
• Hadoop Security Primer
•

•
•

•

Security options
•
•
•

•

38

Authentication
Authorization
Default
Kerberos with Impersonation
Kerberos with Sentry

Demo
Click to edit Master title style

39

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

Was ist angesagt? (20)

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
Comprehensive Hadoop Security for the Enterprise | Part I | Compliance Ready ...
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 

Andere mochten auch

Andere mochten auch (18)

Introduction to Apache HBase Training
Introduction to Apache HBase TrainingIntroduction to Apache HBase Training
Introduction to Apache HBase Training
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarIntroduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training Webinar
 
Introduction to sentry
Introduction to sentryIntroduction to sentry
Introduction to sentry
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Cloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-HadoopCloudera Showcase: SQL-on-Hadoop
Cloudera Showcase: SQL-on-Hadoop
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...Secure Search - Using Apache Sentry to Add Authentication and Authorization S...
Secure Search - Using Apache Sentry to Add Authentication and Authorization S...
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 

Ähnlich wie Deploying Enterprise-grade Security for Hadoop

Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE Implementation
FIWARE
 

Ähnlich wie Deploying Enterprise-grade Security for Hadoop (20)

TriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache SentryTriHUG 2/14: Apache Sentry
TriHUG 2/14: Apache Sentry
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
[CONFidence 2016] Jakub Kałużny, Mateusz Olejarka - Big problems with big dat...
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platform
 
Containers and security
Containers and securityContainers and security
Containers and security
 
Big data security
Big data securityBig data security
Big data security
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Security Threats to Hadoop: Data Leakage Attacks and Investigation
Security Threats to Hadoop: Data Leakage Attacks  and InvestigationSecurity Threats to Hadoop: Data Leakage Attacks  and Investigation
Security Threats to Hadoop: Data Leakage Attacks and Investigation
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.x
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
PC = Personal Cloud (or how to use your development machine with Vagrant and ...
PC = Personal Cloud (or how to use your development machine with Vagrant and ...PC = Personal Cloud (or how to use your development machine with Vagrant and ...
PC = Personal Cloud (or how to use your development machine with Vagrant and ...
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
Cosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARECosmos, Big Data GE implementation in FIWARE
Cosmos, Big Data GE implementation in FIWARE
 
Cosmos, Big Data GE Implementation
Cosmos, Big Data GE ImplementationCosmos, Big Data GE Implementation
Cosmos, Big Data GE Implementation
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Deploying Enterprise-grade Security for Hadoop

  • 1. 6 ways to exploit Hive – and what to do about it Brock Noland |Software Engineer, Cloudera January 23, 2013 1
  • 2. Outline Introduction • Hadoop security primer • • • • Security options • • • • 2 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 3. Introduction Tonight's focus is SQL-on-Hadoop • Vast majority of Hadoop users use Hive or Cloudera Impala • Data warehouse offload is the most common use case • Data warehouse offload is a two step process 1. 2. 3 Automatic transformations moved to Hadoop Data analysts given query access
  • 4. Data warehouse use case Online Database 4 Hadoop Data Warehouse
  • 5. Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 5 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 6. Authentication Authentication is who you are • Hadoop models • • • 6 Default - “trusted network” Strong - Kerberos
  • 7. Default Authentication – trusted network Default security mechanism • Hadoop client uses local username • Used in • • • • • 7 POCs Startups Demos Pre-prod environments
  • 8. Default Authentication – trusted network Client Host User: brock File: a.txt Contents: some data $ whoami brock $ cat a.txt some data $ hadoop fs -put file . 8 Hadoop
  • 9. Strong Authentication – Kerberos • Hadoop is secured with Kerberos • • • Every user and service has a Kerberos “principal” • • • Service: impala/hostname@MYCOMPANY.COM User: brock@MYCOMPANY.COM Credentials • • 9 Provides mutual authentication Protects against eavesdropping and replay attacks Service: keytabs User: password
  • 10. Strong Authentication – Kerberos Client Host User: brock <kerberos ticket> <encrypted data> * $ whoami brock $ kinit Password: ******* $ cat a.txt some data $ hadoop fs -put file . 10 Hadoop * RPC Encryption must be enabled
  • 11. Strong Authentication – Kerberos • Keytab • • 11 Encrypted key for servers (similar to a “password”) Generated by server such as MIT Kerberos or Active Directory
  • 12. Hive Server 2 and Oozie Beeline (Hive CLI) Tableau JDBC Hive Server 2 (HS2) Oozie Hadoop 12 Oozie CLI Control-M
  • 13. Strong Authentication – Kerberos • Impersonation • • • 13 Services such as Hive Server2 impersonate users Data loaded by “joe” via HS2 is owned by “joe” Oozie jobs submitted by “brock” are run as “brock”
  • 14. Authorization • HDFS permissions • • • • Other Hadoop components have authorization • • 14 Unix style Read/Write/Execute for Owner/Group/Other Coarse grained MapReduce who can use which job queues HBase table ACL’s
  • 15. HDFS Permisssions $ hadoop fs -ls file -rw-r----1 analyst1 analysts • Permissions • • • • Owner • • Unix style permissions Read/Write/Execute Owner/Group/Other One and only one owner Group • One and only one group 2244 2014-01-19 12:15 file
  • 16. Back to our use case • Scenario facts • • • • Next step • • 16 ETL offload is a success Data warehouse is expensive and at capacity Same data is in Hadoop End users start using Hadoop to augment the DW Security becomes primary concern
  • 17. End users need to share data Unlike automated ETL jobs, end users want to share data with peers • Must manage HDFS permissions manually • Each file has a single group • End result is users set permissions to world readable/writeable • 17
  • 18. Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 18 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 19. Hive: Security holes CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’; 19
  • 20. Hive: Security holes CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2; 20
  • 21. Default: Authorization • Hive ships with an “advisory” authorization system • • • 21 All users see all databases/tables/columns Does not fix any security holes Users grant themselves permissions
  • 22. Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 22 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 23. Kerberos with impersonation: Sharing data The user “manager1” wants to share the table “manager1_table” with senior analysts but not junior analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 23 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 24. Kerberos with impersonation: Sharing data IT must create a group # groupadd senioranalysts Then add the appropriate members to group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1 24
  • 25. Kerberos with impersonation: Sharing data Then “manager1” can manually change the file permissions $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 senioranalysts 25 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 26. Kerberos with impersonation: Sharing data Now any senior-level analyst can query the data $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+ 26 ⏎
  • 27. Kerberos with impersonation: Sharing data Junior analysts cannot query the data: $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/mana ger1_table":manager1:senioranalysts:drwxr-x--T 27
  • 28. Kerberos with impersonation: Sharing data What happens in the real world? 28
  • 29. Kerberos with impersonation: Sharing data Table “manager1_table” is owned by user/group “manager1” $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxr-x--T - manager1 manager1 29 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 30. Kerberos with impersonation: Sharing data User “manager1” makes “manager1_table” world readable/writable $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 drwxr-x--T - jranalyst1 jranalyst1 drwxrwxrwt - manager1 manager1 30 0 0 0 analyst1_table jranalyst1_table manager1_table
  • 31. Kerberos with impersonation: Summary • Securing Hive with Kerberos makes Hive unusable for DW offload • • • • 31 Manual file permission management End state is world writable/readable No ability to restrict access to columns or rows All users see all databases/tables/columns
  • 32. Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 32 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 33. Fine Grained Security: Apache Sentry Authorization module for Hive, Search, & Impala Unlocks Key RBAC Requirements Secure, fine-grained, role-based authorization Multi-tenant administration Open Source Apache Incubator project Ecosystem Support Apache SOLR, HiveServer2, & Impala 1.1+ 33
  • 34. Key Benefits of Sentry Store Sensitive Data in Hadoop Extend Hadoop to More Users Comply with Regulations 34
  • 35. Key Capabilities of Sentry Fine-Grained Authorization Specify security for SERVERS, DATABASES, TABLES & VIEWS Role-Based Authorization SELECT privilege on views & tables INSERT privilege on tables ALL privilege on the server, databases, tables & views ALL privilege is needed to create/modify schema Multi-Tenant Administration Separate policies for each database/schema Can be maintained by separate admins 35
  • 37. Query Execution Flow SQL Parse Validate SQL grammar Build Construct statement tree Check Sentry Forward to execution planner Plan MR 37 Validate statement objects • First check: Authorization Query
  • 38. Outline Introduction • Hadoop Security Primer • • • • Security options • • • • 38 Authentication Authorization Default Kerberos with Impersonation Kerberos with Sentry Demo
  • 39. Click to edit Master title style 39

Hinweis der Redaktion

  1. Other aspects areConfidentiallyAudit
  2. Many, many ways to execute arbitrary codeHive was created originally by web companies that simply don’t care about security. In fact we often run into push back from the community when integrating security. In my presentation at the TC HUG I will explain in detail all the ways in which Hive is insecure. The point is by default any user can execute any code they wish.Users grant themselves permissionsUsers can query any data they please by granting themselves permissions.Zero metadata securityNote possible to stop users from modifying or viewing any metadata.
  3. Manual file permission managementWhen users want to share tables and data with other users it requires modifying file permissions. Can anyone guess what happens next?End state is world writable/readableUsers end up making data world writable and readable.No ability to restrict access to columns or rows Users cannot be restricted to a subset of the data and so tables are copied simply to restrict access to data which results in thousands of out of date tables which full read and write permissions.
  4. Role-Based Access Control (RBAC) For finer-grained access to data accessible via schema -- that is, data structures described by the Apache Hive Metastore and utilized by computing engines like Hive and Impala, as well as collections and indices within Cloudera Search -- Cloudera developed Apache Sentry, which offers a highly modular, role-based privilege model for this data and its given schema. (Cloudera donated Apache Sentry to the Apache Foundation in 2013.) Sentry governs access to each schema object in the Metastore via a set of privileges like SELECT and INSERT. The schema objects are common entities in data management, such as SERVER, DATABASE, TABLE, COLUMN, and URI, i.e. file location within HDFS. Cloudera Search has its own set of privileges, e.g. QUERY, and objects, e.g. COLLECTION. As with other RBAC systems that IT teams are already familiar with, Sentry provides for: Hierarchies of objects, with permissions automatically inherited by objects that exist within a larger umbrella object; Rules containing a set of multiple object/permission pairs; Groups that can be granted one or more roles; Users can be assigned to one or more groups. Sentry is normally configured to deny access to services and data by default so that users have limited rights until they are assigned to a group that has explicit access roles. Column-level Security, Row-level Security and Masked Access Using the combination of Sentry-based permissions, SQL views, and User Defined Functions (UDFs), developers can gain a high degree of access control granularity for SQL computing engines through HiveServer2 and Impala, including: Column-level security - To limit access to only particular columns of entire tables, uses can access the data through a view, which contains either a subset of columns in the table, or have certain columns masked. For example, a view can filter a column to only the last four digits of a US Social Security number. Row-level security - To limit access by particular values, views can employ CASE statements to control rows to which a group of users has access. For example, a broker at a financial services firm may only be able to see data within her managed accounts.
  5. Impala metadata queries, i.e. “SHOW TABLES,” query the Hive Metastore directly and then queries Sentry to filter the results before returning.