SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera	

Wednesday, 7th November 2013
Defining  Security  Func/ons

Perimeter	
  
!
!
!

!2

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage
Enabling  Enterprise  Security

Perimeter	
  
!
!
!

Data	
  

Access	
  

Visibility	
  

Guarding	
  access	
  to	
  the	
  
cluster	
  itself	
  

Protec3ng	
  data	
  in	
  the	
  
cluster	
  from	
  unauthorized	
  
visibility	
  

Defining	
  what	
  users	
  and	
  
applica3ons	
  can	
  do	
  with	
  
data	
  

Repor3ng	
  on	
  where	
  data	
  
came	
  from	
  and	
  how	
  it’s	
  
being	
  used	
  

Technical	
  Concepts:	
  
Authen3ca3on	
  
Network	
  isola3on

	
  Kerberos	
  |	
  Oozie	
  |	
  Knox

!
!

Technical	
  Concepts:	
  
Encryp3on	
  
Data	
  masking

Cer3fied	
  Partners

!
!

Technical	
  Concepts:	
  
Permissions	
  
Authoriza3on

Sentry

Available	
  7/23

!3

!
!

Technical	
  Concepts:	
  
Audi3ng	
  
Lineage

Cloudera	
  Navigator
Hive  Overview
SQL	
  Access	
  to	
  Hadoop	
  
§
§

MapReduce:	
  great	
  massively	
  scalable	
  batch	
  processing	
  framework;	
  
required	
  development	
  for	
  each	
  new	
  job	
  
Hive	
  opened	
  up	
  Hadoop	
  for	
  more	
  users	
  with	
  standard	
  SQL	
  
!

Key	
  Challenges	
  
§
§

Batch	
  MapReduce	
  too	
  slow	
  for	
  interac3ve	
  BI/analy3cs	
  
No	
  concurrency,	
  no	
  security	
  
!

OpEons	
  Today	
  
§
§

!4

Impala	
  designed	
  for	
  low-­‐latency	
  queries	
  
HiveServer2	
  delivers	
  concurrency,	
  authen3ca3on	
  
Our  OpenSource  ac/vity
CDH	
  4.1	
  (HiveServer2)	
  
§
§

Concurrency	
  and	
  Kerberos	
  authen3ca3on	
  for	
  Hive	
  
JDBC	
  and	
  Beeline	
  clients	
  

CDH	
  4.2	
  
§
§
§

HDFS	
  impersona3on	
  authoriza3on	
  as	
  stop-­‐gap	
  
Pluggable	
  authen3ca3on	
  API	
  
JDBC	
  LDAP	
  username/password	
  

ODBC	
  
§
§

!5

Supports	
  Kerberos	
  authen3ca3on	
  and	
  LDAP	
  
Extended	
  partner	
  cer3fica3on
Current  State  of  Authoriza/on
Two	
  Sub-­‐OpEmal	
  Choices	
  for	
  SQL	
  on	
  Hadoop
Insecure	
  Advisory	
  Authoriza3on	
  
Users	
  can	
  grant	
  themselves	
  permissions	
  
Intended	
  to	
  prevent	
  accidental	
  dele3on	
  of	
  data	
  
Problem:	
  Doesn’t	
  guard	
  against	
  malicious	
  users	
  

HDFS	
  Impersona3on	
  
Data	
  is	
  protected	
  at	
  the	
  file	
  level	
  by	
  HDFS	
  permissions	
  
Problem:	
  File-­‐level	
  not	
  granular	
  enough	
  
Problem:	
  Not	
  role-­‐based

!6
Authoriza/on  Requirements
Secure	
  Authoriza3on	
  
Ability	
  to	
  control	
  access	
  to	
  data	
  and/or	
  privileges	
  on	
  data	
  for	
  
authen3cated	
  users	
  

Fine-­‐Grained	
  Authoriza3on	
  
Ability	
  to	
  give	
  users	
  access	
  to	
  a	
  subset	
  of	
  data	
  (e.g.	
  column)	
  in	
  a	
  
database	
  

Role-­‐Based	
  Authoriza3on	
  
Ability	
  to	
  create/apply	
  templa3zed	
  privileges	
  based	
  on	
  
func3onal	
  roles	
  

Mul3-­‐Tenant	
  Administra3on	
  
Ability	
  for	
  central	
  admin	
  group	
  to	
  empower	
  lower-­‐level	
  admins	
  
to	
  manage	
  security	
  for	
  each	
  database/schema

!7
The  Next  Step:  Introducing  Sentry
AuthorizaEon	
  module	
  for	
  Hive	
  &	
  Impala
Unlocks	
  Key	
  RBAC	
  Requirements	
  
Secure,	
  fine-­‐grained,	
  role-­‐based	
  authoriza3on	
  
Mul3-­‐tenant	
  administra3on	
  

Open	
  Source	
  
Intent	
  to	
  donate	
  to	
  ASF	
  

Available	
  and	
  Fully	
  Supported	
  
Hiveserver2	
  &	
  Impala	
  1.1	
  ini3ally

!8
Key  Benefits  of  Sentry
Store	
  Sensi3ve	
  Data	
  in	
  Hadoop	
  
Extend	
  Hadoop	
  to	
  More	
  Users	
  
Enable	
  New	
  Use	
  Cases	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  
Comply	
  with	
  Regula3ons

!9
Key  Capabili/es  of  Sentry
Fine-­‐Grained	
  Authoriza3on	
  
Specify	
  security	
  for	
  SERVERS,	
  DATABASES,	
  TABLES	
  &	
  VIEWS	
  

Role-­‐Based	
  Authoriza3on	
  
SELECT	
  privilege	
  on	
  views	
  &	
  tables	
  	
  
INSERT	
  privilege	
  on	
  tables	
  
TRANSFORM	
  privilege	
  on	
  servers	
  
ALL	
  privilege	
  on	
  the	
  server,	
  databases,	
  tables	
  &	
  views	
  
ALL	
  privilege	
  is	
  needed	
  to	
  create/modify	
  schema	
  

Mul3-­‐Tenant	
  Administra3on	
  
Separate	
  policies	
  for	
  each	
  database/schema	
  
Can	
  be	
  maintained	
  by	
  separate	
  admins

!10
Apache  Ecosystem  and  Sentry
Shared	
  Hive	
  Metastore	
  (with	
  
HCatalog)	
  
Extensibility	
  plug-­‐in	
  for	
  
HiveServer2	
  
Inline	
  support	
  in	
  Impala	
  1.1	
  
Poten3al	
  extension	
  to	
  Pig,	
  
MapReduce,	
  REST

Hive  Metastore

HCatalog  

M
!11

Sentry
Possible	
  future	
  
development

RE
Sentry  Architecture
Impala

Binding	
  
Layer

HiveServer2

Impala

Hive

Authoriza<on	
  
Provider

Future

Policy	
  Engine
Policy	
  Provider
File

Local	
  FS/HDFS

!12

Database

Interface
Evalua3on,	
  Valida3on
Parsing
Interface
Query  Execu/on  Flow
SQL

Parse

Validate	
  SQL	
  grammar

Build

Construct	
  statement	
  tree

Check

Validate	
  statement	
  objects	
  
• First	
  check:	
  Authoriza3on
Forward	
  to	
  execu3on	
  planner

Plan
MR
!13

Sentry

Query
Example  Security  Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1, 
# Defines the location of the per DB policy file for
server=server1->uri=hdfs://ha-nn-uri/
the
landing/jranalyst1
# ‘customers’ DB (schema)
customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the
global policy
# file even though ‘customers’ has its only policy
[groups]
file.
# Assigns Hadoop groups to their respective set of
# Note that the privileges from both the global
roles
policy file and
manager = analyst_role, junior_analyst_role
# the per-db policy file are merged. There is no
analyst = analyst_role
overriding.
jranalyst = junior_analyst_role
customers_admin_role = server=server1->db=customers
customers_admin = customers_admin_role
admin = admin_role
# Role controls everything on server1.
admin_role = server=server1
[roles]
# Roles that can import or export data to the the URIs
defined,
# i.e. a landing zone. Since the server runs as the
user "hive,"
# files in this directory must either have the “hive”
group set
# with read/write or be set world read/write.
analyst_role = server=server1->db=analyst1, 
server=server1->db=jranalyst1->table=*>action=select 
server=server1->uri=hdfs://ha-nn-uri/landing/
analyst1
(Continued on next column)

!

!

!

!

!

# Role controls everything for the ‘customers’ DB on
server1.

!14

!
Live  Demo  &  Give  Aways
Closes	
  gap	
  between	
  HDFS	
  and	
  Metastore	
  
Easy	
  to	
  implement	
  
RFC	
  2307	
  compilant	
  (Kerberos)	
  
Enable	
  Mul3-­‐User	
  Applica3ons	
  in	
  one	
  Hive	
  WH	
  
Enables	
  Mul3	
  Tendency	
  per	
  Row	
  and	
  Column	
  

!15
About
dev@sentry.incubator.apache.org	

alexander@cloudera.com	

@mapredit	

mapredit.blogspot.com	

!

Web: http://wiki.apache.org/incubator/SentryProposal

16
Sentry - An Introduction

Weitere ähnliche Inhalte

Was ist angesagt?

Integration Patterns for Microservices Architectures
Integration Patterns for Microservices ArchitecturesIntegration Patterns for Microservices Architectures
Integration Patterns for Microservices ArchitecturesNATS
 
DevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityDevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityAlert Logic
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analyticsXiang Fu
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container PlatformDLT Solutions
 
Modeling microservices using DDD
Modeling microservices using DDDModeling microservices using DDD
Modeling microservices using DDDMasashi Narumoto
 
Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootKashif Ali Siddiqui
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Michel Schudel
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Vanrish Mulesoft Integration architect ppt
Vanrish Mulesoft Integration architect pptVanrish Mulesoft Integration architect ppt
Vanrish Mulesoft Integration architect pptRajnish Kumar
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLOlivier DASINI
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native JourneyVMware Tanzu
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to MicroservicesMahmoudZidan41
 

Was ist angesagt? (20)

Integration Patterns for Microservices Architectures
Integration Patterns for Microservices ArchitecturesIntegration Patterns for Microservices Architectures
Integration Patterns for Microservices Architectures
 
DevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityDevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to Security
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Openshift Container Platform
Openshift Container PlatformOpenshift Container Platform
Openshift Container Platform
 
Modeling microservices using DDD
Modeling microservices using DDDModeling microservices using DDD
Modeling microservices using DDD
 
DevOps & SRE at Google Scale
DevOps & SRE at Google ScaleDevOps & SRE at Google Scale
DevOps & SRE at Google Scale
 
Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring Boot
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition! Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
Battle Of The Microservice Frameworks: Micronaut versus Quarkus edition!
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
App Dynamics
App DynamicsApp Dynamics
App Dynamics
 
Kong API
Kong APIKong API
Kong API
 
Vanrish Mulesoft Integration architect ppt
Vanrish Mulesoft Integration architect pptVanrish Mulesoft Integration architect ppt
Vanrish Mulesoft Integration architect ppt
 
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQLMySQL InnoDB Cluster - A complete High Availability solution for MySQL
MySQL InnoDB Cluster - A complete High Availability solution for MySQL
 
DevOps @ OpenShift Online
DevOps @ OpenShift OnlineDevOps @ OpenShift Online
DevOps @ OpenShift Online
 
The Cloud Native Journey
The Cloud Native JourneyThe Cloud Native Journey
The Cloud Native Journey
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 

Ähnlich wie Sentry - An Introduction

Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCloudera, Inc.
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentryBrock Noland
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
OWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASROWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASRLaravel Poland MeetUp
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2Bill Liu
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security Sandeep Patil
 
Securing Open Source Databases
Securing Open Source DatabasesSecuring Open Source Databases
Securing Open Source DatabasesGazzang
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark ApplicationsCloudera, Inc.
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageIGEEKS TECHNOLOGIES
 
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesEncryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesTrend Micro
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptxVMahesh5
 
2016 share the three headed beast v4
2016 share the three headed beast v42016 share the three headed beast v4
2016 share the three headed beast v4bigendiansmalls
 
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureLow Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureMongoDB
 

Ähnlich wie Sentry - An Introduction (20)

Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache HadoopCombat Cyber Threats with Cloudera Impala & Apache Hadoop
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
 
Hive contributors meetup apache sentry
Hive contributors meetup   apache sentryHive contributors meetup   apache sentry
Hive contributors meetup apache sentry
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
OWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASROWASP zabezpieczenia aplikacji - Top 10 ASR
OWASP zabezpieczenia aplikacji - Top 10 ASR
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Ppt linux
Ppt linuxPpt linux
Ppt linux
 
IBM Spectrum Scale Security
IBM Spectrum Scale Security IBM Spectrum Scale Security
IBM Spectrum Scale Security
 
Securing Open Source Databases
Securing Open Source DatabasesSecuring Open Source Databases
Securing Open Source Databases
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Sqrrl and Accumulo
Sqrrl and AccumuloSqrrl and Accumulo
Sqrrl and Accumulo
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Gradution Project
Gradution ProjectGradution Project
Gradution Project
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
 
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security TechniquesEncryption in the Public Cloud: 16 Bits of Advice for Security Techniques
Encryption in the Public Cloud: 16 Bits of Advice for Security Techniques
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
2016 share the three headed beast v4
2016 share the three headed beast v42016 share the three headed beast v4
2016 share the three headed beast v4
 
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More SecureLow Hanging Fruit, Making Your Basic MongoDB Installation More Secure
Low Hanging Fruit, Making Your Basic MongoDB Installation More Secure
 

Mehr von Alexander Alten

Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoTAlexander Alten
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern EnterpriseAlexander Alten
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduceAlexander Alten
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Alexander Alten
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)Alexander Alten
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)Alexander Alten
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFSAlexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache HadoopAlexander Alten
 

Mehr von Alexander Alten (13)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Apache Flume (NG)
Apache Flume (NG)Apache Flume (NG)
Apache Flume (NG)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Sentry - An Introduction

  • 1. Sentry: Open Source Authorization for Hive & Impala Alexander Alten-Lorenz | Senior Field Engineer, Cloudera Wednesday, 7th November 2013
  • 2. Defining  Security  Func/ons Perimeter   ! ! ! !2 Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on ! ! Technical  Concepts:   Encryp3on   Data  masking ! ! Technical  Concepts:   Permissions   Authoriza3on ! ! Technical  Concepts:   Audi3ng   Lineage
  • 3. Enabling  Enterprise  Security Perimeter   ! ! ! Data   Access   Visibility   Guarding  access  to  the   cluster  itself   Protec3ng  data  in  the   cluster  from  unauthorized   visibility   Defining  what  users  and   applica3ons  can  do  with   data   Repor3ng  on  where  data   came  from  and  how  it’s   being  used   Technical  Concepts:   Authen3ca3on   Network  isola3on  Kerberos  |  Oozie  |  Knox ! ! Technical  Concepts:   Encryp3on   Data  masking Cer3fied  Partners ! ! Technical  Concepts:   Permissions   Authoriza3on Sentry Available  7/23 !3 ! ! Technical  Concepts:   Audi3ng   Lineage Cloudera  Navigator
  • 4. Hive  Overview SQL  Access  to  Hadoop   § § MapReduce:  great  massively  scalable  batch  processing  framework;   required  development  for  each  new  job   Hive  opened  up  Hadoop  for  more  users  with  standard  SQL   ! Key  Challenges   § § Batch  MapReduce  too  slow  for  interac3ve  BI/analy3cs   No  concurrency,  no  security   ! OpEons  Today   § § !4 Impala  designed  for  low-­‐latency  queries   HiveServer2  delivers  concurrency,  authen3ca3on  
  • 5. Our  OpenSource  ac/vity CDH  4.1  (HiveServer2)   § § Concurrency  and  Kerberos  authen3ca3on  for  Hive   JDBC  and  Beeline  clients   CDH  4.2   § § § HDFS  impersona3on  authoriza3on  as  stop-­‐gap   Pluggable  authen3ca3on  API   JDBC  LDAP  username/password   ODBC   § § !5 Supports  Kerberos  authen3ca3on  and  LDAP   Extended  partner  cer3fica3on
  • 6. Current  State  of  Authoriza/on Two  Sub-­‐OpEmal  Choices  for  SQL  on  Hadoop Insecure  Advisory  Authoriza3on   Users  can  grant  themselves  permissions   Intended  to  prevent  accidental  dele3on  of  data   Problem:  Doesn’t  guard  against  malicious  users   HDFS  Impersona3on   Data  is  protected  at  the  file  level  by  HDFS  permissions   Problem:  File-­‐level  not  granular  enough   Problem:  Not  role-­‐based !6
  • 7. Authoriza/on  Requirements Secure  Authoriza3on   Ability  to  control  access  to  data  and/or  privileges  on  data  for   authen3cated  users   Fine-­‐Grained  Authoriza3on   Ability  to  give  users  access  to  a  subset  of  data  (e.g.  column)  in  a   database   Role-­‐Based  Authoriza3on   Ability  to  create/apply  templa3zed  privileges  based  on   func3onal  roles   Mul3-­‐Tenant  Administra3on   Ability  for  central  admin  group  to  empower  lower-­‐level  admins   to  manage  security  for  each  database/schema !7
  • 8. The  Next  Step:  Introducing  Sentry AuthorizaEon  module  for  Hive  &  Impala Unlocks  Key  RBAC  Requirements   Secure,  fine-­‐grained,  role-­‐based  authoriza3on   Mul3-­‐tenant  administra3on   Open  Source   Intent  to  donate  to  ASF   Available  and  Fully  Supported   Hiveserver2  &  Impala  1.1  ini3ally !8
  • 9. Key  Benefits  of  Sentry Store  Sensi3ve  Data  in  Hadoop   Extend  Hadoop  to  More  Users   Enable  New  Use  Cases   Enable  Mul3-­‐User  Applica3ons   Comply  with  Regula3ons !9
  • 10. Key  Capabili/es  of  Sentry Fine-­‐Grained  Authoriza3on   Specify  security  for  SERVERS,  DATABASES,  TABLES  &  VIEWS   Role-­‐Based  Authoriza3on   SELECT  privilege  on  views  &  tables     INSERT  privilege  on  tables   TRANSFORM  privilege  on  servers   ALL  privilege  on  the  server,  databases,  tables  &  views   ALL  privilege  is  needed  to  create/modify  schema   Mul3-­‐Tenant  Administra3on   Separate  policies  for  each  database/schema   Can  be  maintained  by  separate  admins !10
  • 11. Apache  Ecosystem  and  Sentry Shared  Hive  Metastore  (with   HCatalog)   Extensibility  plug-­‐in  for   HiveServer2   Inline  support  in  Impala  1.1   Poten3al  extension  to  Pig,   MapReduce,  REST Hive  Metastore HCatalog   M !11 Sentry Possible  future   development RE
  • 12. Sentry  Architecture Impala Binding   Layer HiveServer2 Impala Hive Authoriza<on   Provider Future Policy  Engine Policy  Provider File Local  FS/HDFS !12 Database Interface Evalua3on,  Valida3on Parsing Interface
  • 13. Query  Execu/on  Flow SQL Parse Validate  SQL  grammar Build Construct  statement  tree Check Validate  statement  objects   • First  check:  Authoriza3on Forward  to  execu3on  planner Plan MR !13 Sentry Query
  • 14. Example  Security  Policy [databases] junior_analyst_role = server=server1->db=jranalyst1, # Defines the location of the per DB policy file for server=server1->uri=hdfs://ha-nn-uri/ the landing/jranalyst1 # ‘customers’ DB (schema) customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the global policy # file even though ‘customers’ has its only policy [groups] file. # Assigns Hadoop groups to their respective set of # Note that the privileges from both the global roles policy file and manager = analyst_role, junior_analyst_role # the per-db policy file are merged. There is no analyst = analyst_role overriding. jranalyst = junior_analyst_role customers_admin_role = server=server1->db=customers customers_admin = customers_admin_role admin = admin_role # Role controls everything on server1. admin_role = server=server1 [roles] # Roles that can import or export data to the the URIs defined, # i.e. a landing zone. Since the server runs as the user "hive," # files in this directory must either have the “hive” group set # with read/write or be set world read/write. analyst_role = server=server1->db=analyst1, server=server1->db=jranalyst1->table=*>action=select server=server1->uri=hdfs://ha-nn-uri/landing/ analyst1 (Continued on next column) ! ! ! ! ! # Role controls everything for the ‘customers’ DB on server1. !14 !
  • 15. Live  Demo  &  Give  Aways Closes  gap  between  HDFS  and  Metastore   Easy  to  implement   RFC  2307  compilant  (Kerberos)   Enable  Mul3-­‐User  Applica3ons  in  one  Hive  WH   Enables  Mul3  Tendency  per  Row  and  Column   !15