Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Sentry - An Introduction
1. Sentry: Open Source Authorization for
Hive & Impala
Alexander Alten-Lorenz | Senior Field Engineer, Cloudera
Wednesday, 7th November 2013
2. Defining Security Func/ons
Perimeter
!
!
!
!2
Data
Access
Visibility
Guarding
access
to
the
cluster
itself
Protec3ng
data
in
the
cluster
from
unauthorized
visibility
Defining
what
users
and
applica3ons
can
do
with
data
Repor3ng
on
where
data
came
from
and
how
it’s
being
used
Technical
Concepts:
Authen3ca3on
Network
isola3on
!
!
Technical
Concepts:
Encryp3on
Data
masking
!
!
Technical
Concepts:
Permissions
Authoriza3on
!
!
Technical
Concepts:
Audi3ng
Lineage
3. Enabling Enterprise Security
Perimeter
!
!
!
Data
Access
Visibility
Guarding
access
to
the
cluster
itself
Protec3ng
data
in
the
cluster
from
unauthorized
visibility
Defining
what
users
and
applica3ons
can
do
with
data
Repor3ng
on
where
data
came
from
and
how
it’s
being
used
Technical
Concepts:
Authen3ca3on
Network
isola3on
Kerberos
|
Oozie
|
Knox
!
!
Technical
Concepts:
Encryp3on
Data
masking
Cer3fied
Partners
!
!
Technical
Concepts:
Permissions
Authoriza3on
Sentry
Available
7/23
!3
!
!
Technical
Concepts:
Audi3ng
Lineage
Cloudera
Navigator
4. Hive Overview
SQL
Access
to
Hadoop
§
§
MapReduce:
great
massively
scalable
batch
processing
framework;
required
development
for
each
new
job
Hive
opened
up
Hadoop
for
more
users
with
standard
SQL
!
Key
Challenges
§
§
Batch
MapReduce
too
slow
for
interac3ve
BI/analy3cs
No
concurrency,
no
security
!
OpEons
Today
§
§
!4
Impala
designed
for
low-‐latency
queries
HiveServer2
delivers
concurrency,
authen3ca3on
5. Our OpenSource ac/vity
CDH
4.1
(HiveServer2)
§
§
Concurrency
and
Kerberos
authen3ca3on
for
Hive
JDBC
and
Beeline
clients
CDH
4.2
§
§
§
HDFS
impersona3on
authoriza3on
as
stop-‐gap
Pluggable
authen3ca3on
API
JDBC
LDAP
username/password
ODBC
§
§
!5
Supports
Kerberos
authen3ca3on
and
LDAP
Extended
partner
cer3fica3on
6. Current State of Authoriza/on
Two
Sub-‐OpEmal
Choices
for
SQL
on
Hadoop
Insecure
Advisory
Authoriza3on
Users
can
grant
themselves
permissions
Intended
to
prevent
accidental
dele3on
of
data
Problem:
Doesn’t
guard
against
malicious
users
HDFS
Impersona3on
Data
is
protected
at
the
file
level
by
HDFS
permissions
Problem:
File-‐level
not
granular
enough
Problem:
Not
role-‐based
!6
7. Authoriza/on Requirements
Secure
Authoriza3on
Ability
to
control
access
to
data
and/or
privileges
on
data
for
authen3cated
users
Fine-‐Grained
Authoriza3on
Ability
to
give
users
access
to
a
subset
of
data
(e.g.
column)
in
a
database
Role-‐Based
Authoriza3on
Ability
to
create/apply
templa3zed
privileges
based
on
func3onal
roles
Mul3-‐Tenant
Administra3on
Ability
for
central
admin
group
to
empower
lower-‐level
admins
to
manage
security
for
each
database/schema
!7
8. The Next Step: Introducing Sentry
AuthorizaEon
module
for
Hive
&
Impala
Unlocks
Key
RBAC
Requirements
Secure,
fine-‐grained,
role-‐based
authoriza3on
Mul3-‐tenant
administra3on
Open
Source
Intent
to
donate
to
ASF
Available
and
Fully
Supported
Hiveserver2
&
Impala
1.1
ini3ally
!8
9. Key Benefits of Sentry
Store
Sensi3ve
Data
in
Hadoop
Extend
Hadoop
to
More
Users
Enable
New
Use
Cases
Enable
Mul3-‐User
Applica3ons
Comply
with
Regula3ons
!9
10. Key Capabili/es of Sentry
Fine-‐Grained
Authoriza3on
Specify
security
for
SERVERS,
DATABASES,
TABLES
&
VIEWS
Role-‐Based
Authoriza3on
SELECT
privilege
on
views
&
tables
INSERT
privilege
on
tables
TRANSFORM
privilege
on
servers
ALL
privilege
on
the
server,
databases,
tables
&
views
ALL
privilege
is
needed
to
create/modify
schema
Mul3-‐Tenant
Administra3on
Separate
policies
for
each
database/schema
Can
be
maintained
by
separate
admins
!10
11. Apache Ecosystem and Sentry
Shared
Hive
Metastore
(with
HCatalog)
Extensibility
plug-‐in
for
HiveServer2
Inline
support
in
Impala
1.1
Poten3al
extension
to
Pig,
MapReduce,
REST
Hive Metastore
HCatalog
M
!11
Sentry
Possible
future
development
RE
13. Query Execu/on Flow
SQL
Parse
Validate
SQL
grammar
Build
Construct
statement
tree
Check
Validate
statement
objects
• First
check:
Authoriza3on
Forward
to
execu3on
planner
Plan
MR
!13
Sentry
Query
14. Example Security Policy
[databases]
junior_analyst_role = server=server1->db=jranalyst1,
# Defines the location of the per DB policy file for
server=server1->uri=hdfs://ha-nn-uri/
the
landing/jranalyst1
# ‘customers’ DB (schema)
customers = hdfs://ha-nn-uri/etc/access/customers.ini # Privileges for ‘customers’ can be defined in the
global policy
# file even though ‘customers’ has its only policy
[groups]
file.
# Assigns Hadoop groups to their respective set of
# Note that the privileges from both the global
roles
policy file and
manager = analyst_role, junior_analyst_role
# the per-db policy file are merged. There is no
analyst = analyst_role
overriding.
jranalyst = junior_analyst_role
customers_admin_role = server=server1->db=customers
customers_admin = customers_admin_role
admin = admin_role
# Role controls everything on server1.
admin_role = server=server1
[roles]
# Roles that can import or export data to the the URIs
defined,
# i.e. a landing zone. Since the server runs as the
user "hive,"
# files in this directory must either have the “hive”
group set
# with read/write or be set world read/write.
analyst_role = server=server1->db=analyst1,
server=server1->db=jranalyst1->table=*>action=select
server=server1->uri=hdfs://ha-nn-uri/landing/
analyst1
(Continued on next column)
!
!
!
!
!
# Role controls everything for the ‘customers’ DB on
server1.
!14
!
15. Live Demo & Give Aways
Closes
gap
between
HDFS
and
Metastore
Easy
to
implement
RFC
2307
compilant
(Kerberos)
Enable
Mul3-‐User
Applica3ons
in
one
Hive
WH
Enables
Mul3
Tendency
per
Row
and
Column
!15