SlideShare ist ein Scribd-Unternehmen logo
1 von 39
© Hortonworks Inc. 2011
Improvements in Hadoop Security
Sanjay Radia
sanjay@hortonworks.com
@srr
Chris Nauroth
cnauroth@hortonworks.com
@cnauroth
Page 1
© Hortonworks Inc. 2011
Hello
Sanjay Radia
• Founder, Hortonworks
• Part of the Hadoop team at Yahoo! since 2007
– Chief Architect of Hadoop Core at Yahoo!
– Long time Apache Hadoop PMC and Committer
– Designed and developed several key Hadoop features
• Prior
– Data center automation, virtualization, Java, HA, OSs, File Systems (Startup, Sun Microsystems, …)
– Ph.D., University of Waterloo
Chris Nauroth
• Member of Technical Staff, Hortonworks
– Apache Hadoop Committer
– Major contributor to HDFS ACLs
• Hadoop user since 2010
– Prior employment experience deploying, maintaining and using Hadoop clusters
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Overview
• Models of Deployment
– Secure and insecure
• Hadoop Authentication
– The how and why
– Knox – perimeter security
• Authorization – existing and what is new
– HDFS
– Tables and Hive
– HBase and Accumulo
• Data protection and encryption
– Wire
– Data at rest
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Two Reasons for Security in Hadoop
Hadoop Contains Sensitive Data
– As Hadoop adoption grows so too has the types of data organizations look to store. Often the
data is proprietary or personal and it must be protected.
– In this context, Hadoop is governed by the same security requirements as any data center
platform.
Hadoop is subject to Compliance adherence
– Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that
require protection of personal information.
– Adherence to other Corporate security policies.
1
2
© Hortonworks Inc. 2011
Three Models of Hadoop Deployment
• Insecure cluster
– You have protected it via the perimeter
– You trust the code that runs in the system
– Note In Hadoop cluster, user submitted code runs inside the cluster
– (Not true in typical client-server applications)
– The client-side libraries pass the client’s login credential
– There is no end-end-authentication here
– Authorization is done against this credential
• Secure cluster
– Full authentication
– Can run arbitrary code in jobs
• Perimeter security using Knox
– Internal cluster can be secure or insecure depending on your needs
© Hortonworks Inc. 2011
Pillars of Hadoop Security
Authorization
Restrict access to
explicit data
Audit
Understand who did
what
Data Protection
Encrypt data at rest &
motion
AD/Kerberos in native
Apache Hadoop
Perimeter Security
with Apache Knox
Gateway
Authentication
Who am I/prove it?
Control access to cluster.
Every service has
audit logs
Knox and XASecure
provide central audit
logs
© Hortonworks Inc. 2011
Hadoop Authentication Overview
• Kerberos/Active Directory based security
– SSO – users do not have to re-login into Hadoop
– Hadoop accounts do not have to be created
– Caveat – MR Task isolation require Unix accounts for each user current but this is going away with Linux containers
– Hadoop tokens – supplement the Kerberos authentication
– Delegation tokens – deal with the delayed job execution
– Block tokens – capabilities to deal with the distributed nature of HDFS
– Trusted Proxies – support for third party services to act as proxy
– Oozie
– Gateways – HDFS proxy, Knox, etc.
• Knox – Perimeter Security and Rest Gateway
Page 7
© Hortonworks Inc. 2011
Why the tokens
• Why does Hadoop have its own tokens?
– Standard client-server security model is not sufficient for Hadoop
– Works when logged-in client is directly accessing a Hadoop service
– But for a job, the execution happens much later
– The job submitter has long logged off
• Hence we needed to add delegation tokens
• HDFS is a distributed service and needed to add capability-like tokens for datanode
authentication
The permissions/ ACLs are in the Namenode
© Hortonworks Inc. 2011
Apache Knox
Perimeter Security with Hadoop REST APIs
Architecting the Future of Big Data
Page 10
© Hortonworks Inc. 2011
The Gateway or Edge Node
• Hadoop APIs can be used from any desktop after SSO login
– FileSystem and MapReduce Java APIs
– Pig, Hive and Oozie clients (that wrap the Java APIs)
• However it is typical to Use “Edge Node” or “Gateway Node” that is “inside” cluster
– The libraries for the APIs are generally only installed on the gateway
– Users SSH to Edge Node and execute API commands from shell
Page 11
HadoopUser Edge Node
SSH
© Hortonworks Inc. 2011
• Single Hadoop access point
• REST API hierarchy
• Consolidated API calls
• Multi-cluster support
• Eliminates SSH “edge
node”
• Central API management
• Central audit control
• Simple Service level
Authorization
• SSO Integration –
Siteminder, API Key*,
OAuth* & SAML*
• LDAP & AD integration
Perimeter Security with Apache Knox
Integrated with existing
systems to simplify identity
maintenance
Incubated and led by Hortonworks,
Apache Knox provides a simple and open framework for
Hadoop perimeter security.
Single, simple point of
access for a cluster
Central controls ensure
consistency across one or
more clusters
© Hortonworks Inc. 2011
Hadoop REST APIs
• Useful for connecting to Hadoop from the outside the cluster
• When more client language flexibility is required
– i.e. Java binding not an option
• Challenges (Knox addresses these challenges)
– Client must have knowledge of cluster topology
– Required to open ports (and in some cases, on every host) outside the cluster
Page 13
Service API
WebHDFS Supports HDFS user operations including reading files, writing to files,
making directories, changing permissions and renaming. Learn more about
WebHDFS.
WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL
commands. Learn more about WebHCat.
Hive Hive REST API operations
HBase HBase REST API operations
Oozie Job submission and management, and Oozie administration. Learn more
about Oozie.
© Hortonworks Inc. 2011
What can be done today?
Authorization
Restrict access to
explicit data
Audit
Understand who did
what
Data Protection
Encrypt data at rest &
motion
Previously
• All Services: Service level ACLs
• HDFS: Permissions
• Yarn: Queue ACLs
• Hive/Pig Tables: Table level via
HDFS
• Apache Accumulo: Cell level
• HBase: Namespace, Table,
Column Family and Column level
ACLs
Authentication
Who am I/prove it?
Control access to cluster.
Hadoop 2.x
• HDFS: ACLs
• Hive: Column level ACLs
• HBase: Cell level ACLs
• Knox:
• Rest Service level Authorization
• Access Audit with Knox
© Hortonworks Inc. 2011
HDFS ACLs
• Existing HDFS POSIX permissions good, but not flexible enough
– Permission requirements may differ from the natural organizational hierarchy of users and groups.
• HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX
ACL model.
– An ACL (Access Control List) provides a way to set different permissions for specific named users or named
groups, not only the file’s owner and file’s group.
Page 15
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HDFS File Permissions Example
• Authorization requirements:
–In a sales department, they would like a single user Maya (Department Manager) to
control all modifications to sales data
–Other members of sales department need to view the data, but can’t modify it.
–Everyone else in the company must not be allowed to view the data.
• Can be implemented via the following:
Read/Write perm for user
maya
User
Group
Read perm for group sales
File with sales data
© Hortonworks Inc. 2011
HDFS ACLs
• Problem
–No longer feasible for Maya to control all modifications to the file
– New Requirement: Maya, Diane and Clark are allowed to make modifications
– New Requirement: New group called executives should be able to read the sales data
–Current permissions model only allows permissions at 1 group and 1 user
• Solution: HDFS ACLs
–Now assign different permissions to different users and groups
Owner
Group
Others
HDFS
Directory
… rwx
… rwx
… rwx
Group D … rwx
Group F … rwx
User Y … rwx
© Hortonworks Inc. 2011
HDFS ACLs
New Tools for ACL Management (setfacl, getfacl)
– hdfs dfs -setfacl -m group:execs:r-- /sales-data
– hdfs dfs -getfacl /sales-data
# file: /sales-data
# owner: maya
# group: sales
user::rw-
group::r—
group:execs:r—
mask::r—
other::--
– How do you know if a directory has ACLs set?
– hdfs dfs -ls /sales-data
Found 1 items
-rw-r-----+ 3 maya sales 0 2014-03-04 16:31 /sales-data
© Hortonworks Inc. 2011
HDFS ACLs
Default ACLs
–hdfs dfs -setfacl -m default:group:execs:r-x /monthly-sales-data
–hdfs dfs -mkdir /monthly-sales-data/JAN
–hdfs dfs –getfacl /monthly-sales-data/JAN
– # file: /monthly-sales-data/JAN
# owner: maya
# group: sales
user::rwx
group::r-x
group:execs:r-x
mask::r-x
other::---
default:user::rwx
default:group::r-x
default:group:execs:r-x
default:mask::r-x
default:other::---
© Hortonworks Inc. 2011
HDFS ACLs Best Practices
• Start with traditional HDFS permissions to implement most permission requirements.
• Define a smaller number of ACLs to handle exceptional cases.
• A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that
has only traditional permissions.
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Tables and Hive
Architecting the Future of Big Data
Page 21
© Hortonworks Inc. 2011
Table ACLs – The Challenge and Solution
• Hive and Pig have traditionally offer full table access control via HDFS access control
• The challenge in column-level access control
– Hive and Pig queries are executed as Tez-based tasks that access the HDFS files directly
– HDFS does not have knowledge of columns (there are several file/table formats)
• Solution for Column level ACLs
– Let Hive server check and submit the query execution
– Let the table be accessible only by special user (“HiveServer”)
– But one has to restrict the UDFs and file formats
– Good news: Hive provides an authorization plugin to do this cleanly
• Use standard SQL permission constructs
– GRANT/REVOKE
• Store the ACLs in Hive Metastore instead of some external DB
• But what about Pig, there is no Pig server …
Page 22
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Hive ATZ-NG – Architecture
HDFS
Metastore
HiveServer2
O/JDBC Beeline CLI
• ATZ-NG is called for O/JDBC & Beeline CLI
• Standard SQL GRANT / REVOKE for management
• Privilege to register UDF restricted to Admin user
• Policy integrated with Table/View life cycle
Storage Based Authorization Provider
Hive
CLI
OozieHue
PIG HCat
Ambari
0. Enable HiveATZ-NG
1. Authentication
UDFs
Protected – column level
Protected – table level
Restrict direct access to Metastore
Protect HDFS with Kerberos & HDFS ACL
ATZ-NG
2. Authorization
© Hortonworks Inc. 2011
What about MR/Pig
• Note there is no Pig/MR server to submit and check column ACLs
• Hence in the same cluster running Hive
–You cannot give Pig similar column level access control
–If Pig/MR is important,
–Use coarse grained table level
–Or run Pig/MR as privileged uses with full table level access
Page 26
© Hortonworks Inc. 2011
Hive ATZ-NG Example
Page 28
© Hortonworks Inc. 2011
Scenario
• Objective: Share Product Management Roadmap securely
• Actors:
–Admin Role – Specified in hive-site
– Admin role controls role memberships
–Product Management Role
– Should be able to create, read all road map details.
– Members: Vinay Shukla, Tim Hall
–Engineering Role
– Should be able to read (see) all roadmap details
– Members: Kevin Minder, Larry McCay
Page 29
© Hortonworks Inc. 2011
Step 1: Admin role Creates Roles, Adds Users
1. CREATE ROLE PM;
1. CREATE ROLE ENG;
1. GRANT ROLE PM to user timhall with admin option;
1. GRANT ROLE PM to user vinayshukla;
1. GRANT ROLE ENG to user kevinminder with admin option;
1. GRANT ROLE ENG to user larrymccay;
© Hortonworks Inc. 2011
Step 2: Super-user Creates Tables/Views
create table hdp_hadoop_plans (
id int,
hadoop_roadmap string,
hdp_roadmap string
);
© Hortonworks Inc. 2011
Step 3: Users or Roles Assigned To Tables
1. GRANT ALL ON hdp_hadoop_plans TO ROLE PM;
1. GRANT SELECT ON hdp_hadoop_plans TO ROLE ENG;
© Hortonworks Inc. 2011
HBase Cell Level Authorization
• The HBase permissions model already supports ACLs defined at the
namespace, table, column family and column level.
–This is sufficient to meet many requirements
–This can be insufficient if a data model requires protection on individual rows/cells.
–Example: Medical data, each row representing a patient, may require customizing who
can see an individual patient’s data, and the social security number of each row may
need further restriction.
Page 33
© Hortonworks Inc. 2011
HBase Cell Level Authorization
• Cell level authorization augments the permissions model by allowing
ACLs specified on individual cells.
–ACLs are now supported at the individual cell level.
–Individual operations may choose order of evaluation. Cell level ACLs may be evaluated
last or first.
–Evaluating last is useful if the common case is access granted through table or column
family ACLs, and cell level ACLs define exceptions for denial.
–Evaluating first is useful if many users are granted access through cell level ACLs.
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2011
HBase Cell Level Authorization
• Visibility labels
–Visibility expressions can be stored as metadata in a cell’s tag.
–A visibility expression consists of labels combined with boolean operators.
–E.g. (financial | strategy | research) & !newhire
–This means that a user must be labeled financial or strategy or research and not be a
newhire in order to see the column.
–The mapping of users to their labels is pluggable. By default, a user’s labels are specified
as authorizations in the individual operation.
–HBase visibility labels were inspired by similar features in Apache Accumulo, and the
model will look very familiar to Accumulo users.
Page 35
Architecting the Future of Big Data
© Hortonworks Inc. 2011
What can be done today?
Authorization
Restrict access to
explicit data
Audit
Understand who did
what
Data Protection
Encrypt data at rest &
motion
Wire encryption
• In native Hadoop
• With Knox
• SSL for Rest (2.x)
File encryption
• Via MR file format
• 3rd Party encryption
tools for col level
encryption
• Native HDFS support
coming
Authentication
Who am I/prove it?
Control access to cluster.
© Hortonworks Inc. 2011
Wire Encryption – for data in motion
Page 37
• Hadoop client to DataNode is via Data Transfer Protocol
– HDFS client reads/writes to HDFS service over encrypted channel
– Configurable encryption strength
• ODBC/JDBC Client to HiveServer 2
– Encryption is via SASL Quality Of Protection
• Map to Reduce via shuffle
– Shuffle is over HTTP(S)
– Supports mutual authentication via SSL
– Host name verification enabled
• Rest Protocols
– SSL support
© Hortonworks Inc. 2011
Data at Rest
• Coming: HDFS encrypted file system currently under development in Apache
– https://issues.apache.org/jira/browse/HADOOP-10150
– https://issues.apache.org/jira/browse/HDFS-6134
Page 38
Architecting the Future of Big Data
© Hortonworks Inc. 2011
XA Secure
A Major step forward in Hadoop security
See Shaun Connolly’s Key note on Wednesday June 4
Architecting the Future of Big Data
Page 39
© Hortonworks Inc. 2011
Security in Hadoop with HDP + XA Secure
Authorization
Restrict access to
explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & in motion
• Kerberos in native
Apache Hadoop
• HTTP/REST API
Secured with
Apache Knox
Gateway
• MapReduce Access Control Lists
• HDFS Permissions, HDFS ACL,
• Audit logs in with HDFS & MR
• Hive ATZ-NG
• Cell level access control in
Apache Accumulo
Authentication
Who am I/prove it?
• Wire encryption
in Hadoop
• Orchestrated
encryption with
3rd party tools
• HDFS, Hive and
Hbase
• Fine grain
access control
• RBAC
• Centralized
audit reporting
• Policy and
access history
• Future roadmap
• Strategy to be
finalized
HDP2.1XASecure
Centralized Security Administration
• As-Is, works with
current
authentication
methods
© Hortonworks Inc. 2011
Open Source?
•Yes XASecure technology will be open sourced
–Not just a Apache license where you are forced to get
the latest from Hortonworks
–But a full-fledged Apache Project that is truly open to the
community of developers and users
•See Shaun Connolly’s Keynote on Wednesday for
details
Page 41
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Summary
• Very strong Authentication via Kerberos and Active directory
– Uses your organization's user DB and integrates to its group and role membership
– Supplemented by Hadoop tokens
– Note these are necessary due to delayed job execution after user logs-off
• Strong fine grained authentication with some recent improvements
– HDFS ACLs
– Hive – integrated via SQL model and Hive Metastore
– Note a hacked side addon
– HBase Cell Level Authorization
• Strong encryption support
– Wire
– Data
– Some improvements coming soon
• Every product has audit logs
• XASecure adds a major step forward
– Yes it will be open sourced as a Apache Project
Page 42
Architecting the Future of Big Data
© Hortonworks Inc. 2011
Thank you, Q&A
Page 43
Resource Location
Hortonworks Security Labs http://hortonworks.com/labs/security/
Apache Knox Project Page http://knox.incubator.apache.org/
HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/
Encrypted File System
Development
https://issues.apache.org/jira/browse/HADOOP-10150
https://issues.apache.org/jira/browse/HDFS-6134
HBase Cell Level
Authorization
https://blogs.apache.org/hbase/entry/hbase_cell_security
Learn more

Weitere ähnliche Inhalte

Was ist angesagt?

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...DataWorks Summit
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 

Was ist angesagt? (20)

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
Extending Apache Ranger Authorization Beyond Hadoop: Review of Apache Ranger ...
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 

Andere mochten auch

Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXAbhishek Mallick
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authenticationleahculver
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1Amal Abid
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystèmeKhanh Maudoux
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Andere mochten auch (19)

Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop et son écosystème
Hadoop et son écosystèmeHadoop et son écosystème
Hadoop et son écosystème
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Ähnlich wie Hortonworks Improves Hadoop Security

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...huguk
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Apekshit Sharma
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 

Ähnlich wie Hortonworks Improves Hadoop Security (20)

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Kürzlich hochgeladen (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Hortonworks Improves Hadoop Security

  • 1. © Hortonworks Inc. 2011 Improvements in Hadoop Security Sanjay Radia sanjay@hortonworks.com @srr Chris Nauroth cnauroth@hortonworks.com @cnauroth Page 1
  • 2. © Hortonworks Inc. 2011 Hello Sanjay Radia • Founder, Hortonworks • Part of the Hadoop team at Yahoo! since 2007 – Chief Architect of Hadoop Core at Yahoo! – Long time Apache Hadoop PMC and Committer – Designed and developed several key Hadoop features • Prior – Data center automation, virtualization, Java, HA, OSs, File Systems (Startup, Sun Microsystems, …) – Ph.D., University of Waterloo Chris Nauroth • Member of Technical Staff, Hortonworks – Apache Hadoop Committer – Major contributor to HDFS ACLs • Hadoop user since 2010 – Prior employment experience deploying, maintaining and using Hadoop clusters Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2011 Overview • Models of Deployment – Secure and insecure • Hadoop Authentication – The how and why – Knox – perimeter security • Authorization – existing and what is new – HDFS – Tables and Hive – HBase and Accumulo • Data protection and encryption – Wire – Data at rest Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2011 Two Reasons for Security in Hadoop Hadoop Contains Sensitive Data – As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. – In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence – Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information. – Adherence to other Corporate security policies. 1 2
  • 5. © Hortonworks Inc. 2011 Three Models of Hadoop Deployment • Insecure cluster – You have protected it via the perimeter – You trust the code that runs in the system – Note In Hadoop cluster, user submitted code runs inside the cluster – (Not true in typical client-server applications) – The client-side libraries pass the client’s login credential – There is no end-end-authentication here – Authorization is done against this credential • Secure cluster – Full authentication – Can run arbitrary code in jobs • Perimeter security using Knox – Internal cluster can be secure or insecure depending on your needs
  • 6. © Hortonworks Inc. 2011 Pillars of Hadoop Security Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion AD/Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Authentication Who am I/prove it? Control access to cluster. Every service has audit logs Knox and XASecure provide central audit logs
  • 7. © Hortonworks Inc. 2011 Hadoop Authentication Overview • Kerberos/Active Directory based security – SSO – users do not have to re-login into Hadoop – Hadoop accounts do not have to be created – Caveat – MR Task isolation require Unix accounts for each user current but this is going away with Linux containers – Hadoop tokens – supplement the Kerberos authentication – Delegation tokens – deal with the delayed job execution – Block tokens – capabilities to deal with the distributed nature of HDFS – Trusted Proxies – support for third party services to act as proxy – Oozie – Gateways – HDFS proxy, Knox, etc. • Knox – Perimeter Security and Rest Gateway Page 7
  • 8. © Hortonworks Inc. 2011 Why the tokens • Why does Hadoop have its own tokens? – Standard client-server security model is not sufficient for Hadoop – Works when logged-in client is directly accessing a Hadoop service – But for a job, the execution happens much later – The job submitter has long logged off • Hence we needed to add delegation tokens • HDFS is a distributed service and needed to add capability-like tokens for datanode authentication The permissions/ ACLs are in the Namenode
  • 9. © Hortonworks Inc. 2011 Apache Knox Perimeter Security with Hadoop REST APIs Architecting the Future of Big Data Page 10
  • 10. © Hortonworks Inc. 2011 The Gateway or Edge Node • Hadoop APIs can be used from any desktop after SSO login – FileSystem and MapReduce Java APIs – Pig, Hive and Oozie clients (that wrap the Java APIs) • However it is typical to Use “Edge Node” or “Gateway Node” that is “inside” cluster – The libraries for the APIs are generally only installed on the gateway – Users SSH to Edge Node and execute API commands from shell Page 11 HadoopUser Edge Node SSH
  • 11. © Hortonworks Inc. 2011 • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support • Eliminates SSH “edge node” • Central API management • Central audit control • Simple Service level Authorization • SSO Integration – Siteminder, API Key*, OAuth* & SAML* • LDAP & AD integration Perimeter Security with Apache Knox Integrated with existing systems to simplify identity maintenance Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security. Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters
  • 12. © Hortonworks Inc. 2011 Hadoop REST APIs • Useful for connecting to Hadoop from the outside the cluster • When more client language flexibility is required – i.e. Java binding not an option • Challenges (Knox addresses these challenges) – Client must have knowledge of cluster topology – Required to open ports (and in some cases, on every host) outside the cluster Page 13 Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie.
  • 13. © Hortonworks Inc. 2011 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Previously • All Services: Service level ACLs • HDFS: Permissions • Yarn: Queue ACLs • Hive/Pig Tables: Table level via HDFS • Apache Accumulo: Cell level • HBase: Namespace, Table, Column Family and Column level ACLs Authentication Who am I/prove it? Control access to cluster. Hadoop 2.x • HDFS: ACLs • Hive: Column level ACLs • HBase: Cell level ACLs • Knox: • Rest Service level Authorization • Access Audit with Knox
  • 14. © Hortonworks Inc. 2011 HDFS ACLs • Existing HDFS POSIX permissions good, but not flexible enough – Permission requirements may differ from the natural organizational hierarchy of users and groups. • HDFS ACLs augment the existing HDFS POSIX permissions model by implementing the POSIX ACL model. – An ACL (Access Control List) provides a way to set different permissions for specific named users or named groups, not only the file’s owner and file’s group. Page 15 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2011 HDFS File Permissions Example • Authorization requirements: –In a sales department, they would like a single user Maya (Department Manager) to control all modifications to sales data –Other members of sales department need to view the data, but can’t modify it. –Everyone else in the company must not be allowed to view the data. • Can be implemented via the following: Read/Write perm for user maya User Group Read perm for group sales File with sales data
  • 16. © Hortonworks Inc. 2011 HDFS ACLs • Problem –No longer feasible for Maya to control all modifications to the file – New Requirement: Maya, Diane and Clark are allowed to make modifications – New Requirement: New group called executives should be able to read the sales data –Current permissions model only allows permissions at 1 group and 1 user • Solution: HDFS ACLs –Now assign different permissions to different users and groups Owner Group Others HDFS Directory … rwx … rwx … rwx Group D … rwx Group F … rwx User Y … rwx
  • 17. © Hortonworks Inc. 2011 HDFS ACLs New Tools for ACL Management (setfacl, getfacl) – hdfs dfs -setfacl -m group:execs:r-- /sales-data – hdfs dfs -getfacl /sales-data # file: /sales-data # owner: maya # group: sales user::rw- group::r— group:execs:r— mask::r— other::-- – How do you know if a directory has ACLs set? – hdfs dfs -ls /sales-data Found 1 items -rw-r-----+ 3 maya sales 0 2014-03-04 16:31 /sales-data
  • 18. © Hortonworks Inc. 2011 HDFS ACLs Default ACLs –hdfs dfs -setfacl -m default:group:execs:r-x /monthly-sales-data –hdfs dfs -mkdir /monthly-sales-data/JAN –hdfs dfs –getfacl /monthly-sales-data/JAN – # file: /monthly-sales-data/JAN # owner: maya # group: sales user::rwx group::r-x group:execs:r-x mask::r-x other::--- default:user::rwx default:group::r-x default:group:execs:r-x default:mask::r-x default:other::---
  • 19. © Hortonworks Inc. 2011 HDFS ACLs Best Practices • Start with traditional HDFS permissions to implement most permission requirements. • Define a smaller number of ACLs to handle exceptional cases. • A file with an ACL incurs an additional cost in memory in the NameNode compared to a file that has only traditional permissions. Page 20 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2011 Tables and Hive Architecting the Future of Big Data Page 21
  • 21. © Hortonworks Inc. 2011 Table ACLs – The Challenge and Solution • Hive and Pig have traditionally offer full table access control via HDFS access control • The challenge in column-level access control – Hive and Pig queries are executed as Tez-based tasks that access the HDFS files directly – HDFS does not have knowledge of columns (there are several file/table formats) • Solution for Column level ACLs – Let Hive server check and submit the query execution – Let the table be accessible only by special user (“HiveServer”) – But one has to restrict the UDFs and file formats – Good news: Hive provides an authorization plugin to do this cleanly • Use standard SQL permission constructs – GRANT/REVOKE • Store the ACLs in Hive Metastore instead of some external DB • But what about Pig, there is no Pig server … Page 22 Architecting the Future of Big Data
  • 22. © Hortonworks Inc. 2011 Hive ATZ-NG – Architecture HDFS Metastore HiveServer2 O/JDBC Beeline CLI • ATZ-NG is called for O/JDBC & Beeline CLI • Standard SQL GRANT / REVOKE for management • Privilege to register UDF restricted to Admin user • Policy integrated with Table/View life cycle Storage Based Authorization Provider Hive CLI OozieHue PIG HCat Ambari 0. Enable HiveATZ-NG 1. Authentication UDFs Protected – column level Protected – table level Restrict direct access to Metastore Protect HDFS with Kerberos & HDFS ACL ATZ-NG 2. Authorization
  • 23. © Hortonworks Inc. 2011 What about MR/Pig • Note there is no Pig/MR server to submit and check column ACLs • Hence in the same cluster running Hive –You cannot give Pig similar column level access control –If Pig/MR is important, –Use coarse grained table level –Or run Pig/MR as privileged uses with full table level access Page 26
  • 24. © Hortonworks Inc. 2011 Hive ATZ-NG Example Page 28
  • 25. © Hortonworks Inc. 2011 Scenario • Objective: Share Product Management Roadmap securely • Actors: –Admin Role – Specified in hive-site – Admin role controls role memberships –Product Management Role – Should be able to create, read all road map details. – Members: Vinay Shukla, Tim Hall –Engineering Role – Should be able to read (see) all roadmap details – Members: Kevin Minder, Larry McCay Page 29
  • 26. © Hortonworks Inc. 2011 Step 1: Admin role Creates Roles, Adds Users 1. CREATE ROLE PM; 1. CREATE ROLE ENG; 1. GRANT ROLE PM to user timhall with admin option; 1. GRANT ROLE PM to user vinayshukla; 1. GRANT ROLE ENG to user kevinminder with admin option; 1. GRANT ROLE ENG to user larrymccay;
  • 27. © Hortonworks Inc. 2011 Step 2: Super-user Creates Tables/Views create table hdp_hadoop_plans ( id int, hadoop_roadmap string, hdp_roadmap string );
  • 28. © Hortonworks Inc. 2011 Step 3: Users or Roles Assigned To Tables 1. GRANT ALL ON hdp_hadoop_plans TO ROLE PM; 1. GRANT SELECT ON hdp_hadoop_plans TO ROLE ENG;
  • 29. © Hortonworks Inc. 2011 HBase Cell Level Authorization • The HBase permissions model already supports ACLs defined at the namespace, table, column family and column level. –This is sufficient to meet many requirements –This can be insufficient if a data model requires protection on individual rows/cells. –Example: Medical data, each row representing a patient, may require customizing who can see an individual patient’s data, and the social security number of each row may need further restriction. Page 33
  • 30. © Hortonworks Inc. 2011 HBase Cell Level Authorization • Cell level authorization augments the permissions model by allowing ACLs specified on individual cells. –ACLs are now supported at the individual cell level. –Individual operations may choose order of evaluation. Cell level ACLs may be evaluated last or first. –Evaluating last is useful if the common case is access granted through table or column family ACLs, and cell level ACLs define exceptions for denial. –Evaluating first is useful if many users are granted access through cell level ACLs. Page 34 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2011 HBase Cell Level Authorization • Visibility labels –Visibility expressions can be stored as metadata in a cell’s tag. –A visibility expression consists of labels combined with boolean operators. –E.g. (financial | strategy | research) & !newhire –This means that a user must be labeled financial or strategy or research and not be a newhire in order to see the column. –The mapping of users to their labels is pluggable. By default, a user’s labels are specified as authorizations in the individual operation. –HBase visibility labels were inspired by similar features in Apache Accumulo, and the model will look very familiar to Accumulo users. Page 35 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2011 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Wire encryption • In native Hadoop • With Knox • SSL for Rest (2.x) File encryption • Via MR file format • 3rd Party encryption tools for col level encryption • Native HDFS support coming Authentication Who am I/prove it? Control access to cluster.
  • 33. © Hortonworks Inc. 2011 Wire Encryption – for data in motion Page 37 • Hadoop client to DataNode is via Data Transfer Protocol – HDFS client reads/writes to HDFS service over encrypted channel – Configurable encryption strength • ODBC/JDBC Client to HiveServer 2 – Encryption is via SASL Quality Of Protection • Map to Reduce via shuffle – Shuffle is over HTTP(S) – Supports mutual authentication via SSL – Host name verification enabled • Rest Protocols – SSL support
  • 34. © Hortonworks Inc. 2011 Data at Rest • Coming: HDFS encrypted file system currently under development in Apache – https://issues.apache.org/jira/browse/HADOOP-10150 – https://issues.apache.org/jira/browse/HDFS-6134 Page 38 Architecting the Future of Big Data
  • 35. © Hortonworks Inc. 2011 XA Secure A Major step forward in Hadoop security See Shaun Connolly’s Key note on Wednesday June 4 Architecting the Future of Big Data Page 39
  • 36. © Hortonworks Inc. 2011 Security in Hadoop with HDP + XA Secure Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway • MapReduce Access Control Lists • HDFS Permissions, HDFS ACL, • Audit logs in with HDFS & MR • Hive ATZ-NG • Cell level access control in Apache Accumulo Authentication Who am I/prove it? • Wire encryption in Hadoop • Orchestrated encryption with 3rd party tools • HDFS, Hive and Hbase • Fine grain access control • RBAC • Centralized audit reporting • Policy and access history • Future roadmap • Strategy to be finalized HDP2.1XASecure Centralized Security Administration • As-Is, works with current authentication methods
  • 37. © Hortonworks Inc. 2011 Open Source? •Yes XASecure technology will be open sourced –Not just a Apache license where you are forced to get the latest from Hortonworks –But a full-fledged Apache Project that is truly open to the community of developers and users •See Shaun Connolly’s Keynote on Wednesday for details Page 41 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2011 Summary • Very strong Authentication via Kerberos and Active directory – Uses your organization's user DB and integrates to its group and role membership – Supplemented by Hadoop tokens – Note these are necessary due to delayed job execution after user logs-off • Strong fine grained authentication with some recent improvements – HDFS ACLs – Hive – integrated via SQL model and Hive Metastore – Note a hacked side addon – HBase Cell Level Authorization • Strong encryption support – Wire – Data – Some improvements coming soon • Every product has audit logs • XASecure adds a major step forward – Yes it will be open sourced as a Apache Project Page 42 Architecting the Future of Big Data
  • 39. © Hortonworks Inc. 2011 Thank you, Q&A Page 43 Resource Location Hortonworks Security Labs http://hortonworks.com/labs/security/ Apache Knox Project Page http://knox.incubator.apache.org/ HDFS ACLs Blog Post http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/ Encrypted File System Development https://issues.apache.org/jira/browse/HADOOP-10150 https://issues.apache.org/jira/browse/HDFS-6134 HBase Cell Level Authorization https://blogs.apache.org/hbase/entry/hbase_cell_security Learn more

Hinweis der Redaktion

  1. Note: I would have used Tom Reilly instead of Mike Olson but nobody knows who Tom is.
  2. WebHDFS and WebHcat surely support SSL – others check?