SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Securing Hadoop
Keys Botzum, MapR Technologies
kbotzum@maprtech.com
Jan 2014
©MapR Technologies - Confidential

1
Why Secure Hadoop


Historically security wasn’t a high priority
–



Reflection of the type of data and the type of organizations using Hadoop

Hadoop is now being used by more traditional firms as well as
organizations with high security requirements
–
–
–

Highly regulated
Sensitive data sets
People with experience with security in existing enterprise technologies
(e.g., databases) are asking for the same in Hadoop

©MapR Technologies - Confidential

2
Why Secure Hadoop


Client operating system is trusted to identify user (weak
authentication)
–
–



Hadoop servers trust anyone that can reach them on the network
–



Could I falsify a data node, job tracker, etc.?

Hive Server runs as ‘system’ user
–



If I can compromise client, I can run jobs or access HDFS as anyone
Think about virtual machines with root access

All Hive Server submitted jobs run as that ‘system’ user

Intruders can see and modify all network traffic

©MapR Technologies - Confidential

3
Apache Hadoop Security


Core goals
–

Authenticate network traffic
Users authenticate
• Servers authenticate to each other
•

–



Encrypt network traffic

Note: Hadoop also has a lot of authorization functionality which
I’m not discussing here

©MapR Technologies - Confidential

4
Apache Hadoop Security


Kerberos as core authentication technology
–
–



But Kerberos doesn’t fit perfectly with Hadoop model
–



Kerberos to access HDFS, JT, Oozie, etc.
Kerberos for server to server traffic
Introduce delegation tokens for carrying identity in many scenarios

Kerberos is complicated
–

Need Kerberos identity for every server in the cluster
•

–
–

Lots to manage!

Every user needs a Kerberos identity to access cluster, Web UIs, etc.
Lots of steps
•

http://www.cloudera.com/content/cloudera-content/clouderadocs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_3.html

©MapR Technologies - Confidential

5
Ecosystem Kerberos


Ecosystem components also generally rely on Kerberos
–

–
–



Need to create appropriate Kerberos SPNEGO identities for many services
(Web UI access)
Need to create service Kerberos identity for cluster access for many
services, often for each node
Lots to manage

HBase, Oozie, Hive Server 2, Hive Meta Server, Flume, etc.

©MapR Technologies - Confidential

6
Apache Hadoop Security – Additional Items


Kerberos only part of the puzzle



More steps – some examples
–
–
–

Configure Web UI HTTPS
Configure Encrypted Shuffle
Configure Hive Server 2
Authentication using LDAP or Kerberos
• Impersonation
•

Authenticate to HS2 (userid/password or Kerberos)
– HS2 executes job using secure impersonation on cluster
– Now job runs as submitting user and can see/modify only what user can
–

•

Encryption
–

SSL can be used to protect userid & password authentication to HS2

©MapR Technologies - Confidential

7
MapR Distribution for Apache Hadoop


Complete Hadoop distribution



Comprehensive management
suite



Industry-standard interfaces



Enterprise-grade
dependability



Higher performance



Ease of Use

©MapR Technologies - Confidential

8
The Cloud Leaders Pick MapR

Google chose MapR to
provide Hadoop on Google
Compute Engine

Amazon EMR is the largest
Hadoop provider in revenue
and # of clusters

©MapR Technologies - Confidential

9
MapR Security


Build on the work of the Apache community, but with
improvements



Goals
–

Authenticate network traffic
Users authenticate
• Servers authenticate to each other
•

–

–
–

Encrypt network traffic
Low performance overhead
Simple and easy to administer

©MapR Technologies - Confidential

10
MapR Native Security


Hadoop security without Kerberos
–



But borrow heavily from Kerberos design

Kerberos integration if desired

©MapR Technologies - Confidential

11
Architecture


Shared secrets like Kerberos
–



Managed at cluster level

Identity represented using a ticket which is issued by MapR CLDB
servers (Container Location DataBase)

©MapR Technologies - Confidential

12
Tickets


A ticket represents a valid authenticated identity



Contains
–
–
–



An expiration time, renewal lifetime, and creation time
A randomly generated secret key
Information about the identity – userid, group ids

A client authenticates to servers using the ticket

©MapR Technologies - Confidential

13
User Experience


User invokes maprlogin
–

maprlogin connects to CLDB (over https)
•

–



Ticket is returned, saved in file in /tmp file and accessible only by
owning user – file name is /tmp/maprticket_<uid>

MapR PAM module
–



Provide userid & password (or Kerberos ticket) for validation by CLDB

Optional MapR provided PAM module creates MapR tickets
automatically during Unix login

All processes automatically pick up ticket (nothing to do)
Java and C/C++ clients implicitly look for valid ticket and use it
– Clients optionally use existing Kerberos identity to get MapR ticket
–

©MapR Technologies - Confidential

14
Client First Contact


Client sends the ticket and data encrypted using secret key



Receiving server
–
–
–



Validates ticket, including expiration
Extracts identity information from ticket and uses that for authorization
Returns encrypted response to client

Notice that MapR user identity is independent of host or operating
system identity

©MapR Technologies - Confidential

15
Server First Contact


When a trusted server starts it uses a local server ticket to
authenticate to the CLDB
–
–
–

CLDB verifies the ticket’s authenticity using secret key
CLDB returns a server key that is used to create and validate user tickets
The server is now a trusted member of the cluster

©MapR Technologies - Confidential

16
Maprlogin


Primary user visible security tool



Actions are
–
–
–
–
–
–



password - authenticate to a MapR cluster using a valid password
kerberos - authenticate to a MapR cluster using Kerberos
print - print information on your existing credentials
authtest - test authentication as a generic client
end / logout - logout of cluster
renew - renew existing ticket

For example:
% maprlogin password
[Password for user 'fred' at cluster 'my.cluster.com': ]
MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to
'/tmp/maprticket_1001'

©MapR Technologies - Confidential

17
Maprlogin – Under the Covers
maprlogin

1. username/passwd
sent on https

4. ticket + key saved in file in /tmp

MapR
CLDB

LDAP/
Kerberos/
NIS

3. ticket + user
key returned

6. client sends RPC
encrypted with
user-key + ticket

hadoop fs –ls /
5. cmd picks up
ticket + key from
file

©MapR Technologies - Confidential

2. uses PAM to
authenticate

FileServer/
CLDB
7. server decrypts ticket to
authenticate user and
checks permissions on ACL

18
Cryptography


Encrypted using current NIST standards
–

AES-256 in GCM mode for encryption and signing
http://en.wikipedia.org/wiki/Galois/Counter_Mode
• NIST standard - http://csrc.nist.gov/publications/fips/fips1402/fips1402annexa.pdf
•

–

Leverage Intel hardware encryption where available, software otherwise



Use the open source crypto++ library for our C++ cryptography –
http://cryptopp.com



Random number generation
–

Use secure random number generation as documented here
http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.htm
l#_details

©MapR Technologies - Confidential

19
MapR Security – More by Default


By default, out of the box
–

HS2 supports password authentication
•

–

Oozie supports MapR ticket authentication
•

–
–

Can configure Kerberos and SSL function, same as from Apache, including secure
impersonation
Can configure Kerberos and SSL function, same as from Apache, including secure
impersonation

MapR Tables (HBase APIs) use native MapR security, no configuration
needed
Most Web UIs enhanced to support userid & password authentication and
HTTPS
•

Can configure Kerberos SPNEGO, same as from Apache

©MapR Technologies - Confidential

20
Encrypted Shuffle (?)


No need to special case encrypting shuffle



MapR-FS is store for Map output
–

Shuffle inherits the same encryption, authentication, and authorization
functionality of the rest of MapR-FS

©MapR Technologies - Confidential

21
Let’s Build a Secure Cluster!


Node 1
apt-get install mapr….
configure.sh –C … -Z … -secure –genkeys
– Generates all needed keys for MapR-RPC as well as for HTTPS



Node N
apt-get install mapr….
scp
rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_keyst
ore,ssl_truststore} /opt/mapr/conf
configure.sh –C … -Z … -secure



Clients
apt-get install mapr…
scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf
configure.sh … -secure

©MapR Technologies - Confidential

22
MapR Advantage


Vastly simpler
–
–



Easier integration
–



Core secured by default in one step
No requirement for Kerberos in core and associated complexity
Leverage existing Linux authentication (PAM and NSSwitch)

Faster
–

Leverage Intel AES hardware cryptography

©MapR Technologies - Confidential

23
Further Reading


MapR
–



MapR Native Security
–
–



http://www.mapr.com/press-release/mapr-technologies-integratessecurity-into-hadoop
http://www.mapr.com/products/only-with-mapr/mapr-integrates-securityinto-hadoop

Adding Security to Apache Hadoop
–



http://mapr.com

http://hortonworks.com/wp-content/uploads/2011/10/securitydesign_withCover-1.pdf

The Evolution of Hadoop’s Security Model
–

http://www.infoq.com/articles/HadoopSecurityModel/

©MapR Technologies - Confidential

24
Thank You

©MapR Technologies - Confidential

25
Appendix

©MapR Technologies - Confidential

26
Key Design Elements


User authentication and authorization information obtained using
standard operating system information – PAM and nsswitch



MapR specific shared secret keys
–
–
–

Easier to manage
No dependencies on complex external security systems
Better performance



MapR servers (running as ‘mapr’) have access to maprserverticket
and are therefore privileged processes



MapR-RPC altered to encrypt and authenticate traffic



Maprsasl created for Apache Java code to leverage similar security
–

–

Leverages same keys, authentication model, etc.
Reuses the C/C++ code via JNI

©MapR Technologies - Confidential

27
Persistent Keys and Tickets
CLDB/ZK
1

K
Node 1

…

CLDB/ZK
N

K

Node 2

Node N

…

©MapR Technologies - Confidential

28
Example: Job Tracker Integration

JobClient

submit
job
(maprsasl)

1. JC copies
job conf securely to FS

JobTracker

schedule
job
(maprsasl)

TaskTracker

2. JT
creates
user ticket 3. TT fetches
4. TT launches job using ticket identity
ticket

File system

JT can create user tickets. TT copies ticket to private job directory on local disk.
taskcontroller copies it to user private local disk dir and tasks set
MAPR_TICKET_LOCATION to that place.
©MapR Technologies - Confidential

29
Creating a Secure Cluster


On first node run configure.sh … -genkeys, it creates some keys
–
–
–



Additional nodes
–
–
–



Copy all to other CLDB and ZK nodes
Copy all but the CLDB key to remaining nodes
Run configure.sh

On a client
–
–



CLDB key (cldb.key)
Ticket for nodes (maprserverticket)
SSL certificates (ssl_keystore & ssl_truststore)

Copy SSL truststore from any server node
Run configure.sh

No requirement for Kerberos configuration

©MapR Technologies - Confidential

30

Más contenido relacionado

Was ist angesagt?

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache RangerDataWorks Summit
 
Scalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerScalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerWSO2
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Alfresco REST API of the future ... is closer than you think
Alfresco REST API of the future ... is closer than you thinkAlfresco REST API of the future ... is closer than you think
Alfresco REST API of the future ... is closer than you thinkJ V
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introductionJason Hu
 
Connecting Kafka Across Multiple AWS VPCs
Connecting Kafka Across Multiple AWS VPCs Connecting Kafka Across Multiple AWS VPCs
Connecting Kafka Across Multiple AWS VPCs confluent
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)WSO2
 
Service mesh(istio) monitoring
Service mesh(istio) monitoringService mesh(istio) monitoring
Service mesh(istio) monitoringJeong-Ho Na
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaKai Wähner
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 

Was ist angesagt? (20)

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Scalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API ManagerScalable deployment options in WSO2 API Manager
Scalable deployment options in WSO2 API Manager
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Alfresco REST API of the future ... is closer than you think
Alfresco REST API of the future ... is closer than you thinkAlfresco REST API of the future ... is closer than you think
Alfresco REST API of the future ... is closer than you think
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache SparkRow/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introduction
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
Connecting Kafka Across Multiple AWS VPCs
Connecting Kafka Across Multiple AWS VPCs Connecting Kafka Across Multiple AWS VPCs
Connecting Kafka Across Multiple AWS VPCs
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
 
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
Workflows in WSO2 API Manager - WSO2 API Manager Community Call (12/15/2021)
 
Service mesh(istio) monitoring
Service mesh(istio) monitoringService mesh(istio) monitoring
Service mesh(istio) monitoring
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 

Andere mochten auch

Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumMapR Technologies
 
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumSecuring Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumMapR Technologies
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionMapR Technologies
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureMapR Technologies
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 

Andere mochten auch (12)

Securing Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys BotzumSecuring Hadoop by Sr. Principal Technologist Keys Botzum
Securing Hadoop by Sr. Principal Technologist Keys Botzum
 
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys BotzumSecuring Hadoop by MapR's Senior Principal Technologist Keys Botzum
Securing Hadoop by MapR's Senior Principal Technologist Keys Botzum
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
MapR 5.2 Product Update
MapR 5.2 Product UpdateMapR 5.2 Product Update
MapR 5.2 Product Update
 
Insight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital TransformationInsight Platforms Accelerate Digital Transformation
Insight Platforms Accelerate Digital Transformation
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Zeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data ArchitectureZeta Architecture: The Next Generation Big Data Architecture
Zeta Architecture: The Next Generation Big Data Architecture
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 

Ähnlich wie Securing Hadoop - MapR Technologies

Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)MapR Technologies
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPALDAPCon
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_securityAdam Muise
 
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan VMware Tanzu
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости HadoopPositive Hack Days
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOS
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOSWalking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOS
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOSCody Thomas
 
Kerberos case study
Kerberos case studyKerberos case study
Kerberos case studyMayuri Patil
 
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...Xiaohui Chen
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetupJuraj Hantak
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsCloud Native Day Tel Aviv
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax
 

Ähnlich wie Securing Hadoop - MapR Technologies (20)

Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheConTechnical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
2014 sept 4_hadoop_security
2014 sept 4_hadoop_security2014 sept 4_hadoop_security
2014 sept 4_hadoop_security
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
Secure Credential Management with CredHub - DaShaun Carter & Sharath Sahadevan
 
Охота на уязвимости Hadoop
Охота на уязвимости HadoopОхота на уязвимости Hadoop
Охота на уязвимости Hadoop
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOS
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOSWalking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOS
Walking the Bifrost: An Operator's Guide to Heimdal & Kerberos on macOS
 
Kerberos case study
Kerberos case studyKerberos case study
Kerberos case study
 
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
网易云K8S应用实践 | practices for kubernetes cluster provisioning, management and ap...
 
Secrets management vault cncf meetup
Secrets management vault cncf meetupSecrets management vault cncf meetup
Secrets management vault cncf meetup
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
DataStax | DSE: Bring Your Own Spark (with Enterprise Security) (Artem Aliev)...
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 

Último

Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 

Último (20)

Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 

Securing Hadoop - MapR Technologies

  • 1. Securing Hadoop Keys Botzum, MapR Technologies kbotzum@maprtech.com Jan 2014 ©MapR Technologies - Confidential 1
  • 2. Why Secure Hadoop  Historically security wasn’t a high priority –  Reflection of the type of data and the type of organizations using Hadoop Hadoop is now being used by more traditional firms as well as organizations with high security requirements – – – Highly regulated Sensitive data sets People with experience with security in existing enterprise technologies (e.g., databases) are asking for the same in Hadoop ©MapR Technologies - Confidential 2
  • 3. Why Secure Hadoop  Client operating system is trusted to identify user (weak authentication) – –  Hadoop servers trust anyone that can reach them on the network –  Could I falsify a data node, job tracker, etc.? Hive Server runs as ‘system’ user –  If I can compromise client, I can run jobs or access HDFS as anyone Think about virtual machines with root access All Hive Server submitted jobs run as that ‘system’ user Intruders can see and modify all network traffic ©MapR Technologies - Confidential 3
  • 4. Apache Hadoop Security  Core goals – Authenticate network traffic Users authenticate • Servers authenticate to each other • –  Encrypt network traffic Note: Hadoop also has a lot of authorization functionality which I’m not discussing here ©MapR Technologies - Confidential 4
  • 5. Apache Hadoop Security  Kerberos as core authentication technology – –  But Kerberos doesn’t fit perfectly with Hadoop model –  Kerberos to access HDFS, JT, Oozie, etc. Kerberos for server to server traffic Introduce delegation tokens for carrying identity in many scenarios Kerberos is complicated – Need Kerberos identity for every server in the cluster • – – Lots to manage! Every user needs a Kerberos identity to access cluster, Web UIs, etc. Lots of steps • http://www.cloudera.com/content/cloudera-content/clouderadocs/CDH4/4.3.0/CDH4-Security-Guide/cdh4sg_topic_3.html ©MapR Technologies - Confidential 5
  • 6. Ecosystem Kerberos  Ecosystem components also generally rely on Kerberos – – –  Need to create appropriate Kerberos SPNEGO identities for many services (Web UI access) Need to create service Kerberos identity for cluster access for many services, often for each node Lots to manage HBase, Oozie, Hive Server 2, Hive Meta Server, Flume, etc. ©MapR Technologies - Confidential 6
  • 7. Apache Hadoop Security – Additional Items  Kerberos only part of the puzzle  More steps – some examples – – – Configure Web UI HTTPS Configure Encrypted Shuffle Configure Hive Server 2 Authentication using LDAP or Kerberos • Impersonation • Authenticate to HS2 (userid/password or Kerberos) – HS2 executes job using secure impersonation on cluster – Now job runs as submitting user and can see/modify only what user can – • Encryption – SSL can be used to protect userid & password authentication to HS2 ©MapR Technologies - Confidential 7
  • 8. MapR Distribution for Apache Hadoop  Complete Hadoop distribution  Comprehensive management suite  Industry-standard interfaces  Enterprise-grade dependability  Higher performance  Ease of Use ©MapR Technologies - Confidential 8
  • 9. The Cloud Leaders Pick MapR Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters ©MapR Technologies - Confidential 9
  • 10. MapR Security  Build on the work of the Apache community, but with improvements  Goals – Authenticate network traffic Users authenticate • Servers authenticate to each other • – – – Encrypt network traffic Low performance overhead Simple and easy to administer ©MapR Technologies - Confidential 10
  • 11. MapR Native Security  Hadoop security without Kerberos –  But borrow heavily from Kerberos design Kerberos integration if desired ©MapR Technologies - Confidential 11
  • 12. Architecture  Shared secrets like Kerberos –  Managed at cluster level Identity represented using a ticket which is issued by MapR CLDB servers (Container Location DataBase) ©MapR Technologies - Confidential 12
  • 13. Tickets  A ticket represents a valid authenticated identity  Contains – – –  An expiration time, renewal lifetime, and creation time A randomly generated secret key Information about the identity – userid, group ids A client authenticates to servers using the ticket ©MapR Technologies - Confidential 13
  • 14. User Experience  User invokes maprlogin – maprlogin connects to CLDB (over https) • –  Ticket is returned, saved in file in /tmp file and accessible only by owning user – file name is /tmp/maprticket_<uid> MapR PAM module –  Provide userid & password (or Kerberos ticket) for validation by CLDB Optional MapR provided PAM module creates MapR tickets automatically during Unix login All processes automatically pick up ticket (nothing to do) Java and C/C++ clients implicitly look for valid ticket and use it – Clients optionally use existing Kerberos identity to get MapR ticket – ©MapR Technologies - Confidential 14
  • 15. Client First Contact  Client sends the ticket and data encrypted using secret key  Receiving server – – –  Validates ticket, including expiration Extracts identity information from ticket and uses that for authorization Returns encrypted response to client Notice that MapR user identity is independent of host or operating system identity ©MapR Technologies - Confidential 15
  • 16. Server First Contact  When a trusted server starts it uses a local server ticket to authenticate to the CLDB – – – CLDB verifies the ticket’s authenticity using secret key CLDB returns a server key that is used to create and validate user tickets The server is now a trusted member of the cluster ©MapR Technologies - Confidential 16
  • 17. Maprlogin  Primary user visible security tool  Actions are – – – – – –  password - authenticate to a MapR cluster using a valid password kerberos - authenticate to a MapR cluster using Kerberos print - print information on your existing credentials authtest - test authentication as a generic client end / logout - logout of cluster renew - renew existing ticket For example: % maprlogin password [Password for user 'fred' at cluster 'my.cluster.com': ] MapR credentials of user 'fred' for cluster 'my.cluster.com' are written to '/tmp/maprticket_1001' ©MapR Technologies - Confidential 17
  • 18. Maprlogin – Under the Covers maprlogin 1. username/passwd sent on https 4. ticket + key saved in file in /tmp MapR CLDB LDAP/ Kerberos/ NIS 3. ticket + user key returned 6. client sends RPC encrypted with user-key + ticket hadoop fs –ls / 5. cmd picks up ticket + key from file ©MapR Technologies - Confidential 2. uses PAM to authenticate FileServer/ CLDB 7. server decrypts ticket to authenticate user and checks permissions on ACL 18
  • 19. Cryptography  Encrypted using current NIST standards – AES-256 in GCM mode for encryption and signing http://en.wikipedia.org/wiki/Galois/Counter_Mode • NIST standard - http://csrc.nist.gov/publications/fips/fips1402/fips1402annexa.pdf • – Leverage Intel hardware encryption where available, software otherwise  Use the open source crypto++ library for our C++ cryptography – http://cryptopp.com  Random number generation – Use secure random number generation as documented here http://www.cryptopp.com/docs/ref/class_auto_seeded_random_pool.htm l#_details ©MapR Technologies - Confidential 19
  • 20. MapR Security – More by Default  By default, out of the box – HS2 supports password authentication • – Oozie supports MapR ticket authentication • – – Can configure Kerberos and SSL function, same as from Apache, including secure impersonation Can configure Kerberos and SSL function, same as from Apache, including secure impersonation MapR Tables (HBase APIs) use native MapR security, no configuration needed Most Web UIs enhanced to support userid & password authentication and HTTPS • Can configure Kerberos SPNEGO, same as from Apache ©MapR Technologies - Confidential 20
  • 21. Encrypted Shuffle (?)  No need to special case encrypting shuffle  MapR-FS is store for Map output – Shuffle inherits the same encryption, authentication, and authorization functionality of the rest of MapR-FS ©MapR Technologies - Confidential 21
  • 22. Let’s Build a Secure Cluster!  Node 1 apt-get install mapr…. configure.sh –C … -Z … -secure –genkeys – Generates all needed keys for MapR-RPC as well as for HTTPS  Node N apt-get install mapr…. scp rootORmapr@node1:/opt/mapr/conf/{cldb.key,maprserverticket,ssl_keyst ore,ssl_truststore} /opt/mapr/conf configure.sh –C … -Z … -secure  Clients apt-get install mapr… scp anyuser@nodeN:/opt/mapr/conf/ssl_truststore /opt/mapr/conf configure.sh … -secure ©MapR Technologies - Confidential 22
  • 23. MapR Advantage  Vastly simpler – –  Easier integration –  Core secured by default in one step No requirement for Kerberos in core and associated complexity Leverage existing Linux authentication (PAM and NSSwitch) Faster – Leverage Intel AES hardware cryptography ©MapR Technologies - Confidential 23
  • 24. Further Reading  MapR –  MapR Native Security – –  http://www.mapr.com/press-release/mapr-technologies-integratessecurity-into-hadoop http://www.mapr.com/products/only-with-mapr/mapr-integrates-securityinto-hadoop Adding Security to Apache Hadoop –  http://mapr.com http://hortonworks.com/wp-content/uploads/2011/10/securitydesign_withCover-1.pdf The Evolution of Hadoop’s Security Model – http://www.infoq.com/articles/HadoopSecurityModel/ ©MapR Technologies - Confidential 24
  • 25. Thank You ©MapR Technologies - Confidential 25
  • 27. Key Design Elements  User authentication and authorization information obtained using standard operating system information – PAM and nsswitch  MapR specific shared secret keys – – – Easier to manage No dependencies on complex external security systems Better performance  MapR servers (running as ‘mapr’) have access to maprserverticket and are therefore privileged processes  MapR-RPC altered to encrypt and authenticate traffic  Maprsasl created for Apache Java code to leverage similar security – – Leverages same keys, authentication model, etc. Reuses the C/C++ code via JNI ©MapR Technologies - Confidential 27
  • 28. Persistent Keys and Tickets CLDB/ZK 1 K Node 1 … CLDB/ZK N K Node 2 Node N … ©MapR Technologies - Confidential 28
  • 29. Example: Job Tracker Integration JobClient submit job (maprsasl) 1. JC copies job conf securely to FS JobTracker schedule job (maprsasl) TaskTracker 2. JT creates user ticket 3. TT fetches 4. TT launches job using ticket identity ticket File system JT can create user tickets. TT copies ticket to private job directory on local disk. taskcontroller copies it to user private local disk dir and tasks set MAPR_TICKET_LOCATION to that place. ©MapR Technologies - Confidential 29
  • 30. Creating a Secure Cluster  On first node run configure.sh … -genkeys, it creates some keys – – –  Additional nodes – – –  Copy all to other CLDB and ZK nodes Copy all but the CLDB key to remaining nodes Run configure.sh On a client – –  CLDB key (cldb.key) Ticket for nodes (maprserverticket) SSL certificates (ssl_keystore & ssl_truststore) Copy SSL truststore from any server node Run configure.sh No requirement for Kerberos configuration ©MapR Technologies - Confidential 30

Hinweis der Redaktion

  1. MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
  2. MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  3. In initial release, server key and cldb key never changes. Server ticket also shared by all servers and does not expire.
  4. Note: this does create a “race condition” in the install process since all nodes but the first have to have configure.sh run after the first. This might be an issue with certain parallel install processes. You can work around this by simply running configure.sh (specifying the domain for the ssl certs as needed) somewhere to create the needed keys and then copying them to all nodes at once.