SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Private Property: No Trespassing
   Hadoop Security Explained




        Aaron T. Myers
      atm@cloudera.com
            @atm
Who am I?




• Aaron T. Myers – Software Engineer, Cloudera
• Hadoop HDFS, Common Committer
• Masters thesis on security sandboxing in Linux kernel
• Primarily works on the Core Platform Team
Outline

• Hadoop Security Overview
 • Hadoop Security pre CDH3
 • Hadoop Security with CDH3
• Details of Deploying Secure Hadoop
• Summary
Hadoop Security: Overview
Why do we care about security?
• SecureCommerceWebSite, Inc has a product that has both
   paid ads and search

• “Payment Fraud” team needs logs of all credit card
   payments

• “Search Quality” team needs all search logs and click
   history

• “Ads Fraud” team needs to access both search logs and
   payment info
  •   So we can't segregate these datasets to different clusters

• If they can share a cluster, we also get better utilization!
Security pre CDH3: User Authentication



• Authentication is by vigorous assertion
• Trivial to impersonate other user:
 • Just set property “hadoop.job.ugi” when
    running job or command
• Group resolution is done client side
Security pre CDH3: Server Authentication




                None
Security pre CDH3: HDFS


• Unix-like file permissions were introduced in
  Hadoop v16.1
• Provides standard user/group/other r/w/x
• Protects well-meaning users from accidents
• Does nothing to prevent malicious users from
  causing harm (weak authentication)
Security pre CDH3: Job Control



• ACLs per job queue for job submission / killing
• No ACLs for viewing counters / logs
• Does nothing to prevent malicious users from
  causing harm (weak authentication)
Security pre CDH3: Tasks


• Individual tasks all run as the same user
 • Whoever the TT is running as (usually 'hadoop')
• Tasks not isolated from each other
 • Tasks which read/write from local storage can
    interfere with each other
 • Malicious tasks can kill each other
• Hadoop is designed to execute arbitrary code
Security pre CDH3: Web interfaces




             None
Security with CDH3: User Authentication

• Authentication is secured by Kerberos v5
 • RPC connections secured with SASL “GSSAPI”
    mechanism
  • Provides proven, strong authentication and
    single-sign-on
• Hadoop servers can ensure that users are who
  they say they are
• Group resolution is done on the server side
Security with CDH3: Server Authentication




• Kerberos authentication is bi-directional
• Users can be sure that they are communicating
  with the Hadoop server they think they are
Security with CDH3: HDFS




• Same general permissions model
 • Added sticky bit for directories (e.g. /tmp)
• But, a user can no longer trivially impersonate
  other users (strong authentication)
Security with CDH3: Job Control



• A job now has its own ACLs, including a view ACL
• Job can now specify who can view logs, counters,
  configuration, and who can modify (kill) it
• JT enforces these ACLs (strong authentication)
Security with CDH3: Tasks

• Tasks now run as the user who launched the job
 • Probably the most complex part of Hadoop's
    security implementation
• Ensures isolation of tasks which run on the same TT
 • Local file permissions enforced
 • Local system permissions enforced (e.g. signals)
• Can take advantage of per-user system limits
 • e.g. Linux ulimits
Security with CDH3: Web Interfaces



• Out of the box Kerberized SSL support
• Pluggable servlet filters (more on this later)
Security with CDH3: Threat Model


• The Hadoop security system assumes that:
 • Users do not have root access to cluster
    machines
 • Users do not have root access to shared user
    machines (e.g. bastion box)
 • Users cannot read or inject packets on the
    network
Thanks, Yahoo!




Yahoo! did the vast majority of the
   core Hadoop security work
Hadoop Security:
Deployment Details
Requirements: Kerberos Infrastructure

• Kerberos domain (KDC)
 • eg. MIT Krb5 in RHEL, or MS Active Directory
• Kerberos principals (SPNs) for every daemon
 • hdfs/hostname@REALM for DN, NN, 2NN
 • mapred/hostname@REALM for TT and JT
 • host/hostname@REALM for web UIs
• Keytabs for service principals distributed to
  correct hosts
Configuring daemons for security

• Most daemons have two configs:
 • Keytab location (eg dfs.datanode.keytab.file)
 • Kerberos principal (eg dfs.datanode.kerberos.principal)
• Principal can use the special token '_HOST' to substitute
  hostname of the daemon (eg 'hdfs/_HOST@MYREALM')

• Several other configs to enable security in the first place
 • See example-confs/conf.secure in CDH3
Setting up users
• Each user must have a Kerberos principal
• May want some shared accounts:
 • sharedaccount/alice and sharedaccount/bob
    principals both act as sharedaccount on HDFS - you
    can use this!

  • hdfs/alice is also useful for alice to act as a superuser
• Users running MR jobs must also have unix accounts on
  each of the slaves

• Centralized user database (eg LDAP) is a practical
  necessity
Installing Secure Hadoop

• MapReduce and HDFS services should run as
  separate users (e.g. 'hdfs' and 'mapred')
• New task-controller setuid executable allows
  tasks to run as a user
• New JNI code in libhadoop.so to plug subtle
  security holes
• Install CDH3 with hadoop-0.20-sbin and hadoop-
  0.20-native packages to get this all set up
Securing higher-level services
• Many “middle tier” applications need to act on
  behalf of their clients when interacting with
  Hadoop
  • e.g: Oozie, Hive Server, Hue/Beeswax
• “Proxy User” feature provides secure
  impersonation (think sudo).
  • hadoop.proxyuser.oozie.hosts - IPs where
    “oozie” may act as an impersonator
  • hadoop.proxyuser.oozie.groups - groups whose
    users “oozie” may impersonate
Customizing Security

• Current plug-in points:
 • hadoop.http.filter.initializers - may configure a
    custom ServletFilter to integrate with existing
    enterprise web SSO
  • hadoop.security.group.mapping - map a
    kerberos principal (alice@FOOCORP.COM) to a
    set of groups
    (users,engstaff,searchquality,adsdata)
  • hadoop.security.auth_to_local - regex
    mappings of Kerberos principals to usernames
Deployment Gotchas

• MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+)
  incompatible with Java Krb5 implementation
  • Run “kinit -R” after kinit to work around
• Enable allow_weak_crypto in /etc/krb5.conf -
  necessary for kerberized SSL
• Must deploy “unlimited security policy JAR” in
  JAVA_HOME/jre/lib/security
• Lifesaver: HADOOP_OPTS=
    ”-Dsun.security.krb5.debug=true” hadoop ...
Best Practices for AD Integration

• MIT Kerberos realm inside cluster:
 • CLUSTER.FOOCORP.COM
• Existing Active Directory domain:
 • FOOCORP.COM or maybe AD.FOOCORP.COM
• Set up one-way cross-realm trust
 • Cluster realm must trust corporate AD realm
 • See “Step by Step Guide to Kerberos 5
    Interoperability” in Windows Server docs
Hadoop Security:
   Summary
What Hadoop Security Is


• Strong authentication
 • Malicious impersonation now impossible
• Better authorization
 • More control over who can view/control jobs
• Ensure isolation between running tasks
• An ongoing development priority
What Hadoop Security Is Not



• Encryption on the wire
• Encryption on disk
• Protection against DOS attacks
• Enabled by default
Security Beyond Core Hadoop

• Comprehensive documentation and best
  practices
  •   https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide

• All components of CDH3 are capable of
  interacting with a secure Hadoop cluster
• Hive 0.7 (included in CDH3) added a rich set of
  access controls
• Much easier deployment if you use Cloudera
  Enterprise
Security Roadmap


• Pluggable “edge authentication” (eg PKI, SAML)
• More authorization features across CDH
  components
 • e.g. HBase access controls
• Data encryption support
Questions?



  Aaron T. Myers
atm@cloudera.com
      @atm

Weitere ähnliche Inhalte

Was ist angesagt?

Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
DataWorks Summit
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
Tushar Dudhatra
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

Was ist angesagt? (20)

Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 

Ähnlich wie Hadoop Security: Overview

Ähnlich wie Hadoop Security: Overview (20)

Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platform
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Head in the clouds
Head in the cloudsHead in the clouds
Head in the clouds
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Road to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsRoad to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoops
 
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Hadoop Security: Overview

  • 1. Private Property: No Trespassing Hadoop Security Explained Aaron T. Myers atm@cloudera.com @atm
  • 2. Who am I? • Aaron T. Myers – Software Engineer, Cloudera • Hadoop HDFS, Common Committer • Masters thesis on security sandboxing in Linux kernel • Primarily works on the Core Platform Team
  • 3. Outline • Hadoop Security Overview • Hadoop Security pre CDH3 • Hadoop Security with CDH3 • Details of Deploying Secure Hadoop • Summary
  • 5. Why do we care about security? • SecureCommerceWebSite, Inc has a product that has both paid ads and search • “Payment Fraud” team needs logs of all credit card payments • “Search Quality” team needs all search logs and click history • “Ads Fraud” team needs to access both search logs and payment info • So we can't segregate these datasets to different clusters • If they can share a cluster, we also get better utilization!
  • 6. Security pre CDH3: User Authentication • Authentication is by vigorous assertion • Trivial to impersonate other user: • Just set property “hadoop.job.ugi” when running job or command • Group resolution is done client side
  • 7. Security pre CDH3: Server Authentication None
  • 8. Security pre CDH3: HDFS • Unix-like file permissions were introduced in Hadoop v16.1 • Provides standard user/group/other r/w/x • Protects well-meaning users from accidents • Does nothing to prevent malicious users from causing harm (weak authentication)
  • 9. Security pre CDH3: Job Control • ACLs per job queue for job submission / killing • No ACLs for viewing counters / logs • Does nothing to prevent malicious users from causing harm (weak authentication)
  • 10. Security pre CDH3: Tasks • Individual tasks all run as the same user • Whoever the TT is running as (usually 'hadoop') • Tasks not isolated from each other • Tasks which read/write from local storage can interfere with each other • Malicious tasks can kill each other • Hadoop is designed to execute arbitrary code
  • 11. Security pre CDH3: Web interfaces None
  • 12. Security with CDH3: User Authentication • Authentication is secured by Kerberos v5 • RPC connections secured with SASL “GSSAPI” mechanism • Provides proven, strong authentication and single-sign-on • Hadoop servers can ensure that users are who they say they are • Group resolution is done on the server side
  • 13. Security with CDH3: Server Authentication • Kerberos authentication is bi-directional • Users can be sure that they are communicating with the Hadoop server they think they are
  • 14. Security with CDH3: HDFS • Same general permissions model • Added sticky bit for directories (e.g. /tmp) • But, a user can no longer trivially impersonate other users (strong authentication)
  • 15. Security with CDH3: Job Control • A job now has its own ACLs, including a view ACL • Job can now specify who can view logs, counters, configuration, and who can modify (kill) it • JT enforces these ACLs (strong authentication)
  • 16. Security with CDH3: Tasks • Tasks now run as the user who launched the job • Probably the most complex part of Hadoop's security implementation • Ensures isolation of tasks which run on the same TT • Local file permissions enforced • Local system permissions enforced (e.g. signals) • Can take advantage of per-user system limits • e.g. Linux ulimits
  • 17. Security with CDH3: Web Interfaces • Out of the box Kerberized SSL support • Pluggable servlet filters (more on this later)
  • 18. Security with CDH3: Threat Model • The Hadoop security system assumes that: • Users do not have root access to cluster machines • Users do not have root access to shared user machines (e.g. bastion box) • Users cannot read or inject packets on the network
  • 19. Thanks, Yahoo! Yahoo! did the vast majority of the core Hadoop security work
  • 21. Requirements: Kerberos Infrastructure • Kerberos domain (KDC) • eg. MIT Krb5 in RHEL, or MS Active Directory • Kerberos principals (SPNs) for every daemon • hdfs/hostname@REALM for DN, NN, 2NN • mapred/hostname@REALM for TT and JT • host/hostname@REALM for web UIs • Keytabs for service principals distributed to correct hosts
  • 22. Configuring daemons for security • Most daemons have two configs: • Keytab location (eg dfs.datanode.keytab.file) • Kerberos principal (eg dfs.datanode.kerberos.principal) • Principal can use the special token '_HOST' to substitute hostname of the daemon (eg 'hdfs/_HOST@MYREALM') • Several other configs to enable security in the first place • See example-confs/conf.secure in CDH3
  • 23. Setting up users • Each user must have a Kerberos principal • May want some shared accounts: • sharedaccount/alice and sharedaccount/bob principals both act as sharedaccount on HDFS - you can use this! • hdfs/alice is also useful for alice to act as a superuser • Users running MR jobs must also have unix accounts on each of the slaves • Centralized user database (eg LDAP) is a practical necessity
  • 24. Installing Secure Hadoop • MapReduce and HDFS services should run as separate users (e.g. 'hdfs' and 'mapred') • New task-controller setuid executable allows tasks to run as a user • New JNI code in libhadoop.so to plug subtle security holes • Install CDH3 with hadoop-0.20-sbin and hadoop- 0.20-native packages to get this all set up
  • 25. Securing higher-level services • Many “middle tier” applications need to act on behalf of their clients when interacting with Hadoop • e.g: Oozie, Hive Server, Hue/Beeswax • “Proxy User” feature provides secure impersonation (think sudo). • hadoop.proxyuser.oozie.hosts - IPs where “oozie” may act as an impersonator • hadoop.proxyuser.oozie.groups - groups whose users “oozie” may impersonate
  • 26. Customizing Security • Current plug-in points: • hadoop.http.filter.initializers - may configure a custom ServletFilter to integrate with existing enterprise web SSO • hadoop.security.group.mapping - map a kerberos principal (alice@FOOCORP.COM) to a set of groups (users,engstaff,searchquality,adsdata) • hadoop.security.auth_to_local - regex mappings of Kerberos principals to usernames
  • 27. Deployment Gotchas • MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+) incompatible with Java Krb5 implementation • Run “kinit -R” after kinit to work around • Enable allow_weak_crypto in /etc/krb5.conf - necessary for kerberized SSL • Must deploy “unlimited security policy JAR” in JAVA_HOME/jre/lib/security • Lifesaver: HADOOP_OPTS= ”-Dsun.security.krb5.debug=true” hadoop ...
  • 28. Best Practices for AD Integration • MIT Kerberos realm inside cluster: • CLUSTER.FOOCORP.COM • Existing Active Directory domain: • FOOCORP.COM or maybe AD.FOOCORP.COM • Set up one-way cross-realm trust • Cluster realm must trust corporate AD realm • See “Step by Step Guide to Kerberos 5 Interoperability” in Windows Server docs
  • 29. Hadoop Security: Summary
  • 30. What Hadoop Security Is • Strong authentication • Malicious impersonation now impossible • Better authorization • More control over who can view/control jobs • Ensure isolation between running tasks • An ongoing development priority
  • 31. What Hadoop Security Is Not • Encryption on the wire • Encryption on disk • Protection against DOS attacks • Enabled by default
  • 32. Security Beyond Core Hadoop • Comprehensive documentation and best practices • https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide • All components of CDH3 are capable of interacting with a secure Hadoop cluster • Hive 0.7 (included in CDH3) added a rich set of access controls • Much easier deployment if you use Cloudera Enterprise
  • 33. Security Roadmap • Pluggable “edge authentication” (eg PKI, SAML) • More authorization features across CDH components • e.g. HBase access controls • Data encryption support
  • 34. Questions? Aaron T. Myers atm@cloudera.com @atm