SlideShare a Scribd company logo
1 of 24
Big Data Security
    Joey Echeverria | Principal Solutions Architect
    joey@cloudera.com | @fwiffo




1                               ©2013 Cloudera, Inc.
Big Data Security




     EARLY DAYS




2
Hadoop File Permissions

    •   Added in HADOOP-1298
        •   Hadoop 0.16
        •   Early 2008
    • Authorization without authentication
    • POSIX-like RWX bits




3
MapReduce ACLs

    •   Added in HADOOP-3698
        •   Hadoop 0.19
        •   Late 2008
    • ACLs per job queue
    • Set a list of allowed users or groups per operation
        •   Job submission
        •   Job administration
    •   No authentication



4
Securing a Cluster Through a Gateway

    • Hadoop cluster runs on a private network
    • Gateway server dual-homed (Hadoop network and
      public network)
    • Users SSH onto gateway
        •   Optionally can create an SSH proxy for jobs to be
            submitted from the client machine
    •   Provides minimum level of protection




5
Big Data Security




     WHY SECURITY MATTERS




6
Prevent Accidental Access

    • Don’t let users shoot themselves in the foot
    • Main driver for early features
    • Not security per-se, but a critical first step
    • Doesn’t require strong authentication




7
Stop Malicious Users

    • Early features were necessary, but not sufficient
    • Security has to get real
    • Hadoop runs arbitrary code
    • Implicit trust doesn’t prevent the insider threat




8
Co-mingle All Your Data

    • Often overlooked
    • Big data means getting rid of stovepipes
        •   Scalability and flexibility are only 50% of the problem
        •   Trust your data in a multi-tenant environment
    •   Most critical driver




9
Big Data Security




      AN EVOLVING STORY




10
Authorization

     • Files
     • MapReduce/YARN job queues
     • Service-level authorization
         •   Whitelists and blacklists of hosts and users




11
Authentication
                 2.2 High Level Use Cases                                                  2 USE CASES
     •   HADOOP-4487
         •   Hadoop 0.22evel U0.20.205
                2.2 H igh L
                                   and se Cases
                  1. A ppl icat i ons accessing fi les on H D F S cl ust er s Non-MapReduce ap-
         •   Late 2010ions, including hadoop fs, access files st ored on one or more HDFS
                     plicat
                      clust ers. T he applicat ion should only be able t o access files and services
     •   Based on Kerberos and internal delegation tokens
                      t hey are aut horized t o access. See figure 1. Variat ions:

                       (a) Access HDFS direct ly using HDFS prot ocol.
         •   Provides strong user authentication servers via t he HFT P
                    (b) Access HDFS indirect ly t hough HDFS proxy
                        FileSyst em or HT T P get .
         •   Also used for service-to-service authentication
                                                    Name
                                                               delg(jo
                                         (joe)      Node               e
                                    kerb                                   )
                                                                                    MapReduce
                     Application
                                                       kerb(hdfs)                      Task
                                   bloc                                     e   n
                                          k to
                                              ken                       tok
                                                                   ck
                                                     Data      blo
                                                     Node



                                          Figure 1: HDFS High-level Dat aflow
12
Encryption

     •   Over the wire encryption for some socket
         connections
     •   RPC encryption added soon after Kerberos
     •   Shuffle encryption (HTTPS) added in Hadoop 2.0.2-
         alpha, back ported to CDH4 MR1
     •   HDFS block streamer encryption added in Hadoop
         2.0.2-alpha
     •   Volume-level encryption for data at rest



13
Big Data Security




      SECURITY FOR KEY VALUE STORES




14
Apache Accumulo

     •   Robust, scalable, high performance data storage and
         retrieval system
     •   Built by NSA, now an Apache project
     •   Based on Google’s BigTable
     •   Built on top of HDFS, ZooKeeper and Thrift
     •   Iterators for server-side extensions
     •   Cell labels for flexible security models




15
Data Model

     • Multi-dimensional, persistent, sorted map
     • Key/Value store with a twist
     • A single primary key (Row ID)
     • Secondary key (Column) internal to a row
         •   Family
         •   Qualifier
     •   Per-cell timestamp




16
Cell-Level Security

     • Labels stored per cell
     • Labels consist of Boolean expressions
       (AND, OR, nesting)
     • Labels associated with each user
     • Cell labels checked against user’s labels with a built-
       in iterator




17
Pluggable Authentication

     • Currently supports username/password
       authentication backed by ZooKeeper
     • ACCUMULO-259
         •   Targeted for Accumulo 1.5.0
     • Authentication info replaced with generic tokens
     • Supports multiple implementations (e.g. Kerberos)




18
Application Level

     • Accumulo often paired with application level
       authentication/authorization
     • Accumulo users created per application
     • Each application granted access level of most
       permitted user
     • Application authenticates users, grabs user
       authorizations, passes user labels with requests




19
Apache HBase

     •   Also based on Google’s BigTable
     •   Started as a Hadoop contrib project
     •   Supports column-level ACLs
     •   Kerberos for authentication
     •   Discussion and early prototypes of cell-level security
         ongoing




20
Big Data Security




      FUTURE




21
Encryption for Data at Rest

     • Need multiple levels of granularity
     • Encryption keys tied to authorization labels (like
       Accumulo labels or HBase ACLs)
     • APIs for file-level, block-level, or record-level
       encryption




22
Hive Security

     • Column-level ACLs
     • Kerberos authentication
     • AccessServer




23
24   ©2013 Cloudera, Inc.

More Related Content

What's hot

IOT privacy and Security
IOT privacy and SecurityIOT privacy and Security
IOT privacy and Securitynoornabi16
 
Wireless security presentation
Wireless security presentationWireless security presentation
Wireless security presentationMuhammad Zia
 
IoT Security Challenges and Solutions
IoT Security Challenges and SolutionsIoT Security Challenges and Solutions
IoT Security Challenges and SolutionsIntel® Software
 
Image encryption and decryption
Image encryption and decryptionImage encryption and decryption
Image encryption and decryptionAashish R
 
Intro to modern cryptography
Intro to modern cryptographyIntro to modern cryptography
Intro to modern cryptographyzahid-mian
 
block ciphers
block ciphersblock ciphers
block ciphersAsad Ali
 
Topics in network security
Topics in network securityTopics in network security
Topics in network securityNasir Bhutta
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityOlivier Busolini
 
Fog computing technology
Fog computing technologyFog computing technology
Fog computing technologyNikhil Sabu
 
Guide to industrial control systems (ics) security
Guide to industrial control systems (ics) securityGuide to industrial control systems (ics) security
Guide to industrial control systems (ics) securityericv83
 
Stream ciphers presentation
Stream ciphers presentationStream ciphers presentation
Stream ciphers presentationdegarden
 
Ip spoofing ppt
Ip spoofing pptIp spoofing ppt
Ip spoofing pptAnushakp9
 
Cloud Computing Security Challenges
Cloud Computing Security ChallengesCloud Computing Security Challenges
Cloud Computing Security ChallengesYateesh Yadav
 

What's hot (20)

IOT privacy and Security
IOT privacy and SecurityIOT privacy and Security
IOT privacy and Security
 
Cloud security
Cloud securityCloud security
Cloud security
 
Wireless security presentation
Wireless security presentationWireless security presentation
Wireless security presentation
 
IoT Security Challenges and Solutions
IoT Security Challenges and SolutionsIoT Security Challenges and Solutions
IoT Security Challenges and Solutions
 
Image encryption and decryption
Image encryption and decryptionImage encryption and decryption
Image encryption and decryption
 
Iot Security
Iot SecurityIot Security
Iot Security
 
Intro to modern cryptography
Intro to modern cryptographyIntro to modern cryptography
Intro to modern cryptography
 
block ciphers
block ciphersblock ciphers
block ciphers
 
Topics in network security
Topics in network securityTopics in network security
Topics in network security
 
Cloud, Fog & Edge Computing
Cloud, Fog & Edge ComputingCloud, Fog & Edge Computing
Cloud, Fog & Edge Computing
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and Cybersecurity
 
Fog Computing
Fog ComputingFog Computing
Fog Computing
 
Fog computing technology
Fog computing technologyFog computing technology
Fog computing technology
 
IoT & Smart City
IoT & Smart CityIoT & Smart City
IoT & Smart City
 
A survey in privacy and security in Internet of Things IOT
A survey in privacy and security in Internet of Things IOTA survey in privacy and security in Internet of Things IOT
A survey in privacy and security in Internet of Things IOT
 
Fog Computing
Fog ComputingFog Computing
Fog Computing
 
Guide to industrial control systems (ics) security
Guide to industrial control systems (ics) securityGuide to industrial control systems (ics) security
Guide to industrial control systems (ics) security
 
Stream ciphers presentation
Stream ciphers presentationStream ciphers presentation
Stream ciphers presentation
 
Ip spoofing ppt
Ip spoofing pptIp spoofing ppt
Ip spoofing ppt
 
Cloud Computing Security Challenges
Cloud Computing Security ChallengesCloud Computing Security Challenges
Cloud Computing Security Challenges
 

Viewers also liked

Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Coastal Pet Products, Inc.
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Miningwanani181
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Big data security the perfect storm
Big data security   the perfect stormBig data security   the perfect storm
Big data security the perfect stormUlf Mattsson
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data scienceMahesh Kumar CV
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...CA API Management
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy IndustryPaige Bailey
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigData_Europe
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 

Viewers also liked (19)

Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title) Big Data, Security Intelligence, (And Why I Hate This Title)
Big Data, Security Intelligence, (And Why I Hate This Title)
 
Information Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data MiningInformation Security in Big Data : Privacy and Data Mining
Information Security in Big Data : Privacy and Data Mining
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big data security the perfect storm
Big data security   the perfect stormBig data security   the perfect storm
Big data security the perfect storm
 
Big data Overview
Big data OverviewBig data Overview
Big data Overview
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
Balancing Mobile UX & Security: An API Management Perspective Presentation fr...
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
"Big Data" in the Energy Industry
"Big Data" in the Energy Industry"Big Data" in the Energy Industry
"Big Data" in the Energy Industry
 
BigDataEurope - Big Data & Energy
BigDataEurope - Big Data & EnergyBigDataEurope - Big Data & Energy
BigDataEurope - Big Data & Energy
 
Add
AddAdd
Add
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Open-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit ReportOpen-BDA Hadoop Summt 2014 - Post Summit Report
Open-BDA Hadoop Summt 2014 - Post Summit Report
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 

Similar to Big data security

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark ApplicationsCloudera, Inc.
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 

Similar to Big data security (20)

Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 

More from Joey Echeverria

Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applicationsJoey Echeverria
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streamsJoey Echeverria
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityJoey Echeverria
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and ClouderaJoey Echeverria
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoopJoey Echeverria
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use casesJoey Echeverria
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itchJoey Echeverria
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real worldJoey Echeverria
 

More from Joey Echeverria (12)

Debugging Apache Spark
Debugging Apache SparkDebugging Apache Spark
Debugging Apache Spark
 
Building production spark streaming applications
Building production spark streaming applicationsBuilding production spark streaming applications
Building production spark streaming applications
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 
The Future of Apache Hadoop Security
The Future of Apache Hadoop SecurityThe Future of Apache Hadoop Security
The Future of Apache Hadoop Security
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
 
Analyzing twitter data with hadoop
Analyzing twitter data with hadoopAnalyzing twitter data with hadoop
Analyzing twitter data with hadoop
 
Hadoop in three use cases
Hadoop in three use casesHadoop in three use cases
Hadoop in three use cases
 
Scratching your own itch
Scratching your own itchScratching your own itch
Scratching your own itch
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop and h base in the real world
Hadoop and h base in the real worldHadoop and h base in the real world
Hadoop and h base in the real world
 

Big data security

  • 1. Big Data Security Joey Echeverria | Principal Solutions Architect joey@cloudera.com | @fwiffo 1 ©2013 Cloudera, Inc.
  • 2. Big Data Security EARLY DAYS 2
  • 3. Hadoop File Permissions • Added in HADOOP-1298 • Hadoop 0.16 • Early 2008 • Authorization without authentication • POSIX-like RWX bits 3
  • 4. MapReduce ACLs • Added in HADOOP-3698 • Hadoop 0.19 • Late 2008 • ACLs per job queue • Set a list of allowed users or groups per operation • Job submission • Job administration • No authentication 4
  • 5. Securing a Cluster Through a Gateway • Hadoop cluster runs on a private network • Gateway server dual-homed (Hadoop network and public network) • Users SSH onto gateway • Optionally can create an SSH proxy for jobs to be submitted from the client machine • Provides minimum level of protection 5
  • 6. Big Data Security WHY SECURITY MATTERS 6
  • 7. Prevent Accidental Access • Don’t let users shoot themselves in the foot • Main driver for early features • Not security per-se, but a critical first step • Doesn’t require strong authentication 7
  • 8. Stop Malicious Users • Early features were necessary, but not sufficient • Security has to get real • Hadoop runs arbitrary code • Implicit trust doesn’t prevent the insider threat 8
  • 9. Co-mingle All Your Data • Often overlooked • Big data means getting rid of stovepipes • Scalability and flexibility are only 50% of the problem • Trust your data in a multi-tenant environment • Most critical driver 9
  • 10. Big Data Security AN EVOLVING STORY 10
  • 11. Authorization • Files • MapReduce/YARN job queues • Service-level authorization • Whitelists and blacklists of hosts and users 11
  • 12. Authentication 2.2 High Level Use Cases 2 USE CASES • HADOOP-4487 • Hadoop 0.22evel U0.20.205 2.2 H igh L and se Cases 1. A ppl icat i ons accessing fi les on H D F S cl ust er s Non-MapReduce ap- • Late 2010ions, including hadoop fs, access files st ored on one or more HDFS plicat clust ers. T he applicat ion should only be able t o access files and services • Based on Kerberos and internal delegation tokens t hey are aut horized t o access. See figure 1. Variat ions: (a) Access HDFS direct ly using HDFS prot ocol. • Provides strong user authentication servers via t he HFT P (b) Access HDFS indirect ly t hough HDFS proxy FileSyst em or HT T P get . • Also used for service-to-service authentication Name delg(jo (joe) Node e kerb ) MapReduce Application kerb(hdfs) Task bloc e n k to ken tok ck Data blo Node Figure 1: HDFS High-level Dat aflow 12
  • 13. Encryption • Over the wire encryption for some socket connections • RPC encryption added soon after Kerberos • Shuffle encryption (HTTPS) added in Hadoop 2.0.2- alpha, back ported to CDH4 MR1 • HDFS block streamer encryption added in Hadoop 2.0.2-alpha • Volume-level encryption for data at rest 13
  • 14. Big Data Security SECURITY FOR KEY VALUE STORES 14
  • 15. Apache Accumulo • Robust, scalable, high performance data storage and retrieval system • Built by NSA, now an Apache project • Based on Google’s BigTable • Built on top of HDFS, ZooKeeper and Thrift • Iterators for server-side extensions • Cell labels for flexible security models 15
  • 16. Data Model • Multi-dimensional, persistent, sorted map • Key/Value store with a twist • A single primary key (Row ID) • Secondary key (Column) internal to a row • Family • Qualifier • Per-cell timestamp 16
  • 17. Cell-Level Security • Labels stored per cell • Labels consist of Boolean expressions (AND, OR, nesting) • Labels associated with each user • Cell labels checked against user’s labels with a built- in iterator 17
  • 18. Pluggable Authentication • Currently supports username/password authentication backed by ZooKeeper • ACCUMULO-259 • Targeted for Accumulo 1.5.0 • Authentication info replaced with generic tokens • Supports multiple implementations (e.g. Kerberos) 18
  • 19. Application Level • Accumulo often paired with application level authentication/authorization • Accumulo users created per application • Each application granted access level of most permitted user • Application authenticates users, grabs user authorizations, passes user labels with requests 19
  • 20. Apache HBase • Also based on Google’s BigTable • Started as a Hadoop contrib project • Supports column-level ACLs • Kerberos for authentication • Discussion and early prototypes of cell-level security ongoing 20
  • 21. Big Data Security FUTURE 21
  • 22. Encryption for Data at Rest • Need multiple levels of granularity • Encryption keys tied to authorization labels (like Accumulo labels or HBase ACLs) • APIs for file-level, block-level, or record-level encryption 22
  • 23. Hive Security • Column-level ACLs • Kerberos authentication • AccessServer 23
  • 24. 24 ©2013 Cloudera, Inc.