Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Hadoop & Security - Past, Present, Future

1.987 Aufrufe

Veröffentlicht am

A comprehensive overview of the security concepts in the open source Hadoop stack in mid 2015 with a look back into the "old days" and an outlook into future developments.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Hadoop & Security - Past, Present, Future

  1. 1. Hadoop & Security Past, Present, Future uweseiler
  2. 2. Page2 About me Big Data Nerd TravelpiratePhotography Enthusiast Hadoop TrainerData Architect
  3. 3. Page3 Agenda Past Present Authentification Authorization Auditing Data Protection Future
  4. 4. Page4 Past
  5. 5. Page5 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  6. 6. Page6 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  7. 7. Page7 Hadoop & Security (Not that long ago…) Hadoop Cluster User SSH hadoop fs -put SSH Gateway /user/uwe/
  8. 8. Page8 Present
  9. 9. Page9 Security in Hadoop 2015 Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion • Kerberos in Native Apache Hadoop • HTTP/REST API Secured with Apache Knox Gateway Authentication Who am I/prove it? • Wire encryption in Hadoop • File Encryption • Built-in since Hadoop 2.6 • Partner tools • HDFS, YARN, MapReduce, Hive & HBase • Storm & Knox • Fine grain access control • Centralized audit reporting • Policy and access history Centralized Security Administration
  10. 10. Page10 Typical Flow - Hive Access with Beeline CLI HDFS HiveServer 2 A B C Beeline Client
  11. 11. Page11 Typical Flow - Authenticate trough Kerberos HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive
  12. 12. Page12 Typical Flow - Authorization through Ranger HDFS HiveServer 2 A B C Beeline Client KDC Use Hive, submit query Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Client gets Service Ticket for Hive Ranger
  13. 13. Page13 Typical Flow - Perimeter through Knox HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Knox gets Service Ticket for Hive Ranger Client gets query result Original request with user id/password Knox runs as proxy user using Hive Hive creates MapReduce/Tez job using NN
  14. 14. Page14 Typical Flow - Wire & File Encryption HDFS HiveServer 2 A B C Beeline Client KDC Hive gets NameNode (NN) Service Ticket Hive creates MapReduce/Tez job using NN Knox gets Service Ticket for Hive Ranger Knox runs as proxy user using Hive Original request with user id/password Client gets query result SSL SSL SASL SSL SSL
  15. 15. Page15 Authentication Kerberos
  16. 16. Page16 Kerberos Synopsis • Client never sends a password • Sends a username + token instead • Authentication is centralized • Key Distribution Center (KDC) • Client will receive a Ticket- Granting-Ticket • Allows authenticated client to request access to secured services • Clients establish a timed session • Clients establish trust with services by sending KDC- stamped tickets to service
  17. 17. Page17 Kerberos + Active Directory/LDAP Cross Realm Trust Client Hadoop Cluster AD / LDAP KDC Hosts: host1@HADOOP.EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Use existing directory tools to manage users Use Kerberos tools to manage host + service principals Authentication Users: seiler@EXAMPLE.COM
  18. 18. Page18 Ambari & Kerberos • Install & Configure Kerberos Server on a single node Client on rest of the nodes • Define Principals & Keytabs A keytab (key table) is a file containing a key for a principal Since there are a few dozen principals, Ambari can generate keytab data for your entire cluster as a downloadable csv file • Configure User Permissions
  19. 19. Page19 Perimeter Security Apache Knox
  20. 20. Page20 Load Balancer Knox: Core Concept Data Ingest ETL SSH RPC Call Falcon Oozie Scoop Flume Admin / Data Operator Business User Hadoop Admin JDBC/ODBCREST/HTTP Hadoop Cluster HDFS Hive App XApp CApplication Layer REST/HTTP Edge Node
  21. 21. Page21 Knox: Hadoop REST API Service Direct URL Knox URL WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie HBase http://hbasehost:60080 https://knox-host:8443/hbase Hive http://hivehost:10001/cliservice https://knox-host:8443/hive YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager Masters could be on many different hosts One host, one port Consistent paths SSL config at one host
  22. 22. Page22 Knox: Features Simplified Access • Kerberos Encapsulation • Single Access Point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP / AD integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • SSL for non-SSL services • WebApp vulnerability filter
  23. 23. Page23 Knox: Architecture REST Client Enterprise Identity Provider Knox Firewall Firewall DMZ L B Edge Node / Hadoop CLIs RPC HTTP Slaves RM NN Web HCat Oozie DN NM HS2 HBase Knox Knox Masters Slaves Hadoop Cluster 1 Slaves RM NN Web HCat Oozie DN NM HS2 HBaseMasters Slaves Hadoop Cluster 2
  24. 24. Page24 Knox: What’s New in Version 0.6 • Knox support for HDFS HA • Support for YARN REST API • Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie) • Knox Management REST API • Integration with Ranger for Knox Service Level Authorization • Use Ambari for install/start/stop/configuration
  25. 25. Page3 Agenda Past Present Authentification Authorization Auditing Data Protection Future
  26. 26. Page26 The Hadoop Layers
  27. 27. Page27 Authorization: Overview • HDFS • Permissions • ACL‘s • YARN • Queue ACL‘s • Pig • No server component to check/enforce ACL‘s • Hive • Column level ACL‘s • HBase • Cell level ACL‘s
  28. 28. Page28 Authorization: HDFS Permissions hadoop fs -chown maya:sales /sales-data hadoop fs -chmod 640 /sales-data
  29. 29. Page29 Authorization: HDFS ACL‘s New Requirements: – Maya, Diana and Clark are allowed to make modifications – New group execs should be able to read the sales data
  30. 30. Page30 Authorization: HDFS ACL‘s hdfs dfs -setfacl -m group:execs:r-- /sales-data hdfs dfs -getfacl /sales-data hadoop fs -ls /sales-data
  31. 31. Page31 Authorization: HDFS Best Practices •Start with traditional HDFS file permissions to implement most permission requirements • Define a small number of ACL‘s to handle exceptional cases •A file/folder with ACL incurs an additional cost in memory in the NameNode compared to a file/folder with traditional permissions
  32. 32. Page4 Past
  33. 33. Page33 Authorization: Hive • Hive has traditionally offered full table access control via HDFS access control • Solution for column-based control – Let HiveServer2 check and submit the query execution – Let the table accessible only by a special (technical) user – Provide an authorization plugin to restrict UDF‘s and file formats • Use standard SQL permission constructs – GRANT / REVOKE • Store the ACL‘s in Hive Metastore
  34. 34. Page34 Authorization: Hive ATZ-NG Details: https://issues.apache.org/jira/browse/HIVE-5837
  35. 35. Page35 Authorization: Hive CREATE ROLE sales_role; GRANT ALL ON DATABASE ‘sales-data’ TO ROLE ‘sales_role’; GRANT SELECT ON DATABASE ‘marketing-data’ TO ROLE ‘sales_role’; CREATE ROLE sales_column_role; GRANT ‘c1,c2,c3’ to ‘sales_column_role’; GRANT ‘SELECT(c1, c2, c3) ’ on ‘secret_table’ to ‘sales_column_role’;
  36. 36. Page36 Authorization: Pig • There is no Pig (or MapReduce) Server to submit and check column-based access • Pig (and MapReduce) is restricted to full data access via HDFS access control
  37. 37. Page37 Authorization: HBase • The HBase permission model traditionally supported ACL‘s defined at the namespace, table , column family and column level – This is sufficient to meet most requirements • Cell-based security was introduced with HBase 0.98 – On par with the security model of Accumolo
  38. 38. Page38 Authorization & Auditing Apache Ranger
  39. 39. Page5 Hadoop & Security 2010 Owen O‘Malley @ Hadoop Summit 2010 http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
  40. 40. Page40 Ranger: Authorization Policies
  41. 41. Page41 Ranger: Auditing
  42. 42. Page42 Ranger: Architecture
  43. 43. Page43 Ranger: What’s New in Version 0.4? • New Components Coverage • Storm Authorization & Auditing • Knox Authorization & Auditing • Deeper Integration with HDP • Windows Support • Integration with Hive Auth API, support grant/revoke commands • Support grant/revoke commands in HBase • Enterprise Readiness • Rest APIs for policy manager • Store Audit logs locally in HDFS • Support Oracle DB • Ambari support, as part of Ambari 2.0 release
  44. 44. Page44 Data Protection Encryption
  45. 45. Page45 Encryption: Data in motion • Hadoop Client to DataNode via Data Transfer Protocol – Client reads/writes to HDFS over encrypted channel – Configurable encryption strength • ODBC/JDBC Client to HiveServer2 – Encryption via SASL Quality of Protection • Mapper to Reducer during Shuffle/Sort Phase – Shuffle is over HTTP(S) – Supports mutual authentification via SSL – Host name verification enabled • REST Protocols – SSL Support
  46. 46. Page46 Encryption: Data at rest HDFS Transparent Data Encryption • Install and run KMS on top of HDP 2.2 • Change according HDFS parameters (via Ambari) • Create encryption key hadoop key create key1 -size 256 hadoop key list –metadata • Create an encryption zone using the key hdfs dfs -mkdir /zone1 hdfs crypto -createZone -keyName key1 /zone1 hdfs –listZones • Details: – http://hortonworks.com/kb/hdfs-transparent-data-encryption/
  47. 47. Page47 Future
  48. 48. Page48 Apache Atlas: Data Classification Currently in Incubation – https://wiki.apache.org/incubator/AtlasProposal
  49. 49. Page49 Apache Atlas: Tag-based Policies HDFS HiveServer 2 A B C Beeline Client RangerMetadata Server Data Classification Table1|“marketing“ Tag Policy Logs IT-Admin Create Data Ingestion / ETL Falcon Oozie Source Data Scoop Flume
  50. 50. Page50 Future: More goodies Dynamic, Attribute based Access Control (ABAC) • Extend Ranger to support data or user attributes in policy decisions • Example: Use geo-location of users Enhanced Auditing • Ranger can stream audit data through Kafka&Storm into multiple stores • Use Storm for correlation of data Encryption as First Class Citizen • Build native encryption support in HDFS, Hive & HBase • Ranger-based key management to support encryption
  51. 51. Page51 Contact Details Twitter: @uweseiler uwe.seiler@codecentric.de Mail: uwe.seiler@codecentric.de Phone +49 176 1076531 XING: https://www.xing.com/profile/Uwe_Seiler

×