Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

大数据数据安全

174 Aufrufe

Veröffentlicht am

基于角色的权限管控及统一的大数据安全管控

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

  • Gehören Sie zu den Ersten, denen das gefällt!

大数据数据安全

  1. 1. 1© Cloudera, Inc. All rights reserved. Data Access Security In Hadoop Jianwei Li jarred@cloudera.com Apache Sentry and RecordService
  2. 2. 2© Cloudera, Inc. All rights reserved. Agenda • Data Access Security in Hadoop • Sentry • RecordService
  3. 3. 3© Cloudera, Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
  4. 4. 4© Cloudera, Inc. All rights reserved. Sentry & RecordService OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
  5. 5. 5© Cloudera, Inc. All rights reserved. Authorization Mechanisms in Hadoop • POSIX-style permissions on files and directories • Read, Write,Excecute • Owner, group, other • Access Control Lists (ACL) for management of services and resources • set different permissions for specific named users or named groups • hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>] • Role-Based Access Control (RBAC) for certain services with advanced access controls to data • Sentry • Record Service
  6. 6. 6© Cloudera, Inc. All rights reserved. Apache Sentry
  7. 7. 7© Cloudera, Inc. All rights reserved. Sentry Overview • Apache Sentry is an authorization module for Hadoop • Provides the ability to control and enforce access to data and/or privileges on data for authenticated users • Apache Licensed & ASF Incubator project • Supports ease of administration through role-based authorization (RBAC) • It currently works out of the box with • Hive/Hcatalog • Apache Solr • Impala • More to come (e.g. HBase, Kudu)
  8. 8. 8© Cloudera, Inc. All rights reserved. Sentry and Hadoop Components
  9. 9. 9© Cloudera, Inc. All rights reserved. Sentry Architecture • Sentry Server:The Sentry RPC server manages the authorization metadata. It supports interfaces to securely retrieve and manipulate the metadata. • Data Engine:This is a data processing application such as Hive or Impala that needs to authorize access to data or metadata resources. The data engine loads the Sentry plugin and all client requests for accessing resources are intercepted and routed to the Sentry plugin for validation. • Sentry Plugin:The Sentry plugin runs in the data engine. It offers interfaces to manipulate authorization metadata stored in the Sentry server, and includes the authorization policy engine that evaluates access requests using the authorization metadata retrieved from the server.
  10. 10. 10© Cloudera, Inc. All rights reserved. Sentry Components • Bindings – Extracts access requests from client and passes to policy engine. • Policy engine – reconciles access requests with access policies. • Policy provider – provides common interface to rules database • Files-based – deprecated except for Solr • Database-based – matches RDBMS syntax
  11. 11. 11© Cloudera, Inc. All rights reserved. Sentry Policy Store & Service • Persist the role to privilege and group to role mappings in an RDBMS • Provide programmatic APIs to create, query, update and delete it. • Enables various Sentry clients to retrieve and modify the privileges concurrently and securely. • Supports Kerberos authentication.
  12. 12. 12© Cloudera, Inc. All rights reserved. Sentry/Hive Integration Query authorization • Done with HiveServer2 via plug-in • Performed after the query is successfully compiled • The plug-in gets the list of objects the query is try to access • Converts this list into an authorization request • User is allowed or not
  13. 13. 13© Cloudera, Inc. All rights reserved. Sentry/Hive Integration Changing privileges Same as above If approved: • Hive generates a Sentry specific task • This task invokes the Sentry store client • sends RPC request to Sentry service for making authorization policy changes.
  14. 14. 14© Cloudera, Inc. All rights reserved. Sentry/Impala Integration • Similar to Hive • Catalogd caches and distributes Sentry policy changes across all Impalad nodes • Authorization happens quicker since requests are local to each Impalad
  15. 15. 15© Cloudera, Inc. All rights reserved. Synchronizing HDFS ACLs and Sentry Permissions Maps Sentry privileges to HDFS ACLs: • SELECT privilege -> Read access on the file. • INSERT privilege -> Write access on the file. • ALL privilege -> Read and Write access on the file. The NameNode loads a Sentry plugin that caches Sentry privileges as well Hive metadata.
  16. 16. 16© Cloudera, Inc. All rights reserved. The actors that play part in Sentry authorization •Resource – Server, Database, Table or URI •Privileges – Select, Insert •Roles – Collections of privileges •Users and Groups
  17. 17. 17© Cloudera, Inc. All rights reserved. User Identity and Group Mapping • User management: Active Directory, MIT Keberos • Group Mapping: • System Security Services Daemon(SSSD) • Linux OS with LDAP • SAMBA,Centrify,Winbind… • Active Directory/LDAP • hadoop.security.group.mapping -> org.apache.hadoop.security.LdapGroupsMapping • Manual configure in OS • Useradd,newgrp
  18. 18. 18© Cloudera, Inc. All rights reserved. Group and Role Mapping • Groups • Alice -> finance-department • Bob -> finance-department, finance-manager • Role mapping: • “Analyst” role: “select” on “Customers”, “Sales” table • Grant “Analyst” role to “finance-department”
  19. 19. 19© Cloudera, Inc. All rights reserved. Sentry Commands – Create/Drop Role •Creates a role to which privileges can be granted. •Only Sentry admin users can use these commands •By default, the hive, impala and hue users have admin privileges in Sentry. •CREATE ROLE [role_name]; •DROP ROLE [role_name];
  20. 20. 20© Cloudera, Inc. All rights reserved. Sentry Commands – Grant/Revoke Privilege •Grant privileges on an object to a role •Only Sentry admin users can use these commands • GRANT <PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> TO ROLE <roleName> [,ROLE <roleName>] • REVOKE<PRIVILEGE> [, <PRIVILEGE> ] ON <OBJECT> <object_name> FROM ROLE <roleName> [,ROLE <roleName>] • GRANT <PRIVILEGE> ... WITH GRANT OPTION • Objects can be Server, Database, Table, URI
  21. 21. 21© Cloudera, Inc. All rights reserved. Sentry Commands – Grant/Revoke Role •The GRANT ROLE statement can be used to assign or remove roles to groups. •Only Sentry admin users can use these commands GRANT ROLE role_name [, role_name] TO GROUP <groupName> [,GROUP <groupName>] REVOKE ROLE role_name [, role_name] FROM GROUP <groupName> [,GROUP <groupName>]
  22. 22. 22© Cloudera, Inc. All rights reserved. Sentry Commands – SHOW • SHOW CURRENT ROLES; - List all the roles in effect for the current user session • SHOW ROLES; - To list all the roles in the system (only for sentry admin users) • SHOW ROLE GRANT GROUP <groupName>; - To list all the roles assigned to the given <groupName> (only allowed for Sentry admin users and others users that are part of the group) • SHOW GRANT ROLE <roleName>; - List all the grants for the given <roleName> (only allowed for Sentry admin users and other users that have been granted the role) • SHOW GRANT ROLE <roleName> on OBJECT <objectName>; - List all the grants for a role on the given <objectName> (only allowed for Sentry admin users and other users that have been granted the role)
  23. 23. 23© Cloudera, Inc. All rights reserved. Sentry Web UI
  24. 24. 24© Cloudera, Inc. All rights reserved. RecordService
  25. 25. 25© Cloudera, Inc. All rights reserved. Permission Enforcement today with Sentry Hive Server 2 Sentry Enforcement Impala HDFS: MR, Pig, Spark, ... Search (Solr) Sentry Permissions rules Rule: “Allow fraud analysts read access to the transaction table” Admins specify permissions Sentry Enforcement Sentry Enforcement Sentry Enforcement Apps: Datameer, Platfora, Zoomdata, etc Sentry Service Coarse grained (table)
  26. 26. 26© Cloudera, Inc. All rights reserved. The Need for Fine-Grained Access Control Across all access paths Columns: Sensitive column visibility varies; Example: credit card numbers • Managers: 1234 5678 1234 5678 • CallCenter: XXXX XXXX XXXX 5678 • Analysts: XXXX XXXX XXXX XXXX • Others: Does not see credit card column Rows: Different groups of users need access to different records • European privacy laws • Government security clearance • Financial information restrictions
  27. 27. 27© Cloudera, Inc. All rights reserved. The workaround Date/time Accnt # SSN Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 14:12:34 16- Feb-2015 4848367383 123-56- 2345 DEF Sell group3 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 13:45:24 16- Feb-2015 3456789012 412-22- 8765 XYZ Sell group2 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy group1 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy group3 Date/time Accnt # SSN Asset Trade Broker 14:12:34 16- Feb-2015 4848367383 123-56- 2345 DEF Sell group3 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 15:55:55 16- Feb-2015 4756983234 234-76- 9274 MA Buy group3 Date/time Accnt # SSN Asset Trade Broker 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 13:45:24 16- Feb-2015 3456789012 412-22- 8765 XYZ Sell group2 Date/time Accnt # SSN Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 09:03:44 16- Feb-2015 4857389329 123-44- 5678 TMV Buy group1 Split the original file; Use HDFS permissions to limit access What if only some brokers in each group are allowed to see full SSN?
  28. 28. 28© Cloudera, Inc. All rights reserved. The Solution • Apply controls to the master data file • Row, column, and sub-column (masking) controls • Ability to enforce these across access paths Date/time Accnt # SSN Asset Trade Broker 09:33:11 16- Feb-2015 0234837823 238-23- 9876 ABC Sell group1 11:33:01 16- Feb-2015 3947848494 329-44- 9847 TBT Buy group2 14:12:34 16- Feb-2015 4848367383 123-56- 2345 HDP Sell group3 09:22:03 16- Feb-2015 3485739384 585-11- 2345 INTC Buy group1 11:55:33 16- Feb-2015 3847598390 234-11- 8765 F Buy group1 10:22:55 16- Feb-2015 8765432176 344-22- 9876 UA Buy group3 13:45:24 16- Feb-2015 3456789012 412-22- 8765 AMZN Sell group2 Column-Level Controls Row-Level Controls What All Group 1 Brokers See:
  29. 29. 29© Cloudera, Inc. All rights reserved. RecordService Unified Access Control Enforcement Sentry Permissions Rules Permissions specified by administrators (top-level and delegated) Rule: Allow managers to see social security numbers Sentry Service HDFS HBase STORAGE RecordService Impala Spark MR Solr Apps …
  30. 30. 30© Cloudera, Inc. All rights reserved. RecordService - Overview • Simplifies • Provides a higher level, logical abstraction for data (ie Tables or Views) • Returns schemed objects (instead of paths and bytes). No need for applications to worry about storage APIs and file formats. • HCatalog? Similar concept - RecordService is secure, performant. Plan to support HCatalog as a data model on RecordService. • Secures • Central location for all authorization checks using Sentry metadata. • Secure service that does not execute arbitrary user code • Accelerates • Unified data access path allows platform-wide performance improvements.
  31. 31. 31© Cloudera, Inc. All rights reserved. Architecture
  32. 32. 32© Cloudera, Inc. All rights reserved. Architecture • Runs as a distributed service: Planner Servers & Worker Servers • Servers do not store any state • Easy HA, fault tolerance. • Planner Servers responsible for request planning • Retrieve and combine metadata (NN, HMS, Sentry) • Split generation -> Creates tasks for workers • Performs authorization • Worker Servers reads from storage and constructs records. • IO, file parsing, predicate evaluation • Runs as the “source” for a DAG computation
  33. 33. 33© Cloudera, Inc. All rights reserved. Architecture – Fault tolerance • Cluster state persisted in ZK • Membership, delegation tokens, secret keys • Servers do not communicate with each other directly => scalability • Planner services • Expected to run a few (i.e. 3) for HA • Fault tolerance handled with clients getting a list of planners and failing over • Plan requests are short • Worker services • Expect to run on each node in the cluster with data • Fault tolerance handled by framework (e.g. MR) rescheduling task
  34. 34. 34© Cloudera, Inc. All rights reserved. Architecture – Security • Authentication using Kerberos and delegation tokens • Planner authorizes request using metadata in Sentry • Column level ACLs • Row level ACLs – create a view with a predicate • Masking – create a view with the masking function in the select list • Worker runs generated tasks.
  35. 35. 35© Cloudera, Inc. All rights reserved. Client APIs – Integration with ecosystem • Similar APIs designed to integrate with MapReduce and Spark • Client APIs make things simpler
  36. 36. 36© Cloudera, Inc. All rights reserved. MR Example //FileInputFormat.setInputPaths(job, new Path(args[0])); //job.setInputFormatClass(AvroKeyInputFormat.class); RecordServiceConfig.setInputTable(configuration, null, args[0]); job.setInputFormatClass( com.cloudera.recordservice.avro.mapreduce.AvroKeyInputFormat.class);
  37. 37. 37© Cloudera, Inc. All rights reserved. Spark Example // Comment out one or the other val file = sc.recordServiceTextFile(path) //val file = sc.textFile(path)
  38. 38. 38© Cloudera, Inc. All rights reserved. Spark SQL Example ctx.sql(s""" |CREATE TEMPORARY TABLE $tbl |USING com.cloudera.recordservice.spark.DefaultSource |OPTIONS ( | RecordServiceTable '$db.$tbl', | RecordServiceTableSize '$size' |) """.stripMargin)
  39. 39. 39© Cloudera, Inc. All rights reserved. Performance • Shares some core components with Impala • IO management, optimized C++ code, runtime code generation, uses low level storage APIs • Highly efficient implementation of the scan functionality • Optimized columnar on wire format • Inspired by Apache Parquet • Accelerates performance for many workloads
  40. 40. 40© Cloudera, Inc. All rights reserved. Terasort • ~Worst case scenario. Minimal schema: a single STRING column • Custom RecordServiceTeraInputFormat (similar to TeraInputFormat) • 78 Node cluster (12 cores/24 Hyper-Threaded, 12 disks) • Ran on 1 billion, 50 billion and 1 trillion (~100TB) scales • See Github repo for more details and runnable examples.
  41. 41. 41© Cloudera, Inc. All rights reserved. TeraChecksum 1 0.48 0.23 1.03 0.8 0.85 0 0.2 0.4 0.6 0.8 1 1.2 1B (MapReduce) 50B (MapReduce) 1T (MapReduce) 1B (Spark) 50B (Spark) 1T (Spark) Normalized job time TeraChecksum Without RecordService With RecordService
  42. 42. 42© Cloudera, Inc. All rights reserved. Spark SQL • Represents a more expected use case • Data is fully schemed • TPCDS • 500GB scale factor, on parquet • Cluster • 5 node cluster
  43. 43. 43© Cloudera, Inc. All rights reserved. 0 50 100 150 200 250 300 350 TPCDS SparkSQL SparkSQL SparkSQL with RecordService Spark SQL ~15% improvement in query times; queries are not scan bound
  44. 44. 44© Cloudera, Inc. All rights reserved. Spark SQL 29.5 31 14 23.5 0 5 10 15 20 25 30 35 2% Selective Scan Sum(col) SparkSQL SparkSQL SparkSQL with RecordService
  45. 45. 45© Cloudera, Inc. All rights reserved. Summary – Sentry and RecordService Sentry Perm. Read Access to Transactions.D ate… Where Country = US Sentry Perm. Read Access to Customers.Cust omerID… Where Country = USSentry Role U.S. Customer Transaction Analysis Group Tier 1 Customer Support Reps Sam Smith Group Tier 1 Broker Analysts Martha Jones Cust. ID SSN Phone Country 6758493 329-44-9847 US 09:22:03 16- Feb-2015 344-22-9876 EU 5768459 585-11-2345 US Date/Time Cust. ID Trade Country 11:33:01 16- Feb-2015 Sell US 09:22:03 16- Feb-2015 344- 22- 9876 EU 13:45:24 16- Feb-2015 Buy US
  46. 46. 46© Cloudera, Inc. All rights reserved. Getting Started: Sentry Users: Install CDH, try a VM, or try on AWS: cloudera.com/download Read docs: www.cloudera.com/content/www/en- us/documentation/enterprise/latest/topics/sg_sentry _overview.html Get help: community.cloudera.com Developers: Contribute: sentry.incubator.apache.org Report issues: issues.apache.org/jira/browse/SENTRY Join Dev list: dev@sentry.incubator.apache.org Contributions/participation are welcome and encouraged!
  47. 47. 47© Cloudera, Inc. All rights reserved. Getting Started: RecordService Users: Install the RS Beta or try a VM: cloudera.github.io/RecordServiceClient Get help: recordservice-user@googlegroups.com Developers: Contribute: github.com/cloudera/RecordServiceClient Join Dev list: recordservice-dev@googlegroups.com Contributions/participation are welcome and encouraged!
  48. 48. 48© Cloudera, Inc. All rights reserved. Thank you

×