Suche senden
Hochladen
大数据数据治理及数据安全
•
2 gefällt mir
•
285 views
Jianwei Li
Folgen
大数据数据治理及数据安全
Weniger lesen
Mehr lesen
Technologie
Melden
Teilen
Melden
Teilen
1 von 52
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
大数据数据安全
大数据数据安全
Jianwei Li
sql on hadoop
sql on hadoop
Jianwei Li
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
Jianwei Li
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
Cloudera, Inc.
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Empfohlen
大数据数据安全
大数据数据安全
Jianwei Li
sql on hadoop
sql on hadoop
Jianwei Li
快速数据快速分析引擎-Kudu
快速数据快速分析引擎-Kudu
Jianwei Li
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
Cloudera, Inc.
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
Security implementation on hadoop
Security implementation on hadoop
Wei-Chiu Chuang
What the Enterprise Requires - Business Continuity and Visibility
What the Enterprise Requires - Business Continuity and Visibility
Cloudera, Inc.
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
John Zuniga Resume
John Zuniga Resume
John Zuniga
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Hadoop Security
Hadoop Security
Timothy Spann
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
GoDataDriven
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
DataWorks Summit
Weitere ähnliche Inhalte
Was ist angesagt?
Big Data Fundamentals
Big Data Fundamentals
Cloudera, Inc.
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Cloudera, Inc.
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
Cloudera, Inc.
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
Cloudera, Inc.
John Zuniga Resume
John Zuniga Resume
John Zuniga
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
Cloudera, Inc.
End to End Streaming Architectures
End to End Streaming Architectures
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Cloudera, Inc.
Hadoop Security
Hadoop Security
Timothy Spann
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera, Inc.
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
GoDataDriven
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera, Inc.
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Cloudera, Inc.
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Cloudera, Inc.
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
Was ist angesagt?
(20)
Big Data Fundamentals
Big Data Fundamentals
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
John Zuniga Resume
John Zuniga Resume
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
End to End Streaming Architectures
End to End Streaming Architectures
Spark One Platform Webinar
Spark One Platform Webinar
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
Hadoop Security
Hadoop Security
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Hadoop and Data Access Security
Hadoop and Data Access Security
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
Ähnlich wie 大数据数据治理及数据安全
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Cloudera, Inc.
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
DataWorks Summit
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Niel Dunnage
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
hadooparchbook
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Stefan Lipp
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Stefan Lipp
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
Cloudera, Inc.
TriHUG October: Apache Ranger
TriHUG October: Apache Ranger
trihug
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Cloudera, Inc.
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Cloudera, Inc.
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
hadooparchbook
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
DataWorks Summit
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Cloudera, Inc.
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
BigData Security - A Point of View
BigData Security - A Point of View
Karan Alang
Architecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
Ähnlich wie 大数据数据治理及数据安全
(20)
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Application Architectures with Hadoop
Application Architectures with Hadoop
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
TriHUG October: Apache Ranger
TriHUG October: Apache Ranger
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
Application Architectures with Hadoop
Application Architectures with Hadoop
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
Data Science and CDSW
Data Science and CDSW
BigData Security - A Point of View
BigData Security - A Point of View
Architecting Applications with Hadoop
Architecting Applications with Hadoop
Kürzlich hochgeladen
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
ChristopherTHyatt
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
The Digital Insurer
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
hans926745
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
jfdjdjcjdnsjd
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Kürzlich hochgeladen
(20)
Evaluating the top large language models.pdf
Evaluating the top large language models.pdf
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
presentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
大数据数据治理及数据安全
1.
1© 2014 Cloudera,
Inc. All rights reserved. Data Governance and Protection in Hadoop Jianwei Li jarred@cloudera.com Introduction of Cloudera Navigator
2.
2© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
3.
3© 2014 Cloudera,
Inc. All rights reserved. Hadoop Ecosystem OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
4.
4© 2014 Cloudera,
Inc. All rights reserved. The Benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
5.
5© 2014 Cloudera,
Inc. All rights reserved. …Can Create Information Security Challenges Business Manager • Run high value workloads in cluster • Quickly adopt new innovations Information Security • Follow established policies and procedures • Maintain compliance IT/Operations • Integrate with existing IT investments • Minimize end-user support • Automate configuration
6.
6© 2014 Cloudera,
Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
7.
7© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
8.
8© 2014 Cloudera,
Inc. All rights reserved. Data Management Challenges Compliance Officers • Who’s accessing what data? • What are they doing with the data? • Is sensitive data governed and protected? • Can I meet compliance needs? Data Stewards/Curators • How can I manage data from ingest to purge? • How do I classify data efficiently? • How can data be made available to end-users? Business Users • How do I find what’s relevant? • Can I trust what I find? • How can I explore data on my own? Database Admins • How is data being used today? • How can I optimize for future workloads? • How can I take advantage of Hadoop risk-free and fast?
9.
9© 2014 Cloudera,
Inc. All rights reserved. Cloudera Navigator • Metadata Management • Audit • Policy Based Data Management • Data Analytics The only integrated data management and governance platform for Hadoop
10.
10© 2014 Cloudera,
Inc. All rights reserved. Navigator Metadata Architecture
11.
11© 2014 Cloudera,
Inc. All rights reserved. Metadata Extraction • HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. • Hive - Extracts database and table metadata from the Hive Metastore Server. • Impala - Extracts database and table metadata from the Hive Metastore Server. Extracts query metadata from the Impala Daemon lineage logs. • MapReduce - Extracts job metadata from the JobTracker
12.
12© 2014 Cloudera,
Inc. All rights reserved. Metadata Extraction • Oozie - Extracts Oozie workflows from the Oozie Server. • Pig - Extracts Pig script runs from the JobTracker or Job History Server. • Spark - Extracts Spark job metadata from YARN logs. • Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server. Extracts job runs from the JobTracker or Job History Server. • YARN - Extracts job metadata from the ResourceManager.
13.
13© 2014 Cloudera,
Inc. All rights reserved. Metadata Indexing • Metadata is indexed to Solr for searching • Technical metadata key-value pairs, for example, “fileSystemPath:/tmp/hbase-staging” • Custom metadata key-value pairs, for example, “description:Banking*” • Hive extended attribute key-value pairs, • ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1'); • (sourceType:hive OR sourceType:hdfs) AND (type:table OR type:directory)
14.
14© 2014 Cloudera,
Inc. All rights reserved. Self-Service Data Discovery & Analytics For Business Users Effortlessly find and trust the data that matters most • Search across unified metadata repository • Gain context and visibility into data sets • Find similar, relevant data
15.
15© 2014 Cloudera,
Inc. All rights reserved. Technical & Business Metadata
16.
16© 2014 Cloudera,
Inc. All rights reserved. Modifying Metadata • HDFS file • /user/test/file1.txt • /user/test/.file1.txt.navigator { "name" : "aName", "description" : "a description", "properties" : { "prop1" : "value1", "prop2" : "value2" }, "tags" : [ "tag1" ] } • REST: http://Navigator_Metadata_Server_host:port/api/v 8/entities/ -u username:password -X POST -H "Content-Type: application/json" -d '{properties}'
17.
17© 2014 Cloudera,
Inc. All rights reserved. Navigator Analytics • Metadata - the number of files by creation and access times, size, block size, and replication count. • Audit – Activity tab - by directory which files have been accessed using the open operation and how many times they have been accessed. – Top Users tab - the top-n commands and the top-n users and top n commands those users performed
18.
18© 2014 Cloudera,
Inc. All rights reserved. Navigator Audit Architecture
19.
19© 2014 Cloudera,
Inc. All rights reserved. Compliance-Ready Governance & Protection For Compliance Officers Track, understand, and protect access to sensitive data • Search centralized audits for the entire ecosystem • See how data is used and changing with intuitive lineage • Protect all data with high-performance encryption and key management • Integrate with leading partner tools
20.
20© 2014 Cloudera,
Inc. All rights reserved. Policy Based Data Management • Automate data stewardship and curation activities with the policy engine • Data archive • Data delete • Metadata management • automatic naming with timestamp: entity.get(FSEntityProperties.ORIGINAL_NAME, Object.class) + " - " + new SimpleDateFormat("yyyy-MM- dd").format(entity.get(FSEntityProperties.CREATED, Instant.class).toDate()) • Ensured business continuity through built-in backup & disaster recovery • Integrate with leading partner tools
21.
21© 2014 Cloudera,
Inc. All rights reserved. Lineage • Lineage provides provenance information to show where data came from and how it has been transformed within the EDH • Cloudera Navigator provides column-level lineage within Cloudera EDH • Integrates with certified third party lineage solutions, such as Informatica, for enterprise-wide lineage information
22.
22© 2014 Cloudera,
Inc. All rights reserved. Lineage
23.
23© 2014 Cloudera,
Inc. All rights reserved. End-to-End Data Management Cloudera Navigator + Partners Lineage Auditing Metadata AugmentationConsumption
24.
24© 2014 Cloudera,
Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
25.
25© 2014 Cloudera,
Inc. All rights reserved. Background • Our customers are increasingly wanting to use HDFS to store sensitive data • Customers often are mandated to protect data at rest • National Security • Company confidential • Encryption of data at rest helps mitigate certain security threats • Rogue administrators (insider threat) • Lost/stolen hard drives
26.
26© 2014 Cloudera,
Inc. All rights reserved. Over the Wire Encryption • Uses certificates and TLS to encrypt and optionally authenticate network communication • Customers can use commercial certificate authorities, corporate CAs, or self- signed certificates • Active Directory Certificate Services is commonly used by customers • Secures Hadoop data processing components as well as Cloudera Manager agents and management services
27.
27© 2014 Cloudera,
Inc. All rights reserved. Data at Rest Encryption • Protects data on disk from unauthorized exposure • Protects the data from both online attacks while the system is running as well as offline attacks such as stealing physical drives • HDFS transparent encryption at rest is an open source technology available in Apache Hadoop • Navigator Encrypt is a proprietary technology that protects data outside HDFS • Backend databases, log directories, temp directories, landing zones • Navigator KeyTrustee Server is a proprietary key management server that can integrate with an enterprise HSM
28.
28© 2014 Cloudera,
Inc. All rights reserved. HDFS Encrypt + Navigator Encrypt + Key Trustee
29.
29© 2014 Cloudera,
Inc. All rights reserved.
30.
30© 2014 Cloudera,
Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
31.
31© 2014 Cloudera,
Inc. All rights reserved. Navigator Key Trustee Architecture
32.
32© 2014 Cloudera,
Inc. All rights reserved. Key Management Service (KMS) • When encrypting any data it is important to securely store your encryption keys away from the encrypted data • KMS is a Key Management Service for HDFS Encryption to store and retrieve encryption keys • KMS is open source and provides a standard interface for pluggable key providers • The default key provider for KMS is the Java Key Store • The Java Key Store is not recommended for production key management is meant for development and testing
33.
33© 2014 Cloudera,
Inc. All rights reserved. Key Management Service (KMS) ● Encryption occurs on the requesting client. ○ Data is encrypted before it lands on disk. ○ The KMS encrypts and decrypts specific key components. ○ The KMS does not encrypt content. ○ The KMS does not store keys.
34.
34© 2014 Cloudera,
Inc. All rights reserved. KMS Proxy Deployment considerations.
35.
35© 2014 Cloudera,
Inc. All rights reserved. KMS Proxy Deployment considerations.
36.
36© 2014 Cloudera,
Inc. All rights reserved. Navigator Key Trustee • Navigator Key Trustee provides secure, centralized and scalable key storage and administration • Is not open source and licensed with Cloudera Navigator • Is the recommended option for production deployments • Provides the hooks to integrate with Hardware Security Modules for physically tamper proof requirements (FIPS 140-2 level 3) • Also provides centralized Key Management for Navigator Encrypt
37.
37© 2014 Cloudera,
Inc. All rights reserved. • Customers may choose to use Hardware Security Modules (HSM) to improve the security of their Key store. • Key HSM is a universal Hardware Security Module (HSM) driver. • It acts as a translator between the target HSM Platform and Key Trustee. Key HSM
38.
38© 2014 Cloudera,
Inc. All rights reserved. Hardware Security Module (HSM) • There are a number of vendors out there that provide this. • They exists as appliances and attachable physical hardware. • If one is configured with Key Trustee it will be used as a Root of Trust. • Data inside of the Key Trustee Keystore will be encrypted by this Root of Trust. • The HSM "master" keys are generated in the HSM and never leave the HSM.
39.
39© 2014 Cloudera
and/or its affiliates. All rights reserved. HDFS Encryption Workflow
40.
40© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Involved Parties HDFS KMS Key Trustee zHSM HSM Client optional Key authorization File authorization ©2014 Cloudera, Inc. All rights reserved.
41.
41© 2014 Cloudera,
Inc. All rights reserved. Keys Used in Encryption at Rest HDFS Encryption • Encryption Zone Key (EZKEY) • This key much like a mount key is associated with an encryption zone in HDFS. • Encrypted Data Encryption Key (EDEK) • This is an encrypted copy of a Data Encryption Key. • Data Encryption Key (DEK) • This is the real data encryption key used to encrypt data stored within a file, zone, or block device. This particular key concept is used in both Navigator Encrypt and HDFS Transparent Data Encryption (TDE).
42.
42© 2014 Cloudera,
Inc. All rights reserved. Keys Used in Encryption at Rest (1) When an EZ is created, the administrator specifies an encryption zone key (EZ Key) that is already stored in the backing keystore. The EZ Key encrypts the data encryption keys (DEKs) that are used in turn to encrypt each file. DEKs are encrypted with the EZ key to form an encrypted data encryption key (EDEK), which is stored on the NameNode via an extended attribute on the file (2) To encrypt a file, the client retrieves a new EDEK from the NameNode, and then asks the KMS to decrypt it with the corresponding EZ key. This step results in a DEK (3) the client uses a DEK to encrypt their data (3). (4)To decrypt a file, the client needs to again decrypt the file’s EDEK with the EZ key to get the DEK (2). Then, the client reads the encrypted data and decrypts it with the DEK .
43.
43© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Writing a File HDFS KMS Client To Trustee 2 3 6 7 1 5 8 1. create file 2. generate key 3. encrypted key 4. store encrypted key 5. file handle & encrypted key 6. decrypt encrypted key 7. decrypted key 8. encrypt & write data 4 ©2014 Cloudera, Inc. All rights reserved.
44.
44© 2014 Cloudera,
Inc. All rights reserved. HDFS Encryption, Reading a File HDFS KMS Client To Trustee 3 4 1 2 5 1. open file (passed read permission check) 2. file handle & encrypted key 3. decrypt encrypted key 4. decrypted key 5. read & decrypt data ©2014 Cloudera, Inc. All rights reserved.
45.
45© 2014 Cloudera
and/or its affiliates. All rights reserved. HDFS Encryption Implementation and Usage
46.
46© 2014 Cloudera,
Inc. All rights reserved. Enabling HDFS Encryption on a Cluster • Need recent version of libcrypto.so on HDFS and MapReduce client hosts • To check use the following command: hadoop checknative Output openssl: true /usr/lib64/libcrypto.so • yum install openssl openssl-devel • openssl package installs the library, openssl-devel creates the libcrypto.so symlink (you can manually create this as well) • Openssl provides AES-NI integration for Intel hardware
47.
47© 2014 Cloudera,
Inc. All rights reserved. Enabling HDFS Encryption on a Cluster Using Cloudera Manager 1) Adding the KMS Service - add service Java KeyStore KMS on a host 2) Enabling Java KeyStore KMS for the HDFS Service • HDFS service – configuration tab • Scope > HDFS (Service-Wide) • Category > All • KMS Service property – turn on radio button SAVE CHANGES Restart Cluster Deploy Client Configuration.
48.
48© 2014 Cloudera,
Inc. All rights reserved. Creating Encryption Zones • Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones. # Create an encryption key for your zone as the application user that will be using the key $ hadoop key create myKey # Create a new empty directory and make it an encryption zone $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName myKey -path /zone # To see the key zones $ hdfs crypto –listZones
49.
49© 2014 Cloudera,
Inc. All rights reserved. Adding Files to an Encryption Zones Remember they start empty! You cannot create a Zone in directories with data hadoop distcp /user/dir /user/enczone • By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. • When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. • Use -skipcrccheck and -update flags to avoid verifying checksums. • Also use the distcp flags to preserve all attributes (-prbugpcaxt)
50.
50© 2014 Cloudera,
Inc. All rights reserved. Unified Governance Foundation Unified Auditing Comprehensive Lineage Unified Metadata Universal Policies Search Define Analyze Profile Self-Service Discovery & Analytics Effortlessly find and trust the data that matters most Audit Track Encrypt Manage Keys Compliance-Ready Governance & Protection Track, understand, and protect access to sensitive data Report Optimize Migrate Maintain Models Active Data Optimization Configure Hadoop to boost user productivity Classify Steward Backup Retain Hadoop-Scale Data Lifecycle Management Maximize cluster performance at Hadoop scale with ease Cloudera Navigator The only integrated data management and governance platform for Hadoop
51.
51© 2014 Cloudera,
Inc. All rights reserved. Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems MasterCard Cloudera: The first PCI-Certified Hadoop Platform Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar Chief Technology Officer, Architecture MasterCard
52.
jarred@cloudera.com
Jetzt herunterladen