SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
1© 2014 Cloudera, Inc. All rights reserved.
Data Governance and Protection
in Hadoop
Jianwei Li
jarred@cloudera.com
Introduction of Cloudera Navigator
2© 2014 Cloudera, Inc. All rights reserved.
Agenda
• Hadoop Security Pillars
• Metadata Management and Data Audit
• Data Security at Rest and in Transit
3© 2014 Cloudera, Inc. All rights reserved.
Hadoop Ecosystem
OPERATIONS
Cloudera Manager
Cloudera Director
DATA	
MANAGEMENT
Cloudera Navigator
Encrypt and KeyTrustee
Optimizer
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE	MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Kite
4© 2014 Cloudera, Inc. All rights reserved.
The Benefits of Hadoop...
One place for unlimited data
• All types
• More sources
• Faster, larger ingestion
Unified, multi-framework data access
• More users
• More tools
• Faster changes
5© 2014 Cloudera, Inc. All rights reserved.
…Can Create Information Security Challenges
Business Manager
• Run high value
workloads in cluster
• Quickly adopt new
innovations
Information Security
• Follow established
policies and
procedures
• Maintain compliance
IT/Operations
• Integrate with existing
IT investments
• Minimize end-user
support
• Automate
configuration
6© 2014 Cloudera, Inc. All rights reserved.
Hadoop Security Pillars
Authentication, Authorization, Audit, and Compliance
Access
Defining what users and
applications can do with
data
Technical Concepts:
Permissions
Authorization
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption, Tokenization,
Data masking
Visibility
Reporting on where data
came from and how it’s
being used
Technical Concepts:
Auditing
Lineage
Cloudera Manager
Apache Sentry &
RecordService
Cloudera Navigator
Navigator Encrypt & Key
Trustee | Partners
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
7© 2014 Cloudera, Inc. All rights reserved.
Agenda
• Hadoop Security Pillars
• Metadata Management and Data Audit
• Data Security at Rest and in Transit
8© 2014 Cloudera, Inc. All rights reserved.
Data Management Challenges
Compliance
Officers
• Who’s accessing
what data?
• What are they doing
with the data?
• Is sensitive data
governed and
protected?
• Can I meet
compliance needs?
Data
Stewards/Curators
• How can I manage
data from ingest to
purge?
• How do I classify data
efficiently?
• How can data be
made available to
end-users?
Business Users
• How do I find what’s
relevant?
• Can I trust what I
find?
• How can I explore
data on my own?
Database Admins
• How is data being
used today?
• How can I optimize
for future workloads?
• How can I take
advantage of Hadoop
risk-free and fast?
9© 2014 Cloudera, Inc. All rights reserved.
Cloudera Navigator
• Metadata Management
• Audit
• Policy Based Data
Management
• Data Analytics
The only integrated data management and governance platform for Hadoop
10© 2014 Cloudera, Inc. All rights reserved.
Navigator Metadata Architecture
11© 2014 Cloudera, Inc. All rights reserved.
Metadata Extraction
• HDFS - Extracts HDFS metadata at the next scheduled extraction run after an
HDFS checkpoint.
• Hive - Extracts database and table metadata from the Hive Metastore Server.
• Impala - Extracts database and table metadata from the Hive Metastore Server.
Extracts query metadata from the Impala Daemon lineage logs.
• MapReduce - Extracts job metadata from the JobTracker
12© 2014 Cloudera, Inc. All rights reserved.
Metadata Extraction
• Oozie - Extracts Oozie workflows from the Oozie Server.
• Pig - Extracts Pig script runs from the JobTracker or Job History Server.
• Spark - Extracts Spark job metadata from YARN logs.
• Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server.
Extracts job runs from the JobTracker or Job History Server.
• YARN - Extracts job metadata from the ResourceManager.
13© 2014 Cloudera, Inc. All rights reserved.
Metadata Indexing
• Metadata is indexed to Solr for searching
• Technical metadata key-value pairs, for example, “fileSystemPath:/tmp/hbase-staging”
• Custom metadata key-value pairs, for example, “description:Banking*”
• Hive extended attribute key-value pairs,
• ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1');
• (sourceType:hive OR sourceType:hdfs) AND (type:table OR type:directory)
14© 2014 Cloudera, Inc. All rights reserved.
Self-Service Data Discovery & Analytics
For Business Users
Effortlessly find and trust
the data that matters
most
• Search across unified metadata repository
• Gain context and visibility into data sets
• Find similar, relevant data
15© 2014 Cloudera, Inc. All rights reserved.
Technical & Business Metadata
16© 2014 Cloudera, Inc. All rights reserved.
Modifying Metadata
• HDFS file
• /user/test/file1.txt
• /user/test/.file1.txt.navigator
{ "name" : "aName",
"description" : "a description",
"properties" : { "prop1" : "value1", "prop2" : "value2" },
"tags" : [ "tag1" ]
}
• REST:
http://Navigator_Metadata_Server_host:port/api/v
8/entities/ -u username:password -X POST -H
"Content-Type: application/json" -d '{properties}'
17© 2014 Cloudera, Inc. All rights reserved.
Navigator Analytics
• Metadata - the number of files by
creation and access times, size, block
size, and replication count.
• Audit
– Activity tab - by directory which files
have been accessed using the open
operation and how many times they have
been accessed.
– Top Users tab - the top-n commands
and the top-n users and top n commands
those users performed
18© 2014 Cloudera, Inc. All rights reserved.
Navigator Audit Architecture
19© 2014 Cloudera, Inc. All rights reserved.
Compliance-Ready Governance & Protection
For Compliance Officers
Track, understand, and
protect access to
sensitive data
• Search centralized audits for the entire
ecosystem
• See how data is used and changing with
intuitive lineage
• Protect all data with high-performance
encryption and key management
• Integrate with leading partner tools
20© 2014 Cloudera, Inc. All rights reserved.
Policy Based Data Management
• Automate data stewardship and curation
activities with the policy engine
• Data archive
• Data delete
• Metadata management
• automatic naming with timestamp:
entity.get(FSEntityProperties.ORIGINAL_NAME,
Object.class) + " - "
+ new SimpleDateFormat("yyyy-MM-
dd").format(entity.get(FSEntityProperties.CREATED,
Instant.class).toDate())
• Ensured business continuity through built-in
backup & disaster recovery
• Integrate with leading partner tools
21© 2014 Cloudera, Inc. All rights reserved.
Lineage
• Lineage provides provenance information to show where data came from and
how it has been transformed within the EDH
• Cloudera Navigator provides column-level lineage within Cloudera EDH
• Integrates with certified third party lineage solutions, such as Informatica, for
enterprise-wide lineage information
22© 2014 Cloudera, Inc. All rights reserved.
Lineage
23© 2014 Cloudera, Inc. All rights reserved.
End-to-End Data Management
Cloudera Navigator + Partners
Lineage Auditing Metadata
AugmentationConsumption
24© 2014 Cloudera, Inc. All rights reserved.
Agenda
• Hadoop Security Pillars
• Metadata Management and Data Audit
• Data Security at Rest and in Transit
25© 2014 Cloudera, Inc. All rights reserved.
Background
• Our	customers	are	increasingly	wanting	to	use	HDFS	to	store	sensitive	data
• Customers	often	are	mandated	to	protect	data	at	rest
• National	Security
• Company	confidential
• Encryption	of	data	at	rest	helps	mitigate	certain	security	threats
• Rogue	administrators	(insider	threat)
• Lost/stolen	hard	drives
26© 2014 Cloudera, Inc. All rights reserved.
Over the Wire Encryption
• Uses certificates and TLS to encrypt and optionally authenticate network
communication
• Customers can use commercial certificate authorities, corporate CAs, or self-
signed certificates
• Active Directory Certificate Services is commonly used by customers
• Secures Hadoop data processing components as well as Cloudera Manager
agents and management services
27© 2014 Cloudera, Inc. All rights reserved.
Data at Rest Encryption
• Protects data on disk from unauthorized exposure
• Protects the data from both online attacks while the system is running as well as
offline attacks such as stealing physical drives
• HDFS transparent encryption at rest is an open source technology available in
Apache Hadoop
• Navigator Encrypt is a proprietary technology that protects data outside HDFS
• Backend databases, log directories, temp directories, landing zones
• Navigator KeyTrustee Server is a proprietary key management server that can
integrate with an enterprise HSM
28© 2014 Cloudera, Inc. All rights reserved.
HDFS	Encrypt	+	Navigator	Encrypt	+	Key	Trustee
29© 2014 Cloudera, Inc. All rights reserved.
30© 2014 Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
31© 2014 Cloudera, Inc. All rights reserved.
Navigator Key Trustee Architecture
32© 2014 Cloudera, Inc. All rights reserved.
Key	Management	Service	(KMS)
• When	encrypting	any	data	it	is	important	to	securely	store	your	encryption	
keys	away	from	the	encrypted	data
• KMS	is	a	Key	Management	Service	for	HDFS	Encryption	to	store	and	retrieve	
encryption	keys
• KMS	is	open	source	and	provides	a	standard	interface	for	pluggable	key	
providers
• The default	key	provider	for	KMS	is	the	Java	Key	Store
• The	Java Key	Store	is	not	recommended	for	production	key	management	is	
meant	for	development	and	testing
33© 2014 Cloudera, Inc. All rights reserved.
Key	Management	Service	(KMS)
● Encryption	occurs	on	the	requesting	client.	
○ Data	is	encrypted	before	it	lands	on	disk.
○ The	KMS	encrypts	and	decrypts	specific	key	components.
○ The	KMS	does	not	encrypt	content.
○ The	KMS	does	not	store	keys.
34© 2014 Cloudera, Inc. All rights reserved.
KMS	Proxy	Deployment	considerations.
35© 2014 Cloudera, Inc. All rights reserved.
KMS	Proxy	Deployment	considerations.
36© 2014 Cloudera, Inc. All rights reserved.
Navigator	Key	Trustee
• Navigator	Key	Trustee	provides	secure,	centralized	and	scalable	key	storage	
and	administration
• Is not	open	source	and	licensed	with	Cloudera	Navigator
• Is	the	recommended	option	for	production	deployments
• Provides	the	hooks	to	integrate	with	Hardware	Security	Modules	for	
physically	tamper	proof	requirements	(FIPS	140-2	level	3)
• Also	provides	centralized	Key	Management	for	Navigator	Encrypt
37© 2014 Cloudera, Inc. All rights reserved.
• Customers	may	choose	to	use	Hardware	Security	Modules	(HSM)	to	improve	the	
security	of	their	Key	store.	
• Key	HSM	is	a	universal	Hardware	Security	Module	(HSM)	driver.
• It	acts	as	a	translator	between	the	target	HSM	Platform	and	Key	Trustee.
Key	HSM
38© 2014 Cloudera, Inc. All rights reserved.
Hardware	Security	Module	(HSM)
• There	are	a	number	of	vendors	out	there	that	provide	this.	
• They	exists	as	appliances	and	attachable	physical	hardware.
• If	one	is	configured	with	Key	Trustee	it	will	be	used	as	a	Root	of	Trust.
• Data	inside	of	the	Key	Trustee	Keystore will	be	encrypted	by	this	Root	of	Trust.
• The	HSM	"master"	keys	are	generated	in	the	HSM	and	never	leave	the	HSM.
39© 2014 Cloudera and/or its affiliates. All rights reserved.
HDFS	Encryption	Workflow
40© 2014 Cloudera, Inc. All rights reserved.
HDFS Encryption, Involved Parties
HDFS
KMS Key Trustee
zHSM
HSM
Client
optional
Key authorization
File authorization
©2014	Cloudera,	Inc.	All	rights	reserved.
41© 2014 Cloudera, Inc. All rights reserved.
Keys	Used	in	Encryption	at	Rest	
HDFS	Encryption
• Encryption	Zone	Key	(EZKEY)
• This	key	much	like	a	mount	key	is	associated	with	an	encryption	
zone	in	HDFS.
• Encrypted	Data	Encryption	Key	(EDEK)
• This	is	an	encrypted	copy	of	a	Data	Encryption	Key.
• Data	Encryption	Key	(DEK)
• This	is	the	real	data	encryption	key	used	to	encrypt	data	stored	
within	a	file,	zone,	or	block	device.	This	particular	key	concept	is	
used	in	both	Navigator	Encrypt	and	HDFS	Transparent	Data	
Encryption	(TDE).
42© 2014 Cloudera, Inc. All rights reserved.
Keys	Used	in	Encryption	at	Rest	
(1) When an EZ is created, the administrator specifies an
encryption zone key (EZ Key) that is already stored in the
backing keystore. The EZ Key encrypts the data encryption
keys (DEKs) that are used in turn to encrypt each file. DEKs
are encrypted with the EZ key to form an encrypted data
encryption key (EDEK), which is stored on the NameNode via
an extended attribute on the file
(2) To encrypt a file, the client retrieves a new EDEK from the
NameNode, and then asks the KMS to decrypt it with the
corresponding EZ key. This step results in a DEK
(3) the client uses a DEK to encrypt their data (3).
(4)To decrypt a file, the client needs to again decrypt the file’s
EDEK with the EZ key to get the DEK (2). Then, the client
reads the encrypted data and decrypts it with the DEK .
43© 2014 Cloudera, Inc. All rights reserved.
HDFS Encryption, Writing a File
HDFS
KMS
Client
To Trustee
2 3
6
7
1
5
8
1. create file
2. generate key
3. encrypted key
4. store encrypted
key
5. file handle &
encrypted key
6. decrypt
encrypted key
7. decrypted key
8. encrypt & write data
4
©2014	Cloudera,	Inc.	All	rights	reserved.
44© 2014 Cloudera, Inc. All rights reserved.
HDFS Encryption, Reading a File
HDFS
KMS
Client
To Trustee
3
4
1
2
5
1. open file (passed read
permission check)
2. file handle &
encrypted key
3. decrypt
encrypted key
4. decrypted key
5. read & decrypt data
©2014	Cloudera,	Inc.	All	rights	reserved.
45© 2014 Cloudera and/or its affiliates. All rights reserved.
HDFS	Encryption	
Implementation	and	
Usage
46© 2014 Cloudera, Inc. All rights reserved.
Enabling HDFS Encryption on a Cluster
• Need recent version of libcrypto.so on HDFS and MapReduce client hosts
• To check use the following command: hadoop checknative
Output
openssl: true /usr/lib64/libcrypto.so
• yum install openssl openssl-devel
• openssl package installs the library, openssl-devel creates the libcrypto.so
symlink (you can manually create this as well)
• Openssl provides AES-NI integration for Intel hardware
47© 2014 Cloudera, Inc. All rights reserved.
Enabling HDFS Encryption on a Cluster
Using Cloudera Manager
1) Adding the KMS Service - add service Java KeyStore KMS on a host
2) Enabling Java KeyStore KMS for the HDFS Service
• HDFS service – configuration tab
• Scope > HDFS (Service-Wide)
• Category > All
• KMS Service property – turn on radio button
SAVE CHANGES
Restart Cluster
Deploy Client Configuration.
48© 2014 Cloudera, Inc. All rights reserved.
Creating Encryption Zones
• Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up
new encryption zones.
# Create an encryption key for your zone as the application user that will be using the key
$ hadoop key create myKey
# Create a new empty directory and make it an encryption zone
$ hadoop fs -mkdir /zone
$ hdfs crypto -createZone -keyName myKey -path /zone
# To see the key zones
$ hdfs crypto –listZones
49© 2014 Cloudera, Inc. All rights reserved.
Adding Files to an Encryption Zones
Remember they start empty! You cannot create a Zone in directories with data
hadoop distcp /user/dir /user/enczone
• By default, distcp compares checksums provided by the filesystem to verify that
data was successfully copied to the destination.
• When copying between an unencrypted and encrypted location, the filesystem
checksums will not match since the underlying block data is different.
• Use -skipcrccheck and -update flags to avoid verifying checksums.
• Also use the distcp flags to preserve all attributes (-prbugpcaxt)
50© 2014 Cloudera, Inc. All rights reserved.
Unified Governance Foundation
Unified Auditing Comprehensive Lineage Unified Metadata Universal Policies
Search
Define
Analyze
Profile
Self-Service Discovery
& Analytics
Effortlessly find and trust the
data that matters most
Audit
Track
Encrypt
Manage Keys
Compliance-Ready
Governance & Protection
Track, understand, and protect
access to sensitive data
Report
Optimize
Migrate
Maintain Models
Active Data
Optimization
Configure Hadoop to boost
user productivity
Classify
Steward
Backup
Retain
Hadoop-Scale Data Lifecycle
Management
Maximize cluster performance
at Hadoop scale with ease
Cloudera Navigator
The only integrated data management and governance platform for Hadoop
51© 2014 Cloudera, Inc. All rights reserved.
Challenge:	All	applications,	databases,	or	file	
systems	that	have	the	potential	to	handle	
personal	account-related	data	must	undergo	full	
PCI	certification
Solution: MasterCard’s	Cloudera	environment	
fully	conforms	to	the	PCI-DSS	V	2.0	security	
standards	so	it	can	host	PCI	datasets	and	
potentially	integrate	with	other	internal	systems
MasterCard
Cloudera:	The	first	PCI-Certified	
Hadoop	Platform
Data	privacy	and	protection	is	a	top	priority	
for	MasterCard.	As	we	maximize	the	most	
advanced	technologies	from	partners	and	
vendors,	they	must	meet	the	rigorous	
security	standards	we’ve	set.	With	Cloudera’s	
commitment		to	the	same	standards,	we	
now	have	additional	options	in	how	we	
manage	our	data	center.”Gary	VonderHaar
Chief	Technology	Officer,	
Architecture
MasterCard
jarred@cloudera.com

Weitere ähnliche Inhalte

Was ist angesagt?

Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?DataWorks Summit
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18Cloudera, Inc.
 
John Zuniga Resume
John Zuniga ResumeJohn Zuniga Resume
John Zuniga ResumeJohn Zuniga
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceGoDataDriven
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerCloudera, Inc.
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseCloudera, Inc.
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondCloudera, Inc.
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016StampedeCon
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseCloudera, Inc.
 

Was ist angesagt? (20)

Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18
 
John Zuniga Resume
John Zuniga ResumeJohn Zuniga Resume
John Zuniga Resume
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Cloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and GovernanceCloudera GoDataFest Security and Governance
Cloudera GoDataFest Security and Governance
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Data Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the EnterpriseData Science and Machine Learning for the Enterprise
Data Science and Machine Learning for the Enterprise
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016Hadoop Security and Compliance - StampedeCon 2016
Hadoop Security and Compliance - StampedeCon 2016
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the EnterpriseConfiguring a Secure, Multitenant Cluster for the Enterprise
Configuring a Secure, Multitenant Cluster for the Enterprise
 

Ähnlich wie 大数据数据治理及数据安全

Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopDataWorks Summit
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Cloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Cloudera, Inc.
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationshadooparchbook
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubDataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 

Ähnlich wie 大数据数据治理及数据安全 (20)

Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015Application Architectures with Hadoop | Data Day Texas 2015
Application Architectures with Hadoop | Data Day Texas 2015
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 

Kürzlich hochgeladen

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

大数据数据治理及数据安全

  • 1. 1© 2014 Cloudera, Inc. All rights reserved. Data Governance and Protection in Hadoop Jianwei Li jarred@cloudera.com Introduction of Cloudera Navigator
  • 2. 2© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  • 3. 3© 2014 Cloudera, Inc. All rights reserved. Hadoop Ecosystem OPERATIONS Cloudera Manager Cloudera Director DATA MANAGEMENT Cloudera Navigator Encrypt and KeyTrustee Optimizer STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
  • 4. 4© 2014 Cloudera, Inc. All rights reserved. The Benefits of Hadoop... One place for unlimited data • All types • More sources • Faster, larger ingestion Unified, multi-framework data access • More users • More tools • Faster changes
  • 5. 5© 2014 Cloudera, Inc. All rights reserved. …Can Create Information Security Challenges Business Manager • Run high value workloads in cluster • Quickly adopt new innovations Information Security • Follow established policies and procedures • Maintain compliance IT/Operations • Integrate with existing IT investments • Minimize end-user support • Automate configuration
  • 6. 6© 2014 Cloudera, Inc. All rights reserved. Hadoop Security Pillars Authentication, Authorization, Audit, and Compliance Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption, Tokenization, Data masking Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage Cloudera Manager Apache Sentry & RecordService Cloudera Navigator Navigator Encrypt & Key Trustee | Partners Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation
  • 7. 7© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  • 8. 8© 2014 Cloudera, Inc. All rights reserved. Data Management Challenges Compliance Officers • Who’s accessing what data? • What are they doing with the data? • Is sensitive data governed and protected? • Can I meet compliance needs? Data Stewards/Curators • How can I manage data from ingest to purge? • How do I classify data efficiently? • How can data be made available to end-users? Business Users • How do I find what’s relevant? • Can I trust what I find? • How can I explore data on my own? Database Admins • How is data being used today? • How can I optimize for future workloads? • How can I take advantage of Hadoop risk-free and fast?
  • 9. 9© 2014 Cloudera, Inc. All rights reserved. Cloudera Navigator • Metadata Management • Audit • Policy Based Data Management • Data Analytics The only integrated data management and governance platform for Hadoop
  • 10. 10© 2014 Cloudera, Inc. All rights reserved. Navigator Metadata Architecture
  • 11. 11© 2014 Cloudera, Inc. All rights reserved. Metadata Extraction • HDFS - Extracts HDFS metadata at the next scheduled extraction run after an HDFS checkpoint. • Hive - Extracts database and table metadata from the Hive Metastore Server. • Impala - Extracts database and table metadata from the Hive Metastore Server. Extracts query metadata from the Impala Daemon lineage logs. • MapReduce - Extracts job metadata from the JobTracker
  • 12. 12© 2014 Cloudera, Inc. All rights reserved. Metadata Extraction • Oozie - Extracts Oozie workflows from the Oozie Server. • Pig - Extracts Pig script runs from the JobTracker or Job History Server. • Spark - Extracts Spark job metadata from YARN logs. • Sqoop 1 - Extracts database and table metadata from the Hive Metastore Server. Extracts job runs from the JobTracker or Job History Server. • YARN - Extracts job metadata from the ResourceManager.
  • 13. 13© 2014 Cloudera, Inc. All rights reserved. Metadata Indexing • Metadata is indexed to Solr for searching • Technical metadata key-value pairs, for example, “fileSystemPath:/tmp/hbase-staging” • Custom metadata key-value pairs, for example, “description:Banking*” • Hive extended attribute key-value pairs, • ALTER TABLE table_name SET TBLPROPERTIES ('key1'='value1'); • (sourceType:hive OR sourceType:hdfs) AND (type:table OR type:directory)
  • 14. 14© 2014 Cloudera, Inc. All rights reserved. Self-Service Data Discovery & Analytics For Business Users Effortlessly find and trust the data that matters most • Search across unified metadata repository • Gain context and visibility into data sets • Find similar, relevant data
  • 15. 15© 2014 Cloudera, Inc. All rights reserved. Technical & Business Metadata
  • 16. 16© 2014 Cloudera, Inc. All rights reserved. Modifying Metadata • HDFS file • /user/test/file1.txt • /user/test/.file1.txt.navigator { "name" : "aName", "description" : "a description", "properties" : { "prop1" : "value1", "prop2" : "value2" }, "tags" : [ "tag1" ] } • REST: http://Navigator_Metadata_Server_host:port/api/v 8/entities/ -u username:password -X POST -H "Content-Type: application/json" -d '{properties}'
  • 17. 17© 2014 Cloudera, Inc. All rights reserved. Navigator Analytics • Metadata - the number of files by creation and access times, size, block size, and replication count. • Audit – Activity tab - by directory which files have been accessed using the open operation and how many times they have been accessed. – Top Users tab - the top-n commands and the top-n users and top n commands those users performed
  • 18. 18© 2014 Cloudera, Inc. All rights reserved. Navigator Audit Architecture
  • 19. 19© 2014 Cloudera, Inc. All rights reserved. Compliance-Ready Governance & Protection For Compliance Officers Track, understand, and protect access to sensitive data • Search centralized audits for the entire ecosystem • See how data is used and changing with intuitive lineage • Protect all data with high-performance encryption and key management • Integrate with leading partner tools
  • 20. 20© 2014 Cloudera, Inc. All rights reserved. Policy Based Data Management • Automate data stewardship and curation activities with the policy engine • Data archive • Data delete • Metadata management • automatic naming with timestamp: entity.get(FSEntityProperties.ORIGINAL_NAME, Object.class) + " - " + new SimpleDateFormat("yyyy-MM- dd").format(entity.get(FSEntityProperties.CREATED, Instant.class).toDate()) • Ensured business continuity through built-in backup & disaster recovery • Integrate with leading partner tools
  • 21. 21© 2014 Cloudera, Inc. All rights reserved. Lineage • Lineage provides provenance information to show where data came from and how it has been transformed within the EDH • Cloudera Navigator provides column-level lineage within Cloudera EDH • Integrates with certified third party lineage solutions, such as Informatica, for enterprise-wide lineage information
  • 22. 22© 2014 Cloudera, Inc. All rights reserved. Lineage
  • 23. 23© 2014 Cloudera, Inc. All rights reserved. End-to-End Data Management Cloudera Navigator + Partners Lineage Auditing Metadata AugmentationConsumption
  • 24. 24© 2014 Cloudera, Inc. All rights reserved. Agenda • Hadoop Security Pillars • Metadata Management and Data Audit • Data Security at Rest and in Transit
  • 25. 25© 2014 Cloudera, Inc. All rights reserved. Background • Our customers are increasingly wanting to use HDFS to store sensitive data • Customers often are mandated to protect data at rest • National Security • Company confidential • Encryption of data at rest helps mitigate certain security threats • Rogue administrators (insider threat) • Lost/stolen hard drives
  • 26. 26© 2014 Cloudera, Inc. All rights reserved. Over the Wire Encryption • Uses certificates and TLS to encrypt and optionally authenticate network communication • Customers can use commercial certificate authorities, corporate CAs, or self- signed certificates • Active Directory Certificate Services is commonly used by customers • Secures Hadoop data processing components as well as Cloudera Manager agents and management services
  • 27. 27© 2014 Cloudera, Inc. All rights reserved. Data at Rest Encryption • Protects data on disk from unauthorized exposure • Protects the data from both online attacks while the system is running as well as offline attacks such as stealing physical drives • HDFS transparent encryption at rest is an open source technology available in Apache Hadoop • Navigator Encrypt is a proprietary technology that protects data outside HDFS • Backend databases, log directories, temp directories, landing zones • Navigator KeyTrustee Server is a proprietary key management server that can integrate with an enterprise HSM
  • 28. 28© 2014 Cloudera, Inc. All rights reserved. HDFS Encrypt + Navigator Encrypt + Key Trustee
  • 29. 29© 2014 Cloudera, Inc. All rights reserved.
  • 30. 30© 2014 Cloudera, Inc. All rights reserved.© Cloudera, Inc. All rights reserved.
  • 31. 31© 2014 Cloudera, Inc. All rights reserved. Navigator Key Trustee Architecture
  • 32. 32© 2014 Cloudera, Inc. All rights reserved. Key Management Service (KMS) • When encrypting any data it is important to securely store your encryption keys away from the encrypted data • KMS is a Key Management Service for HDFS Encryption to store and retrieve encryption keys • KMS is open source and provides a standard interface for pluggable key providers • The default key provider for KMS is the Java Key Store • The Java Key Store is not recommended for production key management is meant for development and testing
  • 33. 33© 2014 Cloudera, Inc. All rights reserved. Key Management Service (KMS) ● Encryption occurs on the requesting client. ○ Data is encrypted before it lands on disk. ○ The KMS encrypts and decrypts specific key components. ○ The KMS does not encrypt content. ○ The KMS does not store keys.
  • 34. 34© 2014 Cloudera, Inc. All rights reserved. KMS Proxy Deployment considerations.
  • 35. 35© 2014 Cloudera, Inc. All rights reserved. KMS Proxy Deployment considerations.
  • 36. 36© 2014 Cloudera, Inc. All rights reserved. Navigator Key Trustee • Navigator Key Trustee provides secure, centralized and scalable key storage and administration • Is not open source and licensed with Cloudera Navigator • Is the recommended option for production deployments • Provides the hooks to integrate with Hardware Security Modules for physically tamper proof requirements (FIPS 140-2 level 3) • Also provides centralized Key Management for Navigator Encrypt
  • 37. 37© 2014 Cloudera, Inc. All rights reserved. • Customers may choose to use Hardware Security Modules (HSM) to improve the security of their Key store. • Key HSM is a universal Hardware Security Module (HSM) driver. • It acts as a translator between the target HSM Platform and Key Trustee. Key HSM
  • 38. 38© 2014 Cloudera, Inc. All rights reserved. Hardware Security Module (HSM) • There are a number of vendors out there that provide this. • They exists as appliances and attachable physical hardware. • If one is configured with Key Trustee it will be used as a Root of Trust. • Data inside of the Key Trustee Keystore will be encrypted by this Root of Trust. • The HSM "master" keys are generated in the HSM and never leave the HSM.
  • 39. 39© 2014 Cloudera and/or its affiliates. All rights reserved. HDFS Encryption Workflow
  • 40. 40© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Involved Parties HDFS KMS Key Trustee zHSM HSM Client optional Key authorization File authorization ©2014 Cloudera, Inc. All rights reserved.
  • 41. 41© 2014 Cloudera, Inc. All rights reserved. Keys Used in Encryption at Rest HDFS Encryption • Encryption Zone Key (EZKEY) • This key much like a mount key is associated with an encryption zone in HDFS. • Encrypted Data Encryption Key (EDEK) • This is an encrypted copy of a Data Encryption Key. • Data Encryption Key (DEK) • This is the real data encryption key used to encrypt data stored within a file, zone, or block device. This particular key concept is used in both Navigator Encrypt and HDFS Transparent Data Encryption (TDE).
  • 42. 42© 2014 Cloudera, Inc. All rights reserved. Keys Used in Encryption at Rest (1) When an EZ is created, the administrator specifies an encryption zone key (EZ Key) that is already stored in the backing keystore. The EZ Key encrypts the data encryption keys (DEKs) that are used in turn to encrypt each file. DEKs are encrypted with the EZ key to form an encrypted data encryption key (EDEK), which is stored on the NameNode via an extended attribute on the file (2) To encrypt a file, the client retrieves a new EDEK from the NameNode, and then asks the KMS to decrypt it with the corresponding EZ key. This step results in a DEK (3) the client uses a DEK to encrypt their data (3). (4)To decrypt a file, the client needs to again decrypt the file’s EDEK with the EZ key to get the DEK (2). Then, the client reads the encrypted data and decrypts it with the DEK .
  • 43. 43© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Writing a File HDFS KMS Client To Trustee 2 3 6 7 1 5 8 1. create file 2. generate key 3. encrypted key 4. store encrypted key 5. file handle & encrypted key 6. decrypt encrypted key 7. decrypted key 8. encrypt & write data 4 ©2014 Cloudera, Inc. All rights reserved.
  • 44. 44© 2014 Cloudera, Inc. All rights reserved. HDFS Encryption, Reading a File HDFS KMS Client To Trustee 3 4 1 2 5 1. open file (passed read permission check) 2. file handle & encrypted key 3. decrypt encrypted key 4. decrypted key 5. read & decrypt data ©2014 Cloudera, Inc. All rights reserved.
  • 45. 45© 2014 Cloudera and/or its affiliates. All rights reserved. HDFS Encryption Implementation and Usage
  • 46. 46© 2014 Cloudera, Inc. All rights reserved. Enabling HDFS Encryption on a Cluster • Need recent version of libcrypto.so on HDFS and MapReduce client hosts • To check use the following command: hadoop checknative Output openssl: true /usr/lib64/libcrypto.so • yum install openssl openssl-devel • openssl package installs the library, openssl-devel creates the libcrypto.so symlink (you can manually create this as well) • Openssl provides AES-NI integration for Intel hardware
  • 47. 47© 2014 Cloudera, Inc. All rights reserved. Enabling HDFS Encryption on a Cluster Using Cloudera Manager 1) Adding the KMS Service - add service Java KeyStore KMS on a host 2) Enabling Java KeyStore KMS for the HDFS Service • HDFS service – configuration tab • Scope > HDFS (Service-Wide) • Category > All • KMS Service property – turn on radio button SAVE CHANGES Restart Cluster Deploy Client Configuration.
  • 48. 48© 2014 Cloudera, Inc. All rights reserved. Creating Encryption Zones • Use the hadoop key and hdfs crypto command-line tools to create encryption keys and set up new encryption zones. # Create an encryption key for your zone as the application user that will be using the key $ hadoop key create myKey # Create a new empty directory and make it an encryption zone $ hadoop fs -mkdir /zone $ hdfs crypto -createZone -keyName myKey -path /zone # To see the key zones $ hdfs crypto –listZones
  • 49. 49© 2014 Cloudera, Inc. All rights reserved. Adding Files to an Encryption Zones Remember they start empty! You cannot create a Zone in directories with data hadoop distcp /user/dir /user/enczone • By default, distcp compares checksums provided by the filesystem to verify that data was successfully copied to the destination. • When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. • Use -skipcrccheck and -update flags to avoid verifying checksums. • Also use the distcp flags to preserve all attributes (-prbugpcaxt)
  • 50. 50© 2014 Cloudera, Inc. All rights reserved. Unified Governance Foundation Unified Auditing Comprehensive Lineage Unified Metadata Universal Policies Search Define Analyze Profile Self-Service Discovery & Analytics Effortlessly find and trust the data that matters most Audit Track Encrypt Manage Keys Compliance-Ready Governance & Protection Track, understand, and protect access to sensitive data Report Optimize Migrate Maintain Models Active Data Optimization Configure Hadoop to boost user productivity Classify Steward Backup Retain Hadoop-Scale Data Lifecycle Management Maximize cluster performance at Hadoop scale with ease Cloudera Navigator The only integrated data management and governance platform for Hadoop
  • 51. 51© 2014 Cloudera, Inc. All rights reserved. Challenge: All applications, databases, or file systems that have the potential to handle personal account-related data must undergo full PCI certification Solution: MasterCard’s Cloudera environment fully conforms to the PCI-DSS V 2.0 security standards so it can host PCI datasets and potentially integrate with other internal systems MasterCard Cloudera: The first PCI-Certified Hadoop Platform Data privacy and protection is a top priority for MasterCard. As we maximize the most advanced technologies from partners and vendors, they must meet the rigorous security standards we’ve set. With Cloudera’s commitment to the same standards, we now have additional options in how we manage our data center.”Gary VonderHaar Chief Technology Officer, Architecture MasterCard