SlideShare ist ein Scribd-Unternehmen logo
1 von 25
© 2015 MapR Technologies 1© 2015 MapR Technologies
Deploying a Governed Data Lake
© 2015 MapR Technologies 2
Welcome
• Event will be recorded
• Ask your questions in the Q&A Panel in the lower right-hand
corner of your screen
• Tweet us @mapr during the event
© 2015 MapR Technologies 3
Key Points
• The data lake is becoming a “real-time” shared service to provide
data to the business to support data science and big data
analytics needs
• As the data lake becomes a trusted source of data to drive big
data analytics, security and data governance have to be
addressed
• Security and data governance policies need to be implemented
in a way that still enables self-service and quick time to value vs.
creating 3-6 month delays
© 2015 MapR Technologies 4
Deliver Data Discovery Agility with a Governed “Data Layer”
Adhere to security,
compliance and data
governance policies
Catalog data assets at scale,
with secure provisioning to
the business
Find and understand best-
suited and most trusted data
© 2015 MapR Technologies 5
The danger of the data lake becoming a flea market
Botond Horvath / Shutterstock.com
INVENTORY
DATA
Can’t create and maintain an
inventory fast enough
Big Data Architect INVENTORY
DATA
Can’t explore everything to find
the best item
Data Engineer/Data
Scientist/Business Analyst
INVENTORY
DATA
Can’t tell what’s what and what
can be trusted
CDO/Data Steward
© 2015 MapR Technologies 6
Imagine shopping on Amazon.com
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 7
Governed data lake is like Amazon.com for data in Hadoop
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 8
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Operational
Apps
Recommendation
Fraud Detection
Logistics
MapR-DB MapR-FS
MapR Data Platform
Distribution including
Apache Hadoop
The Governed Data Lake on Apache Hadoop
Data Inventory:
Find, understand
and govern
© 2015 MapR Technologies 9
The Governed Data Lake
Define Ingest Inventory Explore Provision
Wrangle/Model/Vi
sualize
• Critical data elements
• Sensitive data elements
• Security and data
governance policies
• Load
• Profile
• Automatic tagging
• Discover metadata
and generate tags
• Discover data lineage
• Manage tags
• Browse/search
inventory
• Inspect data quality
• Tag and annotate
• Bookmark
• Copy
• Authorized view
Governed data lake as a shared service
Data Governance Data Discovery Agility
Data protection, authentication, authorization, auditing
Can you achieve both?
© 2015 MapR Technologies 10
Find, understand and govern data in Hadoop
© 2015 MapR Technologies 11
Waterline Data is like Amazon.com for data in Hadoop
GOVERNANCE
Inventory
Find and Understand
Provision
© 2015 MapR Technologies 12
Inventory
© 2015 MapR Technologies 13
Find and Understand
© 2015 MapR Technologies 14
Provision
Future: Generate
Drill Views
© 2015 MapR Technologies 15
Governance
© 2015 MapR Technologies 16
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Operational
Apps
Recommendation
Fraud Detection
Logistics
MapR-DB MapR-FS
MapR Data Platform
Distribution including
Apache Hadoop
The Governed Data Lake on Apache Hadoop with MapR
Data Inventory:
Find, understand
and govern
© 2015 MapR Technologies 17
Separate Distinct Data Sets via MapR Volumes
Volumes dramatically simplify
management:
• Replication factor
• Scheduled mirroring
• Scheduled snapshots
• Data placement control
• User access and tracking
• Administrative permissions
/projects
/tahoe
/yosemite
/user
/msmith
/bjohnson
© 2015 MapR Technologies 18
MapR Trust Model (Product Security)
Flexible
Authentication
• Wire-level authentication for all
services in the cluster
• NSA-level cryptographic algorithms
• Integration with LDAP, Active
Directory and other third party
directory services
• Kerberos or username/password
authentication
1
A
AA
DP
Granular
Authorization
• Access Control Expressions
• Protect files, tables, column families,
columns, and management objects
• Extend to role-based access control
(RBAC) with custom role functions
• Drill Views
2Robust
Auditing
• All events recorded immediately
in JSON log files
• Includes data access and
administrative actions
• Ad-hoc queries and custom
reports on audit logs via SQL and
standard BI tools
3
Ubiquitous
Data Protection
• Encryption for Data in Motion
• Within a Cluster
• Between Clusters
• Between Client and Cluster
• Encryption for Data at Rest
• LUKS
• Self-Encrypting Disk
• Partners
4
© 2015 MapR Technologies 19
MapR Comprehensive Auditing
Serving Security Analysts…
Monitoring
Incident
Response
• Who touched customer records outside of
business hours?
• What actions did users take in the days
before leaving the company?
• What operations were performed without
following change control?
• Are users accessing sensitive files from
protected/secured source IPs?
• Why do my reports look different, despite
sourcing from same underlying data?
Security
© 2015 MapR Technologies 20
MapR Comprehensive Auditing (cont.)
…And Data Scientists Too
• Which data is used most frequently?
Implication: High Value; Share More
Broadly
• Which data is least commonly used?
Implication: Low Value; Candidate
for Purge
• Which data should be used more?
Implication: Underutilized; Increase
Awareness
• What administrative actions are
most commonly performed?
Implication: Candidate for
automation
Predictive Analytics
© 2015 MapR Technologies 21
MapR Audits – Key Features
Data Access
• Files
• MapR-DB Tables
Cluster Operations
• Administrative Operations
• Maprcli commands
Authentication Requests
Secure
High Performance
Flexible
• Retention Period
• Maxsize
• Coalesce Interval
JSON Format
{"timestamp":"{$date=2015-06-
01T05:24:58.231Z}","operation":"GETATTR",
"user":"root","uid":"0","ipAddress":"10.10.x.x",
"nfsServer":"10.10.x.x","srcPath":"/dbtest.0/","
srcFid":"2147.16.2","VolumeName":“mktg_file
s","volumeId":“mktg_files","status":"0"}
© 2015 MapR Technologies 22
Access Control that Scales
PAM Authentication +
User Impersonation
Fine-grained row and
column level access control
with Drill Views – no
centralized security
repository required
Files HBase Hive
Drill
View 1
Drill
View 2
UUU
User
User
© 2015 MapR Technologies 23
Ownership Chaining
Combine Self Service Exploration with Data Governance
Name City State Credit Card #
Dave San Jose CA 1374-7914-3865-4817
John Boulder CO 1374-9735-1794-9711
Raw File (/raw/cards.csv)
Name City State Credit Card #
Dave San Jose CA 1374-1111-1111-1111
John Boulder CO 1374-1111-1111-1111
Data Scientist (/views/V_Scientist)
Jane (Read)
John (Owner)
Name City State
Dave San Jose CA
John Boulder CO
Analyst(/views/V_Analyst)
Jack (Read)
Jane(Owner)
RAWFILEV_ScientistV_Analyst
Does Jack have access to V_Analyst? ->YES
Who is the owner of V_Analyst? ->Jane
Drill accesses V_Analyst as Jane (Impersonation hop 1)
Does Jane have access to V_Scientist ? -> YES
Who is the owner of V_Scientist? ->John
Drill accesses V_Scientist as John (Impersonation hop 2)
John(Owner)
Does John have permissions on raw file? -> YES
Who is the owner of raw file? ->John
Drill accesses source file as John (no impersonation here)
Jack queries the view V_Analyst
*Ownership chain length (# hops) is configurable
Ownership
chaining
Access
path
© 2015 MapR Technologies 24
Find, Understand and Govern Data in Hadoop
At Scale and in Real-Time
Discover and protect
sensitive data, audit
and authorize access
to the data lake,
discover data lineage,
and provide data
stewardship
CDO/Data Steward
Automate cataloging of
data assets at scale,
with secure
provisioning to
business users
Big Data Architect
Find and understand
best-suited and most
trusted data without
having to explore
every file manually
Data Engineer/Data
Scientist/Business Analyst
© 2015 MapR Technologies 25
Learn More
www.waterlinedata.com
• Watch the solution video
• Read analyst papers
• Download the free Waterline
Data / MapR sandbox
• Request a demo
• Download and evaluate the
product
www.mapr.com
• Get free On-Demand
Training for Hadoop
• Download the free Waterline
Data / MapR sandbox

Weitere ähnliche Inhalte

Mehr von MapR Technologies

Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 

Mehr von MapR Technologies (20)

Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 

Kürzlich hochgeladen

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Best Practices to Deploy a Governed Data Lake

  • 1. © 2015 MapR Technologies 1© 2015 MapR Technologies Deploying a Governed Data Lake
  • 2. © 2015 MapR Technologies 2 Welcome • Event will be recorded • Ask your questions in the Q&A Panel in the lower right-hand corner of your screen • Tweet us @mapr during the event
  • 3. © 2015 MapR Technologies 3 Key Points • The data lake is becoming a “real-time” shared service to provide data to the business to support data science and big data analytics needs • As the data lake becomes a trusted source of data to drive big data analytics, security and data governance have to be addressed • Security and data governance policies need to be implemented in a way that still enables self-service and quick time to value vs. creating 3-6 month delays
  • 4. © 2015 MapR Technologies 4 Deliver Data Discovery Agility with a Governed “Data Layer” Adhere to security, compliance and data governance policies Catalog data assets at scale, with secure provisioning to the business Find and understand best- suited and most trusted data
  • 5. © 2015 MapR Technologies 5 The danger of the data lake becoming a flea market Botond Horvath / Shutterstock.com INVENTORY DATA Can’t create and maintain an inventory fast enough Big Data Architect INVENTORY DATA Can’t explore everything to find the best item Data Engineer/Data Scientist/Business Analyst INVENTORY DATA Can’t tell what’s what and what can be trusted CDO/Data Steward
  • 6. © 2015 MapR Technologies 6 Imagine shopping on Amazon.com GOVERNANCE Inventory Find and Understand Provision
  • 7. © 2015 MapR Technologies 7 Governed data lake is like Amazon.com for data in Hadoop GOVERNANCE Inventory Find and Understand Provision
  • 8. © 2015 MapR Technologies 8 Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Operational Apps Recommendation Fraud Detection Logistics MapR-DB MapR-FS MapR Data Platform Distribution including Apache Hadoop The Governed Data Lake on Apache Hadoop Data Inventory: Find, understand and govern
  • 9. © 2015 MapR Technologies 9 The Governed Data Lake Define Ingest Inventory Explore Provision Wrangle/Model/Vi sualize • Critical data elements • Sensitive data elements • Security and data governance policies • Load • Profile • Automatic tagging • Discover metadata and generate tags • Discover data lineage • Manage tags • Browse/search inventory • Inspect data quality • Tag and annotate • Bookmark • Copy • Authorized view Governed data lake as a shared service Data Governance Data Discovery Agility Data protection, authentication, authorization, auditing Can you achieve both?
  • 10. © 2015 MapR Technologies 10 Find, understand and govern data in Hadoop
  • 11. © 2015 MapR Technologies 11 Waterline Data is like Amazon.com for data in Hadoop GOVERNANCE Inventory Find and Understand Provision
  • 12. © 2015 MapR Technologies 12 Inventory
  • 13. © 2015 MapR Technologies 13 Find and Understand
  • 14. © 2015 MapR Technologies 14 Provision Future: Generate Drill Views
  • 15. © 2015 MapR Technologies 15 Governance
  • 16. © 2015 MapR Technologies 16 Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Operational Apps Recommendation Fraud Detection Logistics MapR-DB MapR-FS MapR Data Platform Distribution including Apache Hadoop The Governed Data Lake on Apache Hadoop with MapR Data Inventory: Find, understand and govern
  • 17. © 2015 MapR Technologies 17 Separate Distinct Data Sets via MapR Volumes Volumes dramatically simplify management: • Replication factor • Scheduled mirroring • Scheduled snapshots • Data placement control • User access and tracking • Administrative permissions /projects /tahoe /yosemite /user /msmith /bjohnson
  • 18. © 2015 MapR Technologies 18 MapR Trust Model (Product Security) Flexible Authentication • Wire-level authentication for all services in the cluster • NSA-level cryptographic algorithms • Integration with LDAP, Active Directory and other third party directory services • Kerberos or username/password authentication 1 A AA DP Granular Authorization • Access Control Expressions • Protect files, tables, column families, columns, and management objects • Extend to role-based access control (RBAC) with custom role functions • Drill Views 2Robust Auditing • All events recorded immediately in JSON log files • Includes data access and administrative actions • Ad-hoc queries and custom reports on audit logs via SQL and standard BI tools 3 Ubiquitous Data Protection • Encryption for Data in Motion • Within a Cluster • Between Clusters • Between Client and Cluster • Encryption for Data at Rest • LUKS • Self-Encrypting Disk • Partners 4
  • 19. © 2015 MapR Technologies 19 MapR Comprehensive Auditing Serving Security Analysts… Monitoring Incident Response • Who touched customer records outside of business hours? • What actions did users take in the days before leaving the company? • What operations were performed without following change control? • Are users accessing sensitive files from protected/secured source IPs? • Why do my reports look different, despite sourcing from same underlying data? Security
  • 20. © 2015 MapR Technologies 20 MapR Comprehensive Auditing (cont.) …And Data Scientists Too • Which data is used most frequently? Implication: High Value; Share More Broadly • Which data is least commonly used? Implication: Low Value; Candidate for Purge • Which data should be used more? Implication: Underutilized; Increase Awareness • What administrative actions are most commonly performed? Implication: Candidate for automation Predictive Analytics
  • 21. © 2015 MapR Technologies 21 MapR Audits – Key Features Data Access • Files • MapR-DB Tables Cluster Operations • Administrative Operations • Maprcli commands Authentication Requests Secure High Performance Flexible • Retention Period • Maxsize • Coalesce Interval JSON Format {"timestamp":"{$date=2015-06- 01T05:24:58.231Z}","operation":"GETATTR", "user":"root","uid":"0","ipAddress":"10.10.x.x", "nfsServer":"10.10.x.x","srcPath":"/dbtest.0/"," srcFid":"2147.16.2","VolumeName":“mktg_file s","volumeId":“mktg_files","status":"0"}
  • 22. © 2015 MapR Technologies 22 Access Control that Scales PAM Authentication + User Impersonation Fine-grained row and column level access control with Drill Views – no centralized security repository required Files HBase Hive Drill View 1 Drill View 2 UUU User User
  • 23. © 2015 MapR Technologies 23 Ownership Chaining Combine Self Service Exploration with Data Governance Name City State Credit Card # Dave San Jose CA 1374-7914-3865-4817 John Boulder CO 1374-9735-1794-9711 Raw File (/raw/cards.csv) Name City State Credit Card # Dave San Jose CA 1374-1111-1111-1111 John Boulder CO 1374-1111-1111-1111 Data Scientist (/views/V_Scientist) Jane (Read) John (Owner) Name City State Dave San Jose CA John Boulder CO Analyst(/views/V_Analyst) Jack (Read) Jane(Owner) RAWFILEV_ScientistV_Analyst Does Jack have access to V_Analyst? ->YES Who is the owner of V_Analyst? ->Jane Drill accesses V_Analyst as Jane (Impersonation hop 1) Does Jane have access to V_Scientist ? -> YES Who is the owner of V_Scientist? ->John Drill accesses V_Scientist as John (Impersonation hop 2) John(Owner) Does John have permissions on raw file? -> YES Who is the owner of raw file? ->John Drill accesses source file as John (no impersonation here) Jack queries the view V_Analyst *Ownership chain length (# hops) is configurable Ownership chaining Access path
  • 24. © 2015 MapR Technologies 24 Find, Understand and Govern Data in Hadoop At Scale and in Real-Time Discover and protect sensitive data, audit and authorize access to the data lake, discover data lineage, and provide data stewardship CDO/Data Steward Automate cataloging of data assets at scale, with secure provisioning to business users Big Data Architect Find and understand best-suited and most trusted data without having to explore every file manually Data Engineer/Data Scientist/Business Analyst
  • 25. © 2015 MapR Technologies 25 Learn More www.waterlinedata.com • Watch the solution video • Read analyst papers • Download the free Waterline Data / MapR sandbox • Request a demo • Download and evaluate the product www.mapr.com • Get free On-Demand Training for Hadoop • Download the free Waterline Data / MapR sandbox