Operationalizing your Data Lake: Get Ready for Advanced Analytics
1. Operationalizing your Data Lake:
Get Ready for Advanced Analytics
October 22nd , 2017
Parth Patel | Big Data Solutions Engineer
ppatel1@zaloni.com
2. 2
Industry-leading enterprise
data lake management,
governance and
self-service platform
Expert data lake
professional services
(Design, Implementation,
Workshops, Training)
Solutions to simplify
implementation
and reduce business risk
Enabling the data-powered enterprise
Zaloni Confidential and Proprietary
3. 3 Zaloni proprietary – do not duplicate without permission
Increased
Agility
New
Insights
Improved
Scalability
Data lakes are central to the modern data architecture
4. 4 Zaloni Proprietary
Data architecture modernizationTraditionalModern
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various
Sources
Data Discovery
Analytics BI
Data Science
Data Discovery
Analytics BI
5. Zaloni Confidential and Proprietary - Provided under NDA
5 Zaloni Proprietary
0% of market
Optimize
Self-Organizing Data Lake
• Self-improving data
lake via machine
learning algorithms
• True democratization
of big data and
analytics
• Intelligent data
remediation and
curation
• Recommended Data
Security, and
Governance policies
• Lights out business
operations optimized
for business success
2% of market
Automate
Responsive Data Lake
• Self-Service Ingestion
& Provisioning
• 360 View of Customer,
Product, etc
• Enterprise Data
Discovery
• Operationalize
analytical models into
business fabric
• Enables immediate data
impact on business
operations
Manage
10% of market
Managed Data Lake
• Acquire useful data from
across the enterprise
• Improved visibility and
understanding via
managed Ingestion of
data and metadata
• Ensure security and
privacy of sensitive data
• Operationalize
data at scale
• Leverage enterprise
governance &
security policies
• Scalable production data
lake for new and improved
business insights
22% of market
Store
Data Swamp
• Hadoop on premises
or in the Cloud
• Limited visibility and
usability of data
• Limited corporate
oversight & governance
• Sandbox or Dev
Environments
• Ad hoc and incremental
growth of big data
applications
• Ad-hoc and exploratory
insights for individual
use cases
Zaloni Big Data Maturity Model
Stage:
Characteristics:
Descriptor:
Stage Today:
Business
Impact:
Ignore
66% of market
• Emphasis on
structured data
• Limited ability to
leverage data at
scale
• Business emphasis
on retrospective
reporting and
analysis
• Strong governance
and security policies
• Slow to
accommodate
business changes
Data Warehouse
Value Realized
6. 6 Zaloni proprietary – do not duplicate without permission
Managing the Data Supply Chain from Source to Consumer
CONSUMERS
Business Analysts
Researchers
Data Scientists
Applications
• Data Lake Management Platform
• A software solution for data lake management that enables enterprise-wide scalability
• Provides end to end capabilities
Self-Service
Data
Data Lake Management Platform
Enable Govern Engage
Batch ingestion
Streaming
Ingestion
Auto
discovery
Data Quality
Data Privacy and
Security
Data Lifecycle
Management
CatalogMetadata
Management
Operationalize
Transformations
Self-Service
Data Preparation
PRODUCERS
File Data
Streaming
Relational
On-premise
7. 7 Zaloni Proprietary
Data Lake Reference Architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies
• Consumers are anyone with appropriate role-based access
• Single version of truth
Transient
Landing Zone
Raw
Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of
source data
• Consumers are IT,
Data Stewards
• Implemented in highly
regulation industries
• Original source data
ready for consumption
• Consumers are ETL
developers, data
stewards, some data
scientists
• Single source of truth
with history
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data
8. 8 Zaloni Proprietary
Machine learning for data lake implementations
Loyalty
Customer
Service
TransactionsMarketing
3rd Party
● Easily integrate data silos
● Probabilistic data matching and record
linkage
● Automatically classify, encrypt/mask
PII/Sensitive data for regulatory
compliance
Integrate Data
Silos
9. 9 Zaloni Proprietary
• Extend data lake beyond Hadoop
• Catalog traditional sources
• Ingest datasets without IT
• Prepare & provision data to your tool of
choice
Increasing data lake adoption through self-service
Self-Service Data
Preparation