Pradeep Varadan, Verizon's Wireline OSS Data Science Lead and Scott Gidley, Zaloni's VP, Product Management discuss the benefits of augmenting your DW with a data lake in this webinar presentation.
2. • Award-winning provider of enterprise data lake
management solutions:
Integrated data lake management platform
Self-service catalog and data preparation
• Data Lake Design and Implementation Services:
POC, Pilot, Production, Operations, Training
• Data Science Professional Services
3. 3 Zaloni Proprietary
About our speakers
Pradeep Varadan, Verizon Wireline, OSS Data Science Leader
Varadan is a data scientist and enterprise architect who specializes in data challenges within
telecommunications. He is tasked with providing a competitive edge focused on utilizing data
analytics to drive effective decision-making. He is skilled in creating systems that can be used to
understand and make better decisions involving rapid technology shifts, customer lifestyle and
behavior trends and relevant changes that impact the Verizon Network.
Scott Gidley, Zaloni, VP Product Management
Gidley is responsible for the strategy and roadmap of existing and future products within the Zaloni
portfolio. He is a nearly 20 year veteran of the data management software and services market.
Prior to joining Zaloni, he served as senior director of product management at SAS and was
previously CTO and cofounder of DataFlux Corporation.
4. Zaloni Confidential and Proprietary - Provided under NDA
4 Zaloni Proprietary
Current state of a corporate data flow architecture
BI/ReportingData Generators
Machines
Data Channels
Warehouses Marts
Repositories
Data stores
4 Zaloni Proprietary
5. 5 Zaloni Proprietary
Business Challenges:
• Increased processing time/reduced
response
• Lack of data lineage/lack of
visibility
• Constant CapEx for hardware
upgrade
• Lack of access to history
Key Challenges
IT Challenges:
• Multiple data transfers
• Multiple technology platforms with
data copies
• Constant performance tuning
for CPU
• Manual data offload for space
management
6. Zaloni Confidential and Proprietary - Provided under NDA
6 Zaloni Proprietary
Sources ETL Report Mart
Data Discovery
Analytics BI
ELT/Reporting/MiningETL
Resource consumption
Staging Warehouse
6 Zaloni Proprietary
7. Zaloni Confidential and Proprietary - Provided under NDA
7 Zaloni Proprietary
Typical utilization of RDBMS resources
We expend almost all CPU for low business value ETL
Business Value
CPU
ETL to Stage
Auditing
(Landing tables query)
Data Mining
(Staging query)
Ad-hoc Analysis
(Warehouse query)
ETL to Warehouse
ETL to Reporting
Reporting
(Presentation table query)
*Size indicates frequency of use
7 Zaloni Proprietary
8. Zaloni Confidential and Proprietary - Provided under NDA
8 Zaloni Proprietary
~80% of system capacity used for batch processing (ELT)
8 Zaloni Proprietary
9. Zaloni Confidential and Proprietary - Provided under NDA
9 Zaloni Proprietary
Reduce cost of ELT/ETL by offloading to Hadoop
9 Zaloni Proprietary
10. Zaloni Confidential and Proprietary - Provided under NDA
10 Zaloni Proprietary
The future of enterprise data flowFuture
10 Zaloni Proprietary
Legacy
Structured Data ETL EDW+Sandbox BI/ReportingData Marts
Transactional
Systems
Machine logs/IOT
Structured/ Unstructured
Data Lake
Modern
T-Systems
Machines ETL Sandbox
EDW BI/Reporting/
Analytics
Data Marts
Operational Dashboards/EDA/Mining/Reporting/Analytics
Transactional
Systems
EDW Data Marts ETL Sandbox
ETL
12. 12 Zaloni Proprietary
Data lake challenges
• Ingestion
• Visibility and Quality
• Privacy and Compliance
• Timeliness
• Reliance on IT
• Reusability
• Rate of Change
• Skills Gap
• Complexity
Managing: Delivering:Building:
13. Zaloni Confidential and Proprietary - Provided under NDA
13 Zaloni Proprietary
Data Lake 360°: A holistic approach to actionable big data
1. Enable the lake
2. Govern the data
3. Engage the business
• Foster a data-driven business
through self-service data
discovery and preparation
• Safeguard sensitive data and
enable regulatory compliance
• Improve data visibility, reliability
and quality to reduce time-to-
insight
• Leverage the full power of a scale-out
architecture with an actionable,
scalable data lake
14. 14 Zaloni Proprietary
• Managed Ingestion
Ability to ingest vast amounts of data
Ability to handle a wide variety of formats
(streaming, files, custom) and sources
Build in repeatability through automation to pick up incoming data and
apply pre-defined processing
• Metadata Management
Capture and manage operational, technical and business metadata
Provides visibility and reliability – key to finding data in the lake
Reduced time to insight for analytics
File and record level watermarking provides data lineage, enables
audit and traceability
Enable the lake
15. 15 Zaloni Proprietary
Govern the data
• Data Lineage
See how data moves and how it is consumed in the data lake.
Safeguard data and reduce risk, always knowing where data
has come from, where it is, and how it is being used.
• Data Quality
Rules based Data validation
Integration with the Managed Data Pipeline
Stats and metrics for reporting and actions
16. 16 Zaloni Proprietary
Govern the data
• Data Security and Privacy
Differing permissions require enhanced data security
Mask or tokenize data before published in the lake for consumption
Policy-based security
• Data lifecycle management across tiered storage environments
Hot -> Warm -> Cold on an entity level based on policies/SLAs
Across on-premise and cloud environments
Provide data management features to automate scheduling and
orchestration of data movement between heterogeneous storage
environments
17. Zaloni Confidential and Proprietary - Provided under NDA
17 Zaloni Proprietary
Engage the business
• Data Catalog
See what data is available across your enterprise
Contribute valuable business information to improve
search and usage
Use a shopping cart experience to create sandbox for ad-
hoc and exploratory analytics
• Self-service Data Preparation
Blend data in the lake without a costly IT project
Perform interactive data-driven transformations
Collaborate and share data assets and transformations
with peers
18. Zaloni Confidential and Proprietary - Provided under NDA
18 Zaloni Proprietary
Data lake reference architecture
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
• Standardized on corporate governance/ quality policies
• Consumers are anyone with appropriate role-based access
• Single version of truth
Transient
Landing Zone
Raw Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
• Temporary store of
source data
• Consumers are IT,
Data Stewards
• Implemented in highly
regulated industries
• Original source data
ready for consumption
• Consumers are ETL
developers, data
stewards, some data
scientists
• Single source of truth
with history
• Data required for LOB specific views - transformed
from existing certified data
• Consumers are anyone with appropriate role-based access
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data
16 Zaloni Proprietary
19. 19 Zaloni Proprietary
Data lake reference architecture with Zaloni
Consumption ZoneSource
System
File Data
DB Data
ETL Extracts
Streaming
Transient
Landing Zone Raw Zone
Refined
Zone
Trusted
Zone
Sandbox
APIs
Metadata
Management
Data Quality Data Catalog Security
Data Lake
Business Analysts
Researchers
Data Scientists
DATA LAKE MANAGEMENT
& GOVERNANCE PLATFORM
Sensors
(or other time series data)
Relational Data
Stores
(OLTP/ODS/DW)
Logs
(or other unstructured
data)
Social and
shared data
EDW
Data Marts
20. 20 Zaloni Proprietary
• Save millions in storage costs
• Significantly speed up processing
• Maximize the data warehouse for BI
• Extract more value from all of your data
Four great reasons to augment with a data lake
21. 21 Zaloni Proprietary
Centralized data, decentralized access
Business Analyst Business Manager Data Scientist Business SME
What happened? What is happening? What will happen? What can we control? Can I see the data?
IT Team
Business
Users
IT Analyst Programmer DBA/Modeler Data Scientist Data Engineer
Data Lake
Code Analysis App ImplementationApp PrototypeData ModelCode Development
Operations Manager