Many organizations focus on the licensing cost of Hadoop when considering migrating to a cloud platform. But other costs should be considered, as well as the biggest impact, which is the benefit of having a modern analytics platform that can handle all of your use cases. This session will cover lessons learned in assisting hundreds of companies to migrate from Hadoop to Databricks.
Non Text Magic Studio Magic Design for Presentations L&P.pdf
The Hidden Value of Hadoop Migration
1. Modernize Your Data Analytics
Architecture with a Unified
Approach to Data + AI
Anand Venugopal
Global Leader - Industry Solutions (Migrations)
Databricks
2. Topics
Why migrate from Hadoop to Databricks ?
Success stories, technical and business benefits
How can you migrate fast with low costs & low risk ?
3. Legacy On-Prem Analytics Architectures
Are Not Keeping Up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
This is preventing teams from driving
high-impact business outcomes
4. Why Migrate to Databricks ?
Forrester study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
5. DEVOPS INTENSIVE RIGID AND INELASTIC
Hadoop is Costly, Complex and Ineffective
Hadoop ecosystem is
complex and hard to manage
that is prone to failures
Low Productivity
24/7 HDFS clusters that
need to be built for peak
use and costly to upgrade
Cost Prohibitive
LACKS AI CAPABILITIES
No out-of-box Hadoop support
for ML/AI and separate
environments for data and AI
Slow Innovation
X
6. Enterprises Need a Modern
Data Analytics Architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive innovation
7. Enhanced Productivity Lower Cost at Scale New Insights Faster
Building a Modern Cloud Analytics
Architecture with Databricks
Data Science
Workspace
EASY TO MANAGE MASSIVE SCALE AI-ENABLED INNOVATION
Managed cloud platform
that can reliably handle all
types of data
On-demand, elastic
autoscale clusters with
optimized Apache Spark
Unified and collaborative
notebooks with built-in ML
capabilities
8. Databricks Unified Data Analytics
High performance query engineDELTA ENGINE
One platform for every use caseStreaming
Analytics
BI Data
Science
Machine
Learning
Data Lake for all your data
Structured, Semi-Structured and Unstructured
Data
Structured transactional layer
10. Business value: What did they do with us?
“The Un-carrier strategy is an approach that
seeks to listen to the customer, address their
pain points, bring innovation to the industry
and improve the wireless experience for all.”
Situation
○ Every network interaction (call, website load, text, app)
logged in 1,600 node HDP data lake (30PB).
○ 4-5 “large scale” pipelines, with hundreds of downstream
pipelines feeding the business
● PCMD (network measurement data), CDR (call records),
EDR (DNS (website)), LSR (Location)
○ Process call data to get critical network insights:
call-failure reasons and network outages.
○ PCMD – Per Call Measurement Data
● Provides insights on call failures at a granular level
● Best source to determine the outage cause and effect
● Provides rich information about the Sprint customers
roaming in T-Mobile network
11. Solution:
Holistic transformation instead of ‘lift & shift’
Overview
● Migration and transformation of
streaming data analytics from
Apache Storm and Hive on
Hortonworks to Azure Databricks
● The Data was streaming in at an
average of 2M records per second,
375GB per batch, 23 TB per day
(uncompressed)
Results
Accelerating key insights
e.g. hourly dashboards
protecting revenue and
customer churn.
78.5xPerformance gain versus
on-prem operation
BEFORE (with Hive on Tez): 47
mins for 15k cores to do the
job
AFTER: 35 mins for 256 cores
to do the same job
KPI computations took 1/4th
of the time enabling new
hourly dashboards (w/out
optimizations e.g. warm pool
and others still in process)
40%Reduction in use of 1600 node
on-prem cluster
12. Supply Chain decisions
Apply ML to 5000+ stores data
Impact
• 70% reduction in operational costs
• Accelerated Business growth
Demand Forecasting
500K stores, 2TB, 250 pipelines
Impact
• 10X more capacity
• 2X faster data pipelines
Predict Bakery food
spoilage
10+ Large Hadoop clusters
Impact
• $100M in fresh food spoilage saved
• $900K costs down, Time: 7 hr → 40m
Optimize programming
• Could not process 90 days of data
with large Hadoop cluster
Impact
• 26% Team productivity increase
• More Data, lower costs, low devops
13. Databricks Drives New Business
Value at 3 Levels
Databricks Value Framework
The Data
Platform
Business
Outcome
More value
Less value
$$$
$$
$
BUSINESS
IMPACTING
USE CASES
PRODUCTIVITY
INFRASTRUCTURE
Databricks accelerates and expands the
realization of value from
business-oriented use cases that use
net-new capabilities vs. Hadoop
Higher productivity among data scientists
& data engineers eliminating manual tasks
Reduced infrastructure spend with the
performance of the Databricks runtime
3
1
2
14. $12.8M in value delivered with Databricks
Value of Databricks
■ Removed Cloudera licensing
■ No need to add expensive new hardware for additional capacity
■ Avoided data center costs
■ Avoided Hadoop administration costs
Cloudera costs vs. Databricks value & investment
Units: $ Cumulative PV over 3 years
Potential value
with Databricks
Cloudera - Cost of
inaction
Investment -
Databricks,
migration & cloud
Net impact
Includes cost of both
solutions during
migration
$13.8 M
-$18.7 M
-$4.9 M
$12.8 M
Cloudera costs
■ Data center, Hadoop administration, new
hardware, licensing
Databricks investment
■ Databricks usage & support
■ Migration
■ Cloud compute
Databricks customer example:
Large U.S. Telco, 156 node cluster
Source: Databricks value model
Value of Databricks
■ Avoided Cloudera licensing
■ No need to add expensive new hardware
for additional capacity
■ Avoided data center costs
■ Avoided Hadoop administration costs
15. Work with us for a Tailored Value Case
for Your Migration
Tailored Financial Analysis
Tailored business case to be produced
by answering 4 core questions:
1. How many nodes in your Hadoop
environment?
2. How many people support your
Hadoop environment?
3. When is your Cloudera renewal?
4. How do you expect your
data needs to grow over time?
Customer
example
16. Proven Migration Strategy:Reduce Risk,
Costs
Databricks
Expert Team
System IntegratorsTools, ISV Partners
AUTOMATION, TOOLS AND
PROVEN METHODOLOGIES
Cloud Partners
COMPONENTS TO MIGRATE SUCCESSFUL MIGRATION
Data +
Metadata
Workloads/
Jobs
Security &
governance
Other tools,
integrations
Strategy Options: Lift & shift (faster, automatable) Transformation (higher impact)
18. Typically, customers save 55-66 % in costs and see a
reduction of 2-3x in timelines by using Automation tool
Data MigrationAssessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
19. Our Partner Ecosystem will Accelerate Migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
20. Customized Hadoop Migration Success
Plan with a Free Expert-led Assessment
1
2
3
Pre-questionnaire + Discovery, education workshops led by experts
▪ Learn about how Databricks works and how your current workloads, tools and
processes map and transform in the future state in cloud
Proposal and Recommendations for path forward
▪ The expert team will summarize all the findings and walk through the proposed
costs, business value summary and recommended migration plan
Technical, Use-case and Business Value analysis
▪ High level current and future state architecture, discuss use-cases and prioritize
them, understand how $$ value is driven with the migration
21. Databricks Experts Know Hadoop
▪ More than 100 years of combined experience in Hadoop
▪ Practitioners, Architects, Engineers, and Consultants, Open Source
Contributors and Committers
▪ Expertise with all Hadoop ecosystem components and distributions
IMG IMG IMG IMG IMGIMG IMG
22. Hadoop migration to Databricks - recap
Why - Costs, Productivity, Innovation → Business Impact
Your competitors and market leaders are doing it NOW
Databricks experts and automation strategy can help you
migrate faster, with much lower cost and risk