Weitere ähnliche Inhalte Ähnlich wie Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks (20) Mehr von Hortonworks (20) Kürzlich hochgeladen (20) Optimizing your Modern Data Architecture - with Attunity, RCG Global Services and Hortonworks1. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimizing the Modern Data Architecture
with Attunity, Hortonworks and RCG Global Services
We do Hadoop.
2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Speakers
Hortonworks
◦ Adis Cesir, Big Data Solution Engineer
RCG Global Services
◦ Ramu Kalvakuntla, Principal, Big Data Practice
Attunity
◦ Santosh Chitakki, Director of Product Management
3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Partnership
Strategy
and
Solu/on
Delivery
Hadoop
Distribu/on,
Support
and
Training
Any
Data,
Anywhere,
Any/me
RCG
GLOBAL
SERVICES,
HORTONWORKS
AND
ATTUNITY
ARE
PARTNERING
TO
PROVIDE
AN
EDW
OPTIMIZATION
SOLUTION
THAT
DELIVERS
REAL
FINANCIAL
BENEFITS
BY
EFFECTIVELY
IMPLEMENTING
APACHE
HADOOP
TO
AUGMENT
CURRENT
EDW
PLATFORMS.
4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Traditional systems under pressure
Challenges
• Can’t manage new data
• Constrains data to app
• Costly to scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A Typical EDW Faces Three Challenges
1. Data Storage: storing cold
data or throwing data away
2. Processing Capacity:
wasting processing cycles
on low value workloads
3. New Data Sources: unable
to capture and use new data
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
DATASYSTEMS
Systems of
Record
RDBMS
ERP
CRM
Other
Clickstream
Web
&
Social
Geoloca3on
Sensor
&
Machine
Server
Logs
Unstructured
NEW
SOURCES
1 2
3
6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Most EDWs Are Used InefficientlyANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
DATASYSTEMS
Systems of
Record
RDBMS
ERP
CRM
Other
1. Data Storage:
– More than 50% of data is
unused
2. Processing Capacity:
– 55% of CPU capacity is ETL
– 35% of CPU consumed by
ETL is to load unused data
– 30-40% of CPU is consumed
by only 5% of ETL workloads
In a typical EDW*:
Hot Warm Cold
Why pay first class price for economy data?
7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimization: Realize Cost Savings with HDP
Archive data away from the EDW
• Move cold or rarely used data to Hadoop
as active archive
• Store more data longer
Offload costly ETL processes
• Free your EDW to perform high-value functions like
analytics & operations, not ETL
• Use Hadoop for advanced ELT
Enrich the value of your EDW
• Use Hadoop to refine new data sources, such as
web and machine data, for new analytical context
HDP helps you reduce costs and optimize the value associated with your EDW
Clickstream
Web
&
Social
Geoloca3on
Sensor
&
Machine
Server
Logs
Unstructured
SOURCES
Existing Systems
ERP
CRM
SCM
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch Batch
MPP
EDW
8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Time spent understanding source data and defining
destination structure
• High latency between data generation and availability
Challenge with traditional Architecture
DB
Structured Data
Source Layer
ETL / ELT
EDW ETL
Data Collection &
Processing
Data Mart
Integration, Storage &
Business View
Business / Department
Specific
Data Mart
Data Mart
Data Mart
Data Mart
Incapable/high
complexity when
dealing with loosely
structured data
• No linear scale
• High license cost
• Large code footprint
Data discarded due
to cost or
performance
Low or no visibility
into transactional
data
EDW used as an
ETL tool with 100s of
staging tables
Data
Collection &
Processing
9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Offload/Archive/Process – Hadoop based Platform
DB
Structured Data
Data Collection, Integration,
Storage and Processing
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Integrate, Transform, Archive,
Enrich
Source Layer
EDW
Data Mart
Data Mart
Data Mart
Data Mart
Data Mart
Data Mart
• Store transactional data
• Retain 7+ years of data (Hot archive)
• Data Lineage – ability to store intermediate data sets
• Becomes an analytics platform for data scientists
• Linearly scalable
commodity hardware
• Massively parallel
compute and storage
Support for any type of
data: structured or
unstructured with any
volume and velocity
Data Warehouse can now
focus less on storage and
transformation and more on
presentation
Clickstream
Social
Geo
Sensor
Server
Logs
Unstrctur.
10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimization Customer Stories
Archive
TrueCar stores data on
millions of car purchases at
$0.12 per GB with HDP, well
below the $19 per GB
possible with other solutions.
Offload
Luminar cut its ETL
processing times from 3 days
to 3 hours with HDP, quickly
refreshing its models with new
customer transaction data.
Enrich
ZirMed enriches its EDW with
new data, including pharmacy
receipts, text messages, and
patient web searches.
11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hadoop Driver: Enabling the Data LakeSCALE
SCOPE
Data Lake Definition
• Centralized Architecture
Multiple applications on a shared data set
with consistent levels of service
• Any App, Any Data
Multiple applications accessing all data
affording new insights and opportunities.
• Unlocks ‘Systems of Insight’
Advanced algorithms and applications
used to derive new value and optimize
existing value.
Drivers:
1. Cost Optimization
2. Advanced Analytic Apps
Goal:
• Centralized architecture
• Data-driven business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Modern Data Architecture
• Reduce cost and improve performance by
off-loading EDW data and processing to the
Hortonworks Distribution Platform (HDP)
• Implement a platform that scales
incrementally using low cost hardware and
software
• Support unstructured, semi-structured and
structured data in a single analytics
platform
• Enable superior analytic capabilities
providing insight that is not possible to
achieve from their current environments
• Provide seamless access to data for
analysis and business applications
13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Solution Model - Modern Data Architecture
EDW Optimization Roadmap
Identify offload candidates,
create architectural blueprint,
implementation roadmap,
business case and ROI
EDW Optimization
Implementation
Execute Data and ETL/ELT off-
load, active archive, implement
data ingestion and data service
Data Value Realization
Provide insight, data in
motion, advanced
analytics, information
value creation, and
visualization
Enterprise Enablement
Enterprise access,
enriched data sources,
service orchestration and
data virtualization
14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
EDW Optimization – Roadmap and Analysis
• Assess current reporting,
ELT/ETL, and analytical
processes
• Review logical and
physical data models
• Assess current technical
architecture
• Prioritize opportunities
• Define future Hadoop
architecture and capacity
needs
• Develop implementation plan
• Create business case / ROI
• Create and review Executive
Summary with Clients
• Analyze Data Usage:
• Identify under-utilized
• Schemas
• Tables / Columns
• Data
• Identify off-load
opportunities
Analyze EDW Workload
• Read vs. Writes
• ETL vs. ELT
• Analytical vs. Batch SQL’s
• CPU consumption
• CPU utilization
Current State
Analysis
Data Usage
Analysis
Workload
Analysis
Blueprint &
Roadmap
Activities Week1 Week2 Week3 Week4
Current State Analysis
Data Usage Analysis
Workload Analysis
Blueprint & Roadmap
15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
EDW Optimization – Implementation
Activities Month 1 Month 2 Month 3 Month …
Data Off-Load
Process Off-load
Data Services
Analysis & Reporting
Data
Off-load
Process
Off-load
Data
Services
Analysis &
Reporting
• POC / Reference
Implementation (if needed)
• Install / expand HDP
cluster
• Analyze off-load data sets
• Automate data ingestion
• Implement active archiving
• Provide scheme-on-read for
direct business analysis
• Migrate resource intensive
analysis to Hadoop
• Connect analysis and
visualization tools to Hadoop
• Migrate EDW ETL/ELT
workload to Hadoop
• De-normalize data to
optimize performance
• Load Hadoop ETL/ELT
output data back into
EDW
• Provide data virtualization
for data transparency
across Hadoop and MPP
databases
• Build business services
for reporting and
enterprise applications
16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Warehouse Optimization - An Iterative Process
• Identify low-hanging fruits
• Get buy-in from stakeholders
• Plan and implement in increments
• Continuously assess and iterate
17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Attunity Visibility Data Usage Analysis (Sample)
• Unused Data (e.g. Tables
with no ‘SELECT’
statements)
70 Terabytes in
Unused Databases
18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Attunity Visibility Data Usage Analysis (Sample)
• History of data used in
large “Fact” table
• Queries go back only 2
years
• Maintains 8 years of data
19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Attunity Visibility Workload Analysis (Sample)
Almost 60% of CPU
to load and ingest
data
• Intensive ETL
workloads
20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Attunity Visibility Workload Analysis (Sample)
The Top 100 repetitive SQL of
101,000 in ETL SQL acounts for 30+
% of CPU consumption by ETL.
21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Attunity Visibility – The Data Dashboard
Completely Analyze Workloads And Data Usage
Reduce Cost | Optimize Performance | Justify Investments
User Activity Data Usage Workload Performance
22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
RCG Success Stories
• Completed EDW optimization
projects for two large retailors
• Offloading cold data and ELT
to Hadoop
• Cost savings projected
between $6M to $10M
Top
Retailors
$
Top Financial
Services
• Currently working with two large
Fortune 100 financial companies
• Offloading 40TB to 60TB of RAW
data from EDW platforms to
Hadoop
• Re-architecting their batch decision
processing with savings between
$10M to $15M.
23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop
Learn more about our partnerships
http://hortonworks.com/partner/rcg-global-services/
http://hortonworks.com/partner/attunity/
24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
SAN JOSE
June 9-11
BRUSSELS
April 15-16
• Deep-dive technical content
• 65+ sessions and 5 tracks
• 1,000 attendees
• Sponsorships Available
• Including Pre and Post event community meetups
and BOFs
• Hadoop training available
• 100+ sessions and 7 tracks
• Deep-dive technical content
• 5,000 attendees
• Sponsorships Available
• Including Pre and Post event community meetups
and BOFs
• Hadoop training available
www.hadoopsummit.org
The Largest Hadoop Community Events in
Europe and North America