SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Santosh Chitakki, Vice President, Products at Appfluent
schitakki@appfluent.com
Steve Totman, Director of Strategy at Syncsort
@steventotman, stotman@syncsort.com

Presentation + Demo
The Data Warehouse Vision: A Single Version of The Truth

Data
Mart

Oracle

File
XML
ERP

Mainframe

Real-Time

ETL

ETL
Enterprise
Data

Data
Mart

Warehouse

Data
Mart

2
The Data Warehouse Reality:
• Small, sample of structured data, somewhat available
• Takes months to make any changes/additions
• Costs millions every year

Data
Mart

Oracle

File
XML
ERP

ETL

ETL
Enterprise
Data
ELT

New
Reports
Data
Mart

Warehouse

Mainframe

Dead Data

SLA’s
Data
Mart

Real-Time

New
Column

Granular
History

3
ELT Processing Is Driving Exponential Database Costs
The True Cost of ELT
Queries
(Analytics)

$$$







Manual coding/scripting costs
Ongoing manual tuning costs
Higher storage costs
Hurts query performance
Hinders business agility

Transformations
(ELT)

And What if…?
• Batch window is delayed or needs to be re-run?
• Demands increase causing more overlap between queries & batch window?
• A critical business requirement results in longer/heavier queries?
4
Dormant Data Makes the Problem Even Worse
Hot

Warm

Cold Data

Transformations (ELT) of unused data
Storage capacity for dormant data

 Majority of data in data warehouse is unused/dormant
 ETL/ELT processes for unused data unnecessarily consuming CPU capacity
 Dormant data consuming unnecessary storage capacity
 Eliminate batch loads not needed
 Load and store unused data – for active archival in Hadoop
5
The Impact of ELT & Dormant Data

Missing
SLA’s
Data
Retention
Windows
Lack of
Agility
Constant
Upgrades

• Slow response times
• With 40-60% of capacity used for ELT less
resources and storage available for end user
reports.
• Only Freshest Data is stored “on-line”
• Historical data archived (as low as 3 months)
• Granularity is lost Hot / Warm / Cold / Dead
• 6 months (average) to add a new data
source / column & generate a new report
• Best resources on SQL tuning not new SQL
creation.
• Data volume growth absorbs all resources to
keep existing analysis running / perform
upgrades
• Exploration of data a wish list item
6
Offloading The Data Warehouse to Hadoop

Before

Data Sources

ETL

Data Warehouse

ETL
ELT

After

Data Sources

ETL

Business
Intelligence

Analytic
Query &
Reporting

Data Warehouse

ETL / ELT

Syncsort Confidential and Proprietary - do not copy or distribute

Analytic Query & Reporting

7
20% of ETL Jobs Can Consume of to 80% of Resources
ETL is “T” intensive
– Sort, Join, Merge, Aggregate, Partition
Mappings start simple
– Performance demands add complexity
– business logic gets “distributed”
“Spaghetti” architecture
– Impossible to govern
– Prohibitively expensive to maintain
High Impact start with the greatest pain – focus on
the 20%

8
The Opportunity

Transform the economics of data
Cost of managing 1TB of data

$15,000 – $80,000

EDW

But there’s more…
Scalability for longer data retention
Performance SLAs
Business agility

$2000 –
$6,000

Hadoop
Why Appfluent?
Appfluent transforms the economics of Big Data and Hadoop.
We are the only company that can completely analyze how
data is used to reduce costs and optimize performance.
Appfluent Visibility

Uncover the Who, What, When,
Where, and How of your data

Data
Why Syncsort?
For 40 years we have been helping companies solve their big data
issues…even before they knew the name Big Data!

• Speed leader in Big Data Processing
• Fastest sort technology in the
market

Our customers are achieving the
impossible, every day!

• Powering 50% of mainframes’ sort

• First-to-market, fully integrated
approach to Hadoop ETL
• A history of innovation
• 25+ Issued & Pending Patents

• Large global customer base
• 15,000+ deployments in 68 countries

Key Partners

12
Syncsort DMX-h – Enabling the Enterprise Data Hub
Blazing Performance. Iron Security. Disruptive Economics
• Access – One tool to access all your
data, even mainframe
• Offload – Migrate complex ELT
workloads to Hadoop without coding
• Accelerate – Seamlessly optimize new
& existing batch workloads in Hadoop

PLUS…

Access

Offload &
Deploy

Accelerate

• Smarter Architecture – ETL engine
runs natively within MapReduce
• Smarter Productivity – Use Case
Accelerators for common ETL tasks
• Smarter Security – Enterprise-grade
security

13
How to Offload Workload & Data

1

•
•

Identify costly transformations
Identify dormant data

2

•
•
•

Rewrite transformations in DMX-H
Identify performance opportunities
Move dormant data ELT to Hadoop

3

•

•

Run costliest transformations
Store and manage dormant data

4

•

Repeat regularly for maximum
results
1. Identify

Expensive
Transformations

Unused
Data
Cold
Historical Data

Costly
End-user Activity

• Identify expensive transformations such as
ELT to offload to Hadoop.
• Identify unused Tables to find useless
transformations loading them, move to Hadoop or
purge.

• Identify unused historical data (by date functions
used) and move loading & data to Hadoop.
• Discover costly end-user activity and re-direct
workloads to Hadoop.
Costly End-User Activity
Find relevant resource consuming end-user workloads and offload
data-sets and activity to Hadoop.
Example: Identify SAS data extracts (i.e. SAS
queries with with no Where Clause)

SAS Data Extracts Identified

Consuming 300 hours
of server time.

Identify data sets associated with data
extracts. Replicate identified data in
Hadoop and offload associated SAS
workload.
16
Expensive Transformations
Identify expensive transformations such as ELT to offload to Hadoop.

ELT process – consuming 65% of
CPU Time and 66% of I/O.

Drill on process to identify expensive
transformations to offload.
Unused Data
Identify unused Tables to move to Hadoop and offload batch
loads for unused data into Hadoop.
87% of Tables
Unused.

Largest Unused Table
(2 billion records).

Unused columns in Tables.
2. Access & Move Virtually Any Data
One Tool to Quickly and Securely Move All Your Data,
Big or Small. No Coding, No Scripting
Connect to Any Source & Target
•
•
•

RDBMS
Mainframe
Files

•
•
•

Cloud
Appliances
XML

Extract & Load to/from Hadoop
• Extract data & load into the cluster natively from
Hadoop or execute “off-cluster” on ETL server
• Load data warehouses directly from Hadoop. No
need for temporary landing areas.
PLUS… Mainframe Connectivity
• Directly read mainframe data
• Parse & translate
• Load into HDFS

Pre-process & Compress
• Cleanse, validate, and partition for parallel
loading
• Compress for storage savings
19
3. Offload Heavy Transformations to Hadoop
Easily Replicate & Optimize Existing Workloads in Hadoop.
No Coding. No Scripting.
 Develop MapReduce ETL processes
without writing code

 Leverage existing ETL skills
 Develop and test locally in Windows.
Deploy in Hadoop
 Use Case Accelerators to fast-track
development
Sort

Join

+

Aggregate

Copy

Merge

 File-based metadata: create once, reuse many times!

Development accelerators for CDC
and other common data flows

20
Demo
Appfluent Offload Success

Large Financial Organization

Situation

• IBM DB2 Enterprise Data Warehouse (EDW) growing too quickly
• DB2 EDW upgrade/expansion too expensive
• Found cost per terabyte of Hadoop is 5x less than DB2 (fully burdened)

Solution

• Created business program called ‘Data Warehouse Modernization’
• Deployed Cloudera to extend EDW capacity
• Used Appfluent to find migration candidates to move to Hadoop

Benefits

• Capped DB2 EDW at 200TB capacity and not expanded it since
• Saved $MM that would have been spent on additional DB2
• Positioned to handle faster rates of data growth in the future
Offloading the EDW at Leading Financial Organization

Elapsed Time (m)

400

• Offload ELT processing from Teradata into
CDH using DMX-h
• Implement flexible architecture for staging
and change data capture
• Ability to pull data directly from Mainframe
• No coding. Easier to maintain & reuse
• Enable developers with a broader set of skills
to build complex ETL workflows

HiveQL

300

360 min

200

DMX-h

100

15 min

0

DMXh
HiveQL
0

4

4 Man weeks
12 Man weeks

8

12

16

Impact on Loans Application Project:
 Cut development time by 1/3
 Reduced complexity. From 140 HiveQL scripts to 12
DMX-h graphical jobs
 Eliminated need for Java user defined functions
 24x faster!

Development Effort (Weeks)
23
Three Quick Takeaways

+
1. ELT and dormant data are driving data
warehouse cost and capacity constraints
2. Offloading heavy transformations and “cold”
data to Hadoop provides fast savings at
minimum risk
3. Follow these 3 steps:
a.
b.
c.

Identify dormant data and pinpoint heavy
ELT workloads. Focus on top 20%
Access and move data to Hadoop
Deploy new workloads in Hadoop.

24
The Data Warehouse Vision: A Single Version of The
Truth

Data
Mart

Oracle

File
XML
ERP

Mainframe

Real-Time

ETL

ETL
Enterprise
Data

Data
Mart

Warehouse
Data
Mart

25
Next Steps

Sign up for a Data Warehouse Offload assessment!
http://bit.ly/DW-assessment
Our experts will help you:
 Collect critical information about your EDW environment
 Identify migration candidates & determine feasibility
 Develop an offload plan & establish business case

26

26
Q&A

27
Thanks for Attending!

And the winner is…..

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1kcpayne
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptxnandhinijagan9867
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 

Kürzlich hochgeladen (20)

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Offload the Data Warehouse in the Age of Hadoop

  • 1. Santosh Chitakki, Vice President, Products at Appfluent schitakki@appfluent.com Steve Totman, Director of Strategy at Syncsort @steventotman, stotman@syncsort.com Presentation + Demo
  • 2. The Data Warehouse Vision: A Single Version of The Truth Data Mart Oracle File XML ERP Mainframe Real-Time ETL ETL Enterprise Data Data Mart Warehouse Data Mart 2
  • 3. The Data Warehouse Reality: • Small, sample of structured data, somewhat available • Takes months to make any changes/additions • Costs millions every year Data Mart Oracle File XML ERP ETL ETL Enterprise Data ELT New Reports Data Mart Warehouse Mainframe Dead Data SLA’s Data Mart Real-Time New Column Granular History 3
  • 4. ELT Processing Is Driving Exponential Database Costs The True Cost of ELT Queries (Analytics) $$$      Manual coding/scripting costs Ongoing manual tuning costs Higher storage costs Hurts query performance Hinders business agility Transformations (ELT) And What if…? • Batch window is delayed or needs to be re-run? • Demands increase causing more overlap between queries & batch window? • A critical business requirement results in longer/heavier queries? 4
  • 5. Dormant Data Makes the Problem Even Worse Hot Warm Cold Data Transformations (ELT) of unused data Storage capacity for dormant data  Majority of data in data warehouse is unused/dormant  ETL/ELT processes for unused data unnecessarily consuming CPU capacity  Dormant data consuming unnecessary storage capacity  Eliminate batch loads not needed  Load and store unused data – for active archival in Hadoop 5
  • 6. The Impact of ELT & Dormant Data Missing SLA’s Data Retention Windows Lack of Agility Constant Upgrades • Slow response times • With 40-60% of capacity used for ELT less resources and storage available for end user reports. • Only Freshest Data is stored “on-line” • Historical data archived (as low as 3 months) • Granularity is lost Hot / Warm / Cold / Dead • 6 months (average) to add a new data source / column & generate a new report • Best resources on SQL tuning not new SQL creation. • Data volume growth absorbs all resources to keep existing analysis running / perform upgrades • Exploration of data a wish list item 6
  • 7. Offloading The Data Warehouse to Hadoop Before Data Sources ETL Data Warehouse ETL ELT After Data Sources ETL Business Intelligence Analytic Query & Reporting Data Warehouse ETL / ELT Syncsort Confidential and Proprietary - do not copy or distribute Analytic Query & Reporting 7
  • 8. 20% of ETL Jobs Can Consume of to 80% of Resources ETL is “T” intensive – Sort, Join, Merge, Aggregate, Partition Mappings start simple – Performance demands add complexity – business logic gets “distributed” “Spaghetti” architecture – Impossible to govern – Prohibitively expensive to maintain High Impact start with the greatest pain – focus on the 20% 8
  • 9. The Opportunity Transform the economics of data Cost of managing 1TB of data $15,000 – $80,000 EDW But there’s more… Scalability for longer data retention Performance SLAs Business agility $2000 – $6,000 Hadoop
  • 10. Why Appfluent? Appfluent transforms the economics of Big Data and Hadoop. We are the only company that can completely analyze how data is used to reduce costs and optimize performance.
  • 11. Appfluent Visibility Uncover the Who, What, When, Where, and How of your data Data
  • 12. Why Syncsort? For 40 years we have been helping companies solve their big data issues…even before they knew the name Big Data! • Speed leader in Big Data Processing • Fastest sort technology in the market Our customers are achieving the impossible, every day! • Powering 50% of mainframes’ sort • First-to-market, fully integrated approach to Hadoop ETL • A history of innovation • 25+ Issued & Pending Patents • Large global customer base • 15,000+ deployments in 68 countries Key Partners 12
  • 13. Syncsort DMX-h – Enabling the Enterprise Data Hub Blazing Performance. Iron Security. Disruptive Economics • Access – One tool to access all your data, even mainframe • Offload – Migrate complex ELT workloads to Hadoop without coding • Accelerate – Seamlessly optimize new & existing batch workloads in Hadoop PLUS… Access Offload & Deploy Accelerate • Smarter Architecture – ETL engine runs natively within MapReduce • Smarter Productivity – Use Case Accelerators for common ETL tasks • Smarter Security – Enterprise-grade security 13
  • 14. How to Offload Workload & Data 1 • • Identify costly transformations Identify dormant data 2 • • • Rewrite transformations in DMX-H Identify performance opportunities Move dormant data ELT to Hadoop 3 • • Run costliest transformations Store and manage dormant data 4 • Repeat regularly for maximum results
  • 15. 1. Identify Expensive Transformations Unused Data Cold Historical Data Costly End-user Activity • Identify expensive transformations such as ELT to offload to Hadoop. • Identify unused Tables to find useless transformations loading them, move to Hadoop or purge. • Identify unused historical data (by date functions used) and move loading & data to Hadoop. • Discover costly end-user activity and re-direct workloads to Hadoop.
  • 16. Costly End-User Activity Find relevant resource consuming end-user workloads and offload data-sets and activity to Hadoop. Example: Identify SAS data extracts (i.e. SAS queries with with no Where Clause) SAS Data Extracts Identified Consuming 300 hours of server time. Identify data sets associated with data extracts. Replicate identified data in Hadoop and offload associated SAS workload. 16
  • 17. Expensive Transformations Identify expensive transformations such as ELT to offload to Hadoop. ELT process – consuming 65% of CPU Time and 66% of I/O. Drill on process to identify expensive transformations to offload.
  • 18. Unused Data Identify unused Tables to move to Hadoop and offload batch loads for unused data into Hadoop. 87% of Tables Unused. Largest Unused Table (2 billion records). Unused columns in Tables.
  • 19. 2. Access & Move Virtually Any Data One Tool to Quickly and Securely Move All Your Data, Big or Small. No Coding, No Scripting Connect to Any Source & Target • • • RDBMS Mainframe Files • • • Cloud Appliances XML Extract & Load to/from Hadoop • Extract data & load into the cluster natively from Hadoop or execute “off-cluster” on ETL server • Load data warehouses directly from Hadoop. No need for temporary landing areas. PLUS… Mainframe Connectivity • Directly read mainframe data • Parse & translate • Load into HDFS Pre-process & Compress • Cleanse, validate, and partition for parallel loading • Compress for storage savings 19
  • 20. 3. Offload Heavy Transformations to Hadoop Easily Replicate & Optimize Existing Workloads in Hadoop. No Coding. No Scripting.  Develop MapReduce ETL processes without writing code  Leverage existing ETL skills  Develop and test locally in Windows. Deploy in Hadoop  Use Case Accelerators to fast-track development Sort Join + Aggregate Copy Merge  File-based metadata: create once, reuse many times! Development accelerators for CDC and other common data flows 20
  • 21. Demo
  • 22. Appfluent Offload Success Large Financial Organization Situation • IBM DB2 Enterprise Data Warehouse (EDW) growing too quickly • DB2 EDW upgrade/expansion too expensive • Found cost per terabyte of Hadoop is 5x less than DB2 (fully burdened) Solution • Created business program called ‘Data Warehouse Modernization’ • Deployed Cloudera to extend EDW capacity • Used Appfluent to find migration candidates to move to Hadoop Benefits • Capped DB2 EDW at 200TB capacity and not expanded it since • Saved $MM that would have been spent on additional DB2 • Positioned to handle faster rates of data growth in the future
  • 23. Offloading the EDW at Leading Financial Organization Elapsed Time (m) 400 • Offload ELT processing from Teradata into CDH using DMX-h • Implement flexible architecture for staging and change data capture • Ability to pull data directly from Mainframe • No coding. Easier to maintain & reuse • Enable developers with a broader set of skills to build complex ETL workflows HiveQL 300 360 min 200 DMX-h 100 15 min 0 DMXh HiveQL 0 4 4 Man weeks 12 Man weeks 8 12 16 Impact on Loans Application Project:  Cut development time by 1/3  Reduced complexity. From 140 HiveQL scripts to 12 DMX-h graphical jobs  Eliminated need for Java user defined functions  24x faster! Development Effort (Weeks) 23
  • 24. Three Quick Takeaways + 1. ELT and dormant data are driving data warehouse cost and capacity constraints 2. Offloading heavy transformations and “cold” data to Hadoop provides fast savings at minimum risk 3. Follow these 3 steps: a. b. c. Identify dormant data and pinpoint heavy ELT workloads. Focus on top 20% Access and move data to Hadoop Deploy new workloads in Hadoop. 24
  • 25. The Data Warehouse Vision: A Single Version of The Truth Data Mart Oracle File XML ERP Mainframe Real-Time ETL ETL Enterprise Data Data Mart Warehouse Data Mart 25
  • 26. Next Steps Sign up for a Data Warehouse Offload assessment! http://bit.ly/DW-assessment Our experts will help you:  Collect critical information about your EDW environment  Identify migration candidates & determine feasibility  Develop an offload plan & establish business case 26 26
  • 28. Thanks for Attending! And the winner is…..

Hinweis der Redaktion

  1. Jennifer to address this slide- announce the session and introduce the speakers. Instruct on Q& A format.
  2. Back when I started my career in Data Warehousing in the 90’s this is what the business was promised.An Enterprise data warehouse would bring together data from every different source system across an organization to create a single, trusted, source of information. Data would be extracted transformed and loaded into the warehouse using ETL tools – these would be used instead of hand coding SQL or COBOL or other scripts because they would provide a graphical user interface that allowed anyone to develop flows and no need for rocket scientists scalability to handle the growing data volumesmetadata to enable re-use and sharingand connectivity to the different sources and targetsETL would then be used to move data from the EDW to marts and delivered to reporting tools.
  3. So Here’s the reality of Data warehouses today – as one customer recently described it to me their Data Warehouse has become like a huge oil tanker – slow moving and incredibly difficult to change direction. Because of data volume growth the majority of ETL tools commercial and open source were unable to handle the processing within the batch windows – as a result the only engine capable of handling the data volumes were the database engines thanks to their optimizers. So Transformation was pushed into the source, target and especially the enterprise data warehouse databases as hand coded SQL or BTEQ– this so called ELT meant that many ETL tools became little more than expensive schedulers. The usage of ELT resulted in a spaghetti like architecture and was clearly visible to end users by the fact that requests for new reports or the addition of a new column from the warehouse team involves on average a 6 month delay. With so much hand coded SQL adding a new column becomes incredibly complex – it requires adding to the enterprise data model, updating in the warehouse schema, all the existing ELT scripts need to be modified and SLA’s get abandoned.
  4. As you can see in the chart – as ELT has grown then end user reporting and analytics have had to compete for the database storage and capacity – Databases are great when you have the classic use for SQL – Big Data Input, Big Data input, Small result set – exactly what you want to create an aggregated view in a reporting tool but SQL is not ideal for ETL where it’s typically Big input, Big Input, Even bigger output.At first there was less contention as the analysts and warehouse business users ran queries during the day and ELT could be run at night during the overnight batch window but as data volumes increase the batch runs started running into the day and then during the day – today many companies have to do more ELT than can fit into their overnight batch window so they are always trying to catch up and if a load fails it can literally be months before they can recover. It’s also creates a death spiral because you move your best resources to tuning the ELT SQL to improve performance so your less skilled resources hand code new ELT which then needs to be tuned by your best resources. Every step of which hinders agility and increases cost...
  5. Steve you certainly bring up excellent points on how ELT processes are driving up data warehousing costs.  Our experience in analyzing data usage at large organizations shows that a significant amount of data is not being used – but is continuously loaded on a daily basis. Dormant data not only is taking up storage capacity, but the bigger impact is the processing capacity in terms of CPU and I/O that is wasted on running ELT on the data warehouse - to load data that the business does not actively use. Admittedly, in many situations – organizations are required by regulatory reasons to maintain a history of data – even if it is not being used. So the best approach here to significantly cut data warehousing costs is to : Eliminate batch loads for data that is not used and not needed. More importantly offload the ELT processes for unused data that needs to be maintained – do it all on on Hadoop and actively archive that unused data on Hadoop. This way you can recover all the wasted capacity from your expensive data warehouse systems. 
  6. Thanks santoshSo just to summarize there are 4 dimensions to this problem.First you’ll see that your missing SLA’s – as ELT competes with end use queries and analytics in the warehouseNext you’ll see that the warehouse team implements a data retention window – this is because there’s not enough space and it’s not cost effective to store all the data people want – so instead of the entire history you keep a rolling retention window sometimes as small as a few days or weeksOn average today it takes 6 months to add a new report or a column to the warehouse – customers describe this as the onion effect because each layer gets added because nobody wants to change the layer beneath but when you have to it makes everyone involved cry.Then finally you have the constant upgrade cycle – because of data growth the second you’ve completed an upgrade your already planning for your next one – but the tough thing is selling this to your CFO – if you have to explain you need to spend another $3Million on the warehouse and he asks why and you have to explain it’s so the same report that ran yesterday will still run tomorrow – that’s not a good business case
  7. So as we’ve discussed there’s the reality of what happens today in most data warehouses – the before seen here where ETL and ELT in the database are the norm. But as Teradata’s CEO Mike Koehler remarked on a recent earnings call they have found that ETL consumes about 20 to 40% of the workload of their Teradata data warehouses, with some outliers below and above that range. Teradata thinks that 20% of the 20% to 40% of ETL workload being done on Teradata is a good candidate for moving to Hadoop.Now I personally have been involved with ETL my entire career for over 15 years now and in my experience the ELT workload of most data warehouse databases is at least double that so between 40 and 60% and many of the customers we’re working with aren’t looking to move 20% but rather 100% of that ELT into Hadoop but even if you could free 20% of your capacity – that still means you could postpone any major multi million dollar upgrades of Teradata, Db2, Oracle etc.. For a long time.So we’re seeing more and more customers adopt an architecture where the staging area – the dirty secret of every data warehouse which is were the drops from data sources get stored and a lot of the heavy lifting as ELT occurs gets migrated to an enterprise data hub in Hadoop and then the result is moved to the existing data warehouse now with more capacity or direct to reporting tools.
  8. Now what’s really interesting about ETL and ELT is that the workload tends to be very transformation intensive – sorts, joins, merges, aggregations, partitioning, compression etc..But the 80/20 rule applies – 20% of your ETL and ELT is what consumes 80% of your batch window, resources, tuning etc…The screen shot on the right is actually from a real customer and this diagram (which they called the battlestargalacticaCylonmothership diagram because of the way it looks from a distance is actually their nightly batch run sequence and every box on the diagram is a Teradata ELT SQL script of several thousand lines of code. They actually found that 10% of their flows consumed 90% of their batch window so its’ not that you have to migrate everything – you just start with the 20% and you’ll see a huge amount of benefit immediately
  9. At the end of the day, time and resources consumed by inefficient processes have significant tangible costs.But Hadoop is quickly becoming a disruptive technology that presents a tremendous opportunity for enterprises. The economics of Hadoop when compared to the Enterprise Data Warehouse is quite remarkable. Today the cost of a Terabyte on the Data warehouse can vary from $15k on the low end to more than $80k per Terabyte of fully burdened costs per Terabyte. Enterprises are finding that the cost on Hadoop can be 10 times LESS expensive than the data warehouse.So the question is How can you take advantage of this opportunity and where and how do you begin this process? Appfluent and Synscort together have the complete solution you need.
  10. Before we discuss and demonstrate the solution – let us briefly introduce Appfluent and Synscort. Appfluent is a software company whose mission is to transform the economics of Big Data and Hadoop.  Appfluent is the only company that can completely analyze how data is used and enables large enterprises across various vertical industries to reduce costs and optimize performance.
  11. The Appfluent Visibility product gives you the ability to asses and analyze expensive transformations and workloads as well as identify unused data – that can serve as the blueprint to begin the process of offloading your data warehouse to Hadoop. The product non-intrusively monitors and correlates users’ application activity and ELT processes with data usage and the associated resource consumption.  The solution provides this visibility across multiple platforms including Teradata, Oracle/Exadata, DB2/Netezza and Hadoop.
  12. So by now some of you may be wondering Who is Syncsort. We are a leading Big Data company, dedicated to help our customers to collect, process and distribute extreme data volumes. We provide the fastest sort technology and the fastest Data processing engine in the market, and most recently we released the first truly integrated approach to extract, transform and load data with Hadoop and even on the cloud.Now, if you have a mainframe in your organizations, then you probably know Syncsort, because we run on nearly 50% of the world’s mainframes, we’re the most trusted 3rd party software for mainframes. But our customers have been using us for over 10 years to accelerate ETL and ELT processing – our product has a unique Optimizer (similar to a database SQL optimizer) designed specifically to accelerate ETL and ELT processing. Our customers are companies who deal with some of the largest and most sophisticated data volumes – that’s the reason they’ve come to us, because we solve data problems that no one else can.
  13. Every organization is trying to technically and yet economically build infrastructure to keep up with modern data by storing and managing it in a single system, regardless of it’s format. The name people are giving to this is an Enterprise Data Hub and in most cases it’s based on Hadoop, but to deliver on the business requirements for data, an Enterprise Data Hub requires components to Access, Offload and Accelerate Data while also providing Extract Transform and Load (ETL) like functionality along with user productivity that doesn’t require a rocket scientist to do simple tasks and complete enterprise level security. Syncsort enables all this whether your running on Hadoop, Cloud, Mainframes, Unix, Windows or Linux and thanks to it’s unique transformation optimizer can scale with no manual tuning.
  14. Now that you know a little about Appfluent and Syncsort – lets look at the process for offloading the data warehouse. You begin with using Appfluent to identify expensive transformations as well as dormant data that are loaded unnecessarily into the warehouse. Once you have identified what can be offloaded – keeping in mind the 80-20 rule where you focus your efforts on identifying the 20% of processing/data that is impacting 80% of your capacity constraints…..  You can use Synscort to re-write the expensive transformations in DMX-H on Hadoop before loading the data into the data warehouse.  You can also move the dormant data to Hadoop and use DMX-h for the transforming and loading this data- if you need to keep updating the unused data.  This way – you can eliminate all of the ELT related to unused data from the data warehouse and run it on Hadoop and store that data on Hadoop. Finally – this is typically not a one time event. You can view Hadoop as an extension of your data warehouse and they will co-exist for the forseeable future. You can repeat this process continually to maximize performance and minimize costs of your overall infrastructure.
  15. Before we go into a demonstration for the solution, lets take a look at some of the features that Appfluent provides to get started.  Appfluent’s software parses all the activity on your data warehouse at very granular levels of detail. This enable you to obtain actionable information using the Appfluent Visibility web application. You can identify all of the ELT processes that are most expensive on your system that can be offloaded. Second, since all the SQL activity is parsed, you can identify unused data at a Table and Column level of granularity over specified time periods.  Appfluent also parses the data functions being used to query data so you can assess the amount of history being queried by users – to guide your data retention policies. And finally, in addition to expensive ELT transformations, you can also identify end-user workloads and associated data sets that can be run just as well on Hadoop – freeing up capacity on your data warehouse. 
  16. Lets take a look at some real-world examples. In this example, Appfluent was used to identify expensive data extracts being performed by users running SAS on a high-end data warehouse system.  As you can see, the Appfluent Visibility web app was used to select applications that have the name ‘sas’ and focus on workloads that had no ‘constraints’ – meaning only data extracts.  What we found were ‘SAS’ was generated from 5 servers, and … just 42 unique SQL statements were consuming over 300 hours of server time. You can then use Appfluent to easily drill down on this information – and find details such as what data sets were involved and which users were associated with this activity.What we found was that, this activity was related to just 7 Tables, and accessed by a handful of SAS users which Appfluent identified. In this way you can identify data sets to offload to Hadoop and re-direct the application activity to Hadoop. – enabling you to recover wasted data warehouse capacity. 
  17. The next example shows expensive ELT transformations. In this case, the ELT processes only constituted less than 2% of their query workload – but was consuming over 60% of CPU and I/O capacity. Think about this skew for a moment! Appfluent can identify the most expensive ELT by both resource consumption and complexity of ELT – for example by Number of Joins, Sub Queries and other inefficiencies – and provide details about the ELT to enable you to begin the offloading process.
  18. Finally, here is an example of identifying Unused or Dormant data. You can identify unused Databases, Schemas, Tables and even specific fields within Tables – over specified time periods that are relevant for you. In this case, large Tables were not only unused, but more data was continuing to be loaded into these Tables on a daily basis – taking up wasted ELT processing capacity and unnecessary storage capacity. These 3 examples hopefully gave you a brief glimpse of how Appfluent provides the first step in exposing relevant information that can be used as a blue-print to begin offloading your data warehouse.  Syncsort will now discuss the next two steps in this process.  
  19. ThanksSantoshThe second stage in the framework for off-loading data and workloads into Hadoop is Access & Move.Once you’ve identified the data you then have to move it – while Hadoop provides a number of different utilities to move dataThe reality is you will need to use multiple different tools and they don’t have a graphical user interface so you’ll end up manually coding all the scripts and for many critical sources e.g. mainframe – Hadoop offers no connectivitySyncsort provides one solution, that can access data regardless of where it resides - for example we have native high performance connectors to Teradata, Db2, Oracle, IBM Mainframes, Cloud, SalesForce etc..These connectors allow you to extract data and load it natively into the hadoop cluster on each node – or load the data warehouse or marts directly in parallel from Hadoop.We also see a lot of customers are pre-processing and compressing the data before loading into Hadoop – one customers comScore who loads 1.5 Trillion events that’s about 40% of the internet page views through our product DMX-h into Hadoop and Greenplum literally saves Terabytes of storage every month just by sorting the data prior to compression.
  20. Once the data is in Hadoop, you will need a way to easily replicate the same workloads that previously ran in the DWH – typically sorts, joins, CDC, aggregations – but now in Hadoop. Now, sure you can manually write tons of scripts with HiveQL, Pig and Java, but that means you will have to re-train a lot of your staff to scale the development process. A steep learning curve awaits you, so getting productive will take some time. Besides Why re-invent the wheel, when you can easily leverage your existing staff and skills? Syncsort helps you get results quickly and with minimum effort with an intuitive graphical user interface where you can create sophisticated data flows without writing a single line of code. You can even develop and test  locally in Windows before deploying into Hadoop. In addition, we provide a set of Use Case Accelerators for common ETL use cases such as CDC, connectivity, aggregations and more.Finally, once you offload from expensive legacy and data warehouses, you need enterprise-grade tools to manage, secure and operationalize the enterprise data hub. With Syncsort you have file-based metadata, this means you can build once and reuse many times. We also provide full integration with management tools such as Cloudera Manager and Hadoop Job tracker – to easily deploy, monitor and administer your Hadoop cluster. And of course, iron security with leading support for Kerberos.When you put all these pieces together, it is what really makes this solution enterprise-ready!
  21. Now Santosh and Jeff from Syncsort will do a quick demo of the combined solution
  22. Now that you have seen a brief demo on how you can use Appfluent and Syncsort to offload your data warehouse, lets talk about some customers who have done this successfully in production systems. A large financial organization we worked found that their data growth and business needs had begun to grow at a rate that made it economically unsustainable to continue adding more capacity to their Enterprise Data Warehouse. Once they determined that managing data on Hadoop would be more than 5 times cheaper than what it cost them on their data warehouse….they decided to cap the the existing capacity on the data wareahouse and implemented a strategy to deploy Hadoop to extend their data warehouse.  They started a data warehouse modernization project – and systematically began analyzing and identify data sets and expensive transformations – using Appfluent – and offloaded to Cloudera.  The result was that they successfully capped the existing capacity on the data warehouse. They estimated that if they had not done so – they would have had to spend in excess of $15 million on additional capacity over a 18 month period. Instead the Hadoop environment which is now an extension of their data warehouse costs 6-8 times less in total cost of ownership per Terabyte. 
  23. This is anther financial institutions one of the largest in the world – the bank had a significant amount of data hosted and batch processed on Teradata. But for them like many Teradata customers – the cost was becoming unsustainable and they were faced with yet another multi million dollar upgrade. So having heard about Hadoop and the significantly lower cost per Gb of data they decided to migrate a loan marketing application to Cloudera’s distribution of Hadoop.While it proved the viability and massive cost savings of the Hadoop platform, they have hundreds more applications that need to be migrated. The loan application they moved across was initially using Hive and HQL and resulted in meeting the SLA but had much slower performance than Teradata and many maintainability concernsThe bank sought tools that could leverage existing staff skills (ETL) to facilitate the migration of the remaining applications and avoid the need to add significant staff with new skills (MapReduce). TheResults were striking - Significantly less development time was required for the DMX-h implementation of the Loan project. 12 man weeks for HiveQL implementation, 4 for DMXhSimplified process with over 140 HIVEQL scripts replaced with twelve graphicalDMX-h jobsMost importantly they Reduced the processing time from6 hrs to 15 minutes
  24. So there are three key takeawaysYou should be aware of the warehouse cost and capacity impacts from ELT and dormant data and the way it impacts your end usersOff-loading ELT and un-used data from your EDW to Hadoop has been proven as the lowest risk highest return first project for your new hadoop cluster and the cost savings can justify further Hadoop investment and more moon-shot like projects.It’s 3 simple steps – Identify, Access and Deploy
  25. By following these simple steps you can really use an Enterprise Data Hub based on Hadoop and your Enterprise Data Warehouse together with Syncsort and Appfluent to deliver something even better than the original vision of the Enterprise Data Warehouse Today.