SlideShare a Scribd company logo
1 of 9
Download to read offline
HADOOP IN A RELATIONAL DATA
WAREHOUSE
Data andAnalytics/Enterprise DW, Expedia
June 2013
Arek Kaczmarek
Background
 Expedia
 Site
 Competitors
 DW
 Legacy
 EDW
 DNA
 Hadoop at Expedia
 Original Purpose
 Early expectations
A case study
 Project objective
 Datasets
 Competitive shopping comparisons
 Properties
 Bookings
 Clickstream demand
 Forecast
DW architecture –
what’s different?
 Normalized vs denormalized tables
 Does it matter?
 Performance
 Ingestion speed
 Analytical flexibility
DEV work – do you need
different skills?
 Data files: csv, tsv, txt or xml – which work best?
 Hive: HQL UDFs for analytic functions – do you
need them?
 Optimization – reuse your knowledge?
 Architecture (temp tables, partitions)
 HQL (set parameters)
 Load_tags: partitioning, appending, syncing
RDBMSes and Hadoop –
what’s their relationship?
- Syncing from DB2 - Exporting into HBase
- Importing from SQLServer - Exporting into SQLServer
- Exporting into DB2
Place of Hadoop in a Relational
Data Warehouse?
 Conflicting
 Mutually exclusive
 Coexisting
 Complementing
What’s the new Data Warehouse
for data and analytics?
 Complementing:
Polyglot Persistence
Questions
?
akaczmarek@expedia.com

More Related Content

More from Innovation Enterprise

Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingMaking Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingInnovation Enterprise
 
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Innovation Enterprise
 
Strengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirStrengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirInnovation Enterprise
 
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAHow to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAInnovation Enterprise
 
Cisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoCisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoInnovation Enterprise
 
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Innovation Enterprise
 
Enablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackEnablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackInnovation Enterprise
 
Sales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRSales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRInnovation Enterprise
 
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrPredicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrInnovation Enterprise
 
Big Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncBig Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncInnovation Enterprise
 
Vizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaVizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaInnovation Enterprise
 
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Innovation Enterprise
 
Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Innovation Enterprise
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleInnovation Enterprise
 
Google Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionGoogle Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionInnovation Enterprise
 

More from Innovation Enterprise (20)

Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick LingMaking Sales and Operations Planning a Truly Collaborative Process, Dick Ling
Making Sales and Operations Planning a Truly Collaborative Process, Dick Ling
 
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
Building a Fast and Flexible Consumer-Driven Supply Chain, Stanley Black & De...
 
Strengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish SandhirStrengthen the Processes to reach another level of excellence, Satish Sandhir
Strengthen the Processes to reach another level of excellence, Satish Sandhir
 
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDAHow to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
How to Keep S&OP From Getting "Stuck", Oliver Wight, JDA
 
S&OP Innovation, Marietta
S&OP Innovation, MariettaS&OP Innovation, Marietta
S&OP Innovation, Marietta
 
Cisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, CiscoCisco Strategic Planning The Journey, Cisco
Cisco Strategic Planning The Journey, Cisco
 
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...Sales and Operations Planning, Supported by Demand Management Capability, Sus...
Sales and Operations Planning, Supported by Demand Management Capability, Sus...
 
Enablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrackEnablers for Maturing your S&OP Processes, SherTrack
Enablers for Maturing your S&OP Processes, SherTrack
 
S&OP, Kinaxis
S&OP, KinaxisS&OP, Kinaxis
S&OP, Kinaxis
 
Sales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCRSales, Inventory & Operations Planning During High Growth, GMCR
Sales, Inventory & Operations Planning During High Growth, GMCR
 
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottrPredicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
Predicting The Future With Big Data: No Crystal Ball Required, TrendSpottr
 
Big Data Toronto, Unata
Big Data Toronto, UnataBig Data Toronto, Unata
Big Data Toronto, Unata
 
Big Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn IncBig Data in Education, Desire2Learn Inc
Big Data in Education, Desire2Learn Inc
 
Vizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda CanadaVizualization, The Humanization of Big Data, Aveda Canada
Vizualization, The Humanization of Big Data, Aveda Canada
 
Crowd Sourced Data, Bit Torrent
Crowd Sourced Data, Bit TorrentCrowd Sourced Data, Bit Torrent
Crowd Sourced Data, Bit Torrent
 
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
Complex Weather data and a Multi-platform Audience: Big Data at The Weather N...
 
Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay Tackling the Unquantifiable with Big Data, eBay
Tackling the Unquantifiable with Big Data, eBay
 
Attribution Modeling and Big Data, Google
Attribution Modeling and Big Data, GoogleAttribution Modeling and Big Data, Google
Attribution Modeling and Big Data, Google
 
Google Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel SessionGoogle Glass: The Importance of Data Processing and Privacy, Panel Session
Google Glass: The Importance of Data Processing and Privacy, Panel Session
 
Big Data Innovation Summit, Kobo
Big Data Innovation Summit, KoboBig Data Innovation Summit, Kobo
Big Data Innovation Summit, Kobo
 

Recently uploaded

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdfJamie (Taka) Wang
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Recently uploaded (20)

NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
20200723_insight_release_plan_v6.pdf20200723_insight_release_plan_v6.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Hadoop in a Relational Data Warehouse, Expedia

  • 1. HADOOP IN A RELATIONAL DATA WAREHOUSE Data andAnalytics/Enterprise DW, Expedia June 2013 Arek Kaczmarek
  • 2. Background  Expedia  Site  Competitors  DW  Legacy  EDW  DNA  Hadoop at Expedia  Original Purpose  Early expectations
  • 3. A case study  Project objective  Datasets  Competitive shopping comparisons  Properties  Bookings  Clickstream demand  Forecast
  • 4. DW architecture – what’s different?  Normalized vs denormalized tables  Does it matter?  Performance  Ingestion speed  Analytical flexibility
  • 5. DEV work – do you need different skills?  Data files: csv, tsv, txt or xml – which work best?  Hive: HQL UDFs for analytic functions – do you need them?  Optimization – reuse your knowledge?  Architecture (temp tables, partitions)  HQL (set parameters)  Load_tags: partitioning, appending, syncing
  • 6. RDBMSes and Hadoop – what’s their relationship? - Syncing from DB2 - Exporting into HBase - Importing from SQLServer - Exporting into SQLServer - Exporting into DB2
  • 7. Place of Hadoop in a Relational Data Warehouse?  Conflicting  Mutually exclusive  Coexisting  Complementing
  • 8. What’s the new Data Warehouse for data and analytics?  Complementing: Polyglot Persistence