SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Architecting for Big Data  Integrating Hadoop into an Enterprise Data Infrastructure Raghu Kashyap and Jonathan Seidman Gartner Peer Forum September 14 | 2011
Who We Are ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page
page  Launched in  2001, Chicago, IL  Over 160 million bookings
What is Hadoop? ,[object Object],[object Object],[object Object],[object Object],page
Why Hadoop? ,[object Object],page  $ per TB
Why We Started Using Hadoop page  Optimizing hotel search…
Why We Started Using Hadoop ,[object Object],[object Object],page
The Problem… ,[object Object],page  Transactional Data (e.g. bookings) Data Warehouse Non-transactional Data (e.g. searches)
Hadoop Was Selected as a Solution… page  Transactional Data (e.g. bookings) Data Warehouse Non-Transactional Data (e.g. searches) Hadoop
Unfortunately… ,[object Object],[object Object],[object Object],page
Current Big Data Infrastructure Hadoop page  MapReduce HDFS MapReduce Jobs (Java, Python,  R/RHIPE) Analytic Tools (Hive, Pig) Data Warehouse (Greenplum) psql, gpload, Sqoop External Analytical Jobs (Java, R, etc.) Aggregated Data Aggregated Data
Hadoop Architecture Details ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page
Deploying Hadoop Enabled Multiple Applications… page
But Brought New Challenges… ,[object Object],[object Object],page
In Early 2011… ,[object Object],[object Object],[object Object],[object Object],page
Karmasphere Analyst page
Karmasphere Analyst page
Datameer Analytics Solution page
Datameer Analytics Solution page
Not to Mention Other BI Vendors… page
One More Use Case – Click Data Processing ,[object Object],page
Click Data Processing – Current Data Warehouse Processing  page  Web Server Logs ETL DW Data Cleansing (Stored  procedure) DW Web Server Web Servers 3 hours 2 hours ~20% original data size
Click Data Processing – Proposed Hadoop Processing  page  Web Server Logs HDFS Data Cleansing (MapReduce) DW Web Server Web Servers
Lessons Learned ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page
Lessons Learned ,[object Object],[object Object],[object Object],page
Lessons Learned ,[object Object],[object Object],[object Object],page
In the Near Future… ,[object Object],[object Object],[object Object],[object Object],page
[object Object],page
What is  Web  Analytics? ,[object Object],[object Object],[object Object],[object Object]
Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object]
continued…. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Categories ,[object Object],[object Object],[object Object],[object Object],[object Object]
Web Analytics & Big Data ,[object Object],[object Object],[object Object]
Processing of Web Analytics Data
Aggregating data into Data Warehouse
Data Analysis Jobs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Business Insights page
Centralized Decentralization Web Analytics team + SEO team + Hotel optimization team
Model for success ,[object Object],[object Object],[object Object]
Should everyone do this? ,[object Object],[object Object],[object Object],[object Object]
Questions? ,[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

How to implement hadoop successfuly
How to implement hadoop successfulyHow to implement hadoop successfuly
How to implement hadoop successfuly
Adir Sharabi
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Spark Summit
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
hooduku
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 

Was ist angesagt? (20)

How to implement Hadoop successfully
How to implement Hadoop successfullyHow to implement Hadoop successfully
How to implement Hadoop successfully
 
Rethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data HubRethink Analytics with an Enterprise Data Hub
Rethink Analytics with an Enterprise Data Hub
 
How to implement hadoop successfuly
How to implement hadoop successfulyHow to implement hadoop successfuly
How to implement hadoop successfuly
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Recipes for Unlocking Value from Big Data
Recipes for Unlocking Value from Big DataRecipes for Unlocking Value from Big Data
Recipes for Unlocking Value from Big Data
 
BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Reinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapRReinventing the Modern Information Pipeline: Paxata and MapR
Reinventing the Modern Information Pipeline: Paxata and MapR
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 

Ähnlich wie Gartner peer forum sept 2011 orbitz

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Cloudera, Inc.
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
Rajesh Angadi
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Jane Roberts
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 

Ähnlich wie Gartner peer forum sept 2011 orbitz (20)

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergence
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
Rajesh Angadi Brochure
Rajesh Angadi Brochure Rajesh Angadi Brochure
Rajesh Angadi Brochure
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Learn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant ResourceLearn About Big Data and Hadoop The Most Significant Resource
Learn About Big Data and Hadoop The Most Significant Resource
 

Mehr von Raghu Kashyap

Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners View
Raghu Kashyap
 

Mehr von Raghu Kashyap (8)

Agile 2017 Lean Product Development
Agile 2017 Lean Product DevelopmentAgile 2017 Lean Product Development
Agile 2017 Lean Product Development
 
Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?Is BI/Analytics and Agile an Oxymoron?
Is BI/Analytics and Agile an Oxymoron?
 
Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?Traditional BI or Disruptive BI?
Traditional BI or Disruptive BI?
 
Idiots guide to stocks
Idiots guide to stocksIdiots guide to stocks
Idiots guide to stocks
 
Orbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentationOrbitz fifth elephant_2015_conference_orbitz_presentation
Orbitz fifth elephant_2015_conference_orbitz_presentation
 
Big Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners ViewBig Data Analytics from a Practitioners View
Big Data Analytics from a Practitioners View
 
Big Data redefines Enterprise Data Warehouse @Bangalore
Big Data redefines Enterprise Data Warehouse @BangaloreBig Data redefines Enterprise Data Warehouse @Bangalore
Big Data redefines Enterprise Data Warehouse @Bangalore
 
Accelerate 2012 chicago - orbitz
Accelerate   2012 chicago - orbitzAccelerate   2012 chicago - orbitz
Accelerate 2012 chicago - orbitz
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Gartner peer forum sept 2011 orbitz

Hinweis der Redaktion

  1. Welcome everyone. I will be presenting on how we are shaping up web analytics and big data to optimize the data driven decisions at Orbitz World wide 2.. I will also be talking about the process model on how we are effectively utilizing the brains and man power across the organization towards a common goal 4. Between me and Jonathan we promise to give you some thought provoking details about analytics and Big data :-)
  2. Most people think of orbitz.com, but Orbitz Worldwide is really a global portfolio of leading online travel consumer brands including Orbitz, Cheaptickets, The Away Network, ebookers and HotelClub. Orbitz also provides business to business services - Orbitz Worldwide Distribution provides hotel booking capabilities to a number of leading carriers such as Amtrak, Delta, LAN, KLM, Air France and Orbitz for Business provides corporate travel services to a number of Fortune 100 clients Orbitz started in 1999, orbitz site launched in 2001.
  3. A couple of years ago when I mentioned Hadoop I’d often get blank stares, even from developers. I think most folks now are at least aware of what Hadoop is.
  4. This chart isn’t exactly an apples-to-apples comparison, but provides some idea of the difference in cost per TB for the DW vs. Hadoop Hadoop doesn’t provide the same functionality as a data warehouse, but it does allow us to store and process data that wasn’t practical before for economic and technical reasons. Putting data into a DB or DWH requires having knowledge or making assumptions about how the data will be used. Either way you’re putting constraints around how the data is accessed and processed. With Hadoop each application can process the raw data in whatever way is required. If you decide you need to analyze different attributes you just run a new query.
  5. The initial motivation was to solve a particular business problem. Orbitz wanted to be able to use intelligent algorithms to optimize various site functions, for example optimizing hotel search by showing consumers hotels that more closely match their preferences, leading to more bookings.
  6. Improving hotel search requires access to such data as which hotels users saw in search results, which hotels they clicked on, and which hotels were actually booked. Much of this data was available in web analytics logs.
  7. Management was supportive of anything that facilitated ML team efforts. But when we presented a hardware spec for servers with local non-raided storage, etc. syseng offered us blades with attached storage.
  8. Hadoop is used to crunch data for input to a system to recommend products to users. Although we use third-party sites to monitor site performance, Hadoop allows the front end team to provide detailed reports on page download performance, providing valuable trending data not available from other sources. Data is used for analysis of user segments, which can drive personalization. This chart shows that Safari users click on hotels with higher mean and median prices as opposed to other users. This is just a handful of examples of how Hadoop is driving business value.
  9. Recently received an email from a user seeking access to Hive. Sent him a detailed email with info on accessing Hive, etc. Received an email back basically saying “you lost me at ssh”.
  10. Previous to 2011 Hadoop responsibilities were split across technology teams. Moving under a single team centralized responsibility and resources for Hadoop.
  11. Processing of click data gathered by web servers. This click data contains marketing info. data cleansing step is done inside data warehouse using a stored procedure further downstream processing is done to generate final data sets for reporting Although this processing generates the required user reports, this process consumes considerable time and resources on the data warehouse, consuming resources that could be used for reports, queries, etc.
  12. ETL step is eliminated, instead raw logs will be uploaded to HDFS which is a much faster process Moving the data cleansing to MapReduce will allow us to take advantage of Hadoop’s efficiencies and greatly speed up the processing. Moves the “heavy lifting” of processing the relatively large data sets to Hadoop, and takes advantage of Hadoop’s efficiencies.
  13. Bad news is we need to significantly increase the number of servers in our cluster, the good news is that this is because teams are using Hadoop, and new projects are coming online.
  14. I met someone at the train station who asked me what I do? I said I work in the web analytics field and I help shape up the strategy and vision at Orbitz worldwide and enable our business teams to get insights on the performance of our site and act upon it. So he said, Ah you do reporting :-) 2. I started thinking why web analytics is hard for people to get and started evangelizing both within and outside Orbitz 3. I manage the webanalytics team at Orbitz worldwide I also try to help out non-profit organizations while I am not busy with my wife and 2 sons
  15. 1 So what is web analytics? 2 Read the definition. It tells you exactly why someone came to your site and what kind of impact they had on the bottom line of your revenue 3. Read the definition. You need to immerse yourself in data to understand the story it's telling 4. Read the definition. Focus on Customer. Customer is the king. You need to listen and act upon their feedback 5. Read the definition. Test Test and Test. If you want to prove or disprove a HIPPO's opinion you need to perform tests on your site 6. btw HIPPO is a common terminology in the industry. It stands for Highest Income Paid person's opinion :-)
  16. S o with so many brands and so much data we had quite a few challenges? For starters we couldn't easily do multi dimensional analysis with the tools. With data spread across in multiple tools it was hard to picture the whole 9 yards obviously tools cost money Harder for people to understand where to look at for data With Analytics you need direction rather than precision to take action and get insights
  17. In the Big Data front we didn't have a good infrastructure where we could house all this data in a cost effective way. 2. Data extraction was NOT an easy task 3. Focusing on the key differences on when you need testing v/s when you need reporting. 4. Earlier I mentioned that you need to do rigorous outcome analysis. However, with all the challenges we faced it was not an easy task.
  18. So how do we fit the puzzle? By learning the behavior of the customer and focusing on key attributes Know the travel details such as how many travelers, what kind of travelers, any preferred carrier or hotels? 4. Understand the shopping patterns. Does he want to shop only on weekends or else only on Thursdays. 5. Focus on Visit Patterns. How many times does he come to the site before he buys anything 6. Learn the page navigation. I.e does he see 100 pages every time he comes or does he know exactly what to look at 7. Master the Demand source. Anyone who's worked in the marketing side knows that attribution is a holy war. Deciding which demand source gets the credit for conversion is something people will argue to death Just like the IDE war between VIM, EMACS, Intellij and Eclipse :-)
  19. We realized that with all the challenges we had, we had to innovate and experiment new ways to enable successful web analytics at OWW 2. We generate hundreds of GB of log data per day. How can we effectively store this massive data and how can we mine this data and make sense out of it? 3. Our existing DW was not intended to support such large sets of data and more importantly process this data We also needed to make sure that we don't spend huge money to store this data set. 4. Big data infrastructure with Hadoop has been a huge success at Orbitz and at other organizations So what does this buy us? We can now store data for a long period of time without worrying too much about the space Analysts and developers have access to this data set Developers can run adhoc queries to support our business needs. While the core web analytics team focuses on the company standards and metrics
  20. Here is an example of how we process our site analytics data today. We FTP the log files into our Hadoop infrastructure daily. The files are LZO compressed for better storage utilization. Developers then write Map reduce jobs against these raw log files to output data into HIVE tables. HIVE is a DW equivalent of Hadoop Most of the MR jobs are written using Java and scripting languages such as Python, Ruby, BASH. Business teams however, have skillset to run queries against HIVE tables.
  21. Since the market on Big Data is not that mature there are no good ways to build visualization on top of HIVE 2. Due to this and for other reasons we need to bring a subset of this data into our warehouse. 3. So in essence the data that are in HIVE will make it into the warehouse. 4. There are companies such as Karmaspehe, Datamere who are in the initial stages of bridging the gap between business needs and Hadoop access. 5. However, its too early to say if this will be the norm
  22. We focused on some key areas of our business such as demand source and campaigns as our pilot and worked with our business partners to enable the analytics on Big Data 2. We have developers writing Map Reduce jobs which run every day and populate HIVE tables We generate more than 25 million records for a month for the pilot use case that we worked on This only show cases the sheer magnitude and power of analytics within the Big Data framework
  23. So if you have read Avinash Kaushik’s book and his follow his blog Occoms razor” then you know what he always mentions 2 words Data puke Gold (Insights) Here we have a nice depiction of all kinds of insights provided in a nice dashbaord format to our business users. These insights were only made possible due to the data that we housed and extracted from Hadoop. Obviously I couldn’t share what these graphs meant without giving more details 
  24. So how do you organizationally structure yourself and Big Data so that you can be effective both in terms of resource utilization and setting the platform for success 2. This is what we call the Centralized Decentralization. 3. With this approach the core web analytics team controls and supports the individual teams when it comes to data extraction and modeling. 4. This prevents one team from being the bottle neck with data extraction and analytics 5. If you have ever worked in the Data Warehouse side of the world you will know the challenges and delays in getting the data
  25. With the core process of centralized decentralization and being agile how do you succeed? You can't manage if you can't measure. But once you measure make sure you fail fast Every team needs to be thinking of analytics with every feature they work on Dimensional modeling is great but like someone wise said 'All models are wrong but some are useful" :-) My point here is data without analysis is like a Ferrari without gas. If you Make it a point to extract smaller chunks of data and tie this effort to your business objectives. You are sure to succeed
  26. Here are some key learning's from our experience and some thoughts for you to consider If you have the strength of technology go for it. This needs heavy investment from time and resource perspective Like I mentioned many times data without analysis is worthless
  27. Thanks again for listening to our story and we would be available for any further questions you may have. Also if you are know anyone who is interested in working at Orbitz please check out the career site