SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Lars George
Cloudera
‘Unaccapt the Status Quo’
Evolution of Analytic Data
Management
Lars George, Director EMEA Services
@larsgeorge
Growth of Data
Growth of Data
Growth of Cost
Growth of Data
Growth of Cost
Cost with Hadoop
Hadoop’s Genesis …
The Origins of Hadoop
9
Source: Credit Suisse
100% Open Source
10% to 1% the cost
of traditional alternatives
13
What If You Can …
Any Data Amount
Ask Any Question
At Any Speed
For Any Usage
Any Type or Form
Macys.com optimizes the online experience,
reducing email subscription churn by 20%.
Ask Bigger Questions:
How can we maintain high value
customer service in the online channel?
Macys.com benefits from math on
HadoopThe Challenge:
• Extending brand differentiators – customer service, the right product selection – to
online channel
• Incumbent IT struggles to capture & analyze unstructured clickstream/ weblog data at
necessary speed & scale
Macys.com improves loyalty & reduces
email subscription churn by 20%.
The Solution
• Cloudera Enterprise + SAS Enterprise Miner
• Customer insights across channels in near real time
• Advanced segmentation – fewer, more specific emails
Chevron reduces the cost of sending
deepwater drillships into the ocean by more
precisely identifying oil reservoirs.
Ask Bigger Questions:
Where should we look for oil?
Chevron cuts operating costs
The Challenge:
• Very complex to process, store and analyze massive volumes of 5D seismic data
collected by instruments in the ocean
• Drillships cost roughly $1 million per day to operate
Chevron can reduce the cost of sending
deepwater drillships into the ocean by
more precisely identifying oil reservoirs.
The Solution
• CDH platform manages ½ PB seismic data
• More elegant, simple approach to large-scale
data processing at lower cost
19
Skybox Imaging is indexing the earth through a
high-performance constellation of imaging microsatellites.
Ask Bigger Questions:
How can we help companies physically
view their business landscape?
©2013Cloudera, Inc. All rights reserved.19
21
A global financial services company can more quickly and
accurately find fraud while saving $30 million in IT costs.
Ask Bigger Questions:
How can we prevent fraud?
©2013Cloudera, Inc. All rights reserved.21 ©2013Cloudera, Inc. All rights reserved.21
22
Ask Bigger Questions:
How can we conserve energy?
Opower provides 360-degree views into energy usage
patterns and household comparisons.
©2013Cloudera, Inc. All rights reserved.22 ©2013Cloudera, Inc. All rights reserved.22
“Somewhere, something
incredible is waiting
to be known.”
Carl Sagan, Astrophysicist
23
24
The Large Hadron Collider
at CERN generates 27
terabytes of data per day.
Ask Bigger Questions
about the Universe
25
You're not one in a trillion,
you're one in 100 trillion.
Ask Bigger Questions
about the Genome
26
Environment of Change
Where does your path lead?
27
Lars George
Twitter: @larsgeorge

Weitere ähnliche Inhalte

Ähnlich wie Lars George - Unaccept the Status Quo

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDataStax
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
Using Hadoop
Using HadoopUsing Hadoop
Using Hadoopeaiti
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York EventHARMAN Services
 
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...SlideTeam
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case studySudhi Seshachala
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acmehooduku
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsLooker
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudDATAVERSITY
 
Cloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for PartnersCloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for PartnersAmazon Web Services
 
Unravel for azure databricks overview 4 28-20 final
Unravel for azure databricks overview 4 28-20 finalUnravel for azure databricks overview 4 28-20 final
Unravel for azure databricks overview 4 28-20 finalDevOps.com
 
Building Your Business In The Cloud - CANARIE
Building Your Business In The Cloud - CANARIEBuilding Your Business In The Cloud - CANARIE
Building Your Business In The Cloud - CANARIEAnna from Fundica
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataMatt Stubbs
 
Bmc joe goldberg
Bmc joe goldbergBmc joe goldberg
Bmc joe goldbergBigDataExpo
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowInside Analysis
 
Driving Business Benefits with Hadoop
Driving Business Benefits with HadoopDriving Business Benefits with Hadoop
Driving Business Benefits with HadoopMapR Technologies
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Etu Solution
 

Ähnlich wie Lars George - Unaccept the Status Quo (20)

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerceDon't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
Using Hadoop
Using HadoopUsing Hadoop
Using Hadoop
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York Event
 
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
Cloud Computing Roadmap Public Vs Private Vs Hybrid And SaaS Vs PaaS Vs IaaS ...
 
Hooduku - Big data analytics - case study
Hooduku - Big data analytics - case studyHooduku - Big data analytics - case study
Hooduku - Big data analytics - case study
 
Latest corp big data and acme
Latest corp   big data and acmeLatest corp   big data and acme
Latest corp big data and acme
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven DecisionsPower to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
Cloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for PartnersCloud: The Commercial Silver Lining for Partners
Cloud: The Commercial Silver Lining for Partners
 
Unravel for azure databricks overview 4 28-20 final
Unravel for azure databricks overview 4 28-20 finalUnravel for azure databricks overview 4 28-20 final
Unravel for azure databricks overview 4 28-20 final
 
Building Your Business In The Cloud - CANARIE
Building Your Business In The Cloud - CANARIEBuilding Your Business In The Cloud - CANARIE
Building Your Business In The Cloud - CANARIE
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Bmc joe goldberg
Bmc joe goldbergBmc joe goldberg
Bmc joe goldberg
 
The Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and HowThe Cloud Imperative – What, Why, When and How
The Cloud Imperative – What, Why, When and How
 
Driving Business Benefits with Hadoop
Driving Business Benefits with HadoopDriving Business Benefits with Hadoop
Driving Business Benefits with Hadoop
 
Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展
 

Kürzlich hochgeladen

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Lars George - Unaccept the Status Quo

  • 2. Evolution of Analytic Data Management Lars George, Director EMEA Services @larsgeorge
  • 3.
  • 4.
  • 7. Growth of Data Growth of Cost Cost with Hadoop
  • 9. The Origins of Hadoop 9 Source: Credit Suisse
  • 11. 10% to 1% the cost of traditional alternatives
  • 12.
  • 13. 13 What If You Can … Any Data Amount Ask Any Question At Any Speed For Any Usage Any Type or Form
  • 14. Macys.com optimizes the online experience, reducing email subscription churn by 20%. Ask Bigger Questions: How can we maintain high value customer service in the online channel?
  • 15. Macys.com benefits from math on HadoopThe Challenge: • Extending brand differentiators – customer service, the right product selection – to online channel • Incumbent IT struggles to capture & analyze unstructured clickstream/ weblog data at necessary speed & scale Macys.com improves loyalty & reduces email subscription churn by 20%. The Solution • Cloudera Enterprise + SAS Enterprise Miner • Customer insights across channels in near real time • Advanced segmentation – fewer, more specific emails
  • 16. Chevron reduces the cost of sending deepwater drillships into the ocean by more precisely identifying oil reservoirs. Ask Bigger Questions: Where should we look for oil?
  • 17. Chevron cuts operating costs The Challenge: • Very complex to process, store and analyze massive volumes of 5D seismic data collected by instruments in the ocean • Drillships cost roughly $1 million per day to operate Chevron can reduce the cost of sending deepwater drillships into the ocean by more precisely identifying oil reservoirs. The Solution • CDH platform manages ½ PB seismic data • More elegant, simple approach to large-scale data processing at lower cost
  • 18. 19 Skybox Imaging is indexing the earth through a high-performance constellation of imaging microsatellites. Ask Bigger Questions: How can we help companies physically view their business landscape? ©2013Cloudera, Inc. All rights reserved.19
  • 19. 21 A global financial services company can more quickly and accurately find fraud while saving $30 million in IT costs. Ask Bigger Questions: How can we prevent fraud? ©2013Cloudera, Inc. All rights reserved.21 ©2013Cloudera, Inc. All rights reserved.21
  • 20. 22 Ask Bigger Questions: How can we conserve energy? Opower provides 360-degree views into energy usage patterns and household comparisons. ©2013Cloudera, Inc. All rights reserved.22 ©2013Cloudera, Inc. All rights reserved.22
  • 21. “Somewhere, something incredible is waiting to be known.” Carl Sagan, Astrophysicist 23
  • 22. 24 The Large Hadron Collider at CERN generates 27 terabytes of data per day. Ask Bigger Questions about the Universe
  • 23. 25 You're not one in a trillion, you're one in 100 trillion. Ask Bigger Questions about the Genome
  • 25. Where does your path lead? 27 Lars George Twitter: @larsgeorge

Hinweis der Redaktion

  1. We manage data using EDW, storage and databases, systems from the 80s.Dedicated, single-purpose systems. 80s volumes, 80s questions.Good at problems they were built to handle, but not for today’s problems. Can’t get big, not flexible enough for 21st century questions.I would like the picture of servers to build, but not to dim, and not to have the first data curve come in, until I click right here. Then I will say:Data growth is relentless. PAUSE. CLICK.
  2. EDW siloes, secularizes. Rigid. Sharing impossible.Worst: Fantastically expensive containers. Crazy: Value is in the data!Want to keep all the data. Bigger questions. Better answers. Need new systems.PAUSE. CLICK.
  3. To set the stage, We’d like to share a few thoughts with you on the origins of Hadoop and some of it’s guiding principles …
  4. The world is changing (energy, agriculture, life sciences, astronomy, banking, health care...)More data, from more places, arriving faster.Need systems to SEE data, UNDERSTAND, ACT ON.Prisoners of our own success. PAUSE, ADVANCE.
  5. Link to SFDC account record (only valid for Cloudera employees): https://na6.salesforce.com/0018000000fHy7U?srPos=0&srKp=001Macys.com improves customer loyalty and reduces email subscription churn by 20%. Background: Macy’s has built its business over many decades by delivering superior customer service and “the right” product selection to its loyal customer base. In recent years, consumers have started shopping more and more online instead of going to physical stores to buy the products they’re looking for. The retailer realized they’d need to not only offer products for sale online, but if they wanted to keep their loyal customer base and recognized brand, they would need to offer a similar high value customer experience online as they do in stores. They ultimately set out to gain a better understanding of each customer’s lifetime value through a holistic view of their relationship with the retail company. Challenge: In order to offer customers a personalized, high-value experience in the online channel, Macy’s needed to be able to see and understand customers’ behavior holistically – across online and offline channels. This alone would be challenging, as the retailer’s incumbent IT environment captured online and offline activity in disparate systems. The relational databases they had in house struggled to capture unstructured clickstream and weblog data at the speed and scale that would be necessary to keep pace with growing online activity. Also, their RDBMS offered limited analytical flexibility, since data had to be structured for analysis as it was brought into the data warehouse. They had to know what kinds of questions they wanted to ask of their data before loading it into the system. Furthermore, they wanted to capture and analyze more about their customers’ activity online – rather than just looking at what they buy online and seeing how that compares to what they buy in stores, the retailer wanted to be able to look at which web pages each customer is visiting, what pages they enter and exit the website from, what they look at online and how that corresponds to what they actually buy either in stores or on the web. Collecting, storing and analyzing all of this web data would result in huge data volume growth. Integrating online and offline data into a single IT system, and capturing more detail and volume of online activity would prove both cumbersome and very expensive with the retailer’s existing IT infrastructure. Solution: Macys.com deployed Cloudera Enterprise in conjunction with SAS Enterprise Miner. Cloudera Enterprise offers a scalable, flexible environment for capturing massive and growing volumes of both online and offline data, and SAS enables advanced analytics to make sense of all that data. For example, they’re running analytics to segment their customers so they can send less frequent but more thoughtful email communications to their subscribed base. They’ve also built algorithms to guide cross-sell and up-sell offers on their website.Results: Macys.com’snew environment is delivering customer insight across channels in near real time. The retailer can send fewer but more specific emails to customers, and they can more effectively measure and understand the impact of their online marketing initiatives. They finally have a 360-degree view of their customers, spanning both online and offline activity. And it’s working – they’ve been able to reduce email subscription churn by 20%. The company is seeing increased customer loyalty and overall profitability as a result of their new analytics infrastructure.
  6. Background: To find oil in places like the Gulf of Mexico, Chevron sends out ships that survey areas suspected to contain oil or gas deposits. The ships generate waves creating seismic data that is then converted into a picture that Chevron engineers can use to determine where oil is located. It’s important that this picture is as clear as possible because drilling in deepwater is expensive. Drillships cost roughly $1 million per day to operate, according to a September 2011 presentation by Peter Breunig, general manager of IT technology and architecture at Chevron.The process of collecting seismic data works by sending seismic waves into the ocean floor, which are then reflected back to the surface. An array of receivers records the amplitude and arrival times of the reflected waves as a time series that can later be reconstructed to get a picture of what the terrain looks like under the ocean floor, Josh Wills, director of data science at Cloudera, wrote in a January paper.Challenge: Chevron wanted to reduce the cost of sending deepwaterdrillships out into the ocean by doing a better job of processing the vast amounts of data that can help identify reservoirs of oil. (Volumes currently amount to ½ PB.) Solution: Today, Chevron gathers information that includes five dimensions – the x and y coordinates of both the wave’s source and target – along with the time it was collected. The company uses Hadoop software to sort that data. It’s one step in more than 25 steps Chevron takes with the data to create a picture for engineers to use to locate oil reservoirs. Chevron uses a supercomputer to create models and simulations of the underground environment.Benefits: The more data Chevron can collect, the better it can find pockets of oil and natural gas underground.  Hadoop can do some of the seismic data processing in a less expensive way – 10x less than traditional technologies on average.
  7. Background: To find oil in places like the Gulf of Mexico, Chevron sends out ships that survey areas suspected to contain oil or gas deposits. The ships generate waves creating seismic data that is then converted into a picture that Chevron engineers can use to determine where oil is located. It’s important that this picture is as clear as possible because drilling in deepwater is expensive. Drillships cost roughly $1 million per day to operate, according to a September 2011 presentation by Peter Breunig, general manager of IT technology and architecture at Chevron.The process of collecting seismic data works by sending seismic waves into the ocean floor, which are then reflected back to the surface. An array of receivers records the amplitude and arrival times of the reflected waves as a time series that can later be reconstructed to get a picture of what the terrain looks like under the ocean floor, Josh Wills, director of data science at Cloudera, wrote in a January paper.Challenge: Chevron wanted to reduce the cost of sending deepwaterdrillships out into the ocean by doing a better job of processing the vast amounts of data that can help identify reservoirs of oil. (Volumes currently amount to ½ PB.) Solution: Today, Chevron gathers information that includes five dimensions – the x and y coordinates of both the wave’s source and target – along with the time it was collected. The company uses Hadoop software to sort that data. It’s one step in more than 25 steps Chevron takes with the data to create a picture for engineers to use to locate oil reservoirs. Chevron uses a supercomputer to create models and simulations of the underground environment.Benefits: The more data Chevron can collect, the better it can find pockets of oil and natural gas underground.  Hadoop can do some of the seismic data processing in a less expensive way – 10x less than traditional technologies on average.
  8. A Fortune 500 company specializing in agriculture and genomics can automate data-driven R&D decisions to reduce time to market from years to months.Background: A major agricultural company sells seeds and genetic traits developed through biotechnology and crop protection chemicals. Their mission is to attack hunger while our world population grows from 7 billion to 9 billion people, helping farmers produce as much food in the next few decades as they have in the last 10,000 years combined. Challenge: It takes 5-10 years to bring one new product to market because of the intensive research, testing and evaluation that needs to be done during the R&D process. Meanwhile, the company’s data from labs, the field, literature and so on are all stored separately and it seemed impossible to combine those data sources. Their researchers were working in special purpose analytical systems that made it difficult to share their results and combine information. Solution: The biotech company has deployed Cloudera Enterprise + RTQ (Impala) to knock down data silos and help researchers share their data. They get the analytical power of MapReduce and can explore results and design hypotheses at the speed of thought using Impala. They’ve now started using Cloudera Search to index images of plants at various stages in their lifecycles to optimize the production process further. And the company is an early adopter of Cloudera Navigator. Their Cloudera system is integrated with the Oracle Exadata data warehouse. Results: Cloudera Enterprise with Impala helps researchers work together so they can automate many data-driven decisions in the R&D pipeline, answering questions like: What traits do we want to integrate into this germ plasm? Which germ plasms do we integrate -- which male and female plants should be brought together to create a child plant?Once that child plant is created, where should it be tested -- in the northern or southern part of the country? This ultimately helps them reduce the time to market of new products.The company is giving scientists direct access to Hadoop so everyone has a single view of their R&D data. Cloudera Navigator will help them increase user adoption of the Cloudera platform even further by offering auditing and access control.Major agricultural company sells seeds, genetic traits developed through biotechnology and crop protection chemicals.Their mission: Attack hunger. Keep pace with a world population growing from 7 billion to 9 billion people. Allow farmers to produce as much food in the next few decades as we have produced in the last 10,000 years combined.Challenge: Takes as long as a decade to bring a single new product to market.Data is siloed: Data from laboratories, the field, literature and so on all stored separately. Impossible to combine them.Researchers are siloed: Special-purpose analytical and data systems make difficult to collaborate and share results.They need to work together to answer questions like:Which traits do we want to integrate into this germ plasm?Which germ plasms do we integrate – which males and female plants should be brought together to create a child plant?Once that child plant is created, where should it be tested – in the northern or southern part of the country?Cloudera, Hadoop and Impala knock down the siloes. Researchers can share all their data. They get the analytical power of MapReduce, and can explore results and design hypotheses at the speed of thought using Impala.The platform supports better analysis, automates more of the research and discovery process and lets humans focus on what they do best: Thinking of great ideas and testing them.
  9. Skybox Imaging delivers the highest resolution imagery of any spot on earth multiple times per day.Background: Skybox Imaging is developing the world's highest performance constellation of imaging micro-satellites. Those satellites deliver high resolution imagery of any spot on earth multiple times every day. Skybox’s goal is to abstract information from those images and make it available to companies. Skybox’s clients will be able to communicate directly with the satellites. Their satellite offering will launch this fall (2013), and they already have several customers lined up to take advantage of it. MIT's Technology Review publication nominated Skybox Imaging as one of the Top 50 Most Innovative Companies in 2012.  Challenge: Since there isn’t much light in space, capturing high resolution images of the globe requires a lot of image processing of binary data. Skybox also wanted the capability to store unlimited historical images for continuous, temporal analysis. And they wanted to be able to store completely unstructured data and add more structure and supporting applications over time to offer organizations unprecedented geospatial and visual insight into their businesses. Using a traditional RDBMS system in conjunction with a geospatial tool (e.g. Oracle Spatial) and networking equipment would be very expensive and wouldn't be able to provide geospatial imaging at scale (i.e. to cover the entire globe). Solution: Skybox Imaging has deployed CDH at the core of their IT infrastructure for all data processing and discovery. They’re ingesting about a terabyte of raw satellite data every day into CDH, and they don’t ever throw any data away.Skybox is using Cloudera Search to index all of the images they’re collecting and make it easier for users to find value in that information, quickly and easily. Other Hadoop components in use at Skybox are HBase, Hive, Flume and Oozie. Skybox doesn’t have an enterprise data warehouse in place, though they do have a Postgres database that is used to process web application transactions. Results: Skybox can process and deliver high-res images of any spot on earth multiple times per day. Hadoop allows Skybox to leave images in Hadoop for unlimited lengths of time in order to perform continuous analysis, allowing clients to ask questions like, "How good is this year's orange crop versus 10 years ago?" or "How many cars are in X store's parking lot and how does that compare to other stores, other times,...?"
  10. Patterns and Predictions is analyzing veterans’ mobile data and social networking text to help identify and prevent suicide risk among veterans.  Background: Patterns and Predictions is a predictive analytics company that has partnered with Dartmouth College, with funding from the US Defense Advanced Research Projects Agency (DARPA), to initiate “The Durkheim Project.” Their goal: to identify risk factors for suicide that can be applied to the national veteran suicide crisis.Challenge: Cited by a 2012 TIME Magazine cover story (“One A Day”), as a problem of epidemic proportions, suicide rates among veterans are roughly double those of adults in the general US population. TIME also noted that military efforts have struggled to successfully address the suicide crisis due, in part, to a lack of understanding why nothing has worked to stop it. The TIME article stated, “…whatever one imagines might be driving the military suicide rate, it defies easy explanation.”Solution: Patterns and Predictions’ founder, Chris Poulin, has been working with Dartmouth researchers to address the problem since 2010. With big data innovations like Hadoop, new predictive analytics capabilities based on massive and unstructured data sets have been made possible.Patterns and Predictions engaged Cloudera and Attivio to help build this predictive analytics solution, funded by DARPA. A study to test their machine learning data fabric was concluded in February 2013, in which staff from Patterns and Predictions, Dartmouth Medical Center, the US Veterans Administration determined that the data model’s predictive accuracy was statistically significant (at least 65% accurate).Patterns and Predictions is now providing technology that allows opt-in participation from more than 100,000 US veterans to build a ‘big’ medical database -- based on data ingestion, processing and analysis of veterans’ mobile and social activity -- that will help military mental health experts fight the alarming incidence of suicide among veterans.Patterns and Predictions is now integrating Cloudera Search and Cloudera Impala into their machine learning framework to simplify the environment and reduce data movement.Results: Having confirmed the viability of their solution, Patterns and Predictions is leading “Phase 2” technology for The Durkheim Project, a tightly integrated Big Data initiative with one objective: suicidality prediction at scale. The promise of Durkheim lies in its ability to collect and monitor a diverse repository of complex data, with the hope of eventually providing a real-time triage of interventional actions upon detection of a critical event.
  11. Link to SFDC account record (valid only for Cloudera employees): https://na6.salesforce.com/0018000000TMQ8y?srPos=0&srKp=001A global financial services company augments their DW with Cloudera, ultimately generating revenue, finding fraud, and saving $30 million.Background: With the movement from in-person to online financial transaction processing, the number of transactions processed daily has ballooned. This has caused skyrocketing data volumes and increased susceptibility to fraud.Phase 1 Challenge -- Data Warehouse Augmentation leading to Analytic Innovation: This financial services organization spends about $1 billion on their EDW environment annually, yet statisticians were limited to fairly simple queries on no more than a year’s worth of data because anything more extensive would consume too much compute resources. Their statisticians within the Global Information Security group in particular wanted faster query response and unconstrained access to analyze data in the warehouse so they could better mine the data.Phase 1 Solution -- Data Warehouse Augmentation leading to Analytic Innovation: The Global Information Security group within this financial services firm spun up a Hadoop cluster on Cloudera Enterprise to support exploratory analytics that couldn’t be run on the EDW. The Hadoop cluster stores longer historical data with greater detail, and enables them to join large data sets from disparate sources which improves the flexibility and scale of analysis.Today they have about 300TB on their Cloudera cluster, spread across 52 nodes. They’re ingesting 2TB into Cloudera every day, and this will soon double to 4TB per day.Phase 1 Results -- Data Warehouse Augmentation leading to Analytic Innovation: By augmenting their data warehouse with Hadoop for exploratory analysis, this financial services firm is better equipped to identify and prevent fraud. In one particular case, a third party notified the financial services firm that they’d found an incidence of fraud but had caught it early, and said the fraud had only been happening for 2 weeks. The Global Information Security group decided to check into it by looking at the long-term detailed data in their Hadoop cluster. By searching through the broader data set, they figured out that the fraudulent activity had been going on for months. It turned out to be the largest incidence of fraud ever caught at the financial services firm.In addition, the company is using the data from their Hadoop cluster to generate revenue-driving reports for merchants. They’re able to combine transaction data with purchase data from banks and combine those two data sets in the same Hadoop cluster, and then sell those reports to merchants. Those monthly reports used to take 2 days to complete, and drove $200 million dollars in revenues for the company. They can now run those same reports in just hours, and with the scalability to join bigger data sets they expect to grow this business to $1 billion. The analytics from these reports help merchants with customer segmentation, cross-sell analytics, and more.Phase 2 Challenge -- ETL: In the financial services firm’s incumbent environment, they were running Ab Initio for ETL, which has had a strong foothold in the financial services industry because historically it was more scalable than other ETL tools. But it is also very expensive, and Ab Initio charges by compute cycle -- so the more ETL processes you have running, the more you pay.Phase 2 Solution -- ETL: The financial services organization wrote a connector to design their ETL process in Ab Initio but run all their aggregations in Hadoop. They’ve also brought Talend in to provide leverage against Ab Initio, causing Ab Initio to help optimize their ETL environment so that fewer jobs run in the Ab Initio tool (even though this doesn’t benefit Ab Initio). This is still in POC.Phase 2 Results -- ETL: The firm expects to reduce their ETL processes by 10-15%, which corresponds to a 10-15% cost savings on Ab Initio.Phase 3 Challenge -- Queryable Storage: The company’s data management environment consists of an IBM DB2 enterprise data warehouse which serves SQL queries to end users, sitting on top of numerous servers which are connected to the storage area network (SAN). Because of security and regulatory compliance regulations, the organization must replicate all of their production data in a disaster recovery environment, so for every million dollars that they spend on their production EDW, they have to spend another million for the replicated DR environment. The company spends an estimated $1 billion annually with IBM to maintain this production and DR environment.Phase 3 Solution -- Queryable Storage: The company is deploying Cloudera Enterprise with Impala to meet their DR needs through long term, queryable archive solution at a significantly lower cost.Phase 3 Results -- Queryable Storage: By replicating their production data in Hadoop with Impala instead of purchasing a duplicate DB2 environment for DR, this firm expects to save $30 million. Their SAN costs alone will be reduced from $7,000 per TB to $500 per TB -- that’s 1/15 of the original cost.
  12. Opower provides 360-degree views into energy usage patterns and similar household comparisons to help 4+ million homes save 100s of millions of dollars on energy bills.Background: Opower is a technology company that partners with utility providers to help consumers better understand their energy use. Its platform offers behaviorally fine-tuned, decision-tree-optimized home energy reports that are mailed, emailed, SMS’d and web-hosted for about 75 utilities, and 15 million or so utility residential customers, across the globe. Opower also provides a Social Energy application which encourages utility providers and customers to compare and compete with each other about their energy. Challenge: With today’s ever-growing utilities-related data streams, Opower recognized the need to capture and analyze this data in order to help conserve energy. Examples of these data streams are: Advanced metering infrastructure (AMI)Smart appliancesInteractive user applicationsSensorsSocial media Solution: Opower deployed Cloudera Enterprise Core + RTD to expand their big data capabilities, and the list of data sources they leverage includes those noted above in addition to existing home energy data, weather data, consumer behavior data and other disparate pieces of information. Benefits: Today, Opower is equipped to scale with the AMI data from across tens of millions of homes. They culminate data from the many sources noted above, process and analyze it, and then simplify the information to present homeowners with efficiency tips, utility rebate offers, and other suggestions. Cloudera’s software and support “allows Opower to do all kinds of analytics it might not otherwise be able to handle at a reasonable cost or in a reasonable time,” said Opower’s VP marketing and strategy OgiKavazovic. (http://www.greentechmedia.com/articles/read/opower-takes-on-big-data-for-home-energy) For example, Opower can now compare thousands of different homes’ smart meter reads to find tiny fluctuations that indicate certain homes are over-heating or over-cooling at certain thermostat set-points.
  13. Carl Sagan was a master of synthesizing huge amounts of complex information from many sources, and explaining it clearly. I love that skill, and I love this quote. It’s a perpetual truth: No matter how much we learn, whatever frontiers we cross, there is always challenging new territory and fascinating insight still in front of us. It’s that spirit that makes me so glad to be working in the industry today. The opportunity we have is to Ask Bigger Questions and to change the world with the answers.
  14. Large Hadron Collider literally pumping out the answers to fundamental questions about the universeDiscovery of the Higgs boson this year helps us understand why particles have mass and was the last missing piece of the Standard Model of particle physicsThe LHC produces 27 terabytes of data for analysis every day. Actually, it produces much more, but after filtering and processing, scientists get 27 TB/day. Their analysis generates another 10 terabytes of derived results – that’s 37 TB day!Those results are distributed worldwide. At UNL, it’s captured in Hadoop and delivered to researchers for analysis and use.
  15. Healthcare is rich with big data problems.The Human Genome Project produced a map of 21,000 genes. Bowtie, Crossbow and similar tools do sequence alignment and homology search using Hadoop.You’re made of 1 trillion cells, all with the identical genome. But you are home to 100 trillion cells – bacteria and other organisms that live in you and on you. They have more than 2 million genes. If you really want to understand disease, you need to sequence that genome. That’s the Human Microbiome Project. And the same tools, running on Hadoop, will help.National Cancer Institute, winner of 2012 Government Big Data award, uses Hadoop to explore the genetic and environmental causes of cancer. Picture is complicated: Several genes involved in suppressing or promoting cell replication, affected by environment and changing over time. It’s a big data problem and NCI uses Hadoop to attack it.Global Viral Forecasting Initiative collects data from many sources to predict and to detect dangerous viral outbreaks to respond to and prevent pandemics.Cloudera’s Chief Scientist Jeff Hammerbacher is leading a revolutionary project at Mt Sinai School of Medicine to apply the power of Cloudera’s Big Data platform to critical problems in predicting and understanding the progress and treatment of disease.Other examples:Can we comprehend the practical implications of a tax increase, applying predictive modeling to a policy decision?Can we understand the best way to educate millions of children in a way that also satisfies individual learning needs?Can we optimize the energy grid based on very specific usage patterns?It’s a remarkable list. How does Hadoop do this?
  16. When you look out over today’s business landscape, you see a growing array of systems and tools to tackle Big Data. But there is also a lot of confusion as to what is the path forward, where to look for answers, where to start within your organization. You might question if these tools and applications are appropriate for your needs. And even more so, are your business objectives even related to this thing called Big Data?I invite you to join us in asking bigger questions:Where’s the fraud, and how can we stop it?Who are the terrorists, and where will they act next?How do we produce clean energy, and where do we need to deliver it?How do we eliminate disease and improve quality of life globally?How can we produce and distribute clean water? How do we grow enough for the world to eat?We are the first generation that actually has the data. We have the tools. We need big answers. We have to ask bigger questions.