SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Project Cassini:  ’s
New Search Engine



    Vice President of Search, Experience, and Platforms
                                     eBay Marketplaces
$2.63
million
for a lunch with
Warren Buffett
$40,668
for Justin Bieber’s
just-cut hair
$130K
for Princess
Beatrice’s hat
$62
billion
in merchandise sold in 2010
97 million
active buyers and sellers worldwide


250 million queries
each day to our search engine


200+ million items
live in more than 50,000 categories
9 petabytes of data
in our Hadoop and Teradata clusters


2 billion page views
each day


75 billion database calls
each day
Huge Opportunity: Taking the “e” out of ecommerce

        Yesterday                          Today                  Tomorrow

             Online                       Online
              4%                           6%
                                                        Web-
                                                     influenced           Online
                                                        offline             +
                                      Offline
                                                         37%              Offline
              Offline
               96%




 2008 = $325B                                                     2013 = $10T
 Source: Forrester, Euromonitor and
 Economist Intelligence Unit              Source: Forrester       Source: Economist Intelligence Unit
Voyager: our current search engine
Voyager: our current search engine




 ►   Reliable, critical, proven workhorse
Voyager: our current search engine




 ►   Circa-2002 textbook design
     ►   Basic ranking functionality
     ►   Title-only match by default
     ►   Very literal search
Voyager: our current search engine




 ►   Inflexible & Manual
 ►   The next wave of innovation requires a new
     search platform…
Project Cassini at eBay
Our new search engine
Project Cassini at eBay
 Our most ambitious core engineering
 project
Project Cassini at eBay
 Our most ambitious core engineering
 project
   ►   Entirely new codebase
   ►   World-class, from a world-class team
   ►   Platform for ranking innovation
   ►   Uses all data by default
   ►   Flexible
   ►   Automated
   ►   Four major tracks, 100+ engineers
   ►   Complete in less than 18 months
Project Cassini at eBay




              Beginning tests,
           likely launch in 2012
A Short Primer on Indexing
   When a user types a query, it isn’t practical to
    exhaustively scan 200+ million items
   Instead, we create an inverted index, and use it
    to rank the items and find the best matches
   An inverted index is similar to the index in the
    back of a book:
       A set of searchable terms
       For each term, a list of locations
An Inverted Index

     cat         3: 1, 2, 7




 1              cat on the mat fat cat
 2
 3
 4              wild cat
 5
 6
 7
 8
Distributed Index Construction
   Larger index than Voyager
       Descriptions, Seller data, other metadata, …
       Much more history in our indexes
   More computationally expensive work at index-
    time (and less at query-time)
   Ability to rescore or reclassify entire site
    inventory
   Hadoop:
       Distributed indexing – platform for hourly index
        refreshes
       Fault tolerance through HDFS replication
       Better utilization of hardware – can generate
        different index types with one cluster
   HBase:
       Column-oriented data store on top of HDFS
       Used to store eBay’s items
       Bulk and incremental item writes
       Fast item reads for index construction
       Fast item reads and writes for item annotation
   Everyone is still learning
   Some issues only appear at scale
   Production cluster configuration is challenging
       Hardware issues
       Tuning cluster configuration to our work loads
   HBase stability
   Monitoring health of HBase
   Managing workflows – many step map/reduce
    jobs
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams
Hadoop World 2011 Keynote: Ebay - Hugh Williams

Weitere ähnliche Inhalte

Andere mochten auch

Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & AcronymsBrian Johnson
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and RetrievalOptum
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup
 
ebay social measurement by Sudha Jamthe at Social data panel at Adobe
ebay social measurement by Sudha Jamthe at Social data panel at Adobe   ebay social measurement by Sudha Jamthe at Social data panel at Adobe
ebay social measurement by Sudha Jamthe at Social data panel at Adobe Sudha Jamthe
 
SMX Social Media eBay
SMX Social Media eBaySMX Social Media eBay
SMX Social Media eBayjtkoene
 
Converting rice residues into energy - opportunities and challenges
Converting rice residues into energy - opportunities and challengesConverting rice residues into energy - opportunities and challenges
Converting rice residues into energy - opportunities and challengesTuong Do
 
GIZ support mechanism for RE development in Vietnam
GIZ support mechanism for RE development in VietnamGIZ support mechanism for RE development in Vietnam
GIZ support mechanism for RE development in VietnamTuong Do
 
Renewable energy models for rice residues - SNV Vietnam
Renewable energy models for rice residues - SNV VietnamRenewable energy models for rice residues - SNV Vietnam
Renewable energy models for rice residues - SNV VietnamTuong Do
 
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)Tuong Do
 
Leads United aka LEWIS pr brings an eBay.be case at #SMF10
Leads United aka LEWIS pr brings an eBay.be case at #SMF10Leads United aka LEWIS pr brings an eBay.be case at #SMF10
Leads United aka LEWIS pr brings an eBay.be case at #SMF10Pieter De Wit
 
eBay Business Efficiency Optimization: Tools, Tips & Tricks
eBay Business Efficiency Optimization: Tools, Tips & TrickseBay Business Efficiency Optimization: Tools, Tips & Tricks
eBay Business Efficiency Optimization: Tools, Tips & TricksSandi Garcia
 
Expand Your Business With Social Media - ColderICE at eBay On Location
Expand Your Business With Social Media - ColderICE at eBay On LocationExpand Your Business With Social Media - ColderICE at eBay On Location
Expand Your Business With Social Media - ColderICE at eBay On LocationJohn Lawson
 
Social business for corp social summit sf 2012
Social business for corp social summit sf 2012Social business for corp social summit sf 2012
Social business for corp social summit sf 2012Sudha Jamthe
 
Giz2013 Policies and regulatory framework promoting the application of biomas...
Giz2013 Policies and regulatory framework promoting the application of biomas...Giz2013 Policies and regulatory framework promoting the application of biomas...
Giz2013 Policies and regulatory framework promoting the application of biomas...Tuong Do
 
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...Heather Drake
 
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...Tuong Do
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for AgricultureJazz Yao-Tsung Wang
 

Andere mochten auch (20)

Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 
Ebay search
Ebay searchEbay search
Ebay search
 
Structured Document Search and Retrieval
Structured Document Search and RetrievalStructured Document Search and Retrieval
Structured Document Search and Retrieval
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...The eBay Architecture:  Striking a Balance between Site Stability, Feature Ve...
The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...
 
ebay social measurement by Sudha Jamthe at Social data panel at Adobe
ebay social measurement by Sudha Jamthe at Social data panel at Adobe   ebay social measurement by Sudha Jamthe at Social data panel at Adobe
ebay social measurement by Sudha Jamthe at Social data panel at Adobe
 
SMX Social Media eBay
SMX Social Media eBaySMX Social Media eBay
SMX Social Media eBay
 
Converting rice residues into energy - opportunities and challenges
Converting rice residues into energy - opportunities and challengesConverting rice residues into energy - opportunities and challenges
Converting rice residues into energy - opportunities and challenges
 
GIZ support mechanism for RE development in Vietnam
GIZ support mechanism for RE development in VietnamGIZ support mechanism for RE development in Vietnam
GIZ support mechanism for RE development in Vietnam
 
Renewable energy models for rice residues - SNV Vietnam
Renewable energy models for rice residues - SNV VietnamRenewable energy models for rice residues - SNV Vietnam
Renewable energy models for rice residues - SNV Vietnam
 
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)
2013 EVN Smart Grid Plan, Nguyen Hai Ha (EN)
 
Leads United aka LEWIS pr brings an eBay.be case at #SMF10
Leads United aka LEWIS pr brings an eBay.be case at #SMF10Leads United aka LEWIS pr brings an eBay.be case at #SMF10
Leads United aka LEWIS pr brings an eBay.be case at #SMF10
 
eBay Business Efficiency Optimization: Tools, Tips & Tricks
eBay Business Efficiency Optimization: Tools, Tips & TrickseBay Business Efficiency Optimization: Tools, Tips & Tricks
eBay Business Efficiency Optimization: Tools, Tips & Tricks
 
Expand Your Business With Social Media - ColderICE at eBay On Location
Expand Your Business With Social Media - ColderICE at eBay On LocationExpand Your Business With Social Media - ColderICE at eBay On Location
Expand Your Business With Social Media - ColderICE at eBay On Location
 
Social business for corp social summit sf 2012
Social business for corp social summit sf 2012Social business for corp social summit sf 2012
Social business for corp social summit sf 2012
 
Giz2013 Policies and regulatory framework promoting the application of biomas...
Giz2013 Policies and regulatory framework promoting the application of biomas...Giz2013 Policies and regulatory framework promoting the application of biomas...
Giz2013 Policies and regulatory framework promoting the application of biomas...
 
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...
Social Commerce and Local: The New Retail Environment: Jody Ford, VP Marketin...
 
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...
GIZ2013-The Potential of Biogas and Biomass from Agriculture and Agro-Industr...
 
2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture2006-11-16 RFID and OSS for Agriculture
2006-11-16 RFID and OSS for Agriculture
 

Ähnlich wie Hadoop World 2011 Keynote: Ebay - Hugh Williams

Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014ALTER WAY
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingAbzetdin Adamov
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014ALTER WAY
 
Class 39: ...and the World Wide Web
Class 39: ...and the World Wide WebClass 39: ...and the World Wide Web
Class 39: ...and the World Wide WebDavid Evans
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyKetan Patil
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
Keynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified searchKeynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified searchElasticsearch
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Big data - Apache Hadoop for Beginner's
Big data - Apache Hadoop for Beginner'sBig data - Apache Hadoop for Beginner's
Big data - Apache Hadoop for Beginner'ssenthil0809
 
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...Codiax
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfRichardWallis3
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go LiveRichard Wallis
 
Webinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalWebinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalMongoDB
 

Ähnlich wie Hadoop World 2011 Keynote: Ebay - Hugh Williams (20)

Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014Elasticsearch : petit déjeuner du 13 mars 2014
Elasticsearch : petit déjeuner du 13 mars 2014
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision Making
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
 
Class 39: ...and the World Wide Web
Class 39: ...and the World Wide WebClass 39: ...and the World Wide Web
Class 39: ...and the World Wide Web
 
Netflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case StudyNetflix Recommender System : Big Data Case Study
Netflix Recommender System : Big Data Case Study
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Contextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
Contextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Keynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified searchKeynote: Harnessing the power of Elasticsearch for simplified search
Keynote: Harnessing the power of Elasticsearch for simplified search
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Organisational Wiki Adoption
Organisational Wiki AdoptionOrganisational Wiki Adoption
Organisational Wiki Adoption
 
Big data - Apache Hadoop for Beginner's
Big data - Apache Hadoop for Beginner'sBig data - Apache Hadoop for Beginner's
Big data - Apache Hadoop for Beginner's
 
Xinet17 new features
Xinet17 new featuresXinet17 new features
Xinet17 new features
 
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...
Luka Postružin (Superbet) – ‘From zero to hero’ in early life customer segmen...
 
From Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdfFrom Ambition to Go Live SWIB.pdf
From Ambition to Go Live SWIB.pdf
 
From Ambition to Go Live
From Ambition to Go LiveFrom Ambition to Go Live
From Ambition to Go Live
 
Webinar: NoSQL as the New Normal
Webinar: NoSQL as the New NormalWebinar: NoSQL as the New Normal
Webinar: NoSQL as the New Normal
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Hadoop World 2011 Keynote: Ebay - Hugh Williams

  • 1. Project Cassini: ’s New Search Engine Vice President of Search, Experience, and Platforms eBay Marketplaces
  • 2.
  • 3. $2.63 million for a lunch with Warren Buffett
  • 7. 97 million active buyers and sellers worldwide 250 million queries each day to our search engine 200+ million items live in more than 50,000 categories
  • 8. 9 petabytes of data in our Hadoop and Teradata clusters 2 billion page views each day 75 billion database calls each day
  • 9. Huge Opportunity: Taking the “e” out of ecommerce Yesterday Today Tomorrow Online Online 4% 6% Web- influenced Online offline + Offline 37% Offline Offline 96% 2008 = $325B 2013 = $10T Source: Forrester, Euromonitor and Economist Intelligence Unit Source: Forrester Source: Economist Intelligence Unit
  • 10.
  • 11. Voyager: our current search engine
  • 12. Voyager: our current search engine ► Reliable, critical, proven workhorse
  • 13. Voyager: our current search engine ► Circa-2002 textbook design ► Basic ranking functionality ► Title-only match by default ► Very literal search
  • 14. Voyager: our current search engine ► Inflexible & Manual ► The next wave of innovation requires a new search platform…
  • 15. Project Cassini at eBay Our new search engine
  • 16. Project Cassini at eBay Our most ambitious core engineering project
  • 17. Project Cassini at eBay Our most ambitious core engineering project ► Entirely new codebase ► World-class, from a world-class team ► Platform for ranking innovation ► Uses all data by default ► Flexible ► Automated ► Four major tracks, 100+ engineers ► Complete in less than 18 months
  • 18. Project Cassini at eBay Beginning tests, likely launch in 2012
  • 19. A Short Primer on Indexing  When a user types a query, it isn’t practical to exhaustively scan 200+ million items  Instead, we create an inverted index, and use it to rank the items and find the best matches  An inverted index is similar to the index in the back of a book:  A set of searchable terms  For each term, a list of locations
  • 20. An Inverted Index cat 3: 1, 2, 7 1 cat on the mat fat cat 2 3 4 wild cat 5 6 7 8
  • 22. Larger index than Voyager  Descriptions, Seller data, other metadata, …  Much more history in our indexes  More computationally expensive work at index- time (and less at query-time)  Ability to rescore or reclassify entire site inventory
  • 23. Hadoop:  Distributed indexing – platform for hourly index refreshes  Fault tolerance through HDFS replication  Better utilization of hardware – can generate different index types with one cluster
  • 24. HBase:  Column-oriented data store on top of HDFS  Used to store eBay’s items  Bulk and incremental item writes  Fast item reads for index construction  Fast item reads and writes for item annotation
  • 25. Everyone is still learning  Some issues only appear at scale  Production cluster configuration is challenging  Hardware issues  Tuning cluster configuration to our work loads  HBase stability  Monitoring health of HBase  Managing workflows – many step map/reduce jobs

Hinweis der Redaktion

  1. Great to be here – privilege to speak to you allToday, going to talk to you about eBay, our new search engine Cassini, and how Hadoop and Hbase is used in searchHighlight title – and mention that I work on marketplaces (ebay.com, and its sister sites all over the world)Let me begin by giving you a brief overview of eBay…
  2. We’re 16 years old. Here is a shot of the original site – called AuctionWeb – that eBay’s founder, Pierre Omidyar, launched over Labor Weekend in 1995 … as an “experiment.”I’ve circled some text on this page, not sure if you can read it … but it says “There are always SEVERAL HUNDRED auctions underway, so you’re bound to find something interesting.” “Several hundred” … those were our humble beginnings, though pretty impressive at the timeThe only thing that’s remained the same since 1995 is that eBay has always connected buyers and sellers.
  3. In 2010, we sold $62 billion in merchandise.
  4. We’re one of the Web’s largest properties… and the pace of change is being driven largely by our customers and their new and their increasingly more sophisticated shopping expectations …<read slide>
  5. We are fast becoming a data company, where our engineers use data everyday to inform what they doAnd we have a lot of data, as you can imagine from our 97 million users, 200+ million listings, 250 million search queries, and 2 billion page views each day
  6. Before I move on to talk about Search, I want to let you know that it’s becoming more interesting at eBay:Customers are changing how they shop, and we’re at the center of this revolution. Nearly half of all offline purchases have an online component. The offline and online worlds are merging … and this is THE NEW RETAIL landscapeAnd it’s being driven by consumers who are using their smartphones and mobile devices to change the way they shop. eBay and mobile commerce are at the center of this shift – more change is going to happen in commerce in next year or two than in the past ten.
  7. I’ve set the context on eBay.Now, I want to introduce you to project Cassini, our most ambitious engineering project at eBay.We are completely rewriting our search engine, and Hadoop and Hbase are key to this rewrite.But, first, let me tell you something about our current search engine, Voyager
  8. Voyager is named after a 1976 satellite that <fix>.
  9. It’s been driving the search experience on eBay since the early 2000s.Improvements to Voyager have been critical to improving the buyer experience and driving our sellers’ businesses.
  10. However, Voyager is behind the times: a lot has happened in search since 2002Our best match ranking uses only tens of factors in computing our best match ranking functionIt only allows search of item titles by default -- we don’t rank using the great information that’s in the descriptions and elsewhereSearch is very literal – it finds almost exactly what you type, it doesn’t always understand what you mean
  11. Voyager is a challenge to manage and run as an engineering team.It’s very manual, so deployments of software and data take time.Troubleshooting is slow.We decided in late 2010 that Voyager needed to be replaced, and that began project Cassini
  12. Cassini is named after a 1996 satellite, a nod to it being many years ahead of Voyager
  13. <read and click>
  14. We’re probably only the major web property that’s completely rewriting its search engine from scratch.You can see many of the features of Cassini, and I’ll just talk about a couple briefly:First, it will use all data by default – all that great data in descriptions, information in images, data about our buyers and sellers, and the signals that come from 2 billion page views each day will be used in Cassini to compute its best match. Our users are going to see world-class results, and it’ll be a much more powerful tool to connect buyers and sellersSecond, automation is key. There’ll be no more manual operation of the search engine – rolling out code and data, monitoring, alerting, remediation, and more are fully automated.Third, it’s a major engineering undertaking: we’ve over 100 engineers working across four parallel tracks to deliver Cassini in less than 18 months from start to finish
  15. We’ve hit a few major internal milestones, and internal users can already use Cassini if they’d like.<read slide>
  16. To understand how Hadoop and Hbase play a role in Cassini, let me explain some of the fundamentals of building a search engine<first point>200 million items would take about 30 seconds, if we could do 1 document every 10 milliseconds and we had 1000 machines working concurrently<second point>An inverted index is an auxiliary data structure that allows fast calculation of the best matching search resultsA typical query takes ten milliseconds using the same 1000 machines, and an inverted index<third point>Walk through using the index in the back of a book…
  17. It isn’t possible to create an index for over 200 million items on a single machine – we can’t keep in memory the terms and all of their positions in the documentsWhat we do at scale is distributed index construction, it is classic map-reduce (and has been so from well before the phrase was coined).We build an inverted index for a small part of the document collection on one machine, and do the same on hundreds of other machines. We merge the small inverted indexes into larger inverted indexes that are distributed to our query serving grid.This is a technical graphic from our team, it shows the seven high level stages to creating all the index pieces we need in Cassini.
  18. Let’s talk about why Cassini indexing is more challenging than in Voyager, and why we changed the architecture dramatically to include Hadoop and Hbase.First reason: Voyager completed pool = 14 days. Cassini = 90 daysSecond reason: we refresh indexes on an hourly basis – Helps improve ranking, for example updating item and seller informationThird reason: full power to our ranking team to make fast twitch changes
  19. Hadoop is the platform for our index construction and index maintenance in CassiniIt’s ideal because it gives us fault tolerance, and smart utilization of our hardware – without Hadoop, we’d probably have small pools of machines that run custom code for different stages of our index constructionOur Hadoop clusters for analytics are much larger, but this is our major use of Hadoop in driving a customer experience.It’s pretty large scale too: while we have over 200 million active items at any time, we also maintain a “completed index” that is over 1 billion item
  20. We use Hbase to store eBay’s items for index construction and maintenance.Hbase, as you know, is a column oriented data store built on top of HDFS that is tightly integrated with the Hadoop Map/Reduce framework. It has no schema, which is great for us – it means what we store can evolve. Hbase supports fast item lookups and scans, both of which are necessary for index constructionIncremental writes are what we normally do: about 10 million items enter eBay each day, and we need them in the searchable index within a couple of minutesBulk writes are necessary when our ranking team wants to rescore all our items
  21. We’ve got running Hadoop at scale mostly down, but we have challenges with HBaseFirst issue: Ops and Dev are both new to hbase. Lots of learning through failuresSecond issue: Test using mini hadoop cluster + local hbaseThird issue: getting the hardware tuned just rightFourth issue: HBase stability – Unstable Region Servers & HBase master. Regions stuck in transition, etcFifth issue: Monitoring – a lot of times we don’t recognize there are issues until jobs begin to failSixth ssue: Workflow – Our index chains have around 20 stagesBut it’s not all doom and gloom, we’ve recently had a couple of weeks of stability, and we’ve getting more confident each week…Before I finish today, I want to show you a couple of pictures of our data center that houses Cassini…
  22. This is our new data center that we opened in Salt Lake City, Utah in May last yearOne of the most efficient data centers ever built, makes clever use of power and cooling technologies
  23. Andhere are the machines inside the data center that run Cassini.
  24. Before I conclude, I want to let you know that we’re hiring in the search team, and right across all the teams that use and maintain Hadoop and HbaseIf you’re an Hadoop or Hbase committer, I’d especially love to talk to you…And with that, I want to thank you all for listening, and I hope you enjoy a great conference