SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
How we learned to rank
search results
Mouloud Lounaci & Andres Pipicello
Argentina Big Data Meetup
Meetup #6: Long time, no see OLX
October, 2018
2
mouloud.lounaci@olx.com
@mlounaci
https://www.linkedin.com/in/mlounaci
andres.pipicello@olx.com
https://www.linkedin.com/in/andrespipicello
Plan
OLX Group
Personalization and Relevance (PnR)
Learning To Rank (LTR)
The ranking journey
Building Dataset
Modeling
Serving the model
Results
3
4
Preface What do you need to know
about OLX Group ?
OLX Group
Scale of Data at OLX Group
5
35B
Monthly
Page Views
350M
Monthly
Users
60M
Monthly
Listings
4B
Daily
Events
Every minute...
2.5M events captured
500 houses listed
500 cars listed
1000 phones listed
Classifieds ?
6
Two-sided Marketplace
● Buyers looking for goods or services
● Sellers offering goods or services
OLX’s Mission
● Match buyers with sellers using
○ large scale data
○ state-of-the-art technology
Today it’s ALL about “search”
7
â–Ș Retrieval of relevant listings
– Query understanding
– Query-listing matching
â–Ș Ranking of relevant listings
– Learning to rank (LTR) using
query, user and listing
features
Query
Ranked relevant
Items
Actually it’s about “Ranking”
8
â–Ș Retrieval of relevant listings
– Query understanding
– Query-listing matching
â–Ș Ranking of relevant listings
– Learning to rank (LTR) using
query, user and listing
features
Query
Ranked Items
9
Chapter 1
Who are “we” ?
What do we do ?
How do we do it ?Personalization and
Relevance
10
PnR - The Team
PnR - Architecture
11
PnR - Architecture
12
Data Sources
Indexing
Retrieval +
Ranking
Ad Retrieval
13
Phoenix
Constructs the feed to send to the user
Manages all the different spells (algorithms)
used in the feed
Splitter for A|B Testing
Ad Retrieval
14
Loki
Executes the spell (algorithm) from Phoenix
Interacts with all the different data sources
Caches items for fast Page 2 retrieval
15
Chapter 2
Ragnarok ?
Why LTR ?
What is LTR for usLearning To Rank
What we want to do ?
“Learn from the data how to rank a resultset
for a search query”
aka.
RAGNAROK
AD RERANKER
Manual models become hard to tune with a very large number of
features.
1
Leverages large volume of user behaviour (Clicks/replies) data in
an automated way
Create a personalized ranking by including user features (social
search)
2
3
Why Learning To Rank ?
Top r e
do n
Top Rer d
do n
User
query
(Re)ranked
results
Spell
Returned
Documents
User
behaviour
If I click/reply,
then it’s
relevant for me
RAGNAROK
AD RERANKER
Overview
19
Chapter 3
Before we start, we need
Tools
The search for gold
Mining the gold (Spark),
Funnel ?
Modeling, or transforming
the gold
Serving the model
The ranking journey
The ranking journey
Step 1
Building Infra
● Access large
history Data
(Reservoir)
● Build Infra to
process it
(EMR)
Building Dataset
● Process Label
(Judgement
score proxy
with
Clicks/Replies).
● Process
Features.
Step 2
Analysing Dataset
● Analyse Click
and reply
behaviour
● Build “Gold
standard”
dataset for
ranking
Step 3
Building Model
● Iterating on
models
● Evaluating
models
● Selecting a
model
Step 4
Serving Model
● Design Service
Architecture
● Define Service
Requirement
● Create Ranking
Endpoint
Step 5
Integration with PnR
Architecture
● Integrate the
ranking in the
ad retrieval flow
● Define the
interaction with
the PnR
components .
Step 6
21
Step 1
Building Infra
Step 1
User browsing
(parquet - 1h delay)
RELEVANCE
RESERVOIR
Ads
(json - 5 min delay)
Labeled Dataset
Features
Building Infra
STORAGEPROCESSING
Building Infra
Big Data ?
23
1 year android history for South Africa
data...
Step 1
Building Infra
5B
User events
800M
Search
Impressions
40M
Individual
searches
Scalability is key
24
1 year history for South Africa data...
Step 1
Building Infra
5B
User events
800M
Search
Impressions
40M
Individual
searches
25
Step 2
Building Dataset
Step 2
Gold looks like this for us...
query_id query
features
Item
position
item_id Item
features
Label (Relevance
Judgement)
1 ... 1 item1 ... 0
1 ... 2 item2 ... 3
1 ... 3 item3 ... 1
1 ... 4 item4 ... 2
Building Dataset
The search for the “gold” standard dataset
Step 2
Gold looks like this for us...
query_id query
features
Item
position
item_id Item
features
Label (Relevance
Judgement)
1 ... 1 item1 ... 0
1 ... 2 item2 ... 3
1 ... 3 item3 ... 1
1 ... 4 item4 ... 2
Building Dataset
The search for the “gold” standard dataset
Step 2
We used spark (EMR) to build the dataset from user browsing data.
Building Dataset
Hydra
(Trackings)
Labeled
searches
(funnel)
Labeling
(apply funnel)
Let’s “spark” it off
Proxy label, the “funnel”
Step 2
Building Dataset
?
30
Step 3
Analyzing Dataset
Step 3
● Considering searches with at least one reply for training
(to improve quality)
● Include searches with more than 3(4) impressions (user
behaviour affected by smaller resultset)
● Inside each search consider impression up to 30-50-60th
position
● Metric that gives more importance to top position(NDCG with
customized decay)
Analyzing Dataset
Analysing Dataset
32
Step 4
Building Model
Q1
D1,1
D1,2


D1,m
Q2 Qn
...
D2,1
D2,2
...
D2,m
Dn,1
Dn,2
...
Dn,m
Pointwise Pairwise Listwise
f(Qi, Di,j) = s o F(Qi, Di,j > Di,k) = s o n {o,1} f(Qi, {Di,j,...,Di,m}) = {Di,j,...,Di,m})
ra d
Baseline
Q1,D1
Q1,D2
Q1,D3
Q1,D1 0.85
Q1,D3 0.65
Q1,D2 0.30
Q1,D1>D2
Q1,D2>D3
Q1,D3>D4
Q1,D1>D2 1
Q1,D2>D3
0
Q1,D3>D4
1
Q1,D1
Q1,D2
Q1,D3
D1
D3
D2
Step 4
Building Model
Start with a simple approach
McRank from classification to ranking
● Pointwise approach
● Train a classifier to predict the
relevance judgment k i {0, 1, 2}
● Use the class probabilities P(Y=k)
Ran g re =∑ P(Y=k) ∗T(k), w e n o se T(k)=k
Inspired by :
https://papers.nips.cc/paper/3270-mcrank-learning-to-rank-using-multiple-classification-and-gradient-boosting.pdf
Step 4
Building Model
35
Combined Model
Item Features Buyer Features Seller Features
Static Features
Interaction
Features
(Browsing)
Three Classes of Features
Step 4
Building Model
Search String
Search Location
Search Time
Ad Title
Ad Description
Ad Location
Ad Creation Time
Ad Price
Ad Private or Business
Ad Image Count
Ad Category
Textual Similarity (BM25)
Length of the Title
Length of the Description
Freshness
Proximity
Price
Is the Seller a Private Business
Image Count
Category
Raw attributes Features
Static Item/query features
Step 4
Building Model
Item Interaction Features - Example
Interactions
Impressions
Ad Views(Clicks)
Replies
Data source: ods.fact_listing_activity
Time Interval
30 days
7 days
Last day
Item Interaction Features
num_impressions_30days
num_adviews_30days
num_replies_30days
num_impressions_7days
num_adviews_7days
num_replies_7days
num_impressions_lastday
num_adviews_lastday
num_replies_lastday
Step 4
Building Model
38
Step 5
Serving Model
We met “Mleap“ on the way
39
Step 5
Serving Model
40
Step 5
Serving Model
The Service
Aws Data Pipeline For
training
Scala Akka Htttp with
mleap service on
Openshift for
prediction
Training every 7 days Serving
RAGNAROK
AD RERANKER
Ranked Items
ReRanked Items
41
Chapter 4
Does this work ?
Offline ?
Online ?Results
Preliminary results (Offline)
Feature Weight
Proximity 13
bm25 8.7
Freshness 4.6
Price 0
Title Length -4.3
Description Length -7.3
+14% nDCG
Preliminary results (Online)
Feature Weight
Proximity 13
bm25 8.7
Freshness 4.6
Price 0
Title Length -4.3
Description Length -7.3
+14% nDCG
+8%
Replies/DAU
44
Final results (Offline)
Feature Weight
Item Replies Received - 30 days 21.6
Preference for Cars - 30 days 15
Proximity 9.1
bm25 8.2
Preference for Car Parts - 30 days 8.2
Freshness 5.7
+71% nDCG
Item Performance
Buyer preference
Basic features
45
Final results (Online)
Feature Weight
Item Replies Received - 30 days 21.6
Preference for Cars - 30 days 15
Proximity 9.1
bm25 8.2
Preference for Car Parts - 30 days 8.2
Freshness 5.7
+71% nDCG
Coming soon...
Item Performance
Buyer preference
Basic features
46
The end Thank you
Any questions ?

Weitere Àhnliche Inhalte

Ähnlich wie How we learned to rank search results big data meetup

Tech M&A Monthly: What Happens If You Don’t Sell?
Tech M&A Monthly: What Happens If You Don’t Sell?Tech M&A Monthly: What Happens If You Don’t Sell?
Tech M&A Monthly: What Happens If You Don’t Sell?Corum Group
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard onceJi Dong
 
MLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve OperationsMLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve OperationsBigML, Inc
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...Click Earn Grow
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceDatabricks
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseAlberto Danese
 
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...OECDregions
 
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialBusiness Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialQiang Zhu
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Kwame Porter Robinson
 
Data as a Foundation for Growth
Data as a Foundation for GrowthData as a Foundation for Growth
Data as a Foundation for GrowthPerkuto
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
 
The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...Robert BĂžrlum-Bach
 
Agile London at Ticketmaster
Agile London at TicketmasterAgile London at Ticketmaster
Agile London at TicketmasterBilly Jenkins
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCgdgsurrey
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis PapaemmanouilPanagiotis Papaemmanouil
 

Ähnlich wie How we learned to rank search results big data meetup (20)

Tech M&A Monthly: What Happens If You Don’t Sell?
Tech M&A Monthly: What Happens If You Don’t Sell?Tech M&A Monthly: What Happens If You Don’t Sell?
Tech M&A Monthly: What Happens If You Don’t Sell?
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Pydata Chicago - work hard once
Pydata Chicago - work hard oncePydata Chicago - work hard once
Pydata Chicago - work hard once
 
MLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve OperationsMLSEV Virtual. Applying Topic Modelling to improve Operations
MLSEV Virtual. Applying Topic Modelling to improve Operations
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
Click Earn Grow 2009 Original Concept Next Generation Online Betting Technolo...
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
 
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...
Maron, M. - Visualisation and mapping of building open data - Mikel Maron, Ma...
 
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 TutorialBusiness Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
Business Applications of Predictive Modeling at Scale - KDD 2016 Tutorial
 
kdd2015
kdd2015kdd2015
kdd2015
 
Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler Labeling all the Things with the WDI Skill Labeler
Labeling all the Things with the WDI Skill Labeler
 
Data as a Foundation for Growth
Data as a Foundation for GrowthData as a Foundation for Growth
Data as a Foundation for Growth
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...The Triangle - A universal method of working with digital analytics and marke...
The Triangle - A universal method of working with digital analytics and marke...
 
Agile London at Ticketmaster
Agile London at TicketmasterAgile London at Ticketmaster
Agile London at Ticketmaster
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis PapaemmanouilGDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil
 

KĂŒrzlich hochgeladen

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Standamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceDelhi Call girls
 

KĂŒrzlich hochgeladen (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
âž„đŸ” 7737669865 đŸ”â–» malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night StandCall Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 đŸ„” Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >àŒ’8448380779 Escort Service
 

How we learned to rank search results big data meetup

  • 1. How we learned to rank search results Mouloud Lounaci & Andres Pipicello Argentina Big Data Meetup Meetup #6: Long time, no see OLX October, 2018
  • 3. Plan OLX Group Personalization and Relevance (PnR) Learning To Rank (LTR) The ranking journey Building Dataset Modeling Serving the model Results 3
  • 4. 4 Preface What do you need to know about OLX Group ? OLX Group
  • 5. Scale of Data at OLX Group 5 35B Monthly Page Views 350M Monthly Users 60M Monthly Listings 4B Daily Events Every minute... 2.5M events captured 500 houses listed 500 cars listed 1000 phones listed
  • 6. Classifieds ? 6 Two-sided Marketplace ● Buyers looking for goods or services ● Sellers offering goods or services OLX’s Mission ● Match buyers with sellers using ○ large scale data ○ state-of-the-art technology
  • 7. Today it’s ALL about “search” 7 â–Ș Retrieval of relevant listings – Query understanding – Query-listing matching â–Ș Ranking of relevant listings – Learning to rank (LTR) using query, user and listing features Query Ranked relevant Items
  • 8. Actually it’s about “Ranking” 8 â–Ș Retrieval of relevant listings – Query understanding – Query-listing matching â–Ș Ranking of relevant listings – Learning to rank (LTR) using query, user and listing features Query Ranked Items
  • 9. 9 Chapter 1 Who are “we” ? What do we do ? How do we do it ?Personalization and Relevance
  • 10. 10 PnR - The Team
  • 12. PnR - Architecture 12 Data Sources Indexing Retrieval + Ranking
  • 13. Ad Retrieval 13 Phoenix Constructs the feed to send to the user Manages all the different spells (algorithms) used in the feed Splitter for A|B Testing
  • 14. Ad Retrieval 14 Loki Executes the spell (algorithm) from Phoenix Interacts with all the different data sources Caches items for fast Page 2 retrieval
  • 15. 15 Chapter 2 Ragnarok ? Why LTR ? What is LTR for usLearning To Rank
  • 16. What we want to do ? “Learn from the data how to rank a resultset for a search query” aka. RAGNAROK AD RERANKER
  • 17. Manual models become hard to tune with a very large number of features. 1 Leverages large volume of user behaviour (Clicks/replies) data in an automated way Create a personalized ranking by including user features (social search) 2 3 Why Learning To Rank ?
  • 18. Top r e do n Top Rer d do n User query (Re)ranked results Spell Returned Documents User behaviour If I click/reply, then it’s relevant for me RAGNAROK AD RERANKER Overview
  • 19. 19 Chapter 3 Before we start, we need Tools The search for gold Mining the gold (Spark), Funnel ? Modeling, or transforming the gold Serving the model The ranking journey
  • 20. The ranking journey Step 1 Building Infra ● Access large history Data (Reservoir) ● Build Infra to process it (EMR) Building Dataset ● Process Label (Judgement score proxy with Clicks/Replies). ● Process Features. Step 2 Analysing Dataset ● Analyse Click and reply behaviour ● Build “Gold standard” dataset for ranking Step 3 Building Model ● Iterating on models ● Evaluating models ● Selecting a model Step 4 Serving Model ● Design Service Architecture ● Define Service Requirement ● Create Ranking Endpoint Step 5 Integration with PnR Architecture ● Integrate the ranking in the ad retrieval flow ● Define the interaction with the PnR components . Step 6
  • 22. Step 1 User browsing (parquet - 1h delay) RELEVANCE RESERVOIR Ads (json - 5 min delay) Labeled Dataset Features Building Infra STORAGEPROCESSING Building Infra
  • 23. Big Data ? 23 1 year android history for South Africa data... Step 1 Building Infra 5B User events 800M Search Impressions 40M Individual searches
  • 24. Scalability is key 24 1 year history for South Africa data... Step 1 Building Infra 5B User events 800M Search Impressions 40M Individual searches
  • 26. Step 2 Gold looks like this for us... query_id query features Item position item_id Item features Label (Relevance Judgement) 1 ... 1 item1 ... 0 1 ... 2 item2 ... 3 1 ... 3 item3 ... 1 1 ... 4 item4 ... 2 Building Dataset The search for the “gold” standard dataset
  • 27. Step 2 Gold looks like this for us... query_id query features Item position item_id Item features Label (Relevance Judgement) 1 ... 1 item1 ... 0 1 ... 2 item2 ... 3 1 ... 3 item3 ... 1 1 ... 4 item4 ... 2 Building Dataset The search for the “gold” standard dataset
  • 28. Step 2 We used spark (EMR) to build the dataset from user browsing data. Building Dataset Hydra (Trackings) Labeled searches (funnel) Labeling (apply funnel) Let’s “spark” it off
  • 29. Proxy label, the “funnel” Step 2 Building Dataset ?
  • 31. Step 3 ● Considering searches with at least one reply for training (to improve quality) ● Include searches with more than 3(4) impressions (user behaviour affected by smaller resultset) ● Inside each search consider impression up to 30-50-60th position ● Metric that gives more importance to top position(NDCG with customized decay) Analyzing Dataset Analysing Dataset
  • 33. Q1 D1,1 D1,2 
 D1,m Q2 Qn ... D2,1 D2,2 ... D2,m Dn,1 Dn,2 ... Dn,m Pointwise Pairwise Listwise f(Qi, Di,j) = s o F(Qi, Di,j > Di,k) = s o n {o,1} f(Qi, {Di,j,...,Di,m}) = {Di,j,...,Di,m}) ra d Baseline Q1,D1 Q1,D2 Q1,D3 Q1,D1 0.85 Q1,D3 0.65 Q1,D2 0.30 Q1,D1>D2 Q1,D2>D3 Q1,D3>D4 Q1,D1>D2 1 Q1,D2>D3 0 Q1,D3>D4 1 Q1,D1 Q1,D2 Q1,D3 D1 D3 D2 Step 4 Building Model Start with a simple approach
  • 34. McRank from classification to ranking ● Pointwise approach ● Train a classifier to predict the relevance judgment k i {0, 1, 2} ● Use the class probabilities P(Y=k) Ran g re =∑ P(Y=k) ∗T(k), w e n o se T(k)=k Inspired by : https://papers.nips.cc/paper/3270-mcrank-learning-to-rank-using-multiple-classification-and-gradient-boosting.pdf Step 4 Building Model
  • 35. 35 Combined Model Item Features Buyer Features Seller Features Static Features Interaction Features (Browsing) Three Classes of Features Step 4 Building Model
  • 36. Search String Search Location Search Time Ad Title Ad Description Ad Location Ad Creation Time Ad Price Ad Private or Business Ad Image Count Ad Category Textual Similarity (BM25) Length of the Title Length of the Description Freshness Proximity Price Is the Seller a Private Business Image Count Category Raw attributes Features Static Item/query features Step 4 Building Model
  • 37. Item Interaction Features - Example Interactions Impressions Ad Views(Clicks) Replies Data source: ods.fact_listing_activity Time Interval 30 days 7 days Last day Item Interaction Features num_impressions_30days num_adviews_30days num_replies_30days num_impressions_7days num_adviews_7days num_replies_7days num_impressions_lastday num_adviews_lastday num_replies_lastday Step 4 Building Model
  • 39. We met “Mleap“ on the way 39 Step 5 Serving Model
  • 40. 40 Step 5 Serving Model The Service Aws Data Pipeline For training Scala Akka Htttp with mleap service on Openshift for prediction Training every 7 days Serving RAGNAROK AD RERANKER Ranked Items ReRanked Items
  • 41. 41 Chapter 4 Does this work ? Offline ? Online ?Results
  • 42. Preliminary results (Offline) Feature Weight Proximity 13 bm25 8.7 Freshness 4.6 Price 0 Title Length -4.3 Description Length -7.3 +14% nDCG
  • 43. Preliminary results (Online) Feature Weight Proximity 13 bm25 8.7 Freshness 4.6 Price 0 Title Length -4.3 Description Length -7.3 +14% nDCG +8% Replies/DAU
  • 44. 44 Final results (Offline) Feature Weight Item Replies Received - 30 days 21.6 Preference for Cars - 30 days 15 Proximity 9.1 bm25 8.2 Preference for Car Parts - 30 days 8.2 Freshness 5.7 +71% nDCG Item Performance Buyer preference Basic features
  • 45. 45 Final results (Online) Feature Weight Item Replies Received - 30 days 21.6 Preference for Cars - 30 days 15 Proximity 9.1 bm25 8.2 Preference for Car Parts - 30 days 8.2 Freshness 5.7 +71% nDCG Coming soon... Item Performance Buyer preference Basic features
  • 46. 46 The end Thank you Any questions ?