SlideShare ist ein Scribd-Unternehmen logo
1 von 27
September 23, 2016
Query Understanding
In Amazon Search
Tanvi Motwani
Data Scientist
Amazon Search
PRODUCT SEARCH
2
MOTIVATION
Fashion Product
Query
Brand
Product
Type Price under Gender Is Prime
3
Power Law Distribution of Queries
Large population of long tail queries
Speedy Search Response
Fast Models that respond in milliseconds
Dynamic Search Trends
Adaptive to new trends
Global Search Reach
Deal with 10 different languages
CHALLENGES
4
“label”: “_color”,
“id” : C232,
“name” : “Black”
“label”: “_brand”,
“id” : B1402,
“name” : “The North Face”
“label” : “_product”,
“id” : P232,
“name” : “Jacket”
“category” : “Fashion >
Clothing >
Jackets & Coats”
“class”: “product_query”,
“score”: 0.9
“name”: “query_specificity”,
“score”: 0.7
“class”: “fashion”,
“score”: 0.8
QUERY TAGGERS
QUERY CLASSIFIERS
5
QUERY CLASSIFIERS
6
QUERY CATEGORY CLASSIFIER
“A Multiclass Classifier which classifies input user query into Amazon Categories.”
7
QUERY CATEGORY CLASSIFIER
• Automatic generates large training dataset
• Frequent refresh of training data possible
• Trigram model generalizes well for tail queries
tv
ipod
projector
speakers
headphones
pillow
curtains
pet bells
mattress
shower curtain
suits
mr robot
star trek
downton abbey
game of thrones
Trigram Language Model
8
Large percentage of query searches happen within a category
CUSTOMER SERVICE QUERY CLASSIFIER
“Classifies query into customer service queries versus product query.”
contact amazon
amazon phone number
how do I cancel my order?
where is my order history?
where is my order?
how can I see videos?
amazon prime video help
9
QUERY TAGGERS
10
Brand
Product
Type Price under Gender Is Prime
QUERY TAGGING
FILTERING
11
QUERY TAGGING
IMPROVED UI
12
QUERY TAGGING (BRAND)
adidas shoes
jansport backpack
ray ban sunglasses
ralph lauren men
BRAND
BRAND
BRAND
BRAND
ralph lauren men
IB O
Conditional Random Field
adidas shoes
B O
ray ban sunglasses
B I O
ralph lauren men
B I O
jansport backpack
B O
how? TRAIN
13
what?
14
• Discriminative Model – models conditional probability P(Y|X). We do not
care to model P(X)
• Features: word capitalized, word in atlas or name list, previous word is
“Mrs”, next word is “Times”, …
Recommended Tutorial on CRF –
An Introduction to Conditional Random Fields for Relational Learning
https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
CONDITIONAL RANDOM FIELD
adidas shoes
jansport backpack
ray ban sunglasses
north face black jacket
polo ralph lauren men
white shoes
ralph lauren
click add purchase
QUERY LOGS PRODUCT CATALOGUE
15
north face black jacket
QUERY LOGS PRODUCT CATALOGUE
16
arg max 𝑏
𝑖 ∈ 𝑃(𝑏)
𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖)
𝑤ℎ𝑒𝑟𝑒,
𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑,
𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏
north face black jacket
0.8
0.2BRAND
Matching Strategies:
• Attribute completely contained in query
• Match after removing stop words, prepositions etc.
• Partial query-attribute match
17
QUERY TAGGING - SUMMARY
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
Manual
Overrides
18
• Context aware
“Philosophy books” v/s “Philosophy face wash”
• Different formulations of same entity
“Marc by Marc Jacobs” v/s “Marc Jacobs”
Query Understanding Team
• Palo Alto, California
• Munich, Germany
• Tokyo, Japan
• Beijing, China
Acknowledgements
Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta,
Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi
Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
ACCESSORY QUERY CLASSIFIER
20
Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
macbook pro
macs
apple laptop
laptop
Base Query Corpus
mac ram
apple sleeve
laptop cover
apple skin
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
21
ACCESSORY QUERY CLASSIFIER
22
Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
paperwhite
kindle
amazon tablet
book reader
Base Query Corpus
Kindle case
Kindle cover
Amazon cover
case for kindle
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
23
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
QUERY TAGGING - SUMMARY
Manual
Overrides
Validation Techniques:
• Offline validation
 Cross validation 80/20 split
 Manual Gold Standard evaluation
• A/B test
 Control – Before the model was deployed
 Treatment – After the model is deployed
24
• Dictionary methods are not context aware
 Example: “philosophy books”, dictionary method will tag
“philosophy” as brand.
• Fails to detect different formulations of same entity.
 Example: “mk” vs. “michael kors”
COMPARISON TO DICTIONARY
LOOKUP METHODS
Our system improved precision over baseline by 10% and
approximately doubled the recall.
25
GLOBAL REACH
26
27
GENERATIVE v/s DISCRIMINATIVE MODELS
𝑃(𝑌, 𝑋)
𝑃 𝑌 𝑋)

Weitere ähnliche Inhalte

Was ist angesagt?

LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
RyuichiKanoh
 

Was ist angesagt? (20)

潜在ディリクレ配分法
潜在ディリクレ配分法潜在ディリクレ配分法
潜在ディリクレ配分法
 
LDA入門
LDA入門LDA入門
LDA入門
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと実践多クラス分類 Kaggle Ottoから学んだこと
実践多クラス分類 Kaggle Ottoから学んだこと
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
BigQuery MLの行列分解モデルを 用いた推薦システムの基礎
BigQuery MLの行列分解モデルを 用いた推薦システムの基礎BigQuery MLの行列分解モデルを 用いた推薦システムの基礎
BigQuery MLの行列分解モデルを 用いた推薦システムの基礎
 
Elasticsearch勉強会#44 20210624
Elasticsearch勉強会#44 20210624Elasticsearch勉強会#44 20210624
Elasticsearch勉強会#44 20210624
 
Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室Kaggle&競プロ紹介 in 中田研究室
Kaggle&競プロ紹介 in 中田研究室
 
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
『繋がり』を見る: Cytoscapeと周辺ツールを使ったグラフデータ可視化入門
 
Solr の LTR プラグインの使い方 - 第3回 LTR 勉強会資料 -
Solr の LTR プラグインの使い方 - 第3回 LTR 勉強会資料 -Solr の LTR プラグインの使い方 - 第3回 LTR 勉強会資料 -
Solr の LTR プラグインの使い方 - 第3回 LTR 勉強会資料 -
 
SMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装するSMO徹底入門 - SVMをちゃんと実装する
SMO徹底入門 - SVMをちゃんと実装する
 
20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則20130716 はじパタ3章前半 ベイズの識別規則
20130716 はじパタ3章前半 ベイズの識別規則
 
サポートベクトルデータ記述法による異常検知 in 機械学習プロフェッショナルシリーズ輪読会
サポートベクトルデータ記述法による異常検知 in 機械学習プロフェッショナルシリーズ輪読会サポートベクトルデータ記述法による異常検知 in 機械学習プロフェッショナルシリーズ輪読会
サポートベクトルデータ記述法による異常検知 in 機械学習プロフェッショナルシリーズ輪読会
 
ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方ブレインパッドにおける機械学習プロジェクトの進め方
ブレインパッドにおける機械学習プロジェクトの進め方
 
レコメンド研究のあれこれ
レコメンド研究のあれこれレコメンド研究のあれこれ
レコメンド研究のあれこれ
 
スペクトラル・クラスタリング
スペクトラル・クラスタリングスペクトラル・クラスタリング
スペクトラル・クラスタリング
 
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
LightGBMを少し改造してみた ~カテゴリ変数の動的エンコード~
 
バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践
 
Python入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニングPython入門 : 4日間コース社内トレーニング
Python入門 : 4日間コース社内トレーニング
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 

Andere mochten auch

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
MLconf
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
MLconf
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 

Andere mochten auch (20)

Better Search Through Query Understanding
Better Search Through Query UnderstandingBetter Search Through Query Understanding
Better Search Through Query Understanding
 
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
 
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
Michael Galvin, Sr. Data Scientist, Metis at MLconf ATL 2016
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 

Ähnlich wie Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean CampbellSourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Reynolds Center for Business Journalism
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
Ravi Mynampaty
 
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of CoachShopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Eran Eyal
 

Ähnlich wie Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016 (20)

Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean CampbellSourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
Sourcing with Social Media: Tips from a Corporate Sleuth by Sean Campbell
 
Summit slide loop ny
Summit slide loop nySummit slide loop ny
Summit slide loop ny
 
Search Analytics for Fun and Profit
Search Analytics for Fun and ProfitSearch Analytics for Fun and Profit
Search Analytics for Fun and Profit
 
Teresa Torres - Productized Masterclasses
Teresa Torres - Productized MasterclassesTeresa Torres - Productized Masterclasses
Teresa Torres - Productized Masterclasses
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 
Dow Jones Innovation 101 Oct19
Dow Jones Innovation 101 Oct19Dow Jones Innovation 101 Oct19
Dow Jones Innovation 101 Oct19
 
Master Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de RijkeMaster Minds on Data Science - Maarten de Rijke
Master Minds on Data Science - Maarten de Rijke
 
Hiring toolbox for startups
Hiring toolbox for startupsHiring toolbox for startups
Hiring toolbox for startups
 
Search is the new UI
Search is the new UISearch is the new UI
Search is the new UI
 
Keyword Research
Keyword ResearchKeyword Research
Keyword Research
 
Marketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword OntologyMarketing AI - How to Build a Keyword Ontology
Marketing AI - How to Build a Keyword Ontology
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 
Learn how to search VHL Search Portal - intermediate (tutorial)
Learn how to search VHL Search Portal - intermediate (tutorial)Learn how to search VHL Search Portal - intermediate (tutorial)
Learn how to search VHL Search Portal - intermediate (tutorial)
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Hearst Faceted Metadata for Site Navigation and Search
Hearst Faceted Metadata for Site Navigation and SearchHearst Faceted Metadata for Site Navigation and Search
Hearst Faceted Metadata for Site Navigation and Search
 
Ad campaign research
Ad campaign researchAd campaign research
Ad campaign research
 
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of CoachShopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
Shopin's Retail Intelligence Data Engine (R.I.D.E.) analysis of Coach
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 

Mehr von MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

  • 1. September 23, 2016 Query Understanding In Amazon Search Tanvi Motwani Data Scientist Amazon Search
  • 4. Power Law Distribution of Queries Large population of long tail queries Speedy Search Response Fast Models that respond in milliseconds Dynamic Search Trends Adaptive to new trends Global Search Reach Deal with 10 different languages CHALLENGES 4
  • 5. “label”: “_color”, “id” : C232, “name” : “Black” “label”: “_brand”, “id” : B1402, “name” : “The North Face” “label” : “_product”, “id” : P232, “name” : “Jacket” “category” : “Fashion > Clothing > Jackets & Coats” “class”: “product_query”, “score”: 0.9 “name”: “query_specificity”, “score”: 0.7 “class”: “fashion”, “score”: 0.8 QUERY TAGGERS QUERY CLASSIFIERS 5
  • 7. QUERY CATEGORY CLASSIFIER “A Multiclass Classifier which classifies input user query into Amazon Categories.” 7
  • 8. QUERY CATEGORY CLASSIFIER • Automatic generates large training dataset • Frequent refresh of training data possible • Trigram model generalizes well for tail queries tv ipod projector speakers headphones pillow curtains pet bells mattress shower curtain suits mr robot star trek downton abbey game of thrones Trigram Language Model 8 Large percentage of query searches happen within a category
  • 9. CUSTOMER SERVICE QUERY CLASSIFIER “Classifies query into customer service queries versus product query.” contact amazon amazon phone number how do I cancel my order? where is my order history? where is my order? how can I see videos? amazon prime video help 9
  • 11. Brand Product Type Price under Gender Is Prime QUERY TAGGING FILTERING 11
  • 13. QUERY TAGGING (BRAND) adidas shoes jansport backpack ray ban sunglasses ralph lauren men BRAND BRAND BRAND BRAND ralph lauren men IB O Conditional Random Field adidas shoes B O ray ban sunglasses B I O ralph lauren men B I O jansport backpack B O how? TRAIN 13 what?
  • 14. 14 • Discriminative Model – models conditional probability P(Y|X). We do not care to model P(X) • Features: word capitalized, word in atlas or name list, previous word is “Mrs”, next word is “Times”, … Recommended Tutorial on CRF – An Introduction to Conditional Random Fields for Relational Learning https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf CONDITIONAL RANDOM FIELD
  • 15. adidas shoes jansport backpack ray ban sunglasses north face black jacket polo ralph lauren men white shoes ralph lauren click add purchase QUERY LOGS PRODUCT CATALOGUE 15
  • 16. north face black jacket QUERY LOGS PRODUCT CATALOGUE 16
  • 17. arg max 𝑏 𝑖 ∈ 𝑃(𝑏) 𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖) 𝑤ℎ𝑒𝑟𝑒, 𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑, 𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏 north face black jacket 0.8 0.2BRAND Matching Strategies: • Attribute completely contained in query • Match after removing stop words, prepositions etc. • Partial query-attribute match 17
  • 18. QUERY TAGGING - SUMMARY Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY Manual Overrides 18 • Context aware “Philosophy books” v/s “Philosophy face wash” • Different formulations of same entity “Marc by Marc Jacobs” v/s “Marc Jacobs”
  • 19. Query Understanding Team • Palo Alto, California • Munich, Germany • Tokyo, Japan • Beijing, China Acknowledgements Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta, Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
  • 21. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER macbook pro macs apple laptop laptop Base Query Corpus mac ram apple sleeve laptop cover apple skin Accessory Query Corpus Binary Classifier Class A Class B Search Engine 21
  • 23. Base Product DB Accessory Product DB ACCESSORY QUERY CLASSIFIER paperwhite kindle amazon tablet book reader Base Query Corpus Kindle case Kindle cover Amazon cover case for kindle Accessory Query Corpus Binary Classifier Class A Class B Search Engine 23
  • 24. Training Data Generator Query Logs Product Catalogue Conditional Random Field Search Engine TRAIN DEPLOY QUERY TAGGING - SUMMARY Manual Overrides Validation Techniques: • Offline validation  Cross validation 80/20 split  Manual Gold Standard evaluation • A/B test  Control – Before the model was deployed  Treatment – After the model is deployed 24
  • 25. • Dictionary methods are not context aware  Example: “philosophy books”, dictionary method will tag “philosophy” as brand. • Fails to detect different formulations of same entity.  Example: “mk” vs. “michael kors” COMPARISON TO DICTIONARY LOOKUP METHODS Our system improved precision over baseline by 10% and approximately doubled the recall. 25
  • 27. 27 GENERATIVE v/s DISCRIMINATIVE MODELS 𝑃(𝑌, 𝑋) 𝑃 𝑌 𝑋)

Hinweis der Redaktion

  1. 00:30 Hi, I am Tanvi Motwani from Query Understanding team of A9 and today we will look into how we make this happen.
  2. 01:00 Typical product search page User types in query which is free text Search box is the most frequent method a customer uses to find products at Amazon Lets zoom into the first result here. Query then hits the QU module which analyzes what user has typed and computes query features. These features then go to ranking module that finds most “relevant” product for our customers. Today we are going to look into how this QU module functions.
  3. 03:00 What we see in product page is detailed information about a product Like web search we have unstructured text eg product description Along with this we have lots of structured information like …. We also have Add to cart, wish list, buy buttons and more that provide users behavioral data We need to make use of this structural information to help us provide precise results Challenge is the query user types in is unstructured text. Extracting structured information from query and matching it with appropriate product fields enable us to surface relevant products for the user We label parts of query like… we call these query annotations. We also perform query classification which is at the level of whole query eg: category class for this query is “Clothing”. High Level motivation
  4. 5:30
  5. 8:00 Add query taggers. Gray out black text.
  6. 9:00 Understanding category enables new features for example we can ask you a specific question.. Screenshots zoom in
  7. 11:00 This is a standard NLP approach Benefits on separate slide.
  8. 12:00 Change this slide with training data picture
  9. 05:30 See if you can show Nike on left nav
  10. Split into slide
  11. Kate spade new york example
  12. 06:30 1. User types “macbook pro” 2. if we use query words as features we get products containing macbook pro like 2nd , 4th . 3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.
  13. 06:30 1. User types “macbook pro” 2. if we use query words as features we get products containing macbook pro like 2nd , 4th . 3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.
  14. 6:00