Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

September 23, 2016
Query Understanding
In Amazon Search
Tanvi Motwani
Data Scientist
Amazon Search

MOTIVATION
Fashion Product
Query
Brand
Product
Type Price under Gender Is Prime
3

Power Law Distribution of Queries
Large population of long tail queries
Speedy Search Response
Fast Models that respond in milliseconds
Dynamic Search Trends
Adaptive to new trends
Global Search Reach
Deal with 10 different languages
CHALLENGES
4

“label”: “_color”,
“id” : C232,
“name” : “Black”
“label”: “_brand”,
“id” : B1402,
“name” : “The North Face”
“label” : “_product”,
“id” : P232,
“name” : “Jacket”
“category” : “Fashion >
Clothing >
Jackets & Coats”
“class”: “product_query”,
“score”: 0.9
“name”: “query_specificity”,
“score”: 0.7
“class”: “fashion”,
“score”: 0.8
QUERY TAGGERS
QUERY CLASSIFIERS
5

QUERY CATEGORY CLASSIFIER
“A Multiclass Classifier which classifies input user query into Amazon Categories.”
7

QUERY CATEGORY CLASSIFIER
• Automatic generates large training dataset
• Frequent refresh of training data possible
• Trigram model generalizes well for tail queries
tv
ipod
projector
speakers
headphones
pillow
curtains
pet bells
mattress
shower curtain
suits
mr robot
star trek
downton abbey
game of thrones
Trigram Language Model
8
Large percentage of query searches happen within a category

CUSTOMER SERVICE QUERY CLASSIFIER
“Classifies query into customer service queries versus product query.”
contact amazon
amazon phone number
how do I cancel my order?
where is my order history?
where is my order?
how can I see videos?
amazon prime video help
9

Brand
Product
Type Price under Gender Is Prime
QUERY TAGGING
FILTERING
11

QUERY TAGGING (BRAND)
adidas shoes
jansport backpack
ray ban sunglasses
ralph lauren men
BRAND
BRAND
BRAND
BRAND
ralph lauren men
IB O
Conditional Random Field
adidas shoes
B O
ray ban sunglasses
B I O
ralph lauren men
B I O
jansport backpack
B O
how? TRAIN
13
what?

14
• Discriminative Model – models conditional probability P(Y|X). We do not
care to model P(X)
• Features: word capitalized, word in atlas or name list, previous word is
“Mrs”, next word is “Times”, …
Recommended Tutorial on CRF –
An Introduction to Conditional Random Fields for Relational Learning
https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
CONDITIONAL RANDOM FIELD

adidas shoes
jansport backpack
ray ban sunglasses
north face black jacket
polo ralph lauren men
white shoes
ralph lauren
click add purchase
QUERY LOGS PRODUCT CATALOGUE
15

QUERY LOGS PRODUCT CATALOGUE
16

arg max 𝑏
𝑖 ∈ 𝑃(𝑏)
𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖)
𝑤ℎ𝑒𝑟𝑒,
𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑,
𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏
0.8
0.2BRAND
Matching Strategies:
• Attribute completely contained in query
• Match after removing stop words, prepositions etc.
• Partial query-attribute match
17

QUERY TAGGING - SUMMARY
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
Manual
Overrides
18
• Context aware
“Philosophy books” v/s “Philosophy face wash”
• Different formulations of same entity
“Marc by Marc Jacobs” v/s “Marc Jacobs”

Query Understanding Team
• Palo Alto, California
• Munich, Germany
• Tokyo, Japan
• Beijing, China
Acknowledgements
Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta,
Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi
Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown

Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
macbook pro
macs
apple laptop
laptop
Base Query Corpus
mac ram
apple sleeve
laptop cover
apple skin
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
21

Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
paperwhite
kindle
amazon tablet
book reader
Base Query Corpus
Kindle case
Kindle cover
Amazon cover
case for kindle
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
23

Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
QUERY TAGGING - SUMMARY
Manual
Overrides
Validation Techniques:
• Offline validation
 Cross validation 80/20 split
 Manual Gold Standard evaluation
• A/B test
 Control – Before the model was deployed
 Treatment – After the model is deployed
24

• Dictionary methods are not context aware
 Example: “philosophy books”, dictionary method will tag
“philosophy” as brand.
• Fails to detect different formulations of same entity.
 Example: “mk” vs. “michael kors”
COMPARISON TO DICTIONARY
LOOKUP METHODS
Our system improved precision over baseline by 10% and
approximately doubled the recall.
25

27
GENERATIVE v/s DISCRIMINATIVE MODELS
𝑃(𝑌, 𝑋)
𝑃 𝑌 𝑋)

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

Ähnlich wie Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016 (20)

Mehr von MLconf

Mehr von MLconf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016

Hinweis der Redaktion