E-commerce Query Tagging System Using Unsupervised Training Methods: Amazon is one of the world’s largest e-commerce sites and Amazon Search powers the majority of Amazon’s sales. A key component of Amazon Search is the query understanding pipeline, which extracts appropriate semantic information used to precisely display products for billions of queries everyday. In this talk, we will go through the primary building blocks of query understanding pipeline.
Amazon Search enables users to search against structured products, hence it is necessary to extract information from queries in a format that is consistent with the structured information about the products. Query tagging is the task of semantically annotating query terms to pre-defined labels (such as brand, product-type and color). We propose a scalable system to train large-scale machine learning algorithms to solve this problem. Our system improved the precision over baseline, which is a dictionary lookup based tagger, by 10% and approximately doubled the recall.
4. Power Law Distribution of Queries
Large population of long tail queries
Speedy Search Response
Fast Models that respond in milliseconds
Dynamic Search Trends
Adaptive to new trends
Global Search Reach
Deal with 10 different languages
CHALLENGES
4
8. QUERY CATEGORY CLASSIFIER
• Automatic generates large training dataset
• Frequent refresh of training data possible
• Trigram model generalizes well for tail queries
tv
ipod
projector
speakers
headphones
pillow
curtains
pet bells
mattress
shower curtain
suits
mr robot
star trek
downton abbey
game of thrones
Trigram Language Model
8
Large percentage of query searches happen within a category
9. CUSTOMER SERVICE QUERY CLASSIFIER
“Classifies query into customer service queries versus product query.”
contact amazon
amazon phone number
how do I cancel my order?
where is my order history?
where is my order?
how can I see videos?
amazon prime video help
9
13. QUERY TAGGING (BRAND)
adidas shoes
jansport backpack
ray ban sunglasses
ralph lauren men
BRAND
BRAND
BRAND
BRAND
ralph lauren men
IB O
Conditional Random Field
adidas shoes
B O
ray ban sunglasses
B I O
ralph lauren men
B I O
jansport backpack
B O
how? TRAIN
13
what?
14. 14
• Discriminative Model – models conditional probability P(Y|X). We do not
care to model P(X)
• Features: word capitalized, word in atlas or name list, previous word is
“Mrs”, next word is “Times”, …
Recommended Tutorial on CRF –
An Introduction to Conditional Random Fields for Relational Learning
https://people.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
CONDITIONAL RANDOM FIELD
15. adidas shoes
jansport backpack
ray ban sunglasses
north face black jacket
polo ralph lauren men
white shoes
ralph lauren
click add purchase
QUERY LOGS PRODUCT CATALOGUE
15
17. arg max 𝑏
𝑖 ∈ 𝑃(𝑏)
𝑓(𝑐𝑖, 𝑎𝑖, 𝑝𝑖)
𝑤ℎ𝑒𝑟𝑒,
𝑏 𝑖𝑠 𝑎 𝑏𝑟𝑎𝑛𝑑 𝑎𝑛𝑑,
𝑃(𝑏) 𝑎𝑟𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑏𝑟𝑎𝑛𝑑 𝑏
north face black jacket
0.8
0.2BRAND
Matching Strategies:
• Attribute completely contained in query
• Match after removing stop words, prepositions etc.
• Partial query-attribute match
17
18. QUERY TAGGING - SUMMARY
Training Data
Generator
Query Logs
Product
Catalogue
Conditional
Random
Field
Search
Engine
TRAIN DEPLOY
Manual
Overrides
18
• Context aware
“Philosophy books” v/s “Philosophy face wash”
• Different formulations of same entity
“Marc by Marc Jacobs” v/s “Marc Jacobs”
19. Query Understanding Team
• Palo Alto, California
• Munich, Germany
• Tokyo, Japan
• Beijing, China
Acknowledgements
Mukund Seshadri, Tracy King, Will Headden, Louka Dlagnekov, Tianyu Cao, Rahul Goutam, Huascar Fiorletta,
Alexander Zeyliger, Smruthi Mukund, Konstantin Stulov, Himanshu Gahlot, Yosi Shturm, Taro Kawagishi, Ravi
Jammalamadaka, Anand Lakshminath, Hernan, Greg Miller, Heran, Nick Trown
21. Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
macbook pro
macs
apple laptop
laptop
Base Query Corpus
mac ram
apple sleeve
laptop cover
apple skin
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
21
23. Base Product DB Accessory Product DB
ACCESSORY QUERY CLASSIFIER
paperwhite
kindle
amazon tablet
book reader
Base Query Corpus
Kindle case
Kindle cover
Amazon cover
case for kindle
Accessory Query
Corpus
Binary Classifier
Class A
Class B
Search Engine
23
25. • Dictionary methods are not context aware
Example: “philosophy books”, dictionary method will tag
“philosophy” as brand.
• Fails to detect different formulations of same entity.
Example: “mk” vs. “michael kors”
COMPARISON TO DICTIONARY
LOOKUP METHODS
Our system improved precision over baseline by 10% and
approximately doubled the recall.
25
00:30
Hi, I am Tanvi Motwani from Query Understanding team of A9 and today we will look into how we make this happen.
01:00
Typical product search page
User types in query which is free text
Search box is the most frequent method a customer uses to find products at Amazon
Lets zoom into the first result here.
Query then hits the QU module which analyzes what user has typed and computes query features. These features then go to ranking module that finds most “relevant” product for our customers. Today we are going to look into how this QU module functions.
03:00
What we see in product page is detailed information about a product
Like web search we have unstructured text eg product description
Along with this we have lots of structured information like ….
We also have Add to cart, wish list, buy buttons and more that provide users behavioral data
We need to make use of this structural information to help us provide precise results
Challenge is the query user types in is unstructured text.
Extracting structured information from query and matching it with appropriate product fields enable us to surface relevant products for the user
We label parts of query like… we call these query annotations.
We also perform query classification which is at the level of whole query eg: category class for this query is “Clothing”.
High Level motivation
5:30
8:00
Add query taggers. Gray out black text.
9:00
Understanding category enables new features for example we can ask you a specific question..
Screenshots zoom in
11:00
This is a standard NLP approach
Benefits on separate slide.
12:00
Change this slide with training data picture
05:30
See if you can show Nike on left nav
Split into slide
Kate spade new york example
06:30
1. User types “macbook pro”
2. if we use query words as features we get products containing macbook pro like 2nd , 4th .
3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.
06:30
1. User types “macbook pro”
2. if we use query words as features we get products containing macbook pro like 2nd , 4th .
3. If we tag macbook pro is product type as a feature, the rank function will rank higher products that have the same product type and so we see actual macbook pros bubbling up.