SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Searching for Similar Items in Diverse
Universes
Large-scale machine learning at Google
Corinna Cortes
Google Research
corinna@google.com
1
pageMachine Learning at Google 2014
Browse for Fashion
2
pageMachine Learning at Google 2014
Browse forVideos
3
pageMachine Learning at Google 2014
Outline
Metric setting
• Using similarities (efficiency)
• Learning similarities (quality)
Graph-based setting
• Generating similarities
4
pageMachine Learning at Google 2014
Image Browsing
Image browsing relies on some measure of similarity.
5
⎛
⎜
⎜
⎜
⎝
x11
x12
...
x1N
⎞
⎟
⎟
⎟
⎠
⎛
⎜
⎜
⎜
⎝
x21
x22
...
x2N
⎞
⎟
⎟
⎟
⎠
Sim(x1, x2) =
⎛
⎜
⎜
⎜
⎝
x11
x12
...
x1N
⎞
⎟
⎟
⎟
⎠
·
⎛
⎜
⎜
⎜
⎝
x21
x22
...
x2N
⎞
⎟
⎟
⎟
⎠
pageMachine Learning at Google 2014
Clustering of Images
Compute all the pair-wise distances between related
images and form clusters:
6
pageMachine Learning at Google 2014
Similar Images - version 0.0
7
Demo
pageMachine Learning at Google 2014
Helsinki, some cluster
8
pageMachine Learning at Google 2014
“Machine Learning” comes up empty
9
pageMachine Learning at Google 2014
The Web is Huge
Image Swirl was precomputed and restricted to top
20K queries.
People search on billions of different queries.
Cannot precompute billions of distances
we need to do something smarter.
10
pageMachine Learning at Google 2014
Approximate Nearest Neighbors
Preprocessing:
• Represent images by short vectors, kernel-PCA;
• Grow tree top-down based on ‘random’
projections. Spill a bit for robustness;
• Save the node ID(s) with each image.
At query time:
• propagate down tree to find node ID;
• retrieve other image with same ID;
• rank according to similarity with the short vector
and other meta data.
11
pageMachine Learning at Google 2014
Similar Images,Version 1.0
Live Search, Similar Images
Inspect the single clusters
• cluster 700
• cluster 700 + machine learning
• cluster 1000
• cluster 400
12
pageMachine Learning at Google 2014
Cluster 700
13
pageMachine Learning at Google 2014
Cluster 700 + Machine Learning
14
pageMachine Learning at Google 2014
Cluster 1000
15
pageMachine Learning at Google 2014
Cluster 1000 + Machine Learning
16
pageMachine Learning at Google 2014
Cluster 400
17
pageMachine Learning at Google 2014
Cluster 400 + Machine Learning
18
pageMachine Learning at Google 2014
Finding Similar Time Series
19
pageMachine Learning at Google 2014
Flu Trends
20
http://www.google.org/flutrends/us/#US
pageMachine Learning at Google 2014
Flu Trends
21
pageMachine Learning at Google 2014
Split time series into N/k chunks:
Represent each chunk with a set of cluster centers
(256) using k-means. Save the coordinates of the
centers, (ID, coordinates).
Save each series as a set of closest IDs, hashcode.
Asymmetric Hashing, Indexing
22
pageMachine Learning at Google 2014
For given input u, divide it into its N/k chunks, uj:
• Compute the N/k * 256 distances to all centers.
• Compute the distances to all hash codes:
MN/k additions needed.
The “Asymmetric” in “Asymmetric Hashing” refers
to the fact that we hash the database vectors but
not the search vector.
Asymmetric Hashing, Searching
23
d2
(u, vi
) =
N/k
j=1
d2
(uj, c(vi
j))
pageMachine Learning at Google 2014
Google Correlate
http://www.google.com/trends/correlate/
24
pageMachine Learning at Google 2014
Outline
Metric setting
• Using similarities (efficiency)
• Learning similarities (quality)
Graph-based setting
• Generating similarities
25
pageMachine Learning at Google 2014Machine Learning at Google 2014
Table Search
26
http://webdatacommons.org/webtables/
March 5, 2014: 11 billion HTML tables reduced to 147 million quasi-relational Web tables.
Standard Learning with Kernels
The user is burdened with choosing an appropriate kernel.
User
Algorithm
m points
(x, y)
27
Learning Kernels
Demands less commitment from the user: instead of a
specific kernel, only requires the definition of a family of
kernels.
User
Algorithm
m points
28
pageMachine Learning at Google 2014pageMachine Learning at Google
Two stages:
Outperforms uniform baseline and previous algorithms.
Centered alignment is key: different from notion used by
(Cristiannini et al., 2001).
2014
Centered Alignment-Based LK
K h
29
[C. Cortes, M. Mohri, and A. Rostamizadeh:Two-stage learning kernel methods, ICML 2010]
pageMachine Learning at Google 2014Machine Learning at Google 2014
Centered Alignment
Centered kernels:
Centered alignment:
Choose kernel to maximize alignment with the
target kernel:
where expectation is over pairs of points.
KY (xi, xj) = yiyj.
pageMachine Learning at Google 2014pageMachine Learning at Google 2014
Alignment-Based Kernel Learning
Theoretical Results:
• Concentration bound.
• Existence of good predictors.
Alignment algorithms:
• Simple, highly scalable algorithm
• Quadratic Program based algorithm.
31
pageMachine Learning at Google 2014Machine Learning at Google 2014
Table Search
http://research.google.com/tables
• Machine Learning Conferences
• Marathons USA
Research Tool in Docs
32
pageMachine Learning at Google 2014
Outline
Metric setting
• Using similarities (efficiency)
• Learning similarities (quality)
Graph-based setting
• Generating similarities
33
pageMachine Learning at Google 2014
Graph-based setting
Personalized Page Rank
• PPR used to densify the graph
• Single-Linkage Clustering used to find clusters in
the graph
34
pageMachine Learning at Google 2014
Example
35
pageMachine Learning at Google 2014
PPR Clustering
Personalized PageRank (PPR) of :
• Probability of visiting in a random walk starting
at :
• With probability , go to a neighbor
uniformly at random.
• With probability , go back to ;
Resulting graph: single hops enriched with multiple
hops, resulting in a denser graph.Threshold graph at
appropriate level.
36
pageMachine Learning at Google 2014
2 Pass Map-Reduce
• Each push operation takes a single vertex u, moves
an fraction of the probability from r(u) onto
p(u), and then spreads the remaining (1 )
fraction within r, as if a single step of the lazy
random walk were applied only to the vertex u.
• Initialization: p = ~0 and r = indicator function at u
Algorithm
37
pageMachine Learning at Google 2014
“Single-Linkage” Clustering
Examples
Can be parallelized;
Can be repeated hierarchically.
38
pageMachine Learning at Google 2014
Demo, Phileas, Landmark
The goal of this project is to develop a system to
automatically recognize touristic landmarks and
popular location in images and videos. Applications
include automatic geo-tagging and annotation of
photos or videos. Current clients of the service
include Personal Photos Search, and Google Goggles,
Map Explorer
39
pageMachine Learning at Google 2014pageMachine Learning at Google 2014
Summary
Challenging very large-scale problems:
• efficient and effective similarity measures
algorithms.
Still more to do:
• learning similarities for graph based algorithms;
• similarity measures vs discrepancy measures:
• match time series on spikes
• match shoes with shoes, highlighting differences.
40

Weitere ähnliche Inhalte

Ähnlich wie Machine Learning Techniques for Finding Similar Items

RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Databricks
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspectiveVijaya Prabhu
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspectiveVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overviewMapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overviewPrakher Hajela Saxena
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...Louis Dorard
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 

Ähnlich wie Machine Learning Techniques for Finding Similar Items (20)

RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Session 6.pdf
Session 6.pdfSession 6.pdf
Session 6.pdf
 
Session 6.pdf
Session 6.pdfSession 6.pdf
Session 6.pdf
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
Retrieving Visually-Similar Products for Shopping Recommendations using Spark...
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspective
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspective
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overviewMapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Machine Learning Techniques for Finding Similar Items

  • 1. Searching for Similar Items in Diverse Universes Large-scale machine learning at Google Corinna Cortes Google Research corinna@google.com 1
  • 2. pageMachine Learning at Google 2014 Browse for Fashion 2
  • 3. pageMachine Learning at Google 2014 Browse forVideos 3
  • 4. pageMachine Learning at Google 2014 Outline Metric setting • Using similarities (efficiency) • Learning similarities (quality) Graph-based setting • Generating similarities 4
  • 5. pageMachine Learning at Google 2014 Image Browsing Image browsing relies on some measure of similarity. 5 ⎛ ⎜ ⎜ ⎜ ⎝ x11 x12 ... x1N ⎞ ⎟ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎜ ⎝ x21 x22 ... x2N ⎞ ⎟ ⎟ ⎟ ⎠ Sim(x1, x2) = ⎛ ⎜ ⎜ ⎜ ⎝ x11 x12 ... x1N ⎞ ⎟ ⎟ ⎟ ⎠ · ⎛ ⎜ ⎜ ⎜ ⎝ x21 x22 ... x2N ⎞ ⎟ ⎟ ⎟ ⎠
  • 6. pageMachine Learning at Google 2014 Clustering of Images Compute all the pair-wise distances between related images and form clusters: 6
  • 7. pageMachine Learning at Google 2014 Similar Images - version 0.0 7 Demo
  • 8. pageMachine Learning at Google 2014 Helsinki, some cluster 8
  • 9. pageMachine Learning at Google 2014 “Machine Learning” comes up empty 9
  • 10. pageMachine Learning at Google 2014 The Web is Huge Image Swirl was precomputed and restricted to top 20K queries. People search on billions of different queries. Cannot precompute billions of distances we need to do something smarter. 10
  • 11. pageMachine Learning at Google 2014 Approximate Nearest Neighbors Preprocessing: • Represent images by short vectors, kernel-PCA; • Grow tree top-down based on ‘random’ projections. Spill a bit for robustness; • Save the node ID(s) with each image. At query time: • propagate down tree to find node ID; • retrieve other image with same ID; • rank according to similarity with the short vector and other meta data. 11
  • 12. pageMachine Learning at Google 2014 Similar Images,Version 1.0 Live Search, Similar Images Inspect the single clusters • cluster 700 • cluster 700 + machine learning • cluster 1000 • cluster 400 12
  • 13. pageMachine Learning at Google 2014 Cluster 700 13
  • 14. pageMachine Learning at Google 2014 Cluster 700 + Machine Learning 14
  • 15. pageMachine Learning at Google 2014 Cluster 1000 15
  • 16. pageMachine Learning at Google 2014 Cluster 1000 + Machine Learning 16
  • 17. pageMachine Learning at Google 2014 Cluster 400 17
  • 18. pageMachine Learning at Google 2014 Cluster 400 + Machine Learning 18
  • 19. pageMachine Learning at Google 2014 Finding Similar Time Series 19
  • 20. pageMachine Learning at Google 2014 Flu Trends 20 http://www.google.org/flutrends/us/#US
  • 21. pageMachine Learning at Google 2014 Flu Trends 21
  • 22. pageMachine Learning at Google 2014 Split time series into N/k chunks: Represent each chunk with a set of cluster centers (256) using k-means. Save the coordinates of the centers, (ID, coordinates). Save each series as a set of closest IDs, hashcode. Asymmetric Hashing, Indexing 22
  • 23. pageMachine Learning at Google 2014 For given input u, divide it into its N/k chunks, uj: • Compute the N/k * 256 distances to all centers. • Compute the distances to all hash codes: MN/k additions needed. The “Asymmetric” in “Asymmetric Hashing” refers to the fact that we hash the database vectors but not the search vector. Asymmetric Hashing, Searching 23 d2 (u, vi ) = N/k j=1 d2 (uj, c(vi j))
  • 24. pageMachine Learning at Google 2014 Google Correlate http://www.google.com/trends/correlate/ 24
  • 25. pageMachine Learning at Google 2014 Outline Metric setting • Using similarities (efficiency) • Learning similarities (quality) Graph-based setting • Generating similarities 25
  • 26. pageMachine Learning at Google 2014Machine Learning at Google 2014 Table Search 26 http://webdatacommons.org/webtables/ March 5, 2014: 11 billion HTML tables reduced to 147 million quasi-relational Web tables.
  • 27. Standard Learning with Kernels The user is burdened with choosing an appropriate kernel. User Algorithm m points (x, y) 27
  • 28. Learning Kernels Demands less commitment from the user: instead of a specific kernel, only requires the definition of a family of kernels. User Algorithm m points 28
  • 29. pageMachine Learning at Google 2014pageMachine Learning at Google Two stages: Outperforms uniform baseline and previous algorithms. Centered alignment is key: different from notion used by (Cristiannini et al., 2001). 2014 Centered Alignment-Based LK K h 29 [C. Cortes, M. Mohri, and A. Rostamizadeh:Two-stage learning kernel methods, ICML 2010]
  • 30. pageMachine Learning at Google 2014Machine Learning at Google 2014 Centered Alignment Centered kernels: Centered alignment: Choose kernel to maximize alignment with the target kernel: where expectation is over pairs of points. KY (xi, xj) = yiyj.
  • 31. pageMachine Learning at Google 2014pageMachine Learning at Google 2014 Alignment-Based Kernel Learning Theoretical Results: • Concentration bound. • Existence of good predictors. Alignment algorithms: • Simple, highly scalable algorithm • Quadratic Program based algorithm. 31
  • 32. pageMachine Learning at Google 2014Machine Learning at Google 2014 Table Search http://research.google.com/tables • Machine Learning Conferences • Marathons USA Research Tool in Docs 32
  • 33. pageMachine Learning at Google 2014 Outline Metric setting • Using similarities (efficiency) • Learning similarities (quality) Graph-based setting • Generating similarities 33
  • 34. pageMachine Learning at Google 2014 Graph-based setting Personalized Page Rank • PPR used to densify the graph • Single-Linkage Clustering used to find clusters in the graph 34
  • 35. pageMachine Learning at Google 2014 Example 35
  • 36. pageMachine Learning at Google 2014 PPR Clustering Personalized PageRank (PPR) of : • Probability of visiting in a random walk starting at : • With probability , go to a neighbor uniformly at random. • With probability , go back to ; Resulting graph: single hops enriched with multiple hops, resulting in a denser graph.Threshold graph at appropriate level. 36
  • 37. pageMachine Learning at Google 2014 2 Pass Map-Reduce • Each push operation takes a single vertex u, moves an fraction of the probability from r(u) onto p(u), and then spreads the remaining (1 ) fraction within r, as if a single step of the lazy random walk were applied only to the vertex u. • Initialization: p = ~0 and r = indicator function at u Algorithm 37
  • 38. pageMachine Learning at Google 2014 “Single-Linkage” Clustering Examples Can be parallelized; Can be repeated hierarchically. 38
  • 39. pageMachine Learning at Google 2014 Demo, Phileas, Landmark The goal of this project is to develop a system to automatically recognize touristic landmarks and popular location in images and videos. Applications include automatic geo-tagging and annotation of photos or videos. Current clients of the service include Personal Photos Search, and Google Goggles, Map Explorer 39
  • 40. pageMachine Learning at Google 2014pageMachine Learning at Google 2014 Summary Challenging very large-scale problems: • efficient and effective similarity measures algorithms. Still more to do: • learning similarities for graph based algorithms; • similarity measures vs discrepancy measures: • match time series on spikes • match shoes with shoes, highlighting differences. 40