[241]large scale search with polysemous codes

Large-scale search
with polysemous codes
Florent Perronnin
Naver Labs Europe

Problem statement
Given a query, find closest match(es) in large database of “entities”
Example entities: image, video, text, post, user, ad, …
Example applications:
• video copy detection (query = video, database = video)
• blog recommendation (query = user, database = blogs)
• ad placement (query = user, database = ads)
→ very large-scale problems
2

Problem statement
Visual signatures: compact, fast, accurate
DB
query
3

Image signatures
Step 1: embedding in common Eucliean space (approx. 1-10K dim)
Step 2: compression
“the cat”
“the dog”
4

Step 1: embedding in common Eucliean space (approx. 1-10K dim)
Step 2: compression → our focus in this talk
“the cat”
“the dog”
Image signatures
5

CONTENTS
1. Background: similarity search with compact codes
2. Polysemous codes
3. Application: knn-graph construction
6

1.
Background: large-scale
search with compact codes

1.1 Binary codes
000 001
100
110 111
011
101
010
Idea: design/learn a function mapping the original space into a
compact Hamming space
Neighbors w.r.t Hamming space try to reflect neighbors in original space
Advantages: compact descriptor, fast distance computation
LSH example: random projection + thresholding
[Charikar’02] shows that Hamming distance gives a cosine estimator 8

1.2 Product quantization (PQ)
y = y1 y2 y3 y4
Decompose the feature space as a product space
• use a distinct quantizer in each subspace, typically k-means
• estimate distances by using look-ups and additions only
[Jégou’11]
9

1.3 Binary codes vs PQ
Binary codes (ITQ) [Gong’11] Product quantization
e.g. [01000110…01] e.g. [2 63 27 227]
context-free comparison need quantizer centroids
1,190M comparisons / sec 222M comparisons / sec
precision = 0.143 precision = 0.442
How to get the best of both worlds?
Seen as competing methods in literature
10

2.
Polysemous codes
[Polysemous codes, Douze, Jégou, Perronnin, ECCV’16]

2.1 A naïve approach
q_bin= binary_encode(x) # compute query binary code
d_min = ∞
for i = 1..n # loop over database items
db_bin = db_bin_codes[i] # get binary code for item i
if hamming(q_bin, db_min) < threshold
db_pq = db_pq_codes[i] # get PQ code for item i
d = PQ_distance(x, db_pq)
if d < d_min
nearest_neighbor, d_min = i, d
Encode all DB items with binary and PQ codes:
12

2.2 A naïve approach
Encode all DB items with binary and PQ codes
→ memory increase (x2)
Could we use the same codes for both the Hamming and
PQ distances?
→ polysemous codes
13

2.3 Channel optimized vector quantization
Channel-optimized vector quantizers: “pseudo-Gray coding”
Minimize the overall expected distortion (both from source and channel)
Optimize the index assignment → neighboring codes encode similar info
enc=01001100 dec=01011100
14

2.4 Index assignment optimization
Given a k-means quantizer, learn a permutation of the codes such that the
binary comparison reflects centroid distances
15

2.5 The polysemous approach
q = encode(x) # compute query code
d_min = ∞
db = db_codes[i] # get code for item i
if hamming(q, db) < threshold
d = PQ_distance(x, db)
if d < d_min
Interpret PQ codes as binary codes:
16

2.5 The polysemous approach
q = encode(x) # compute query code
d_min = ∞
db = db_codes[i] # get code for item i
if hamming(q, db) < threshold
d = PQ_distance(x, db)
if d < d_min
Interpret PQ codes as binary codes:
→ no memory increase 17

2.6 Objective function
Find a permutation 𝝅() such that
the Hamming distance between permuted indices matches
the distance between centroids
weighting to
favor nearby
centroids
×
optimize permutation
over all pairs
of centroids
monotonous (linear)
function to correct the scale
18

weighting to
favor nearby
centroids
×
over all pairs
of centroids
monotonous (linear)
19

weighting to
favor nearby
centroids
×
over all pairs
of centroids
monotonous (linear)
20

weighting to
favor nearby
centroids
×
over all pairs
of centroids
monotonous (linear)
21

2.7 Optimization
Simulated annealing:
• initialization: random permutation
• swap two entries in the permutation
• converges in approx. 200k iterations (<10s)
22

2.9 Results: exhaustive search
1M SIFT vectors: exhaustive comparison (16 bytes)
24

90M CNN descriptors from Flickr100M dataset (16 bytes)
2.10 Results: non-exhaustive search
25
BIGANN academic benchmark (1B vectors) → x2-2.5 speed-up

3.
The knn-graph of a
collection

3.1 Building a graph on images
Testbed: Flickr100M
• public dataset of CC images
• described with AlexNet FC7 features
normalized, PCA to 256D, encoded as 32bytes,
coarse quantizer size 4096
Each image in turn is a query
• compute 100-NN
• build index = 14h, search = 7h
• storage for the graph = 2 x 40 GB RAM

3.2 Graph modes
Graph seen as a Markov model
→ compute stationary distribution [Cho’12]
Sparse matrix – vector multiplication
• 200 iterations (30s / iter)
• mode = local maximum over nodes

3.3 Paths in the graph
Almost all images are connected: find path between pairs of images
→ morphing from one image to another
Which paths?
• shortest path
• minimize sum of distances
• minimize max of distances
29

3.8 Path: different objects
34

[241]large scale search with polysemous codes

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie [241]large scale search with polysemous codes

Ähnlich wie [241]large scale search with polysemous codes (20)

Mehr von NAVER D2

Mehr von NAVER D2 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

[241]large scale search with polysemous codes