Efficient nearest neighbors search for large scale

Efficient Nearest Neighbors
Search for Large-Scale
Landmark Recognition
Federico Magliani, Tomaso Fontanini, and Andrea
Prati
IMP Lab - University of Parma
22/10/2018 IMP Lab - University of Parma 1

Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions

Motivations
• Approximate Nearest Neighbor (ANN) search problem
• find relevant results among an huge quantity of data
• trade-off between computational time and memory occupancy
• applied on image, text and information retrieval

Agenda
• Motivations
• Related works
• Conclusions

Related works
• Permutation Pivots allows to represent the image descriptors
through permutation of a set of randomly selected reference
objects;
• Locality Sensitive Hashing (LSH) projects points that are close to
each other into the same bucket with high probability;
• Product Quantization (PQ) decomposes the space into a Cartesian
product of low dimensional subspaces and quantizes each subspace
separately;
• FLANN: an open source library for ANN and one of the most
popular for nearest neighbor matching.

Agenda
• Motivations
• Related works
• Conclusions

Proposed approach: Bag of Indexes (BoI)
It’s a multi-index hashing algorithm for ANN search problem.
• The Db data are projected through LSH function and the index of the
signature is saved in hash tables;
• For each query, the following process is repeated for every projection:
1. Project the descriptor.
2. The indexes found in the bucket closest to the query will be added to a ranking
list (BoI) with a weight proportional by the Hamming distance between the
query bucket and the analysed bucket.
3. At the end the topN elements are re-ranked according to the Euclidean distance.

0
1
2
3
1 2 3 4 5 6 7
Weight
Image Index
Hash Table 1 Hash Table 2 Hash Table 3
Hash
Table 1
…
…
{4,6}
5
{2,3}
…
…
Hash
Table 2
…
7
{5,3}
1
…
…
…
Hash
Table 3
…
…
…
5
3
{1,4}
…
Index of query
image for each
Hash Table
L = 3

• Weighing strategy (multi-probe approach):
𝑤 𝑖, 𝑞, 𝑙 = ቐ
1
2 𝐻(𝑖,𝑞)
, 𝑖𝑓 𝐻 𝑖, 𝑞 ≤ 𝑙
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where i is a generic bucket, q is the query bucket and H(i,q) is the
Hamming distance between i and q.
• Adaptive version: after a predefined number of hash table, the gap
is reduced in order to reduce the computational time.

Linear vs Sublinear reduction
• linear: the number of neighboring buckets γ is reduced by 2 every 40 hash tables:
𝛾𝑖 = ቊ
𝛾𝑖−1 − 2, 𝑖𝑓 𝑖 = {Δ1, … , 𝑘𝑖Δ1}
𝛾𝑖−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
with i = {1, . . . , L}, ∆1 = 40 and k1 : k1 ∆1 ≤ L
• sublinear: the number of neighboring buckets γ is reduced by 2 every 25 hash tables, but only after the first half of hash tables:
𝛾𝑖 =
𝛾𝑖−1, 𝑖𝑓 𝑖 ≤
𝐿
2
𝛾𝑖−1 − 2, 𝑖𝑓 𝑖 =
𝐿
2
,
𝐿
2
+ Δ2, … ,
𝐿
2
+ 𝑘2Δ2
𝛾𝑖−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
with i = {1, . . . , L}, ∆2 = 25 and k2 : L/2 + k2∆2 ≤ L

BoI - Parameters config
Symbol Name Value
δ hash dimension 2
8
= 256
L hash tables 100
𝜸 𝟎 initial gap 68
l neighbors used 3-neighbors
- reduction sublinear
ε re-ranking top 250 elements

Agenda
• Motivations
• Related works
• Conclusions

Datasets
• Holidays+Flickr1M (1M distractor images + 1491 images: 500 classes,
500 query.)
• Oxford105k (100k distractor images + 5062 images: 11 classes, 55
queries);
• Paris106k (100k distractor images + 6412 images: 11 classes, 55
queries);
• SIFT1M (1M 128D SIFT descriptors, 10k query images, only the top
100 images in the final ranking for each query are evaluated)
• GIST1M (1M 960D GIST descriptors, 1k query images, only the top
100 images in the final ranking for each query are checked)

Evaluation Metrics
• Different evaluation metrics are used to compare with the state-of-
the-art approaches:
• Recall in R = 1, 10, 100 → it is the average rate of queries for which
the 1-nearest neighbor is ranked in the top R positions.
• mAP (mean Average Precision) → mean of Average Precision scores
(correct results) for each query, based on the position in the
ranking.

Results on Holidays+Flickr1M
Method ε mAP Avg retrieval time (msec)
LSH 250 86.03 % 3103
Multi-probe LSH 250 86.10 % 16706
Permutations 250 82.70 % 2844
LOPQ 250 36.37 % 4
FLANN 250 83.97 % 995
BoI LSH 250 78.10 % 5
BoI multi-probe LSH 250 85.16 % 12
BoI adaptive multi-probe LSH 250 85.35 % 8

Results on Holidays+Flickr1M
Method ε mAP Avg retrieval time (msec)
Permutations 10k 85.51 % 15640
LOPQ 10k 67.22 % 72
FLANN 10k 85.66 % 1004
BoI adaptive multi-probe LSH 10k 86.09 % 16

Results on Oxford105k and Paris106k
Method ε
Oxford105k Paris106k
mAP
Avg ret.
Time (msec)
mAP
Avg ret. Time
(msec)
LSH 2500 80.83% 610 86.50% 607
Permutations 2500 81.89% 240 88.14% 140
LOPQ 2500 71.70% 346 87.47% 295
FLANN 2500 70.33% 2118 68.93% 2132
Boi adaptive multi-probe LSH 2500 81.44% 12 87.90% 13
Permutations 10k 82.82% 250 89.04% 164
LOPQ 10k 69.94% 1153 88.00% 841
FLANN 10k 69.37% 2135 70.73% 2156
Boi adaptive multi-probe LSH 10k 84.38% 25 92.31% 26

Results on Sift1M
Method ε R=1 R=10 R=100 Avg retrieval
time (msec)
Permutations 500 94.32 % 94.98% 94.98 % 16999
LOPQ 500 19.93 % 44.80 % 52.92 % 3
FLANN 500 54.47 % 54.83 % 54.83% 16
BoI adaptive multi-probe LSH 500 93.72 % 94.34 % 94.34 % 22
LOPQ 10k 36.34 % 80.11 % 96.18 % 104
FLANN 10k 95.06 % 95.86 % 95.86 % 31
BoI adaptive multi-probe LSH 10k 99.17 % 99.85 % 99.85 % 30

Results on Gist1M
Method ε R=1 R=10 R=100 Avg retrieval
time (msec)
Permutations 500 54.80 % 55.30% 55.30 % 17909
FLANN 500 28.30 % 28.60 % 28.60% 1262
BoI adaptive multi-probe LSH 500 57.70 % 58.20 % 58.20 % 69
LOPQ 10k 75.90 % 76.50 % 76.50 % 1352
BoI adaptive multi-probe LSH 10k 92.40 % 93.40 % 93.40 % 108

Agenda
• Motivations
• Related works
• Conclusions

Conclusions
• The proposed Bag of Indexes (BoI) adaptive multi-probe LSH is a
simple technique implemented for the efficient resolution of the
ANN search problem.
• BoI allows to work in combination of different hashing/projection
functions.
• Experiments are performed on five public datasets, namely
Holidays+Flickr1M, Oxford105k, Paris106k, SIFT1M and GIST1M, and
demonstrate superior recognition accuracy w.r.t. the state of the art.

Thanks for your attention!
• Questions?
• Contacts: tomaso.fontanini@studenti.unipr.it
• Website: implab.ce.unipr.it/?page_id=122
• GitHub: github.com/fmaglia/BoI

Efficient nearest neighbors search for large scale

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (8)

Ähnlich wie Efficient nearest neighbors search for large scale

Ähnlich wie Efficient nearest neighbors search for large scale (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Efficient nearest neighbors search for large scale