The problem of landmark recognition has achieved excellent results in small-scale datasets. When dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k.
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Efficient nearest neighbors search for large scale
1. Efficient Nearest Neighbors
Search for Large-Scale
Landmark Recognition
Federico Magliani, Tomaso Fontanini, and Andrea
Prati
IMP Lab - University of Parma
22/10/2018 IMP Lab - University of Parma 1
2. Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions
22/10/2018 IMP Lab - University of Parma 2
3. Motivations
• Approximate Nearest Neighbor (ANN) search problem
• find relevant results among an huge quantity of data
• trade-off between computational time and memory occupancy
• applied on image, text and information retrieval
22/10/2018 IMP Lab - University of Parma 3
4. Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions
22/10/2018 IMP Lab - University of Parma 4
5. Related works
• Permutation Pivots allows to represent the image descriptors
through permutation of a set of randomly selected reference
objects;
• Locality Sensitive Hashing (LSH) projects points that are close to
each other into the same bucket with high probability;
• Product Quantization (PQ) decomposes the space into a Cartesian
product of low dimensional subspaces and quantizes each subspace
separately;
• FLANN: an open source library for ANN and one of the most
popular for nearest neighbor matching.
22/10/2018 IMP Lab - University of Parma 5
6. Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions
22/10/2018 IMP Lab - University of Parma 6
7. Proposed approach: Bag of Indexes (BoI)
It’s a multi-index hashing algorithm for ANN search problem.
• The Db data are projected through LSH function and the index of the
signature is saved in hash tables;
• For each query, the following process is repeated for every projection:
1. Project the descriptor.
2. The indexes found in the bucket closest to the query will be added to a ranking
list (BoI) with a weight proportional by the Hamming distance between the
query bucket and the analysed bucket.
3. At the end the topN elements are re-ranked according to the Euclidean distance.
22/10/2018 IMP Lab - University of Parma 7
8. Proposed approach: Bag of Indexes (BoI)
0
1
2
3
1 2 3 4 5 6 7
Weight
Image Index
Hash Table 1 Hash Table 2 Hash Table 3
22/10/2018 IMP Lab - University of Parma 8
Hash
Table 1
…
…
{4,6}
5
{2,3}
…
…
Hash
Table 2
…
7
{5,3}
1
…
…
…
Hash
Table 3
…
…
…
5
3
{1,4}
…
Index of query
image for each
Hash Table
L = 3
9. Proposed approach: Bag of Indexes (BoI)
• Weighing strategy (multi-probe approach):
𝑤 𝑖, 𝑞, 𝑙 = ቐ
1
2 𝐻(𝑖,𝑞)
, 𝑖𝑓 𝐻 𝑖, 𝑞 ≤ 𝑙
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where i is a generic bucket, q is the query bucket and H(i,q) is the
Hamming distance between i and q.
• Adaptive version: after a predefined number of hash table, the gap
is reduced in order to reduce the computational time.
22/10/2018 IMP Lab - University of Parma 9
10. Linear vs Sublinear reduction
• linear: the number of neighboring buckets γ is reduced by 2 every 40 hash tables:
𝛾𝑖 = ቊ
𝛾𝑖−1 − 2, 𝑖𝑓 𝑖 = {Δ1, … , 𝑘𝑖Δ1}
𝛾𝑖−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
with i = {1, . . . , L}, ∆1 = 40 and k1 : k1 ∆1 ≤ L
• sublinear: the number of neighboring buckets γ is reduced by 2 every 25 hash tables, but only after the first half of hash tables:
𝛾𝑖 =
𝛾𝑖−1, 𝑖𝑓 𝑖 ≤
𝐿
2
𝛾𝑖−1 − 2, 𝑖𝑓 𝑖 =
𝐿
2
,
𝐿
2
+ Δ2, … ,
𝐿
2
+ 𝑘2Δ2
𝛾𝑖−1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
with i = {1, . . . , L}, ∆2 = 25 and k2 : L/2 + k2∆2 ≤ L
22/10/2018 IMP Lab - University of Parma 10
11. BoI - Parameters config
Symbol Name Value
δ hash dimension 2
8
= 256
L hash tables 100
𝜸 𝟎 initial gap 68
l neighbors used 3-neighbors
- reduction sublinear
ε re-ranking top 250 elements
22/10/2018 IMP Lab - University of Parma 11
12. Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions
22/10/2018 IMP Lab - University of Parma 12
13. Datasets
• Holidays+Flickr1M (1M distractor images + 1491 images: 500 classes,
500 query.)
• Oxford105k (100k distractor images + 5062 images: 11 classes, 55
queries);
• Paris106k (100k distractor images + 6412 images: 11 classes, 55
queries);
• SIFT1M (1M 128D SIFT descriptors, 10k query images, only the top
100 images in the final ranking for each query are evaluated)
• GIST1M (1M 960D GIST descriptors, 1k query images, only the top
100 images in the final ranking for each query are checked)
22/10/2018 IMP Lab - University of Parma 13
14. Evaluation Metrics
• Different evaluation metrics are used to compare with the state-of-
the-art approaches:
• Recall in R = 1, 10, 100 → it is the average rate of queries for which
the 1-nearest neighbor is ranked in the top R positions.
• mAP (mean Average Precision) → mean of Average Precision scores
(correct results) for each query, based on the position in the
ranking.
22/10/2018 IMP Lab - University of Parma 14
20. Agenda
• Motivations
• Related works
• Proposed approach (Bag of Indexes)
• Experimental results
• Conclusions
22/10/2018 IMP Lab - University of Parma 20
21. Conclusions
• The proposed Bag of Indexes (BoI) adaptive multi-probe LSH is a
simple technique implemented for the efficient resolution of the
ANN search problem.
• BoI allows to work in combination of different hashing/projection
functions.
• Experiments are performed on five public datasets, namely
Holidays+Flickr1M, Oxford105k, Paris106k, SIFT1M and GIST1M, and
demonstrate superior recognition accuracy w.r.t. the state of the art.
22/10/2018 IMP Lab - University of Parma 21
22. Thanks for your attention!
• Questions?
• Contacts: tomaso.fontanini@studenti.unipr.it
• Website: implab.ce.unipr.it/?page_id=122
• GitHub: github.com/fmaglia/BoI
22/10/2018 IMP Lab - University of Parma 22