The landmark recognition problem is far from being solved, but with the use of features extracted from intermediate layers of Convolutional Neural Networks (CNNs), excellent results have been obtained. In this work, we propose some improvements on the creation of R-MAC descriptors in order to make the newly-proposed R-MAC+ descriptors more representative than the previous ones. However, the main contribution of this paper is a novel retrieval technique, that exploits the fine representativeness of the MAC descriptors of the database images. Using this descriptors called "db regions" during the retrieval stage, the performance is greatly improved. The proposed method is tested on different public datasets: Oxford5k, Paris6k and Holidays. It outperforms the state-of-the- art results on Holidays and reached excellent results on Oxford5k and Paris6k, overcame only by approaches based on fine-tuning strategies.
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
An accurate retrieval through R-MAC+ descriptors for landmark recognition
1. An accurate retrieval through R-MAC+
descriptors for landmark recognition
Federico Magliani, Andrea Prati
ICDSC 2018 – Eindhoven, Netherlands – 3-4 September 2018
2. Agenda
2
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
3. Motivations
3
Landmark Recognition problem
➢ Try to understand what’s is in front
of you and retrieve similar images.
➢ Semantic gap: for a human, this task
is pretty simple thanks to personal
experience, but a computer can use
only the info available in the images.
➢ It is far from being solved
(viewpoint, illumination conditions,
image resolution, ...).
4. Motivations
4
➢ Challenges
○ High accuracy retrieval (precision)
○ Fast research (response to query)
○ Reduced memory occupied (mobile friendly)
○ Work well with big data (>1M data)
➢ Possible applications
○ Augmented reality (tourism)
○ Person Re-ID (video-surveillance)
○ Online clothes search (fashion)
5. Agenda
5
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions and Future Works
6. Summary of contributions
6
➢ a new region detector for CNN feature maps implemented through grids, that respect
the aspect ratio of the images.
➢ an improvement on the effectiveness of the multi-resolution approach for R-MAC
descriptors.
➢ a novel retrieval method for checking the similarities between query descriptors and
regions of database R-MAC descriptors. It allows to outperform the results of R-MAC
descriptors on Oxford5k and Paris6k by +7% and +3%.
7. Agenda
7
➢ Motivations
➢ Summary of contribution
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
8. Related works
8
➢ Bag of Words (BoW): first method for solving the problem (different
techniques: vocabulary tree, …).
➢ VLAD: similar to BoW, but using the residual of the descriptors
(=feature descriptor - closest centers in the vocabulary).
➢ CNN based: extract features from intermediate layers of CNN
architectures and then apply previous embedding techniques (BLCF, ...).
➢ MAC: max pooling applied on CNN features
➢ R-MAC: regional MAC descriptors created through the application of a
rigid-grid mechanism
9. Agenda
9
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
10. R-MAC (Regional MAC) descriptors
10
Considering a rectangular region R ⊆ Ω = (1,W) x (1,H), and define the regional feature vector:
fR
= (fR,1
...fR,i
...fR,K
)T
where fR,i
= max Xi
(p) is the maximum activation of the ith
channel on the considered
region.
Then we calculate the feature vector associated with each region, and post-process it with
l2
-normalization, PCA-whitening and l2
-normalization. We combine the collection of regional feature
vectors into a single image vector by summing them and l2
-normalizing in the end.
We define the response maps and sample square regions at
L different scales
➢ at the largest scale (l=1), the region size is determined
to be as large as possible (height = width = min(W,H))
➢ at every other scale l, we uniformly sample l x (l+m-1)
regions of width 2min(W,H)/(l+1). (with m=2)
11. R-MAC (Regional MAC) descriptors
11
Settings:
➢ Fully convolutional off-the-shelf VGG16
➢ Pool5
➢ Spatial Max pooling
➢ High Resolution images
➢ Global descriptor based on aggregating region vectors
➢ Sliding window approach
Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015.
12. Agenda
12
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
13. Proposed approach: R-MAC+
New multi-resolution approach: the images are resized of +25%,-25%, 0% on the largest
size, respecting the aspect ratio of the image.
➢ This strategy is an alternative of the first multi-resolution approach, that resized the
image to a fixed size: 550px, 800px and 1050 on the largest size, retaining the aspect
ratio of the image.
➢ This strategy should allow to augment the dimensions of the feature maps in order to
have more features and therefore local maxima than the previous multi-resolution
R-MAC. This approach is connected to the new region detector, that detects a
reduced number of regions (15) instead of the 20 of the original one.
13
14. Proposed approach: R-MAC+
14
A new mechanism for region detection in the CNN feature maps (15 regions)
● l=0 → 1 region covering entirely the image;
● l=1 → 2 square regions (widthRegion = heightRegion = min(H,W));
● l=2 → 6 rect regions (widthRegion = heightRegion =⌈2*min(W,H)/(l+1))⌉, arranged along the
horizontal axis (width and height of the regions are adapted to cover all the image);
● l=3 → 6 rect regions (widthRegion = heightRegion= ⌈2*min(W,H)/(l+2))⌉, arranged along the
vertical axis (width and height of the regions are adapted to cover all the image).
15. Proposed approach: R-MAC+
15
A new retrieval method based on db regions (MAC descriptors of the database images) and the
R-MAC descriptors of the query images (+7% on Oxford5k and +4% on Paris6k than previous results)
16. Agenda
16
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
17. Datasets and evaluation metric
Datasets:
➢ Holidays (1491 images: 500 classes, 500 queries).
➢ Oxford5k (5063 images, 11 classes, 55 queries).
➢ Paris6k (6412 images, 11 classes, 55 queries).
Evaluation metric:
➢ mAP (mean Average Precision) → mean of Average Precision scores (correct results)
for each query, based on the position in the ranking.
17
19. Results after QE application
19
Method Network Holidays
(original/rotated)
Oxf5k Paris6k
M-R R-MAC+ ResNet50 94.97 % / 95.97 % 86.45 % 92.01 %
M-R R-MAC+ with retrieval
based on db regions
ResNet50 94.42 % / 96.05 % 87.92 % 93.64 %
M-R R-MAC+ with retrieval
based on db regions and query
expansion based on db regions
ResNet50 94.28 % / 95.91 % 88.78 % 92.30 %
21. Agenda
21
➢ Motivations
➢ Summary of contributions
➢ Related works
➢ Introduction to R-MAC descriptors
➢ Proposed approach (R-MAC+)
➢ Experimental results
➢ Conclusions
22. Conclusions
➢ We propose different improvements on R-MAC descriptors in order to make the
retrieval very accurate.
○ A multi-resolution approach, that uses bigger feature maps than the previous one.
○ A new region detector with the use of adaptable grids allows to catch more local
maxima.
○ A novel retrieval method based on db regions that highly boosts the performance on
Oxford5k and Paris6k.
➢ The proposed method outperforms the state of the art on Holidays, both on the
original and rotated version. Also it outperforms the state-of-the-art results on
some other public benchmarks without the fine-tuning application.
22
23. Thank you for your attention!
questions?
http://implab.ce.unipr.it
23