SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Liangliang Cao
http://www.llcao.net
UMass (now at Google AI*)
* The research in this talk are done before
joining Google/Facebook
Visual Search and Question Answering
Lu Jiang
http://www.lujiang.info/
Google AI
Yannis Kalantidis
http://www.skamalas.com/
Facebook AI*
ICME 2019 Tutorial
July 8th 13:30--17:00
I. Overview of Visual Search and Understanding (Liangliang).
II. Visual Representations and Indexing (Yannis)
III. MemexQA (Lu)
Outline
2
Section II:
Visual Representations and Indexing
3
Visual Search: We want to see more of the “same”
4
Color Similarity
*slide credit: Clayton Mellina, Huy Nguyen5
Compositional Similarity
*slide credit: Clayton Mellina, Huy Nguyen6
Identity Similarity
*slide credit: Clayton Mellina, Huy Nguyen7
Semantic Similarity
*slide credit: Clayton Mellina, Huy Nguyen8
Visual Search Applications
Similarity search:
● Given an image as query, show me visually similar images
● Useful tool for commercial photo search & licensing
● Visually congruent native ads
Clustering and deduplication:
● Cluster images of a large collection for browsing
● Personal photo album summarization
● Deduplicate or diversify image search results
Batch search and recommendations:
● Use all photos from a group to recommend photos to the group admin
● Use all photos favorited by a user to get recommendations
● Visual recommendations can be combined with social metadata
9
Basic Ingredients for large-scale search
Representation Learning
Documents/images/videos are represented as vectors
Quantization and Indexing
● Storing high dimensional features could be prohibitive
○ Hashing (bad performance, reconstruction not possible)
○ Quantization (better performance, allows approx. reconstruction)
● Searching in them can only be feasible if only a very small
percentage of the collection is checked → Indexing
10
Visual Representations
11
Some Recent Visual Representations
A (highly biased) set of recent CNN architectures that aim at:
● Reducing network parameters
○ Multi-Fiber Networks [ECCV 2018]
● Reducing memory for attention mechanisms
○ A2
-Nets: Double Attention Networks [NeurIPS 2018]
● Reasoning with global context
○ Global Reasoning Networks [CVPR 2019]
● Reducing spatial redundancy
○ Octave Convolutions [arXiv 2019]
12
Visual Representations
A (highly biased) set of recent CNN architectures that aim at:
● Reducing network parameters
○ Multi-Fiber Networks [ECCV 2018]
● Reducing memory for attention mechanisms
○ A2
-Nets: Double Attention Networks [NeurIPS 2018]
● Reasoning with global context
○ Global Reasoning Networks [CVPR 2019]
● Reducing spatial redundancy
○ Octave Convolutions [arXiv 2019]
13
The Multi-fiber Unit
Idea: slice the complex residual unit into N parallel and separated units (called
fibers), each of which is isolated from the others
14
The Multi-fiber Unit
● one fiber cannot access and
utilize the feature learned from
the others.
● Transistor component:
facilitates information flow
across these fibers
● number of the first-layer output
channels to be 4 times smaller
(cost would be reduced by a
factor of 2)
[Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 15
Results on Imagenet
[Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 16
Results on Imagenet
[Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 17
Visual Representations
A (highly biased) set of recent CNN architectures that aim at:
● Reducing network parameters
○ Multi-Fiber Networks [ECCV 2018]
● Reducing memory for attention mechanisms
○ A2
-Nets: Double Attention Networks [NeurIPS 2018]
● Reasoning with global context
○ Global Reasoning Networks [CVPR 2019]
● Reducing spatial redundancy
○ Octave Convolutions [arXiv 2019]
18
Reducing computations for attention mechanisms
Incorporating global context
● e.g. the attention mechanisms [Vaswani et al. 2017, Wang et al. 2018]
● Enables interactions between locations over the full coordinate space
● Requires computing and storing a (quadratic) matrix of all input location pairs
Convolutional Neural Networks model local relations
● Operate on the (spatio-temporal) coordinate space grid
● Require stacking multiple layers to capture relations
between distant locations
[Vaswani et al. Attention is all you need. NIPS 2017]
[Wang et al. Non-local Neural Networks. CVPR, 2018] 19
A2
-Nets: Double Attention Networks
Decomposed attention mechanism
Aggregate and propagate features from the entire
(spatio-temporal) input space efficiently
● First attention: Gather features from the entire
space into a compact set through second-order
attention pooling
● Second attention: Adaptively select and
distribute features to each location.
[Chen, Kalantidis, et al. A2
-Nets: Double Attention Networks. NeurIPS 2018] 20
Accuracy on Imagenet
A2
-Nets: Double Attention Networks
[Chen, Kalantidis, et al. A2
-Nets: Double Attention Networks. NeurIPS 2018] 21
Visual Representations
A (highly biased) set of recent CNN architectures that aim at:
● Reducing network parameters
○ Multi-Fiber Networks [ECCV 2018]
● Reducing memory for attention mechanisms
○ A2
-Nets: Double Attention Networks [NeurIPS 2018]
● Reasoning with global context
○ Global Reasoning Networks [CVPR 2019]
● Reducing spatial redundancy
○ Octave Convolutions [arXiv 2019]
22
Global context modeling is highly important
● Attention-like mechanisms becoming standard across ML
A limitation of current global context modeling approaches
● Follow the Gather → Distribute model
● Only focus on delivering information
● Rely on convolutional layers for reasoning
Can we capture and reason on global region
interactions efficiently?
23
Beyond the simple attention mechanism
Gather → Reason → Distribute
Can we construct a (latent) space, where relations over sets of features scattered
over the coordinate space, translate to simple feature interactions?
24
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Global Reasoning Networks
Coordinate Space Interaction Space
1) From Coordinate Space to Interaction Space
2) Reasoning in Interaction Space
3) From Interaction Space (back) to Coordinate Space
→ Weighted projections
→ Graph convolutions
→ Weighted broadcasting
25
Global Reasoning in Three Steps
Coordinate Space
Interaction Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Interaction Space
● We want to learn a set of projections for (arbitrary) region features
Projection
Coordinate Space
26
From Coordinate Space to Interaction Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
learnable projection weights
27
Given a set of input features , compute projection function
From Coordinate Space to Interaction Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
28
Given a set of input features , compute projection function
From Coordinate Space to Interaction Space
C
H
W
H
W
C
bi
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
29
Given a set of input features , compute projection function
From Coordinate Space to Interaction Space
H
N
W
N
C
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● After projection → N feature vectors
Projection
Coordinate Space
30
From Coordinate Space to Interaction Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● After projection → N feature vectors
● Relations between arbitrary regions → interactions between features
Projection
Coordinate Space
Interaction Space
31
From Coordinate Space to Interaction Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● What is an efficient way of reasoning over feature interactions?
How to model interactions?
● Treat each feature as a node in a fully-connected graph
● Learn the edge weights that correspond to interactions of features
● Graph convolution formulation by [Kipf & Welling]:
Reverse
Projection
N x N (learnt)
adjacency matrix
state update
32
Reasoning in Interaction Space
[Kipf & Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2017]
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Reverse projection: Distribute the updated states back
● Reuse projection weights
Reverse
Projection
Coordinate Space
Interaction Space
33
From Interaction Space to Coordinate Space
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Projection: Weighted global pooling
34
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Projection: Weighted global pooling
● Reasoning: Graph Convolution
35
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Projection: Weighted global pooling
● Reasoning: Graph Convolution
● Reverse projection: Weighted broadcasting
36
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Projection: Weighted global pooling
● Reasoning: Graph Convolution
● Reverse projection: Weighted broadcasting
37
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
● Projection: Weighted global pooling
● Reasoning: Graph Convolution
● Reverse projection: Weighted broadcasting
What do
the learnt projection
weights look like?
38
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Visualization of projection weights
What do the learnt projections
look like?
39
Global Reasoning (GloRe) Unit
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
The Global Reasoning (GloRe) unit
● Is highly efficient (smaller computational cost than a self-attention)
● Is a plug-and-play residual unit that can be inserted in CNNs for different tasks
Image Classification & Action Recognition backbone CNNs
● Insert one or more units units different positions
Semantic segmentation
● Insert before bottleneck
40
Global Reasoning Networks
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Figure from [Noa et al ICCV 2015]
41
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Ablations on Imagenet
How many blocks to add and where?
How many graph convolutions?
42
[Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
Experiments on ImageNet
Visual Representations
A (highly biased) set of recent CNN architectures that aim at:
● Reducing network parameters
○ Multi-Fiber Networks [ECCV 2018]
● Reducing memory for attention mechanisms
○ A2
-Nets: Double Attention Networks [NeurIPS 2018]
● Reasoning with global context
○ Global Reasoning Networks [CVPR 2019]
● Reducing spatial redundancy
○ Octave Convolutions [arXiv 2019]
43
[Huang et al. Multi-Scale Dense Networks for Resource Efficient Image Classification, ICLR 2018]
[Chen et al. Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition, ICLR 2019]
Reducing Spatial Redundancy
Many approaches exploit multi-scale inputs
• Recent Examples
• Multi-scale DenseNets [Huang et al.]: Multi-resolution paths over a DenseNet
• Big-Little Nets [Chen et al.]: Multi-resolution paths, synchronizing at every block
• Network architecture is altered
Spatial-redundancy in feature maps
• ConvNet kernels are highly local
• Some feature maps must contain low
frequency information (smooth and slowly
varying)
44
Octave Convolution
45
Octave Convolution
Advantages
•Multi-scale processing with effective
communication between the low- and
high-frequency maps
•Gains in terms of FLOPS
•Gains in terms of memory
•Larger receptive field for low-frequency
feature maps
The Octave Convolution
kernel
46
import OctConv as conv
Ablation study on ImageNet for varying models and
ratios 47
ImageNet Classification
48
Is the speedup real?
•On CPU (i.e. FB production): Reaching (almost) theoretical gains!
•On GPU: An optimized CUDA-level implementation is required
Results for
ResNet-50
49
Recent Visual Representations
Code online:
● Multi-Fiber Networks [ECCV 2018]
○ https://github.com/cypw/PyTorch-MFNet
● Global Reasoning Networks [CVPR 2019]
○ https://github.com/facebookresearch/GloRe (coming soon)
● Octave Convolutions [arXiv 2019]
○ https://github.com/facebookresearch/OctConv
50
Indexing
51
Basic Ingredients for large-scale search
Representation Learning
Documents/images/videos are represented as vectors
Quantization and Indexing
● Storing high dimensional features could be prohibitive
○ Hashing (bad performance, reconstruction not possible)
○ Quantization (better performance, allows approx. reconstruction)
● Searching in them can only be feasible if only a very small
percentage of the collection is checked → Indexing
52
Quantization: k-means
Pros:
● Very high compression
Cons:
● Hard to train for large k
● Performance is good only for large k
Idea: Create a “vocabulary” in high-dimensional space through clustering
Represent each vector with the index of its closest “word”
[McQueen 1967]53
Quantization: product quantization
Idea: Split the vector in multiple sub-vectors, create a vocabulary for each subvector
Represent each feature with the list of indices for its closest words
[Gray, ASSP 1984]
[Jegou, Douze & Schmid, PAMI 2011]54
Quantization: product quantization
Pros:
● Tunable compression & better reconstruction
● Easy & fast to train, a vocabulary of size k
gives you km
effective “cells” for m subvectors
Cons:
● Independence assumption (“fix”: PCA)
● Unbalanced partitioning (fix: OPQ)
[Gray, ASSP 1984]
[Jegou, Douze & Schmid, PAMI 2011]55
Optimized product quantization
[Ge et al, CVPR 2013, PAMI 2014]56
Locally Optimized Product Quantization
[Kalantidis & Avrithis, CVPR 2014]
Idea: Locally optimize residuals, balance variance across subspaces, use multi-index
57
Locally Optimized Product Quantization
[Kalantidis & Avrithis, CVPR 2014]58
Locally Optimized Product Quantization
● Balance variance across subspaces
● Local optimization using OPQ
● 20% improvement in precision
over state-of-the-art
● Overhead independent of database size
Stats for multi-LOPQ:
● 1 Billion 128-dimensional vectors
● ~22GB memory
● less than 55ms search time
[Kalantidis & Avrithis, CVPR 2014]
Idea: Locally optimize residuals, balance variance across subspaces, use multi-index
59
Indexing
21.1 3.33 21.2 20.1 2.21 11.1 11.2 0.21
id: 123984
.
.
.
.
5,4 id:123984...
1
5
6
...
7
2
4
21.1 3.33 21.2 20.1 11.1 11.2 0.21
11
231
661
id: 123984
.
.
.
.
11 id:123984... ...
60
Indexing: multi-index
Pros:
● 2-step quantization: in the second stage one can quantize residuals
● Finer partitioning / smaller residuals
● Need to search many cells/posting lists:
multi-sequence: fast algorithm for traversing neighboring cells
[Babenko & Lempitsky, CVPR 2012]
Idea: Use product quantization for indexing: Split into 2 sub-vectors
61
Multi-LOPQ: Searching in a multi-index
● split query vector
● sort PQ centroids by ascending
distance for each subvector
● start at the cell (Q1
[0], Q2
[0]), the
first clusters in each posting list
● for the current cell (Q1
[a], Q2
[b]),
insert both its bottom and right
neighbors into a priority queue
with priority:
dist(xL
, Q1
[a]) + dist(xR
, Q2
[b])
62
Locally Optimized Product Quantization
[Kalantidis & Avrithis, CVPR 2014]63
Project Name
Thank you!
Yannis Kalantidis
ykalant@image.ntua.gr
http://www.skamalas.com
64
Locally Optimized Product Quantization
https://github.com/yahoo/lopq
[Kalantidis & Avrithis, CVPR 2014]
[Kalantidis et al, ECCV-W 2016]65

Weitere ähnliche Inhalte

Was ist angesagt?

Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Sangmin Woo
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Universitat Politècnica de Catalunya
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...Universitat Politècnica de Catalunya
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...NAVER Engineering
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 

Was ist angesagt? (20)

Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
Visual Saliency Prediction with Deep Learning - Kevin McGuinness - UPC Barcel...
 
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
Content-based Image Retrieval - Eva Mohedano - UPC Barcelona 2018
 
DeepFix: a fully convolutional neural network for predicting human fixations...
DeepFix:  a fully convolutional neural network for predicting human fixations...DeepFix:  a fully convolutional neural network for predicting human fixations...
DeepFix: a fully convolutional neural network for predicting human fixations...
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
CNN vs SIFT-based Visual Localization - Laura Leal-Taixé - UPC Barcelona 2018
 
Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...Backbone can not be trained at once rolling back to pre trained network for p...
Backbone can not be trained at once rolling back to pre trained network for p...
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 

Ähnlich wie Visual Search and Question Answering II

On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...Sushant Gautam
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
On Execution Platforms for Large-Scale Aggregate Computing
On Execution Platforms for Large-Scale Aggregate ComputingOn Execution Platforms for Large-Scale Aggregate Computing
On Execution Platforms for Large-Scale Aggregate ComputingRoberto Casadei
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceParn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceNAVER Engineering
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...thanhdowork
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageIRJET Journal
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
 
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...Si Chen
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
 
Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Behzad Shomali
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJordan Open Source Association
 
NasAk.pptx
NasAk.pptxNasAk.pptx
NasAk.pptxHoneyViz
 

Ähnlich wie Visual Search and Question Answering II (20)

On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...On Integrating Information Visualization Techniques into Data Mining: A Revie...
On Integrating Information Visualization Techniques into Data Mining: A Revie...
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
On Execution Platforms for Large-Scale Aggregate Computing
On Execution Platforms for Large-Scale Aggregate ComputingOn Execution Platforms for Large-Scale Aggregate Computing
On Execution Platforms for Large-Scale Aggregate Computing
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
Deep 3D Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2018
 
The Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention ModelsThe Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention Models
 
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondenceParn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
Parn pyramidal+affine+regression+networks+for+dense+semantic+correspondence
 
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
Deep 3D Visual Analysis - Javier Ruiz-Hidalgo - UPC Barcelona 2017
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
 
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...
CrowdMap: Accurate Reconstruction of Indoor Floor Plan from Crowdsourced Sens...
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Visual Network Narrations
Visual Network NarrationsVisual Network Narrations
Visual Network Narrations
 
Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)Content Based Image Retrieval (CBIR)
Content Based Image Retrieval (CBIR)
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
NasAk.pptx
NasAk.pptxNasAk.pptx
NasAk.pptx
 
POSTER_BUSTOS
POSTER_BUSTOSPOSTER_BUSTOS
POSTER_BUSTOS
 

Mehr von Wanjin Yu

Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIWanjin Yu
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learningWanjin Yu
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportationWanjin Yu
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 

Mehr von Wanjin Yu (9)

Architecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks IIArchitecture Design for Deep Neural Networks II
Architecture Design for Deep Neural Networks II
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Computer vision for transportation
Computer vision for transportationComputer vision for transportation
Computer vision for transportation
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 

Kürzlich hochgeladen

AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.CarlotaBedoya1
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 

Kürzlich hochgeladen (20)

Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
INDIVIDUAL ASSIGNMENT #3 CBG, PRESENTATION.
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 

Visual Search and Question Answering II

  • 1. Liangliang Cao http://www.llcao.net UMass (now at Google AI*) * The research in this talk are done before joining Google/Facebook Visual Search and Question Answering Lu Jiang http://www.lujiang.info/ Google AI Yannis Kalantidis http://www.skamalas.com/ Facebook AI* ICME 2019 Tutorial July 8th 13:30--17:00
  • 2. I. Overview of Visual Search and Understanding (Liangliang). II. Visual Representations and Indexing (Yannis) III. MemexQA (Lu) Outline 2
  • 4. Visual Search: We want to see more of the “same” 4
  • 5. Color Similarity *slide credit: Clayton Mellina, Huy Nguyen5
  • 6. Compositional Similarity *slide credit: Clayton Mellina, Huy Nguyen6
  • 7. Identity Similarity *slide credit: Clayton Mellina, Huy Nguyen7
  • 8. Semantic Similarity *slide credit: Clayton Mellina, Huy Nguyen8
  • 9. Visual Search Applications Similarity search: ● Given an image as query, show me visually similar images ● Useful tool for commercial photo search & licensing ● Visually congruent native ads Clustering and deduplication: ● Cluster images of a large collection for browsing ● Personal photo album summarization ● Deduplicate or diversify image search results Batch search and recommendations: ● Use all photos from a group to recommend photos to the group admin ● Use all photos favorited by a user to get recommendations ● Visual recommendations can be combined with social metadata 9
  • 10. Basic Ingredients for large-scale search Representation Learning Documents/images/videos are represented as vectors Quantization and Indexing ● Storing high dimensional features could be prohibitive ○ Hashing (bad performance, reconstruction not possible) ○ Quantization (better performance, allows approx. reconstruction) ● Searching in them can only be feasible if only a very small percentage of the collection is checked → Indexing 10
  • 12. Some Recent Visual Representations A (highly biased) set of recent CNN architectures that aim at: ● Reducing network parameters ○ Multi-Fiber Networks [ECCV 2018] ● Reducing memory for attention mechanisms ○ A2 -Nets: Double Attention Networks [NeurIPS 2018] ● Reasoning with global context ○ Global Reasoning Networks [CVPR 2019] ● Reducing spatial redundancy ○ Octave Convolutions [arXiv 2019] 12
  • 13. Visual Representations A (highly biased) set of recent CNN architectures that aim at: ● Reducing network parameters ○ Multi-Fiber Networks [ECCV 2018] ● Reducing memory for attention mechanisms ○ A2 -Nets: Double Attention Networks [NeurIPS 2018] ● Reasoning with global context ○ Global Reasoning Networks [CVPR 2019] ● Reducing spatial redundancy ○ Octave Convolutions [arXiv 2019] 13
  • 14. The Multi-fiber Unit Idea: slice the complex residual unit into N parallel and separated units (called fibers), each of which is isolated from the others 14
  • 15. The Multi-fiber Unit ● one fiber cannot access and utilize the feature learned from the others. ● Transistor component: facilitates information flow across these fibers ● number of the first-layer output channels to be 4 times smaller (cost would be reduced by a factor of 2) [Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 15
  • 16. Results on Imagenet [Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 16
  • 17. Results on Imagenet [Chen, Kalantidis, et al. Multi-Fiber Networks. ECCV 2018] 17
  • 18. Visual Representations A (highly biased) set of recent CNN architectures that aim at: ● Reducing network parameters ○ Multi-Fiber Networks [ECCV 2018] ● Reducing memory for attention mechanisms ○ A2 -Nets: Double Attention Networks [NeurIPS 2018] ● Reasoning with global context ○ Global Reasoning Networks [CVPR 2019] ● Reducing spatial redundancy ○ Octave Convolutions [arXiv 2019] 18
  • 19. Reducing computations for attention mechanisms Incorporating global context ● e.g. the attention mechanisms [Vaswani et al. 2017, Wang et al. 2018] ● Enables interactions between locations over the full coordinate space ● Requires computing and storing a (quadratic) matrix of all input location pairs Convolutional Neural Networks model local relations ● Operate on the (spatio-temporal) coordinate space grid ● Require stacking multiple layers to capture relations between distant locations [Vaswani et al. Attention is all you need. NIPS 2017] [Wang et al. Non-local Neural Networks. CVPR, 2018] 19
  • 20. A2 -Nets: Double Attention Networks Decomposed attention mechanism Aggregate and propagate features from the entire (spatio-temporal) input space efficiently ● First attention: Gather features from the entire space into a compact set through second-order attention pooling ● Second attention: Adaptively select and distribute features to each location. [Chen, Kalantidis, et al. A2 -Nets: Double Attention Networks. NeurIPS 2018] 20
  • 21. Accuracy on Imagenet A2 -Nets: Double Attention Networks [Chen, Kalantidis, et al. A2 -Nets: Double Attention Networks. NeurIPS 2018] 21
  • 22. Visual Representations A (highly biased) set of recent CNN architectures that aim at: ● Reducing network parameters ○ Multi-Fiber Networks [ECCV 2018] ● Reducing memory for attention mechanisms ○ A2 -Nets: Double Attention Networks [NeurIPS 2018] ● Reasoning with global context ○ Global Reasoning Networks [CVPR 2019] ● Reducing spatial redundancy ○ Octave Convolutions [arXiv 2019] 22
  • 23. Global context modeling is highly important ● Attention-like mechanisms becoming standard across ML A limitation of current global context modeling approaches ● Follow the Gather → Distribute model ● Only focus on delivering information ● Rely on convolutional layers for reasoning Can we capture and reason on global region interactions efficiently? 23 Beyond the simple attention mechanism
  • 24. Gather → Reason → Distribute Can we construct a (latent) space, where relations over sets of features scattered over the coordinate space, translate to simple feature interactions? 24 [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019] Global Reasoning Networks Coordinate Space Interaction Space
  • 25. 1) From Coordinate Space to Interaction Space 2) Reasoning in Interaction Space 3) From Interaction Space (back) to Coordinate Space → Weighted projections → Graph convolutions → Weighted broadcasting 25 Global Reasoning in Three Steps Coordinate Space Interaction Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 26. Interaction Space ● We want to learn a set of projections for (arbitrary) region features Projection Coordinate Space 26 From Coordinate Space to Interaction Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 27. learnable projection weights 27 Given a set of input features , compute projection function From Coordinate Space to Interaction Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 28. 28 Given a set of input features , compute projection function From Coordinate Space to Interaction Space C H W H W C bi [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 29. 29 Given a set of input features , compute projection function From Coordinate Space to Interaction Space H N W N C [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 30. ● After projection → N feature vectors Projection Coordinate Space 30 From Coordinate Space to Interaction Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 31. ● After projection → N feature vectors ● Relations between arbitrary regions → interactions between features Projection Coordinate Space Interaction Space 31 From Coordinate Space to Interaction Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019] ● What is an efficient way of reasoning over feature interactions?
  • 32. How to model interactions? ● Treat each feature as a node in a fully-connected graph ● Learn the edge weights that correspond to interactions of features ● Graph convolution formulation by [Kipf & Welling]: Reverse Projection N x N (learnt) adjacency matrix state update 32 Reasoning in Interaction Space [Kipf & Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2017] [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 33. ● Reverse projection: Distribute the updated states back ● Reuse projection weights Reverse Projection Coordinate Space Interaction Space 33 From Interaction Space to Coordinate Space [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 34. ● Projection: Weighted global pooling 34 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 35. ● Projection: Weighted global pooling ● Reasoning: Graph Convolution 35 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 36. ● Projection: Weighted global pooling ● Reasoning: Graph Convolution ● Reverse projection: Weighted broadcasting 36 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 37. ● Projection: Weighted global pooling ● Reasoning: Graph Convolution ● Reverse projection: Weighted broadcasting 37 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 38. ● Projection: Weighted global pooling ● Reasoning: Graph Convolution ● Reverse projection: Weighted broadcasting What do the learnt projection weights look like? 38 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 39. Visualization of projection weights What do the learnt projections look like? 39 Global Reasoning (GloRe) Unit [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019]
  • 40. The Global Reasoning (GloRe) unit ● Is highly efficient (smaller computational cost than a self-attention) ● Is a plug-and-play residual unit that can be inserted in CNNs for different tasks Image Classification & Action Recognition backbone CNNs ● Insert one or more units units different positions Semantic segmentation ● Insert before bottleneck 40 Global Reasoning Networks [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019] Figure from [Noa et al ICCV 2015]
  • 41. 41 [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019] Ablations on Imagenet How many blocks to add and where? How many graph convolutions?
  • 42. 42 [Chen, Rohrbach, Yan, Shuicheng, Feng, Kalantidis. Graph-Based Global Reasoning Networks. CVPR 2019] Experiments on ImageNet
  • 43. Visual Representations A (highly biased) set of recent CNN architectures that aim at: ● Reducing network parameters ○ Multi-Fiber Networks [ECCV 2018] ● Reducing memory for attention mechanisms ○ A2 -Nets: Double Attention Networks [NeurIPS 2018] ● Reasoning with global context ○ Global Reasoning Networks [CVPR 2019] ● Reducing spatial redundancy ○ Octave Convolutions [arXiv 2019] 43
  • 44. [Huang et al. Multi-Scale Dense Networks for Resource Efficient Image Classification, ICLR 2018] [Chen et al. Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition, ICLR 2019] Reducing Spatial Redundancy Many approaches exploit multi-scale inputs • Recent Examples • Multi-scale DenseNets [Huang et al.]: Multi-resolution paths over a DenseNet • Big-Little Nets [Chen et al.]: Multi-resolution paths, synchronizing at every block • Network architecture is altered Spatial-redundancy in feature maps • ConvNet kernels are highly local • Some feature maps must contain low frequency information (smooth and slowly varying) 44
  • 46. Octave Convolution Advantages •Multi-scale processing with effective communication between the low- and high-frequency maps •Gains in terms of FLOPS •Gains in terms of memory •Larger receptive field for low-frequency feature maps The Octave Convolution kernel 46
  • 47. import OctConv as conv Ablation study on ImageNet for varying models and ratios 47
  • 49. Is the speedup real? •On CPU (i.e. FB production): Reaching (almost) theoretical gains! •On GPU: An optimized CUDA-level implementation is required Results for ResNet-50 49
  • 50. Recent Visual Representations Code online: ● Multi-Fiber Networks [ECCV 2018] ○ https://github.com/cypw/PyTorch-MFNet ● Global Reasoning Networks [CVPR 2019] ○ https://github.com/facebookresearch/GloRe (coming soon) ● Octave Convolutions [arXiv 2019] ○ https://github.com/facebookresearch/OctConv 50
  • 52. Basic Ingredients for large-scale search Representation Learning Documents/images/videos are represented as vectors Quantization and Indexing ● Storing high dimensional features could be prohibitive ○ Hashing (bad performance, reconstruction not possible) ○ Quantization (better performance, allows approx. reconstruction) ● Searching in them can only be feasible if only a very small percentage of the collection is checked → Indexing 52
  • 53. Quantization: k-means Pros: ● Very high compression Cons: ● Hard to train for large k ● Performance is good only for large k Idea: Create a “vocabulary” in high-dimensional space through clustering Represent each vector with the index of its closest “word” [McQueen 1967]53
  • 54. Quantization: product quantization Idea: Split the vector in multiple sub-vectors, create a vocabulary for each subvector Represent each feature with the list of indices for its closest words [Gray, ASSP 1984] [Jegou, Douze & Schmid, PAMI 2011]54
  • 55. Quantization: product quantization Pros: ● Tunable compression & better reconstruction ● Easy & fast to train, a vocabulary of size k gives you km effective “cells” for m subvectors Cons: ● Independence assumption (“fix”: PCA) ● Unbalanced partitioning (fix: OPQ) [Gray, ASSP 1984] [Jegou, Douze & Schmid, PAMI 2011]55
  • 56. Optimized product quantization [Ge et al, CVPR 2013, PAMI 2014]56
  • 57. Locally Optimized Product Quantization [Kalantidis & Avrithis, CVPR 2014] Idea: Locally optimize residuals, balance variance across subspaces, use multi-index 57
  • 58. Locally Optimized Product Quantization [Kalantidis & Avrithis, CVPR 2014]58
  • 59. Locally Optimized Product Quantization ● Balance variance across subspaces ● Local optimization using OPQ ● 20% improvement in precision over state-of-the-art ● Overhead independent of database size Stats for multi-LOPQ: ● 1 Billion 128-dimensional vectors ● ~22GB memory ● less than 55ms search time [Kalantidis & Avrithis, CVPR 2014] Idea: Locally optimize residuals, balance variance across subspaces, use multi-index 59
  • 60. Indexing 21.1 3.33 21.2 20.1 2.21 11.1 11.2 0.21 id: 123984 . . . . 5,4 id:123984... 1 5 6 ... 7 2 4 21.1 3.33 21.2 20.1 11.1 11.2 0.21 11 231 661 id: 123984 . . . . 11 id:123984... ... 60
  • 61. Indexing: multi-index Pros: ● 2-step quantization: in the second stage one can quantize residuals ● Finer partitioning / smaller residuals ● Need to search many cells/posting lists: multi-sequence: fast algorithm for traversing neighboring cells [Babenko & Lempitsky, CVPR 2012] Idea: Use product quantization for indexing: Split into 2 sub-vectors 61
  • 62. Multi-LOPQ: Searching in a multi-index ● split query vector ● sort PQ centroids by ascending distance for each subvector ● start at the cell (Q1 [0], Q2 [0]), the first clusters in each posting list ● for the current cell (Q1 [a], Q2 [b]), insert both its bottom and right neighbors into a priority queue with priority: dist(xL , Q1 [a]) + dist(xR , Q2 [b]) 62
  • 63. Locally Optimized Product Quantization [Kalantidis & Avrithis, CVPR 2014]63
  • 64. Project Name Thank you! Yannis Kalantidis ykalant@image.ntua.gr http://www.skamalas.com 64
  • 65. Locally Optimized Product Quantization https://github.com/yahoo/lopq [Kalantidis & Avrithis, CVPR 2014] [Kalantidis et al, ECCV-W 2016]65