SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Learning From Multiple Views of
Data
PhD Defense talk of
Abhishek Sharma
Collaborators
David W. Jacobs, Larry S. Davis, Hal Daume III, Oncel Tuzel, Ming Yu-Liu,
Abhishek Kumar, Jonghyun Choi, Murad Al Haj, Sanja Fidler and Angjoo
Kanazawa
Overview
1. Introduction
PART - I
1. Content Extraction
1. Semantic Segmentation as visual feature
2. Contextual information
3. Neural Network model
PART - II
1. Cross-modal content matching
1. Challenges
2. PLS based common representation
3. Generalized Multi-view Analysis
2. Future Directions
Match image and sentence
Image courtesy – UIUC sentence-Image dataset: http://vision.cs.uiuc.edu/pascal-sentences/
Text
viewTwo parked jet airplanes facing opposite directions
Image
view
Canonical/
Common
view
Find the image based on a sentence
Two parked jet airplanes facing opposite directions
Find the image based on a sentence
Two parked jet airplanes facing opposite directions
Find the image based on a sentence
Two parked jet airplanes facing opposite directions
Find the image based on a sentence
Two parked jet airplanes facing opposite directions
A simple computer-based matching of sentence and image
1. Task understanding
2. Content from text and image
1. jet airplanes
2. Two
3. Parked
4. facing opposite direction
3. Content Matching
Cross-view content matching challenges
Text – “Two parked jet airplanes facing opposite directions on a grassy land”
Bag-of-Word
SIFT BoW
1
jetdirection facing
111 …Index 2 3 4 10000
Dimension
Mismatch
Semantic
Mismatch
Insufficient
Content
Deep ?
Cross-view content matching challenge
Lack of correspondence
Same Region
Missing Region
=
Column-wise Vectorization
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Deep ?
Other useful problems
Task – Face recognition
… Face DB
Content Extraction
Pixel, Attribute, SIFT, LBP,
HOG, Gabor
Content Matching
CCA, PLS, Metric Learning,
SVMs
Other useful problems
Task – Forensic sketch photo matching
Suspect
Image
Database
Forensic
Sketch Query
Image courtesy – Lios Gibson, “Forensic Art Essentials: A Manual for Law Enforcement Artists”
Content Extraction
SIFT, HOG, Gabor
Content Matching
Local LDA, PLS, CCA
This Dissertation
We are interested in extracting and matching task-dependent content
across multiple modalities
Task
Content
Matching
Content
Extraction
Pose-invariant face recognition
Pose-lighting invariant face recognition
Text-image matching
Forensic Image-photo matching
Semantic Segmentation
Partial Least Square
Pose-error robust matching
Generalized Multi-view Analysis
Part - I
Semantic Segmentation
Semantic Segmentation: Task
Input Image Segmentation Mask
Image courtesy – http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/baseFinal.html
Label each
pixel
Semantic Segmentation: Overview
1. Scene understanding, robotics, medical image analysis etc.
2. Related work
3. Problem formulation
4. Role of context
5. Intuitive picture
6. Mathematical picture
7. Complete Pipeline
8. Back-propagation and issues
9. Pure-node RCPN
10. Experiments
Related Work
1. Multi-scale CNN (Farabet, Pineheiro)
2. Deep CNN (DeepSeg)
3. Non-parametric template matching (Tighe_1, Tighe_2, Eigen, Yang)
4. CRF models (Gould, Munoz, Lempitzky, Kumar, Mottaghi, Yuille)
Semantic Segmentation: Problem formulation
Label each super-pixel
Super-
segmentation
Road
Car
Ground
Image courtesy – http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/baseFinal.html
Input image Super-segment overlaid image
Semantic Segmentation: Context
• Labeling super-pixel in isolation is difficult
• Without context machines outperform humans: 77.4% vs 72.2%
(Mottaghi et al.)
Building
Train
Aeroplane
Image courtesy – Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
Semantic Segmentation: Context importance
Image courtesy – Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
Semantic Segmentation: Context
• Labeling super-pixel in isolation is difficult
• Without context machines outperform humans: 77.4% vs 72.2%
(Mottaghi et al.)
• Use context
• MRFs and CRFs
• Typically MRFs and CRFs use human designed potential functions and features
• Complex human visual system – LEARN IT FROM DATA
Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
Recursive Context Propagation Network or RCPN
1. Label each super-pixel using entire image
2. Fast feed-forward computations for real-time labeling
3. End-to-end learning
4. Modular to the segmentation pipeline
Tree
Building
Sky
Water
Boat
Semantic Segmentation - Pipeline
• 𝐹𝐶𝑁𝑁 = Multi-scale CNN at scales – 1, 2 and 4
• 8×8×16 → 2×2 maxpool → 7×7×64 → 2×2
maxpool → 7×7×256
• 256×3 = 768 dimensional pixel feature
• Field of View (FOV) for every pixel = 47×47,
94×94 and 188×188 at different scales
• Super-pixels by LiuSeg
• ~ 100 super-pixels per image
• 𝑣𝑖 = average pixel features in each super-pixel
• Data augmentation by 5 random average sets
1. Super-pixel feature
Semantic Segmentation - Pipeline
1. Super-pixel feature
2. Context via Recursive Context Propagation Network
RCPN forward computation
1v
2v
1x
2x
6x
9x
Sub-tree
Semantic mapper 𝐹𝑠𝑒𝑚: ℜ 𝑑 𝑣 → ℜ 𝑑 𝑠
semantic vector 𝒙𝑖 = 𝐹𝑠𝑒𝑚( 𝒗𝑖 )
Combiner 𝐹𝑐𝑜𝑚: ℜ2𝑑 𝑠 → ℜ 𝑑 𝑠
Parent feature 𝑥𝑖𝑗 = 𝐹𝑐𝑜𝑚
𝑥𝑖
𝑥𝑗
6
~x
1
~x 1y
Decombiner 𝐹𝑑𝑒𝑐: ℜ2𝑑 𝑠 → ℜ 𝑑 𝑠
Enhanced feature 𝒙𝑖 = 𝐹𝑑𝑒𝑐
𝒙𝑖
𝒙𝑖𝑗
Labeler 𝐹𝑙𝑎𝑏: ℜ 𝑑 𝑠 → ℜ 𝑐
Label 𝒚𝑖 = 𝐹𝑙𝑎𝑏([ 𝒙𝑖])
Cartoon example
5 Super-pixel
RCPN characteristics
• N super-pixels = 2N – 1 nodes
• Leaf-nodes = super-pixels
• Internal nodes = merged-regions
• Pure merged-regions
• Pure nodes = Pure merged-regions
• Every super-pixel affects every super-pixel
Semantic Segmentation: Learning
• 𝐹𝐶𝑁𝑁 trained using CAFFE on
Nvidia GTX 780
• Stochastic gradient descent
• Learning rate = 0.1
• Momentum = 0.9
• Batch-size = 12 images
• Data augmentation -
Horizontal flip
• 2000 iterations in 7 hours
1. Multi-scale CNN
Semantic Segmentation: Learning
𝐹𝑅𝐶𝑃𝑁 = {𝐹𝑠𝑒𝑚; 𝐹𝑐𝑜𝑚; 𝐹𝑑𝑒𝑐; 𝐹𝑙𝑎𝑏} was trained using L-BFGS. Typically, 800-1000
iterations were required for complete training.
1. Mutli-scale CNN
2. RCPN
RCPN Back-propagation
1v
2v
1x
2x
6x 6
~x
1y1
~x
9x
cat
e1
dec
e6
com
e9
Sub-tree
com
e6
sem
e1
sem
e2
1l
Diminishing
Gradient with
Depth
Error flows
everywhere
Cartoon example
5 Super-pixel
RCPN Back-propagation and Bypass Error
1v
2v
1x
2x
6x 6
~x
1y1
~x
9x
cat
e1
dec
e6
com
e9
Sub-tree
com
e6
sem
e1
sem
e2
1l
RCPN Back-propagation and Bypass Error
1v 1x 1y1
~x
cat
e1
2v 2x
6x 6
~x
9x
dec
e6
com
e9
Sub-tree
com
e6
sem
e1
sem
e2
1l
Combiner is
bypassed
Context
Lost
Poor Local
Minimum
Sem Grad
Com Grad
Dec Grad
Lab Grad
Empirical
𝒈 𝑐𝑜𝑚 ≪ 𝒈 𝑠𝑒𝑚 ≈ 𝒈 𝑑𝑒𝑐 ≪ 𝒈𝑙𝑎𝑏
Ideal
𝒈 𝑠𝑒𝑚 < 𝒈 𝑐𝑜𝑚 < 𝒈 𝑑𝑒𝑐 < 𝒈𝑙𝑎𝑏
Pure-node RCPN or PN-RCPN
•RCPN + pure-nodes classification loss
•Benefits
•Roughly 65% more training data
•Meaningful combination by combiner
•Deeper and stronger gradients
PN-RCPN Back-propagation
1v
2v
1x
2x
6x 6
~x
1y1
~x
9x
cat
e1
dec
e6
com
e9
Sub-tree
com
e6
sem
e1
sem
e2
1l
Deep Strong
Gradient
6y 6lcat
e6
Grad Strength: RCPN vs. PN-RCPN
Sem Grad
Com Grad
Dec Grad
Lab Grad
Sem Grad
Com Grad
Dec Grad
Lab Grad
𝒈 𝑐𝑜𝑚 ≪ 𝒈 𝑠𝑒𝑚 ≈ 𝒈 𝑑𝑒𝑐 ≪ 𝒈𝑙𝑎𝑏 𝒈 𝑠𝑒𝑚 < 𝒈 𝑐𝑜𝑚 ≈ 𝒈 𝑑𝑒𝑐 < 𝒈𝑙𝑎𝑏
Experiments: Datasets
We conduct semantic segmentation experiments on three datasets
Stanford Background
Color images with 8 semantic classes
Train/Test – 572/143 images
SIFT Flow
Color images with 33 semantic classes
Train/Test – 2488/200
Daimler Urban Dataset
Gray-scale images with 6 semantic classes
Train/Test – 500/200
Experiments: Details
• Per pixel 0.5 subtraction
• 100 Super-pixels/image for Stanford and SIFT Flow
• 800 for Daimler due to large size
• 10 random parse trees with 5 random feature set for training to avoid
over-fitting
• 20 random parse trees with max-voting for testing
Experiments: Performance metric
1. Per-pixel accuracy (PPA)
2. Mean-class accuracy (MCA)
3. Intersection over Union (IoU) – Penalize under- & over-segmentation
4. Dynamic IoU (Dyn IoU) – IoU for dynamic objects
5. Time Per Image (TPI) – Both CPU and GPU
Stanford Results
Method PPA MCA IoU TPI (CPU/GPU)
Gould 76.4 NA NA 30 – 600 / NA
Munoz 76.9 NA NA 12 / NA
Tighe_1 77.5 NA NA 4 / NA
Kumar 79.4 NA NA < 600 / NA
Socher 78.1 NA NA NA / NA
Lempitzky 81.9 72.4 NA > 60 /NA
Singh 74.1 62.2 NA 20 / NA
Farabet 81.4 76.0 NA 60.5 / NA
Eigen 75.3 66.5 NA 16.6 / NA
Pinheiro 80.2 69.6 NA 10 / NA
Plain-NN 80.1 69.7 56.4 1.1 / 0.4
RCPN 81.8 73.9 61.3 1.1 / 0.4
PN-RCPN 82.1 79.0 64.0 1.1 / 0.4
TM-RCPN 82.3 79.1 64.5 1.6-6.1 / 0.9-5.9
SIFT Flow results
Method PPA MCA IoU TPI (CPU/GPU)
Tighe 77.0 30.1 NA 8.4 / NA
Liu 76.7 NA NA 31 / NA
Siingh 79.2 33.8 NA 20 / NA
Eigen 77.1 32.5 NA 16.6 / NA
Farabet 78.5 29.6 NA NA / NA
Bal. Farabet 72.3 50.8 NA NA / NA
Tighe, 24 78.6 39.2 NA 8.4 / NA
Pinheiro 77.7 29.8 NA NA / NA
Yang 79.8 48.7 NA < 12 / NA
Plain-NN 76.3 32.1 24.7 1.1 / 0.36
RCPN 79.6 33.6 26.9 1.1 / 0.4
Bal. RCPN 75.5 48.0 28.6 1.1 / 0.4
PN-RCPN 80.9 39.1 30.8 1.1 / 0.4
Bal. PN-RCPN 75.5 52.8 30.2 1.1 / 0.4
TM-RCPN 80.8 38.4 30.7 1.6-6.1 / 0.9-5.4
Bal. TM-RCPN 76.4 52.6 31.4 1.6-6.1 / 0.9-5.4
DeepSeg 85.2 51.7 39.1 NA / 0.2
Daimler Urban results
Method PPA MCA IoU IoU Dyn TPI (CPU/GPU)
Joint 94.5 91.0 86.0 74.5 111 / NA
Stixmantic 92.8 87.5 80.6 72.3 0.05 / NA
Bal. Plain-NN 91.4 83.2 75.8 56.2 5.9 / 2.8
Bal. RCPN 93.3 87.6 80.9 66.0 6.0 / 2.8
Bal. PN-RCPN 94.5 90.2 84.5 73.8 6.0 / 2.8
Bal. TM-RCPN 94.5 90.1 84.5 73.8 12 / 8.8
Some visual results
Part - II
Cross-Modal Content Matching
Common space representation
View 1
View 2
View 4
View 3
View v
Common Content
Noise
View-specific content
Feature vector
Common space
Cross-view Content Matching: A simple picture
RELAXEDIDEAL
Shape – Classes
Solid/Hollow shapes - Views
Dashed shape – Unseen classes
PAIRED DATA
VIEW 1
VIEW 2
PLS based multi-modal face recognition
PLS Bridge
Common
Subspace
Pose
Resolution
Sketch
WX WY
Shape = Identity
X Y
PLS based pose-invariant face recognition
0.75
0.8
0.85
0.9
0.95
1
1.05
PGFR TFA LLR ELF
Partial Comparison –Differenttesting
scenario
Others Proposed
• CMU PIE face date set for experiments.
• 34 training and 34 testing, intensity features
Continued …
PLS based sketch-face recognition
Metho
d
Gal. Size Type Accuracy
Wang 100 Holistic 81
Liu 300 Patch 87.67
Klare 300 Pixel 99.47
PLS 100 Holistic 93.6
CCA 100 Holistic 94.6
Bilinear 100 Holistic 94.2
Cross-view Content Matching: A more complete picture
Multiple samples per class
Shape – Classes
Solid/Hollow shapes - Views
Color – Same-class samples
Dashed shape – Unseen classes
PAIRED DATA
VIEW 1
VIEW 2
RELAXEDIDEAL
What CCA/PLS/BLM can do ?
VIEW 1
VIEW 2
CCA/PLS/BLM
?
Match paired samples
Desired
CCA/PLS/BLM A better Picture
Generalized Multi-view Analysis or GMA

















































nn
S
nnnn
n
nD
w
w
w
S
S
w
w
w
DCC
CDC
CC










2
1
22
1
21
2212
112
00
00
0011

GMA cont..
Nice closed-form eigen-value problem
GMA cont..
• Multi-view extension of any generalized eigen-value
feature extraction
• GMA + LDA = GMLDA
D = Between class scatter matrix; S = Between class scatter
matrix
• GMA + MFA = GMMFA
D = Penalty Graph; S = Intrinsic Graph
• GMA + LPP = GMLPP
D = Identity; S = Graph Laplacian of Similarity matrix
Pros and Cons
Cross-view classification and retrieval
Kernelizable
Closed form optimal solution
Supervised
 Generalize to unseen classes
 Domain agnostic
Pros and Cons
 Still not ideal
 Non-probabilistic
 Shallow
 Similar views across test and train
VIEW 2
GMAVIEW 1 CCA/PLS/BLM
SVM-2K/HMFDA IDEAL
DIFFERENT LATENT SPACESORIGINAL SPACE
PAIRED DATA
Final Picture
Experiments
Pose and Lighting Invariant face
recognition
• 129 train subjects in 5 illums
• 129 test subjects (same identity
diff session) in 18 illums
• 120 subjects in 5 illum
• 129 test subjects (diff identity
diff session) in 18 illum
Text-Image Retrieval
• Wiki pages (2173 + 693)
• 10 Different classes
• Latent Dirichlet Allocation Model based text features
• SIFT histogram based image features
• Precision-Recall based Mean Average Precision score
• SM – Sematic matching (domain dependent approach)
• SCM – Semantic matching in CCA latent space (two stage
domain dependent approach)
Future Directions
• Deep learning based feature extraction
• Large-scale Data collection
• Deep Multi-view algorithms Vs. Common Deep Network
• Unsupervised training
Thank you
Questions ????
Reference
Tighe_1: J. Tighe and S. Lazebnik. Superparsing. Int. J. Comput. Vision, 101(2):329–349, 2013
Tighe_2: J. Tighe and S. Lazebnik. Finding things: Image parsing with regions and per-exemplar detectors. IEEE CVPR, 2013
Gould: S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. IEEE ICCV, 2009
Munoz: D. Munoz, J. A. Bagnell, and M. Hebert. Stacked hierarchical labeling. ECCV, 2010
Kumar: M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. IEEE CVPR, 2010
Lempitsky: V. Lempitsky, A. Vedaldi, and A. Zisserman. A pylon model for semantic segmentation. NIPS, 2011
Farabet: C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE TPAMI, August 2013
Eigen: R. Fergus and D. Eigen. Nonparametric image parsing using adaptive neighbor sets. IEEE CVPR, 2012
Joint: L. Ladick, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. Clocksin, and P. Torr. Joint optimization for object class
segmentation and dense stereo reconstruction. International Journal of Computer Vision, 100(2):122–133, 2012
Liu: C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. IEEE TPAMI, 33(12), Dec 2011
LiuSeg: M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy rate superpixel segmentation. IEEE CVPR, 2011
Pinheiro: P. H. O. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene parsing. ICML, 2014
Stixmantics: T. Scharwachter, M. Enzweiler, U. Franke, and S. Roth. Stix- ¨ mantics: A medium-level model for real-time semantic scene
understanding. ECCV, 2014
Yang: J. Yang, B. Price, S. Cohen, and M.-H. Yang. Context driven scene parsing with attention to rare classes. CVPR, pages 3294–3301,
2014

Weitere ähnliche Inhalte

Was ist angesagt?

A Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather ForecastingA Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather Forecastingijctcm
 
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...ijdms
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...IJERA Editor
 
Automatic time series forecasting using nonlinear autoregressive neural netwo...
Automatic time series forecasting using nonlinear autoregressive neural netwo...Automatic time series forecasting using nonlinear autoregressive neural netwo...
Automatic time series forecasting using nonlinear autoregressive neural netwo...journalBEEI
 
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
Simulation of Single and Multilayer of Artificial Neural Network using VerilogSimulation of Single and Multilayer of Artificial Neural Network using Verilog
Simulation of Single and Multilayer of Artificial Neural Network using Verilogijsrd.com
 
Improving of artifical neural networks performance by using gpu's a survey
Improving of artifical neural networks performance by using gpu's  a surveyImproving of artifical neural networks performance by using gpu's  a survey
Improving of artifical neural networks performance by using gpu's a surveycsandit
 
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEY
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEYIMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEY
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEYcsandit
 
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...adeij1
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催Preferred Networks
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, SwitzerlandEuroIoTa
 
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...WithTheBest
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Improved steganographic security by
Improved steganographic security byImproved steganographic security by
Improved steganographic security byIJNSA Journal
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 

Was ist angesagt? (19)

A Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather ForecastingA Time Series ANN Approach for Weather Forecasting
A Time Series ANN Approach for Weather Forecasting
 
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
A SURVEY OF SPIKING NEURAL NETWORKS AND SUPPORT VECTOR MACHINE PERFORMANCE BY...
 
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
Optimization of Number of Neurons in the Hidden Layer in Feed Forward Neural ...
 
Automatic time series forecasting using nonlinear autoregressive neural netwo...
Automatic time series forecasting using nonlinear autoregressive neural netwo...Automatic time series forecasting using nonlinear autoregressive neural netwo...
Automatic time series forecasting using nonlinear autoregressive neural netwo...
 
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
Simulation of Single and Multilayer of Artificial Neural Network using VerilogSimulation of Single and Multilayer of Artificial Neural Network using Verilog
Simulation of Single and Multilayer of Artificial Neural Network using Verilog
 
J04401066071
J04401066071J04401066071
J04401066071
 
Improving of artifical neural networks performance by using gpu's a survey
Improving of artifical neural networks performance by using gpu's  a surveyImproving of artifical neural networks performance by using gpu's  a survey
Improving of artifical neural networks performance by using gpu's a survey
 
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEY
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEYIMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEY
IMPROVING OF ARTIFICIAL NEURAL NETWORKS PERFORMANCE BY USING GPU’S: A SURVEY
 
G013124354
G013124354G013124354
G013124354
 
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
 
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo,  EPFL, SwitzerlandHybrid neural networks for time series learning by Tian Guo,  EPFL, Switzerland
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
 
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
AI IEEE
AI IEEEAI IEEE
AI IEEE
 
Improved steganographic security by
Improved steganographic security byImproved steganographic security by
Improved steganographic security by
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 

Ähnlich wie DefenseTalk_Trimmed

Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca ldaultraraj
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IRamez Abdalla, M.Sc
 
Graduation project Book (Self-Driving Car)
Graduation project Book (Self-Driving Car)Graduation project Book (Self-Driving Car)
Graduation project Book (Self-Driving Car)ahmedshehata133
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitBAINIDA
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...IRJET Journal
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesNamkug Kim
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Networkijceronline
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterYousef Fadila
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Russell Jarvis
 

Ähnlich wie DefenseTalk_Trimmed (20)

Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Fr pca lda
Fr pca ldaFr pca lda
Fr pca lda
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part I
 
Graduation project Book (Self-Driving Car)
Graduation project Book (Self-Driving Car)Graduation project Book (Self-Driving Car)
Graduation project Book (Self-Driving Car)
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
Final Poster
Final PosterFinal Poster
Final Poster
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural NetworkTargeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
Targeted Visual Content Recognition Using Multi-Layer Perceptron Neural Network
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Data driven model optimization [autosaved]
Data driven model optimization [autosaved]
 
23AFMC_Beamer.pdf
23AFMC_Beamer.pdf23AFMC_Beamer.pdf
23AFMC_Beamer.pdf
 

DefenseTalk_Trimmed

  • 1. Learning From Multiple Views of Data PhD Defense talk of Abhishek Sharma Collaborators David W. Jacobs, Larry S. Davis, Hal Daume III, Oncel Tuzel, Ming Yu-Liu, Abhishek Kumar, Jonghyun Choi, Murad Al Haj, Sanja Fidler and Angjoo Kanazawa
  • 2. Overview 1. Introduction PART - I 1. Content Extraction 1. Semantic Segmentation as visual feature 2. Contextual information 3. Neural Network model PART - II 1. Cross-modal content matching 1. Challenges 2. PLS based common representation 3. Generalized Multi-view Analysis 2. Future Directions
  • 3. Match image and sentence Image courtesy – UIUC sentence-Image dataset: http://vision.cs.uiuc.edu/pascal-sentences/ Text viewTwo parked jet airplanes facing opposite directions Image view Canonical/ Common view
  • 4. Find the image based on a sentence Two parked jet airplanes facing opposite directions
  • 5. Find the image based on a sentence Two parked jet airplanes facing opposite directions
  • 6. Find the image based on a sentence Two parked jet airplanes facing opposite directions
  • 7. Find the image based on a sentence Two parked jet airplanes facing opposite directions
  • 8. A simple computer-based matching of sentence and image 1. Task understanding 2. Content from text and image 1. jet airplanes 2. Two 3. Parked 4. facing opposite direction 3. Content Matching
  • 9. Cross-view content matching challenges Text – “Two parked jet airplanes facing opposite directions on a grassy land” Bag-of-Word SIFT BoW 1 jetdirection facing 111 …Index 2 3 4 10000 Dimension Mismatch Semantic Mismatch Insufficient Content Deep ?
  • 10. Cross-view content matching challenge Lack of correspondence Same Region Missing Region = Column-wise Vectorization 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Deep ?
  • 11. Other useful problems Task – Face recognition … Face DB Content Extraction Pixel, Attribute, SIFT, LBP, HOG, Gabor Content Matching CCA, PLS, Metric Learning, SVMs
  • 12. Other useful problems Task – Forensic sketch photo matching Suspect Image Database Forensic Sketch Query Image courtesy – Lios Gibson, “Forensic Art Essentials: A Manual for Law Enforcement Artists” Content Extraction SIFT, HOG, Gabor Content Matching Local LDA, PLS, CCA
  • 13. This Dissertation We are interested in extracting and matching task-dependent content across multiple modalities Task Content Matching Content Extraction Pose-invariant face recognition Pose-lighting invariant face recognition Text-image matching Forensic Image-photo matching Semantic Segmentation Partial Least Square Pose-error robust matching Generalized Multi-view Analysis
  • 14. Part - I Semantic Segmentation
  • 15. Semantic Segmentation: Task Input Image Segmentation Mask Image courtesy – http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/baseFinal.html Label each pixel
  • 16. Semantic Segmentation: Overview 1. Scene understanding, robotics, medical image analysis etc. 2. Related work 3. Problem formulation 4. Role of context 5. Intuitive picture 6. Mathematical picture 7. Complete Pipeline 8. Back-propagation and issues 9. Pure-node RCPN 10. Experiments
  • 17. Related Work 1. Multi-scale CNN (Farabet, Pineheiro) 2. Deep CNN (DeepSeg) 3. Non-parametric template matching (Tighe_1, Tighe_2, Eigen, Yang) 4. CRF models (Gould, Munoz, Lempitzky, Kumar, Mottaghi, Yuille)
  • 18. Semantic Segmentation: Problem formulation Label each super-pixel Super- segmentation Road Car Ground Image courtesy – http://www.cs.unc.edu/~jtighe/Papers/ECCV10/siftflow/baseFinal.html Input image Super-segment overlaid image
  • 19. Semantic Segmentation: Context • Labeling super-pixel in isolation is difficult • Without context machines outperform humans: 77.4% vs 72.2% (Mottaghi et al.) Building Train Aeroplane Image courtesy – Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
  • 20. Semantic Segmentation: Context importance Image courtesy – Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
  • 21. Semantic Segmentation: Context • Labeling super-pixel in isolation is difficult • Without context machines outperform humans: 77.4% vs 72.2% (Mottaghi et al.) • Use context • MRFs and CRFs • Typically MRFs and CRFs use human designed potential functions and features • Complex human visual system – LEARN IT FROM DATA Roozbeh Mottaghi, Sanja Fidler, Jian Yao, Raquel Urtasun and Devi Parikh, “Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs”, IEEE CVPR 2013
  • 22. Recursive Context Propagation Network or RCPN 1. Label each super-pixel using entire image 2. Fast feed-forward computations for real-time labeling 3. End-to-end learning 4. Modular to the segmentation pipeline
  • 24. Semantic Segmentation - Pipeline • 𝐹𝐶𝑁𝑁 = Multi-scale CNN at scales – 1, 2 and 4 • 8×8×16 → 2×2 maxpool → 7×7×64 → 2×2 maxpool → 7×7×256 • 256×3 = 768 dimensional pixel feature • Field of View (FOV) for every pixel = 47×47, 94×94 and 188×188 at different scales • Super-pixels by LiuSeg • ~ 100 super-pixels per image • 𝑣𝑖 = average pixel features in each super-pixel • Data augmentation by 5 random average sets 1. Super-pixel feature
  • 25. Semantic Segmentation - Pipeline 1. Super-pixel feature 2. Context via Recursive Context Propagation Network
  • 26. RCPN forward computation 1v 2v 1x 2x 6x 9x Sub-tree Semantic mapper 𝐹𝑠𝑒𝑚: ℜ 𝑑 𝑣 → ℜ 𝑑 𝑠 semantic vector 𝒙𝑖 = 𝐹𝑠𝑒𝑚( 𝒗𝑖 ) Combiner 𝐹𝑐𝑜𝑚: ℜ2𝑑 𝑠 → ℜ 𝑑 𝑠 Parent feature 𝑥𝑖𝑗 = 𝐹𝑐𝑜𝑚 𝑥𝑖 𝑥𝑗 6 ~x 1 ~x 1y Decombiner 𝐹𝑑𝑒𝑐: ℜ2𝑑 𝑠 → ℜ 𝑑 𝑠 Enhanced feature 𝒙𝑖 = 𝐹𝑑𝑒𝑐 𝒙𝑖 𝒙𝑖𝑗 Labeler 𝐹𝑙𝑎𝑏: ℜ 𝑑 𝑠 → ℜ 𝑐 Label 𝒚𝑖 = 𝐹𝑙𝑎𝑏([ 𝒙𝑖]) Cartoon example 5 Super-pixel
  • 27. RCPN characteristics • N super-pixels = 2N – 1 nodes • Leaf-nodes = super-pixels • Internal nodes = merged-regions • Pure merged-regions • Pure nodes = Pure merged-regions • Every super-pixel affects every super-pixel
  • 28. Semantic Segmentation: Learning • 𝐹𝐶𝑁𝑁 trained using CAFFE on Nvidia GTX 780 • Stochastic gradient descent • Learning rate = 0.1 • Momentum = 0.9 • Batch-size = 12 images • Data augmentation - Horizontal flip • 2000 iterations in 7 hours 1. Multi-scale CNN
  • 29. Semantic Segmentation: Learning 𝐹𝑅𝐶𝑃𝑁 = {𝐹𝑠𝑒𝑚; 𝐹𝑐𝑜𝑚; 𝐹𝑑𝑒𝑐; 𝐹𝑙𝑎𝑏} was trained using L-BFGS. Typically, 800-1000 iterations were required for complete training. 1. Mutli-scale CNN 2. RCPN
  • 31. RCPN Back-propagation and Bypass Error 1v 2v 1x 2x 6x 6 ~x 1y1 ~x 9x cat e1 dec e6 com e9 Sub-tree com e6 sem e1 sem e2 1l
  • 32. RCPN Back-propagation and Bypass Error 1v 1x 1y1 ~x cat e1 2v 2x 6x 6 ~x 9x dec e6 com e9 Sub-tree com e6 sem e1 sem e2 1l Combiner is bypassed Context Lost Poor Local Minimum Sem Grad Com Grad Dec Grad Lab Grad Empirical 𝒈 𝑐𝑜𝑚 ≪ 𝒈 𝑠𝑒𝑚 ≈ 𝒈 𝑑𝑒𝑐 ≪ 𝒈𝑙𝑎𝑏 Ideal 𝒈 𝑠𝑒𝑚 < 𝒈 𝑐𝑜𝑚 < 𝒈 𝑑𝑒𝑐 < 𝒈𝑙𝑎𝑏
  • 33. Pure-node RCPN or PN-RCPN •RCPN + pure-nodes classification loss •Benefits •Roughly 65% more training data •Meaningful combination by combiner •Deeper and stronger gradients
  • 35. Grad Strength: RCPN vs. PN-RCPN Sem Grad Com Grad Dec Grad Lab Grad Sem Grad Com Grad Dec Grad Lab Grad 𝒈 𝑐𝑜𝑚 ≪ 𝒈 𝑠𝑒𝑚 ≈ 𝒈 𝑑𝑒𝑐 ≪ 𝒈𝑙𝑎𝑏 𝒈 𝑠𝑒𝑚 < 𝒈 𝑐𝑜𝑚 ≈ 𝒈 𝑑𝑒𝑐 < 𝒈𝑙𝑎𝑏
  • 36. Experiments: Datasets We conduct semantic segmentation experiments on three datasets Stanford Background Color images with 8 semantic classes Train/Test – 572/143 images SIFT Flow Color images with 33 semantic classes Train/Test – 2488/200 Daimler Urban Dataset Gray-scale images with 6 semantic classes Train/Test – 500/200
  • 37. Experiments: Details • Per pixel 0.5 subtraction • 100 Super-pixels/image for Stanford and SIFT Flow • 800 for Daimler due to large size • 10 random parse trees with 5 random feature set for training to avoid over-fitting • 20 random parse trees with max-voting for testing
  • 38. Experiments: Performance metric 1. Per-pixel accuracy (PPA) 2. Mean-class accuracy (MCA) 3. Intersection over Union (IoU) – Penalize under- & over-segmentation 4. Dynamic IoU (Dyn IoU) – IoU for dynamic objects 5. Time Per Image (TPI) – Both CPU and GPU
  • 39. Stanford Results Method PPA MCA IoU TPI (CPU/GPU) Gould 76.4 NA NA 30 – 600 / NA Munoz 76.9 NA NA 12 / NA Tighe_1 77.5 NA NA 4 / NA Kumar 79.4 NA NA < 600 / NA Socher 78.1 NA NA NA / NA Lempitzky 81.9 72.4 NA > 60 /NA Singh 74.1 62.2 NA 20 / NA Farabet 81.4 76.0 NA 60.5 / NA Eigen 75.3 66.5 NA 16.6 / NA Pinheiro 80.2 69.6 NA 10 / NA Plain-NN 80.1 69.7 56.4 1.1 / 0.4 RCPN 81.8 73.9 61.3 1.1 / 0.4 PN-RCPN 82.1 79.0 64.0 1.1 / 0.4 TM-RCPN 82.3 79.1 64.5 1.6-6.1 / 0.9-5.9
  • 40. SIFT Flow results Method PPA MCA IoU TPI (CPU/GPU) Tighe 77.0 30.1 NA 8.4 / NA Liu 76.7 NA NA 31 / NA Siingh 79.2 33.8 NA 20 / NA Eigen 77.1 32.5 NA 16.6 / NA Farabet 78.5 29.6 NA NA / NA Bal. Farabet 72.3 50.8 NA NA / NA Tighe, 24 78.6 39.2 NA 8.4 / NA Pinheiro 77.7 29.8 NA NA / NA Yang 79.8 48.7 NA < 12 / NA Plain-NN 76.3 32.1 24.7 1.1 / 0.36 RCPN 79.6 33.6 26.9 1.1 / 0.4 Bal. RCPN 75.5 48.0 28.6 1.1 / 0.4 PN-RCPN 80.9 39.1 30.8 1.1 / 0.4 Bal. PN-RCPN 75.5 52.8 30.2 1.1 / 0.4 TM-RCPN 80.8 38.4 30.7 1.6-6.1 / 0.9-5.4 Bal. TM-RCPN 76.4 52.6 31.4 1.6-6.1 / 0.9-5.4 DeepSeg 85.2 51.7 39.1 NA / 0.2
  • 41. Daimler Urban results Method PPA MCA IoU IoU Dyn TPI (CPU/GPU) Joint 94.5 91.0 86.0 74.5 111 / NA Stixmantic 92.8 87.5 80.6 72.3 0.05 / NA Bal. Plain-NN 91.4 83.2 75.8 56.2 5.9 / 2.8 Bal. RCPN 93.3 87.6 80.9 66.0 6.0 / 2.8 Bal. PN-RCPN 94.5 90.2 84.5 73.8 6.0 / 2.8 Bal. TM-RCPN 94.5 90.1 84.5 73.8 12 / 8.8
  • 43. Part - II Cross-Modal Content Matching
  • 44. Common space representation View 1 View 2 View 4 View 3 View v Common Content Noise View-specific content Feature vector Common space
  • 45. Cross-view Content Matching: A simple picture RELAXEDIDEAL Shape – Classes Solid/Hollow shapes - Views Dashed shape – Unseen classes PAIRED DATA VIEW 1 VIEW 2
  • 46. PLS based multi-modal face recognition PLS Bridge Common Subspace Pose Resolution Sketch WX WY Shape = Identity X Y
  • 47. PLS based pose-invariant face recognition 0.75 0.8 0.85 0.9 0.95 1 1.05 PGFR TFA LLR ELF Partial Comparison –Differenttesting scenario Others Proposed • CMU PIE face date set for experiments. • 34 training and 34 testing, intensity features
  • 49. PLS based sketch-face recognition Metho d Gal. Size Type Accuracy Wang 100 Holistic 81 Liu 300 Patch 87.67 Klare 300 Pixel 99.47 PLS 100 Holistic 93.6 CCA 100 Holistic 94.6 Bilinear 100 Holistic 94.2
  • 50. Cross-view Content Matching: A more complete picture Multiple samples per class Shape – Classes Solid/Hollow shapes - Views Color – Same-class samples Dashed shape – Unseen classes PAIRED DATA VIEW 1 VIEW 2 RELAXEDIDEAL
  • 51. What CCA/PLS/BLM can do ? VIEW 1 VIEW 2 CCA/PLS/BLM ? Match paired samples Desired
  • 52. CCA/PLS/BLM A better Picture Generalized Multi-view Analysis or GMA
  • 54. GMA cont.. • Multi-view extension of any generalized eigen-value feature extraction • GMA + LDA = GMLDA D = Between class scatter matrix; S = Between class scatter matrix • GMA + MFA = GMMFA D = Penalty Graph; S = Intrinsic Graph • GMA + LPP = GMLPP D = Identity; S = Graph Laplacian of Similarity matrix
  • 55. Pros and Cons Cross-view classification and retrieval Kernelizable Closed form optimal solution Supervised  Generalize to unseen classes  Domain agnostic
  • 56. Pros and Cons  Still not ideal  Non-probabilistic  Shallow  Similar views across test and train
  • 57. VIEW 2 GMAVIEW 1 CCA/PLS/BLM SVM-2K/HMFDA IDEAL DIFFERENT LATENT SPACESORIGINAL SPACE PAIRED DATA Final Picture
  • 58. Experiments Pose and Lighting Invariant face recognition • 129 train subjects in 5 illums • 129 test subjects (same identity diff session) in 18 illums • 120 subjects in 5 illum • 129 test subjects (diff identity diff session) in 18 illum
  • 59. Text-Image Retrieval • Wiki pages (2173 + 693) • 10 Different classes • Latent Dirichlet Allocation Model based text features • SIFT histogram based image features • Precision-Recall based Mean Average Precision score • SM – Sematic matching (domain dependent approach) • SCM – Semantic matching in CCA latent space (two stage domain dependent approach)
  • 60. Future Directions • Deep learning based feature extraction • Large-scale Data collection • Deep Multi-view algorithms Vs. Common Deep Network • Unsupervised training
  • 62. Reference Tighe_1: J. Tighe and S. Lazebnik. Superparsing. Int. J. Comput. Vision, 101(2):329–349, 2013 Tighe_2: J. Tighe and S. Lazebnik. Finding things: Image parsing with regions and per-exemplar detectors. IEEE CVPR, 2013 Gould: S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. IEEE ICCV, 2009 Munoz: D. Munoz, J. A. Bagnell, and M. Hebert. Stacked hierarchical labeling. ECCV, 2010 Kumar: M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. IEEE CVPR, 2010 Lempitsky: V. Lempitsky, A. Vedaldi, and A. Zisserman. A pylon model for semantic segmentation. NIPS, 2011 Farabet: C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE TPAMI, August 2013 Eigen: R. Fergus and D. Eigen. Nonparametric image parsing using adaptive neighbor sets. IEEE CVPR, 2012 Joint: L. Ladick, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. Clocksin, and P. Torr. Joint optimization for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision, 100(2):122–133, 2012 Liu: C. Liu, J. Yuen, and A. Torralba. Nonparametric scene parsing via label transfer. IEEE TPAMI, 33(12), Dec 2011 LiuSeg: M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy rate superpixel segmentation. IEEE CVPR, 2011 Pinheiro: P. H. O. Pinheiro and R. Collobert. Recurrent convolutional neural networks for scene parsing. ICML, 2014 Stixmantics: T. Scharwachter, M. Enzweiler, U. Franke, and S. Roth. Stix- ¨ mantics: A medium-level model for real-time semantic scene understanding. ECCV, 2014 Yang: J. Yang, B. Price, S. Cohen, and M.-H. Yang. Context driven scene parsing with attention to rare classes. CVPR, pages 3294–3301, 2014

Hinweis der Redaktion

  1. First show a single mode matching and then discuss cross-modal with face perhaps easier or text-image.
  2. Assigning a class to each pixel
  3. Prior work on semantic segmentation at least one slide
  4. Say from where you got it all of them
  5. Cite Racquel’s paper
  6. Racquel cite and give a little more time to audience
  7. Try to remove as much as possible
  8. This is what I did, dnt throw it away like this without emphasizing
  9. Make correspondence between the part being discussed and the text
  10. Make correspondence between the part being discussed and the text
  11. Too much parse tree text
  12. Put a picture of pure node with animation say something that it is learnable and end-to-end
  13. Variable width of the gradient because it is vanishing
  14. Variable width of the gradient because it is vanishing
  15. Variable width of the gradient because it is vanishing
  16. Variable width of the gradient because it is vanishing
  17. Pictures from dataset
  18. Segmentation results
  19. Make palletes of all colors for view-specific content