SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Modularity Matters:
Learning Invariant Relational ReasoningTasks
1st July, 2018
PR12 Paper Review
Jinwon Lee
Samsung Electronics
Jason Jo, et al., “Modularity Matters: Learning Invariant Relational
Reasoning Tasks”, arXiv:1806.06765
Related Papers in PR12
• Adam Santoro, et al., ”A Simple Neural Network Module for
Relational Reasoning”
 PR-018 : https://youtu.be/Lb1PVpFp9F8
• Sara Sabour, et al., “Dynamic Routing Between Capsules”
 PR-056 : https://youtu.be/_YT_8CT2w_Q
Introduction
• The human visual system is able to learn discriminative
representations for high level abstractions in the data that are also
invariant to an incredibly large and varied collection of
transformations
• The current de-facto standard visual learning models are deep
convolutional neural networks
• While various CNN models are able to exhibit record breaking, it
should be noted that this test generalization is in the identically and
independently distributed (i.i.d) setting. So-called adversarial noise
has been shown to break various models on some tasks
Introduction
• The majority of CNNs can be interpreted as learning deep hierarchies
of fully distributed features
 For features fl
1, fl
2 at level l of the hierarchy, these features get applied to the
same input yl-1
• In this paper, they explore the efficacy of the fully distributed
representation prior for learning invariant relational rules focused on
tow relational tasks
 MNIST ParityTask
 ColorizedVariant of the PentominoTask
• These two tasks are supervised visual reasoning tasks whose labels
encode a semantic (high-level) relational rule between two or more
objects in an image
MNIST Parity Dataset
• 30K training, 5K validation and
5K test images
• Each image is of size 64x64 and
is divided into a 2x2 grid of 32x32 blocks
• Each image has 2 MNIST digits placed in 2 randomly chosen blocks
• Randomly colored (10 randomly chosen colors)
• Randomly scaled to size(20x20, 22x22, … , 28x28)
• Randomly rotated by angle(0, 5, 10, …, 30)
• Placed at a random location with in block
• The task is to predict whether both the digits in an images are of the
same parity, both even or both odd(label 1) or not(label 0)
Pentomino
Colorized Pentomino Dataset
• 20K train, 5K validation and 5K test images
• Each image is of size 64x64 which is divided into a grid of 8x8 blocks
• Each image has 3 Pentomino sprites placed in 3 randomly chosen
unique blocks
• Scaling factor is {1, 2}
• Randomly rotated by a multiple of 90 degrees
• Randomly colored by one out of 10 colors
• The maximum size of sprites is 4x8
Colorized Pentomino Dataset
• The task is to learn whether all the Pentomino sprites in an image
belong to same class (label 0) or not (label 1).
Relational Object ReasoningTasks
• Two key defining characteristics
 Object distribution
 Relational rule
• The MNIST Parity task consists of curvilinear digit strokes while the
colorized Pentomino task consists of rigid polygonal shapes
• With respect to the relational rule, the MNIST Parity task is an AND
operation on the parity of the digits while the colorized Pentomino task is
a XOR like operation on the sprite types.
• Colorized Pentomino has more sparsity in the images and the objects in
the image have more freedom for translation as compared to MNIST
Parity.
• Arguably, MNIST Parity dataset’s curves assist more than the straight
edges of Colorized Pentomino dataset in learning discriminative features
for the desired task
Relational Reasoning
• This paper’s interest is invariant relational learning
• In this setting, a machine learning model must be able to recognize
that simply translating, rotating, scaling or changing the color of any
of the objects in the image does not change the label of the image
• Therefore a machine learning model will be tasked with learning
simultaneously discriminative and invariant representations
Interference Problem
• Many of deepCNNs may be classified as learning a deep hierarchy of fully
distributed features
• Overall, distributed representations have been an extremely powerful
architectural prior for AI.
• However, when the number of invariances in the dataset is very large
(and/or the dataset size is sufficiently small), one may encounter the
interference problem for architectures that learn fully distributed
representations
• In the case of supervised learning from image labels, there is one global
teaching signal, and this would entangle all the neural network’s
parameters, which would cause the features to interfere with one another
and result in a slow down in learning
 Take for example the MNIST Parity task: a machine learning model must learn
associate the digit pairing [1, 4] with [2, 7] as they have the same label of 0, but the
digit pairings have different geometric properties.
Modularity Matters
• One natural way to combat the interference problem is to allow for
specialized sub-modules in our architecture
• Once we modularize, we reduce the amount of interference that can
occur between features in our model
• These specialized modules can now learn highly discriminative yet
invariant representations while not interfering with each other
Residual Mixture Network(ResMixNet)
• Mixture of Experts architecture
 Individual expert networks {E1, …., En} (which here map their input to their
output)
 A Gater network G that weights the output from each of the individual
experts, in a way that is context-dependent
ResMixNetArchitecture
Experimental Results – MNIST Parity
• VGG19-BN network soundly outperforms the ResNet models
• This is the first time such a performance gap has been exhibited
between a residual network and non-residual network
• ResMixNet(2,2) model actually attains slightly better test
performance while having over 70x
fewer parameters
Experimental Results – Colorized Pentomino
• VGG19-BN and the various
ResNet models generalize
poorly
• Stellar optimization and
generalization performance of
the ResMixNet(4,1) model
• A nearly 30x reduction in test
error from the non-
modularized CNNs to the
ResMixNet(4,1) model.
Experimental Results – Classical Object
Recognition
• The performance on CIFAR-10 is quite close, merely a 0.74% gap in
test error and that for SVHN that the performance of the two models
is even closer, a mere difference of 0.13%
• The gap is 5.46% for the CIFAR-100. Note that for the CIFAR-100, the
data by design has multiple class labels that are semantically similar,
and thus many of the images may share features.
RelatedWork
• The ResNeXt model uses multi-branches (e.g. experts) and pools the
experts together via summation, but they do not employ a gater-
type network to weight the sum.
• The Inception architectures also uses multi-branch modules and
concatenates all them together, thus they similarly lack a gater
network.

Weitere ähnliche Inhalte

Was ist angesagt?

PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide威智 黃
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksHannes Hapke
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054Jinwon Lee
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 

Was ist angesagt? (20)

PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
ShuffleNet - PR054
ShuffleNet - PR054ShuffleNet - PR054
ShuffleNet - PR054
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Deep learning
Deep learningDeep learning
Deep learning
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 

Ähnlich wie PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks

Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Joe Xing
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classificationijtsrd
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptxssuser2624f71
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015Beatrice van Eden
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of FunctionsJaeJun Yoo
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Dongmin Choi
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 

Ähnlich wie PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks (20)

Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptx
 
Image captioning
Image captioningImage captioning
Image captioning
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions[PR12] Generative Models as Distributions of Functions
[PR12] Generative Models as Distributions of Functions
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 

Mehr von Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033Jinwon Lee
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023Jinwon Lee
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 

Mehr von Jinwon Lee (14)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object DetectionPR-217: EfficientDet: Scalable and Efficient Object Detection
PR-217: EfficientDet: Scalable and Efficient Object Detection
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
PVANet - PR033
PVANet - PR033PVANet - PR033
PVANet - PR033
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
YOLO9000 - PR023
YOLO9000 - PR023YOLO9000 - PR023
YOLO9000 - PR023
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks

  • 1. Modularity Matters: Learning Invariant Relational ReasoningTasks 1st July, 2018 PR12 Paper Review Jinwon Lee Samsung Electronics Jason Jo, et al., “Modularity Matters: Learning Invariant Relational Reasoning Tasks”, arXiv:1806.06765
  • 2. Related Papers in PR12 • Adam Santoro, et al., ”A Simple Neural Network Module for Relational Reasoning”  PR-018 : https://youtu.be/Lb1PVpFp9F8 • Sara Sabour, et al., “Dynamic Routing Between Capsules”  PR-056 : https://youtu.be/_YT_8CT2w_Q
  • 3. Introduction • The human visual system is able to learn discriminative representations for high level abstractions in the data that are also invariant to an incredibly large and varied collection of transformations • The current de-facto standard visual learning models are deep convolutional neural networks • While various CNN models are able to exhibit record breaking, it should be noted that this test generalization is in the identically and independently distributed (i.i.d) setting. So-called adversarial noise has been shown to break various models on some tasks
  • 4. Introduction • The majority of CNNs can be interpreted as learning deep hierarchies of fully distributed features  For features fl 1, fl 2 at level l of the hierarchy, these features get applied to the same input yl-1 • In this paper, they explore the efficacy of the fully distributed representation prior for learning invariant relational rules focused on tow relational tasks  MNIST ParityTask  ColorizedVariant of the PentominoTask • These two tasks are supervised visual reasoning tasks whose labels encode a semantic (high-level) relational rule between two or more objects in an image
  • 5. MNIST Parity Dataset • 30K training, 5K validation and 5K test images • Each image is of size 64x64 and is divided into a 2x2 grid of 32x32 blocks • Each image has 2 MNIST digits placed in 2 randomly chosen blocks • Randomly colored (10 randomly chosen colors) • Randomly scaled to size(20x20, 22x22, … , 28x28) • Randomly rotated by angle(0, 5, 10, …, 30) • Placed at a random location with in block • The task is to predict whether both the digits in an images are of the same parity, both even or both odd(label 1) or not(label 0)
  • 7. Colorized Pentomino Dataset • 20K train, 5K validation and 5K test images • Each image is of size 64x64 which is divided into a grid of 8x8 blocks • Each image has 3 Pentomino sprites placed in 3 randomly chosen unique blocks • Scaling factor is {1, 2} • Randomly rotated by a multiple of 90 degrees • Randomly colored by one out of 10 colors • The maximum size of sprites is 4x8
  • 8. Colorized Pentomino Dataset • The task is to learn whether all the Pentomino sprites in an image belong to same class (label 0) or not (label 1).
  • 9. Relational Object ReasoningTasks • Two key defining characteristics  Object distribution  Relational rule • The MNIST Parity task consists of curvilinear digit strokes while the colorized Pentomino task consists of rigid polygonal shapes • With respect to the relational rule, the MNIST Parity task is an AND operation on the parity of the digits while the colorized Pentomino task is a XOR like operation on the sprite types. • Colorized Pentomino has more sparsity in the images and the objects in the image have more freedom for translation as compared to MNIST Parity. • Arguably, MNIST Parity dataset’s curves assist more than the straight edges of Colorized Pentomino dataset in learning discriminative features for the desired task
  • 10. Relational Reasoning • This paper’s interest is invariant relational learning • In this setting, a machine learning model must be able to recognize that simply translating, rotating, scaling or changing the color of any of the objects in the image does not change the label of the image • Therefore a machine learning model will be tasked with learning simultaneously discriminative and invariant representations
  • 11. Interference Problem • Many of deepCNNs may be classified as learning a deep hierarchy of fully distributed features • Overall, distributed representations have been an extremely powerful architectural prior for AI. • However, when the number of invariances in the dataset is very large (and/or the dataset size is sufficiently small), one may encounter the interference problem for architectures that learn fully distributed representations • In the case of supervised learning from image labels, there is one global teaching signal, and this would entangle all the neural network’s parameters, which would cause the features to interfere with one another and result in a slow down in learning  Take for example the MNIST Parity task: a machine learning model must learn associate the digit pairing [1, 4] with [2, 7] as they have the same label of 0, but the digit pairings have different geometric properties.
  • 12. Modularity Matters • One natural way to combat the interference problem is to allow for specialized sub-modules in our architecture • Once we modularize, we reduce the amount of interference that can occur between features in our model • These specialized modules can now learn highly discriminative yet invariant representations while not interfering with each other
  • 13. Residual Mixture Network(ResMixNet) • Mixture of Experts architecture  Individual expert networks {E1, …., En} (which here map their input to their output)  A Gater network G that weights the output from each of the individual experts, in a way that is context-dependent
  • 15. Experimental Results – MNIST Parity • VGG19-BN network soundly outperforms the ResNet models • This is the first time such a performance gap has been exhibited between a residual network and non-residual network • ResMixNet(2,2) model actually attains slightly better test performance while having over 70x fewer parameters
  • 16. Experimental Results – Colorized Pentomino • VGG19-BN and the various ResNet models generalize poorly • Stellar optimization and generalization performance of the ResMixNet(4,1) model • A nearly 30x reduction in test error from the non- modularized CNNs to the ResMixNet(4,1) model.
  • 17. Experimental Results – Classical Object Recognition • The performance on CIFAR-10 is quite close, merely a 0.74% gap in test error and that for SVHN that the performance of the two models is even closer, a mere difference of 0.13% • The gap is 5.46% for the CIFAR-100. Note that for the CIFAR-100, the data by design has multiple class labels that are semantically similar, and thus many of the images may share features.
  • 18. RelatedWork • The ResNeXt model uses multi-branches (e.g. experts) and pools the experts together via summation, but they do not employ a gater- type network to weight the sum. • The Inception architectures also uses multi-branch modules and concatenates all them together, thus they similarly lack a gater network.