SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Using Feature Grouping as a
Stochastic Regularizer for
High Dimensional Noisy Data
Sergül Aydöre
Assistant Professor
Electrical and Computer Engineering
Stevens Institute of Technology
2
Landscape of Machine Learning Applications
https://research.hubspot.com/charts/simplified-ai-landscape​
• Data is High Dimensional, Noisy and Sample Size is
small as in NeuroImaging
3
But what if
PET acquisition process
wikipedia
Implantation of intracranial
electrodes.
Cleveland Epilepsy Clinic
An elastic EEG cap with 60
electrodes [Bai2012]
A typical MEG equipment [BML2001]
MRI Scanner and rs-fMRI time series acquisition [NVIDIA]
4
Other High Dimensional, Noisy Data and Small
Sample Size Situations
Genomics
Integrative Genomics Viewer, 2012
Seismology
https://www.mapnagroup.com
Astronomy
AstronomyMagazine,
2015
5
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
6
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
2. Noise in the data due to mechanical or physical artifacts.
7
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
2. Noise in the data due to mechanical or physical artifacts.
3. Difficulty and cost of data collection
8
Overfitting
• ML models with
large number of
parameters
require large
amount of data.
Otherwise,
overfitting can
occur!
http://scott.fortmann-roe.com/docs/MeasuringError.html
9
Regularization Methods to overcome Overfitting
• Early Stopping [Yao, 2007]
• Ridge Regression (ℓ2 regularization) [Tibshirami 1996]
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization ) [Tibshirami 1996]
• Dropout [Srivastana 2014]
• Group Lasso [Yuan 2016]
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
STOCHASTICITY
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
STOCHASTICITY
STRUCTURE & SPARSITY
12
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
• PROPOSED: STRUCTURE & STOHASTICITY
STOCHASTICITY
STRUCTURE & SPARSITY
13
SPARSITY
14
Problem Setting: Supervised Learning
• Training samples:
drawn from
• Parameters of the model are estimated by:
Loss per sample
15
Multinomial Logistic Regression
• The class label probability of a given input is:
• Hence, the parameter space is
• The loss per sample is:
16
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
17
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
18
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
19
FeatureDropoutMatrices
Randomly picked matrix
Dropout for Multinomial Logistic Regression
20
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Dropout for Multinomial Logistic Regression
21
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Dropout for Multinomial Logistic Regression
22
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Back Propagation
Dropout for Multinomial Logistic Regression
23
StructuredProjectionMatrices
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Back Propagation
Replace Masking with Structured Matrices
Randomly picked matrix
24
Replace Masking with Structured Matrices
25
Replace Masking with Structured Matrices
Each is generated from
random samples (size r) with
replacement from the training
data set (size n).
26
Replace Masking with Structured Matrices
27
Replace Masking with Structured Matrices
28
Replace Masking with Structured Matrices
29
Replace Masking with Structured Matrices
30
Replace Masking with Structured Matrices
We project the training
samples onto a lower
dimensional space by
. Hence, weight matrix
becomes:
approximate x
31
Replace Masking with Structured Matrices
To update , we
project the gradients
back to the original
space
32
Replace Masking with Structured Matrices
No projection is
necessary for the
bias term.
33
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
34
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
35
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
36
Recursive Nearest Agglomeration Clustering
(ReNA)
Hoyos-Idrobo 2016
• Agglomerative clustering schemes start off by placing every data
element in its own cluster.
• They proceed by merging repeatedly the closest pair of connected
clusters until finding the desired number of clusters.
37
Insights: Random Reductions While Fitting
• Let where is the deterministic
term and is the zero-mean noise term.
Loss on the
smoothed input
Regularization Cost
variance of the
model given the
smooth input
features
variance of the
estimated target due
to the randomization
Insights: Random Reductions While Fitting
• Regularization Cost:
• For dropout, we have and is diagonal matrix
where for .
• This is equivalent to ridge regression after “orthogonalizing” the
features.
Constant for linear
regression
39
Computational Complexity
Total
number
of epochs
40
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
41
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
42
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
43
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
44
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
45
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
46
Experimental Results: Olivetti Faces
47
Experimental Results: Olivetti Faces
48
Experimental Results: Olivetti Faces
49
Experimental Results: Olivetti Faces
50
Experimental Results: Olivetti Faces
51
Experimental Results: Olivetti Faces
52
Experimental Results: Olivetti Faces
53
Experimental Results: Olivetti Faces
54
Experimental Results: Olivetti Faces
55
Experimental Results: Olivetti Faces
• Visualization of the learned weights for logistic regression for a single
Olivetti face with high noise using different regularizers.
56
Experimental Results: Olivetti Faces
• Performance in terms of loss as a function of computation time for
MLP with a single layer using feature grouping and best parameters
for other regularizers, for Olivetti face data with high noise.
57
Experimental Results: Neuroimaging Data Set
• Openly accessible fMRI data set from Human Connectome Project
• 500 subjects, 8 cognitive tasks to classify
• Feature dimension: 33854, training set: 3052 samples, test set: 791
samples
58
Experimental Results: Neuroimaging Data Set
59
Experimental Results: Neuroimaging Data Set
60
Summary – Stochastic Regularizer
• We introduced a stochastic regularizer
based on feature averaging that
captures the structure of data.
• Our approach leads to higher accuracy
at high noise settings without
additional computation time.
• Learned weights have more structure
at high noise settings.
61
Collaborators and References
• S. Aydore, B. Thirion, O. Grisel, G. Varoquaux. “Using Feature Grouping as a Stochastic
Regularizer for High-Dimensional Noisy Data”, Women in Machine Learning Workshop, NeurIPS
2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1807.11718.
• S. Aydore, L. Dicker, D. Foster.“A local Regret in Nonconvex Online Learning”, Continual
Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint:
1811.05095.
Bertrand Thirion
(INRIA, France)
Olivier Grisel
(INRIA, France)
Gaël Varoquaux
(INRIA, France)
Dean Foster
(Amazon & University of Pennsylvania)
Lee Dicker
(Amazon & University of Rutgers)
Thank You
More on my website…
http://www.sergulaydore.com

Weitere ähnliche Inhalte

Was ist angesagt?

Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorizationBalázs Hidasi
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separationNAVER Engineering
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataMLconf
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsankit_ppt
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniquesJaeJun Yoo
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From DataSungjoon Choi
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review민재 정
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsankit_ppt
 

Was ist angesagt? (20)

Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 

Ähnlich wie Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology

To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsKimin Lee
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...Jihwan Bang
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Lviv Data Science Summer School
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Tarek Gaber
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsNithyananthSengottai
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutSeunghyun Hwang
 
A Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian ManifoldsA Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian ManifoldsDiego Tosato
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16MLconf
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Matthew Clark
 

Ähnlich wie Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology (20)

To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
 
Network recasting
Network recastingNetwork recasting
Network recasting
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
A Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian ManifoldsA Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian Manifolds
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016
 

Mehr von WiMLDSMontreal

The Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D PartnersThe Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D PartnersWiMLDSMontreal
 
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...WiMLDSMontreal
 
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...WiMLDSMontreal
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
 
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...WiMLDSMontreal
 
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...WiMLDSMontreal
 
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...WiMLDSMontreal
 
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...WiMLDSMontreal
 
Artistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI CuratorArtistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI CuratorWiMLDSMontreal
 
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...WiMLDSMontreal
 
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...WiMLDSMontreal
 

Mehr von WiMLDSMontreal (11)

The Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D PartnersThe Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
 
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
 
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...
 
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
 
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
 
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
 
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
 
Artistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI CuratorArtistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI Curator
 
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
 
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology

  • 1. Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data Sergül Aydöre Assistant Professor Electrical and Computer Engineering Stevens Institute of Technology
  • 2. 2 Landscape of Machine Learning Applications https://research.hubspot.com/charts/simplified-ai-landscape​
  • 3. • Data is High Dimensional, Noisy and Sample Size is small as in NeuroImaging 3 But what if PET acquisition process wikipedia Implantation of intracranial electrodes. Cleveland Epilepsy Clinic An elastic EEG cap with 60 electrodes [Bai2012] A typical MEG equipment [BML2001] MRI Scanner and rs-fMRI time series acquisition [NVIDIA]
  • 4. 4 Other High Dimensional, Noisy Data and Small Sample Size Situations Genomics Integrative Genomics Viewer, 2012 Seismology https://www.mapnagroup.com Astronomy AstronomyMagazine, 2015
  • 5. 5 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure
  • 6. 6 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure 2. Noise in the data due to mechanical or physical artifacts.
  • 7. 7 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure 2. Noise in the data due to mechanical or physical artifacts. 3. Difficulty and cost of data collection
  • 8. 8 Overfitting • ML models with large number of parameters require large amount of data. Otherwise, overfitting can occur! http://scott.fortmann-roe.com/docs/MeasuringError.html
  • 9. 9 Regularization Methods to overcome Overfitting • Early Stopping [Yao, 2007] • Ridge Regression (ℓ2 regularization) [Tibshirami 1996] • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) [Tibshirami 1996] • Dropout [Srivastana 2014] • Group Lasso [Yuan 2016]
  • 10. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso SPARSITY
  • 11. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso STOCHASTICITY SPARSITY
  • 12. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso STOCHASTICITY STRUCTURE & SPARSITY 12 SPARSITY
  • 13. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso • PROPOSED: STRUCTURE & STOHASTICITY STOCHASTICITY STRUCTURE & SPARSITY 13 SPARSITY
  • 14. 14 Problem Setting: Supervised Learning • Training samples: drawn from • Parameters of the model are estimated by: Loss per sample
  • 15. 15 Multinomial Logistic Regression • The class label probability of a given input is: • Hence, the parameter space is • The loss per sample is:
  • 16. 16 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 17. 17 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 18. 18 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 19. 19 FeatureDropoutMatrices Randomly picked matrix Dropout for Multinomial Logistic Regression
  • 20. 20 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Dropout for Multinomial Logistic Regression
  • 21. 21 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Dropout for Multinomial Logistic Regression
  • 22. 22 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Back Propagation Dropout for Multinomial Logistic Regression
  • 23. 23 StructuredProjectionMatrices PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Back Propagation Replace Masking with Structured Matrices Randomly picked matrix
  • 24. 24 Replace Masking with Structured Matrices
  • 25. 25 Replace Masking with Structured Matrices Each is generated from random samples (size r) with replacement from the training data set (size n).
  • 26. 26 Replace Masking with Structured Matrices
  • 27. 27 Replace Masking with Structured Matrices
  • 28. 28 Replace Masking with Structured Matrices
  • 29. 29 Replace Masking with Structured Matrices
  • 30. 30 Replace Masking with Structured Matrices We project the training samples onto a lower dimensional space by . Hence, weight matrix becomes: approximate x
  • 31. 31 Replace Masking with Structured Matrices To update , we project the gradients back to the original space
  • 32. 32 Replace Masking with Structured Matrices No projection is necessary for the bias term.
  • 33. 33 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 34. 34 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 35. 35 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 36. 36 Recursive Nearest Agglomeration Clustering (ReNA) Hoyos-Idrobo 2016 • Agglomerative clustering schemes start off by placing every data element in its own cluster. • They proceed by merging repeatedly the closest pair of connected clusters until finding the desired number of clusters.
  • 37. 37 Insights: Random Reductions While Fitting • Let where is the deterministic term and is the zero-mean noise term. Loss on the smoothed input Regularization Cost variance of the model given the smooth input features variance of the estimated target due to the randomization
  • 38. Insights: Random Reductions While Fitting • Regularization Cost: • For dropout, we have and is diagonal matrix where for . • This is equivalent to ridge regression after “orthogonalizing” the features. Constant for linear regression
  • 40. 40 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 41. 41 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 42. 42 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 43. 43 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 44. 44 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 45. 45 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 55. 55 Experimental Results: Olivetti Faces • Visualization of the learned weights for logistic regression for a single Olivetti face with high noise using different regularizers.
  • 56. 56 Experimental Results: Olivetti Faces • Performance in terms of loss as a function of computation time for MLP with a single layer using feature grouping and best parameters for other regularizers, for Olivetti face data with high noise.
  • 57. 57 Experimental Results: Neuroimaging Data Set • Openly accessible fMRI data set from Human Connectome Project • 500 subjects, 8 cognitive tasks to classify • Feature dimension: 33854, training set: 3052 samples, test set: 791 samples
  • 60. 60 Summary – Stochastic Regularizer • We introduced a stochastic regularizer based on feature averaging that captures the structure of data. • Our approach leads to higher accuracy at high noise settings without additional computation time. • Learned weights have more structure at high noise settings.
  • 61. 61 Collaborators and References • S. Aydore, B. Thirion, O. Grisel, G. Varoquaux. “Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data”, Women in Machine Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1807.11718. • S. Aydore, L. Dicker, D. Foster.“A local Regret in Nonconvex Online Learning”, Continual Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1811.05095. Bertrand Thirion (INRIA, France) Olivier Grisel (INRIA, France) Gaël Varoquaux (INRIA, France) Dean Foster (Amazon & University of Pennsylvania) Lee Dicker (Amazon & University of Rutgers)
  • 62. Thank You More on my website… http://www.sergulaydore.com

Hinweis der Redaktion

  1. In the graphic below, the x-axis reflects the level of technical sophistication the AI tool has. The y-axis represents the mass appeal of the tool. Here is a landscape for the popular machine learning applications. It is of course very exciting to see such progress in AI. But all these applications require massive amounts of data to train machine learning models.
  2. Some fields such as brain imaging often does not have such massive amounts of samples whereas the dimension of the features is large due to the rich spatial and temporal information.
  3. This problem is not limited to brain imaging. There are other fields which also suffer from small-sample data situations.
  4. The performance of machine learning models is often evaluated by their prediction ability on unseen data. While each iteration of model training decreases the training risk, fitting the training data too well can lead to failure in generalization on future predictions. This phenomenon is often called ``overfitting’’ in the field of machine learning. The risk of overfitting is more severe for high-dimensional data-scarce situations. Such situations are common when the data collection is expensive, as in neuroscience, biology, or geology.
  5. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.
  6. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.
  7. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.