SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
DEEP LEARNING &
APPLICATIONS IN
NON-COGNITIVE DOMAINS
5/12/16 1
Truyen Tran
Deakin University
truyen.tran@deakin.edu.au
prada-research.net/~truyen
Source: rdn consulting
AusDM’16, Canberra, Dec 7th 2016
PRADA @ DEAKIN, GEELONG CAMPUS
5/12/16 2
RESOURCES
 Tutorial page:
 http://prada-research.net/~truyen/AusDM16-
tute.html
 Thanks to many people who have created beautiful
graphics & open-source programming frameworks.
5/12/16 3
GOOGLE TRENDS
5/12/16 4
“deep learning”
+ data
“deep
learning” +
intelligence
Dec, 2016
2012
Geoff Hinton
2006
Yann LeCun
1988
1986
http://blog.refu.co/wp-content/uploads/2009/05/mlp.png
2016
Rosenblatt’s
perceptron
1958
DEEP LEARNING IN COGNITIVE DOMAINS
5/12/16 6
http://media.npr.org/
http://cdn.cultofmac.com/
Where human can
recognise, act or answer
accurately within seconds
blogs.technet.microsoft.com
A Ă  B
(ANDREW NG, BAIDU)
5/12/16 7
dbta.com
DEEP LEARNING IN NON-COGNITIVE DOMAINS
­  Where humans need extensive training to do well
­  Domains that demand transparency & interpretability.
  … healthcare
  … security
  … genetics, foods, water …
5/12/16 8
 TEKsystems
AGENDA
 Part I: Theory
 Introduction to (mostly supervised) deep learning
 Part II: Practice
 Applying deep learning to non-cognitive domains
 Part III: Advanced topics
 Position yourself
5/12/16 9
PART I: THEORY
INTRODUCTION TO (MOSTLY SUPERVISED) DEEP LEARNING
 Neural net as function approximation & feature detector
 Three architectures: FFN → RNN → CNN
 Bag of tricks: dropout → piece-wise linear units → skip-connections
→ adaptive stochastic gradient
5/12/16 10
PART II: PRACTICE
APPLYING DEEP LEARNING TO NON-COGNITIVE DOMAINS
 Hand-on:
­ Introducing programming frameworks
(Theano, TensorFlow, Keras, Mxnet)
 Domains how-to:
­ Healthcare
­ Software engineering
­ Anomaly detection
5/12/16 11
PART III: ADVANCED TOPICS
POSITION YOURSELF
 Unsupervised learning
 Relational data & structured outputs
 Memory & attention
 How to position ourselves
5/12/16 12
PART I: THEORY
5/12/16 13
andreykurenkov.com
http://blog.refu.co/wp-content/uploads/2009/05/mlp.png
WHAT IS DEEP LEARNING?
 Fast answer: multilayer perceptrons (aka deep
neural networks) of the 1980s rebranded in
2006.
­ But early nets go stuck at 1-2 hidden layers.
 Slow answer: distributed representation, multiple
steps of computation, modelling the
compositionality of the world, a better prior,
advances in compute, data & optimization, neural
architectures, etc.
5/12/16 14
POWERFUL ARCHITECTURE: RESNET
5/12/16 15
http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html
http://blog.refu.co/wp-content/uploads/2009/05/mlp.png
20151986
WHY DEEP LEARNING?
 Because it works!
­ Mostly performance-driven
 But why does it work?
­ Theory under-developed
 Let’s examine learning
principles first.
5/12/16 16
https://www.quora.com/What-is-the-state-of-the-art-today-on-ImageNet-classification-In-other-words-has-anybody-beaten-Deep-Residual-Learning
RECAP: THE BEST OF MACHINE LEARNING
 Strong/flexible priors:
­ Good features à Feature engineering
­ Data structure à HMM, CRF, MRF, Bayesian nets
­ Model structure, VC-dimension, regularisation, sparsity à SVM, compressed sensing
­ Manifold assumption, class/region separation à Metric + semi-supervised learning
­ Factors of variation à PCA, ICA, FA
 Uncertainty quantification: Bayesian, ensemble à RF, GBM
 Sharing statistical strength: model reuse à transfer learning, domain adaption, multitask
learning, lifelong learning
http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html
PRACTICAL REALISATION
 More data
 More GPUs
 Bigger models
 Better models
 Faster iterations
 Pushing the limit of patience (best models take
2-3 weeks to run)
 A LOT OF NEW TRICKS
Data as new fuel
STARTING POINT: FEATURE LEARNING
 In typical machine learning projects, 80-90% effort is on feature engineering
­ A right feature representation doesn’t need much work. Simple linear methods often work
well.
 Text: BOW, n-gram, POS, topics, stemming, tf-idf, etc.
 SW: token, LOC, API calls, #loops, developer reputation, team complexity,
report readability, discussion length, etc.
 Try yourself on Kaggle.com!
5/12/16 19
THE BUILDING BLOCKS:
FEATURE DETECTOR
5/12/16 20
Integrate-and-fire neuron
andreykurenkov.com
signals
feedbacks
Feature detector
Block representation
THREE MAIN ARCHITECTURES: FNN, RNN & CNN
 Feed-forward (FFN): Function approximator (Vector to vector)
­ Most existing ML/statistics fall into this category
 Recurrent (RNN): Sequence to sequence
­ Temporal, sequential. E.g., sentence, actions, DNA, EMR
­ Program evaluation/execution. E.g., sort, traveling salesman problem
 Convolutional (CNN): Image to vector/sequence/image
­ In time: Speech, DNA, sentences
­ In space: Image, video, relations
5/12/16 21
FEED-FORWARD NET (FFN)
VEC2VEC MAPPING
5/12/16 22
Forward pass: f(x) that
carries units’ contribution
to outcome
Backward pass: f’(x)
that assigns credits
to units
 Problems with depth:
 Multiple chained non-linearity steps
 à neural saturation (top units have little signal from the
bottom)
 à vanishing gradient (bottom layers have much less
training information from the label).
TWO VIEWS OF FNN
v The functional view: FNN as nested
composition of functions
5/12/16 23
(Bengio, DLSS 2015)
v The representation view: FNN as staged
abstraction of data
SOLVING PROBLEM OF VANISHING GRADIENTS
 Principle: Enlarging the channel to pass feature and gradient
 Method 1: Removing saturation with piecewise linear units
­ Rectifier linear unit (ReLU)
 Method 2: Explicit copying through gating & skip-connection
­ Highway networks
­ Skip-connection: ResNet
METHOD 1: RECTIFIER LINEAR
TRANSFORMATION
 The usual logistic and tanh transforms are
saturated at infinity
 The gradient approaches zero, making
learning impossible
 Rectifier linear function has constant gradient,
making information flows much better
Source: https://imiloainf.wordpress.com/2013/11/06/rectifier-nonlinearities/
METHOD 2: SKIP-CONNECTION
WITH RESIDUAL NET
5/12/16 26
http://qiita.com/supersaiakujin/items/935bbc9610d0f87607e8
Theory
http://torch.ch/blog/2016/02/04/resnets.html
Practice
THREE MAIN ARCHITECTURES: FNN, RNN & CNN
 Feed-forward (FFN): Function approximator (Vector to vector)
­ Most existing ML/statistics fall into this category
 Recurrent (RNN): Sequence to sequence
­ Temporal, sequential. E.g., sentence, actions, DNA, EMR
­ Program evaluation/execution. E.g., sort, traveling salesman problem
 Convolutional (CNN): Image to vector/sequence/image
­ In time: Speech, DNA, sentences
­ In space: Image, video, relations
5/12/16 27
RECURRENT NET (RNN)
5/12/16 28
Example application: Language
model which is essentially to predict
the next word given previous words
in the sentence.
Embedding is often used to convert a
word into a vector, which can be
initialized from word2vec.
RNN IS POWERFUL BUT …
 RNN is Turing-complete (Hava T. Siegelmann and
Eduardo D. Sontag, 1991).
 Brain can be thought of as a giant RNN over
discrete time steps.
 But training is very difficult for long sequences
(more than 10 steps), due to:
­  Vanishing gradient
­  Exploding gradient
5/12/16 29
http://www.wildml.com/
SOLVING PROBLEM OF VANISHING/
EXPLODING GRADIENTS
 Trick 1: modifying the gradient, e.g., truncation for exploding gradient
(not always works)
 Trick 2: changing learning dynamics, e.g., adaptive gradient descent
partly solves the problem (will see later)
 Trick 3: modifying the information flow:
­ Explicit copying through gating: Gated Recurrent Unit (GRU, 2014)
­ Explicit memory: Long Short-Term Memory (LSTM, 1997)
LONG SHORT-TERM MEMORY (LSTM)
5/12/16 31
ct
it
ft
Input
RNN: WHERE IT WORKS
5/12/16 32
Classification
Image captioning
Sentence classification & embedding
Neural machine translation
Sequence labelling
Source: http://karpathy.github.io/assets/rnn/diags.jpeg
THREE MAIN ARCHITECTURES: FNN, RNN & CNN
 Feed-forward (FFN): Function approximator (Vector to vector)
­ Most existing ML/statistics fall into this category
 Recurrent (RNN): Sequence to sequence
­ Temporal, sequential. E.g., sentence, actions, DNA, EMR
­ Program evaluation/execution. E.g., sort, traveling salesman problem
 Convolutional (CNN): Image to vector/sequence/image
­ In time: Speech, DNA, sentences
­ In space: Image, video, relations
5/12/16 33
LEARNABLE CONVOLUTION AS FEATURE DETECTOR
(TRANSLATION INVARIANCE)
5/12/16 34
http://colah.github.io/posts/2015-09-NN-Types-FP/
Learnable kernels
andreykurenkov.com
Feature detector,
often many
CNN IS (CONVOLUTION Ă  POOLING)
REPEATED
5/12/16 35
F(x) = NeuralNet(Pooling(Rectifier(Conv(x)))
classifier max/mean nonlinearity feature detector
can be repeated N times - depth
adeshpande3.github.io
Design parameters:
•  Padding
•  Stride
•  #Filters (maps)
•  Filter size
•  Pooling size
•  #Layers
•  Activation function
CNN: WHERE IT WORKS
 Translation invariance: Image, video,
repeated motifs
 Examples:
­ Sentence – a sequence of words (used on
conjunction with word embedding)
­ Sentence – a sequence of characters
­ DNA sequence of {A,C,T,G}
­ Relations (e.g., right-to, left-to, father-of,
etc)
http://www.nature.com/nbt/journal/v33/n8/full/nbt.3300.html
DeepBind, Nature 2015
CURRENT WORK: COMBINATIONS OF
{FFN,RNN,CNN}
 Image classification: CNN + FFN
 Video modelling: CNN + RNN
 Image caption generation: CNN + RNN
 Sentence classification: CNN + FFN
 Sentence classification: RNN + FFN
 Regular shapes (chain, tree, grid): CNN |
RNN
https://vincentweisen.wordpress.com/2016/05/30/ammai-lecture-14-deep-learning-methods-for-image-captioning/
PRACTICAL REALISATION
 More data
 More GPUs
 Bigger models
 Better models
 Faster iterations
 Pushing the limit of priors
 Pushing the limit of patience (best models take 2-3
weeks to run)
 A LOT OF NEW TRICKS
http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html
Data as new fuel
TWO ISSUES IN LEARNING
1.  Slow learning and local traps
­  Very deep nets make gradients uninformative
­  Model uncertainty leads to rugged objective functions with exponentially many local
minima
2.  Data/model uncertainty and overfitting
­  Many models possible
­  Models are currently very big with hundreds of millions parameters
­  Deeper is more powerful, but more parameters.
SOLVING ISSUE OF SLOW LEARNING AND
LOCAL TRAPS
 Redesign model, e.g., using skip-connections
 Sensible initialisation
 Adaptive stochastic gradient descent
SENSIBLE INITIALISATION
 Random Gaussian initialisation often works
well
­ TIP: Control the fan-in/fan-out norms
 If not, use pre-training
­ When no existing models: Unsupervised learning
(e.g., word2vec, language models,
autoencoders)
­ Transfer from other models, e.g., popular in
vision with AlexNet, Inception, ResNet, etc.
Source: (Erhan et al, JMLR'10, Fig 6)
STOCHASTIC GRADIENT DESCENT (SGD)
 Using mini-batch to smooth out gradient
 Use large enough learning rate to get over poor local minima
 Periodically reduce the learning rate to land into a good local
minima
 It sounds like simulated annealing, but without proven global
minima
 Works well in practice since the energy landscape is a funnel
Source: https://charlesmartin14.wordpress.com/2015/03/25/why-does-deep-learning-work/
ADAPTIVE SGDS
 Problems with SGD
­  Poor gradient information, ill-conditioning, slow convergence rate
­  Scheduling learning rate is an art
­  Pathological curvature
 Speed search: Exploiting local search direction with momentum
 Rescaling gradient with Adagrad
 A smoother version of Adagrad: RMSProp (usually good for RNNs)
 All tricks combined: Adam (usually good for most jobs)
5/12/16 44
Source: Alec Radford
Init learning rate
Gradient
Smoothing factor
Previous gradients
Adagrad(Duchi et al, 2011)
SOLVING ISSUE OF DATA/MODEL
UNCERTAINTY AND OVERFITTING
 Dropouts as fast ensemble/Bayesian
 Max-norm: hidden units, channel and path
Dropout is known as the best trick in the past 10 years
DROPOUTS
 A method to build ensemble of neural nets at zero cost
­ For each param update, for each data point, randomly remove part of hidden units
Source: (Srivastava et al, JMLR 2014)
DROPOUTS AS ENSEMBLE
 A method to build ensemble of neural nets at zero cost
­ There are ​2↑𝐾  such options for K units
­ At the end, adjust the param with the same proportion
 Only one model reported, but with the effect of n models, where n is number of
data points.
 Can be extended easily to:
­ Drop features
­ Drop connections
­ Drop any components
PART I: WRAPPING UP
5/12/16 48
andreykurenkov.com
TWO MAJOR VIEWS OF “DEPTH” IN DEEP
LEARNING
­ [2006-2012] Learning layered representations, from raw data to abstracted goal (DBN, DBM, SDAE, GSN).
­  Typically 2-3 layers.
­  High hope for unsupervised learning. A conference set up for this: ICLR, starting in 2013.
­  We will return in Part III.
­ [1991-1997] & [2012-2016] Learning using multiple steps, from data to goal (LSTM/GRU, NTM/DNC, N2N
Mem, HWN, CLN).
­  Reach hundreds if not thousands layers.
­  Learning as credit-assignment.
­  Supervised learning won.
­  Unsupervised learning took a detour (VAE, GAN, NADE/MADE).
5/12/16 49
WHEN DOES DEEP LEARNING WORK?
 Lots of data (e.g., millions)
 Strong, clean training signals (e.g., when human can provide correct labels –
cognitive domains).
­ Andrew Ng of Baidu: When humans do well within sub-second.
 Data structures are well-defined (e.g., image, speech, NLP, video)
 Data is compositional (luckily, most data are like this)
 The more primitive (raw) the data, the more benefit of using deep learning.
5/12/16 50
IN CASE YOU FORGOT, DEEP LEARNING
MODELS ARE COMBINATIONS OF
 Three models:
­ FFN (layered vectors)
­ RNN (recurrence for sequences)
­ CNN (translation invariance +
pooling)
Ă  Architecture engineering!
 A bag of tricks:
­ dropout
­ piece-wise linear units (i.e., ReLU)
­ adaptive stochastic gradient descent
­ data augmentation
­ skip-connections
END OF PART I
5/12/16 52

Weitere ähnliche Inhalte

Was ist angesagt?

Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research Deakin University
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityDeakin University
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryDeakin University
 
From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoningDeakin University
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical scienceDeakin University
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal clusterVenkat Projects
 
CI image processing
CI image processing CI image processing
CI image processing Meenakshi Sood
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning pptBalneSridevi
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueVijayananda Mohire
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingYanbin Kong
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
 
COM623M1.doc.doc
COM623M1.doc.docCOM623M1.doc.doc
COM623M1.doc.docbutest
 

Was ist angesagt? (20)

Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin University
 
Visual reasoning
Visual reasoningVisual reasoning
Visual reasoning
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Deep Learning 2.0
Deep Learning 2.0Deep Learning 2.0
Deep Learning 2.0
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
From deep learning to deep reasoning
From deep learning to deep reasoningFrom deep learning to deep reasoning
From deep learning to deep reasoning
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical science
 
5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster5.local community detection algorithm based on minimal cluster
5.local community detection algorithm based on minimal cluster
 
CI image processing
CI image processing CI image processing
CI image processing
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Test PDF
Test PDFTest PDF
Test PDF
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
PointNet
PointNetPointNet
PointNet
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
COM623M1.doc.doc
COM623M1.doc.docCOM623M1.doc.doc
COM623M1.doc.doc
 

Ähnlich wie Deep learning and applications in non-cognitive domains I

DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkDoug Chang
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardLarry Smarr
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchinside-BigData.com
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
The Seven Main Challenges of an Early Warning System Architecture
The Seven Main Challenges of an Early Warning System ArchitectureThe Seven Main Challenges of an Early Warning System Architecture
The Seven Main Challenges of an Early Warning System Architecturestreamspotter
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...jaliyae
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?inside-BigData.com
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesEmanuele Panigati
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Big data classification based on improved parallel k-nearest neighbor
Big data classification based on improved parallel k-nearest neighborBig data classification based on improved parallel k-nearest neighbor
Big data classification based on improved parallel k-nearest neighborTELKOMNIKA JOURNAL
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenFlorian Stegmaier
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RPoo Kuan Hoong
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 

Ähnlich wie Deep learning and applications in non-cognitive domains I (20)

DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
BRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning TalkBRV CTO Summit Deep Learning Talk
BRV CTO Summit Deep Learning Talk
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
The Seven Main Challenges of an Early Warning System Architecture
The Seven Main Challenges of an Early Warning System ArchitectureThe Seven Main Challenges of an Early Warning System Architecture
The Seven Main Challenges of an Early Warning System Architecture
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?The convergence of HPC and BigData: What does it mean for HPC sysadmins?
The convergence of HPC and BigData: What does it mean for HPC sysadmins?
 
Ph. D. Final Dissertation SLides
Ph. D. Final Dissertation SLidesPh. D. Final Dissertation SLides
Ph. D. Final Dissertation SLides
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Big data classification based on improved parallel k-nearest neighbor
Big data classification based on improved parallel k-nearest neighborBig data classification based on improved parallel k-nearest neighbor
Big data classification based on improved parallel k-nearest neighbor
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Handwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with RHandwritten Recognition using Deep Learning with R
Handwritten Recognition using Deep Learning with R
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 

Mehr von Deakin University

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeakin University
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
 
Deep analytics via learning to reason
Deep analytics via learning to reasonDeep analytics via learning to reason
Deep analytics via learning to reasonDeakin University
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsDeakin University
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemicDeakin University
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate changeDeakin University
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional dataDeakin University
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeakin University
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeakin University
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicineDeakin University
 

Mehr von Deakin University (14)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advances
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
 
Deep analytics via learning to reason
Deep analytics via learning to reasonDeep analytics via learning to reason
Deep analytics via learning to reason
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI Landscape
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemic
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate change
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional data
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 
AI that/for matters
AI that/for mattersAI that/for matters
AI that/for matters
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and future
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicine
 

KĂźrzlich hochgeladen

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 

KĂźrzlich hochgeladen (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Deep learning and applications in non-cognitive domains I

  • 1. DEEP LEARNING & APPLICATIONS IN NON-COGNITIVE DOMAINS 5/12/16 1 Truyen Tran Deakin University truyen.tran@deakin.edu.au prada-research.net/~truyen Source: rdn consulting AusDM’16, Canberra, Dec 7th 2016
  • 2. PRADA @ DEAKIN, GEELONG CAMPUS 5/12/16 2
  • 3. RESOURCES  Tutorial page:  http://prada-research.net/~truyen/AusDM16- tute.html  Thanks to many people who have created beautiful graphics & open-source programming frameworks. 5/12/16 3
  • 4. GOOGLE TRENDS 5/12/16 4 “deep learning” + data “deep learning” + intelligence Dec, 2016
  • 6. DEEP LEARNING IN COGNITIVE DOMAINS 5/12/16 6 http://media.npr.org/ http://cdn.cultofmac.com/ Where human can recognise, act or answer accurately within seconds blogs.technet.microsoft.com
  • 7. A Ă  B (ANDREW NG, BAIDU) 5/12/16 7
  • 8. dbta.com DEEP LEARNING IN NON-COGNITIVE DOMAINS ­  Where humans need extensive training to do well ­  Domains that demand transparency & interpretability.   … healthcare   … security   … genetics, foods, water … 5/12/16 8  TEKsystems
  • 9. AGENDA  Part I: Theory  Introduction to (mostly supervised) deep learning  Part II: Practice  Applying deep learning to non-cognitive domains  Part III: Advanced topics  Position yourself 5/12/16 9
  • 10. PART I: THEORY INTRODUCTION TO (MOSTLY SUPERVISED) DEEP LEARNING  Neural net as function approximation & feature detector  Three architectures: FFN → RNN → CNN  Bag of tricks: dropout → piece-wise linear units → skip-connections → adaptive stochastic gradient 5/12/16 10
  • 11. PART II: PRACTICE APPLYING DEEP LEARNING TO NON-COGNITIVE DOMAINS  Hand-on: ­ Introducing programming frameworks (Theano, TensorFlow, Keras, Mxnet)  Domains how-to: ­ Healthcare ­ Software engineering ­ Anomaly detection 5/12/16 11
  • 12. PART III: ADVANCED TOPICS POSITION YOURSELF  Unsupervised learning  Relational data & structured outputs  Memory & attention  How to position ourselves 5/12/16 12
  • 13. PART I: THEORY 5/12/16 13 andreykurenkov.com
  • 14. http://blog.refu.co/wp-content/uploads/2009/05/mlp.png WHAT IS DEEP LEARNING?  Fast answer: multilayer perceptrons (aka deep neural networks) of the 1980s rebranded in 2006. ­ But early nets go stuck at 1-2 hidden layers.  Slow answer: distributed representation, multiple steps of computation, modelling the compositionality of the world, a better prior, advances in compute, data & optimization, neural architectures, etc. 5/12/16 14
  • 15. POWERFUL ARCHITECTURE: RESNET 5/12/16 15 http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html http://blog.refu.co/wp-content/uploads/2009/05/mlp.png 20151986
  • 16. WHY DEEP LEARNING?  Because it works! ­ Mostly performance-driven  But why does it work? ­ Theory under-developed  Let’s examine learning principles first. 5/12/16 16 https://www.quora.com/What-is-the-state-of-the-art-today-on-ImageNet-classification-In-other-words-has-anybody-beaten-Deep-Residual-Learning
  • 17. RECAP: THE BEST OF MACHINE LEARNING  Strong/flexible priors: ­ Good features Ă  Feature engineering ­ Data structure Ă  HMM, CRF, MRF, Bayesian nets ­ Model structure, VC-dimension, regularisation, sparsity Ă  SVM, compressed sensing ­ Manifold assumption, class/region separation Ă  Metric + semi-supervised learning ­ Factors of variation Ă  PCA, ICA, FA  Uncertainty quantification: Bayesian, ensemble Ă  RF, GBM  Sharing statistical strength: model reuse Ă  transfer learning, domain adaption, multitask learning, lifelong learning
  • 18. http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html PRACTICAL REALISATION  More data  More GPUs  Bigger models  Better models  Faster iterations  Pushing the limit of patience (best models take 2-3 weeks to run)  A LOT OF NEW TRICKS Data as new fuel
  • 19. STARTING POINT: FEATURE LEARNING  In typical machine learning projects, 80-90% effort is on feature engineering ­ A right feature representation doesn’t need much work. Simple linear methods often work well.  Text: BOW, n-gram, POS, topics, stemming, tf-idf, etc.  SW: token, LOC, API calls, #loops, developer reputation, team complexity, report readability, discussion length, etc.  Try yourself on Kaggle.com! 5/12/16 19
  • 20. THE BUILDING BLOCKS: FEATURE DETECTOR 5/12/16 20 Integrate-and-fire neuron andreykurenkov.com signals feedbacks Feature detector Block representation
  • 21. THREE MAIN ARCHITECTURES: FNN, RNN & CNN  Feed-forward (FFN): Function approximator (Vector to vector) ­ Most existing ML/statistics fall into this category  Recurrent (RNN): Sequence to sequence ­ Temporal, sequential. E.g., sentence, actions, DNA, EMR ­ Program evaluation/execution. E.g., sort, traveling salesman problem  Convolutional (CNN): Image to vector/sequence/image ­ In time: Speech, DNA, sentences ­ In space: Image, video, relations 5/12/16 21
  • 22. FEED-FORWARD NET (FFN) VEC2VEC MAPPING 5/12/16 22 Forward pass: f(x) that carries units’ contribution to outcome Backward pass: f’(x) that assigns credits to units  Problems with depth:  Multiple chained non-linearity steps  à neural saturation (top units have little signal from the bottom)  à vanishing gradient (bottom layers have much less training information from the label).
  • 23. TWO VIEWS OF FNN v The functional view: FNN as nested composition of functions 5/12/16 23 (Bengio, DLSS 2015) v The representation view: FNN as staged abstraction of data
  • 24. SOLVING PROBLEM OF VANISHING GRADIENTS  Principle: Enlarging the channel to pass feature and gradient  Method 1: Removing saturation with piecewise linear units ­ Rectifier linear unit (ReLU)  Method 2: Explicit copying through gating & skip-connection ­ Highway networks ­ Skip-connection: ResNet
  • 25. METHOD 1: RECTIFIER LINEAR TRANSFORMATION  The usual logistic and tanh transforms are saturated at infinity  The gradient approaches zero, making learning impossible  Rectifier linear function has constant gradient, making information flows much better Source: https://imiloainf.wordpress.com/2013/11/06/rectifier-nonlinearities/
  • 26. METHOD 2: SKIP-CONNECTION WITH RESIDUAL NET 5/12/16 26 http://qiita.com/supersaiakujin/items/935bbc9610d0f87607e8 Theory http://torch.ch/blog/2016/02/04/resnets.html Practice
  • 27. THREE MAIN ARCHITECTURES: FNN, RNN & CNN  Feed-forward (FFN): Function approximator (Vector to vector) ­ Most existing ML/statistics fall into this category  Recurrent (RNN): Sequence to sequence ­ Temporal, sequential. E.g., sentence, actions, DNA, EMR ­ Program evaluation/execution. E.g., sort, traveling salesman problem  Convolutional (CNN): Image to vector/sequence/image ­ In time: Speech, DNA, sentences ­ In space: Image, video, relations 5/12/16 27
  • 28. RECURRENT NET (RNN) 5/12/16 28 Example application: Language model which is essentially to predict the next word given previous words in the sentence. Embedding is often used to convert a word into a vector, which can be initialized from word2vec.
  • 29. RNN IS POWERFUL BUT …  RNN is Turing-complete (Hava T. Siegelmann and Eduardo D. Sontag, 1991).  Brain can be thought of as a giant RNN over discrete time steps.  But training is very difficult for long sequences (more than 10 steps), due to: ­  Vanishing gradient ­  Exploding gradient 5/12/16 29 http://www.wildml.com/
  • 30. SOLVING PROBLEM OF VANISHING/ EXPLODING GRADIENTS  Trick 1: modifying the gradient, e.g., truncation for exploding gradient (not always works)  Trick 2: changing learning dynamics, e.g., adaptive gradient descent partly solves the problem (will see later)  Trick 3: modifying the information flow: ­ Explicit copying through gating: Gated Recurrent Unit (GRU, 2014) ­ Explicit memory: Long Short-Term Memory (LSTM, 1997)
  • 31. LONG SHORT-TERM MEMORY (LSTM) 5/12/16 31 ct it ft Input
  • 32. RNN: WHERE IT WORKS 5/12/16 32 Classification Image captioning Sentence classification & embedding Neural machine translation Sequence labelling Source: http://karpathy.github.io/assets/rnn/diags.jpeg
  • 33. THREE MAIN ARCHITECTURES: FNN, RNN & CNN  Feed-forward (FFN): Function approximator (Vector to vector) ­ Most existing ML/statistics fall into this category  Recurrent (RNN): Sequence to sequence ­ Temporal, sequential. E.g., sentence, actions, DNA, EMR ­ Program evaluation/execution. E.g., sort, traveling salesman problem  Convolutional (CNN): Image to vector/sequence/image ­ In time: Speech, DNA, sentences ­ In space: Image, video, relations 5/12/16 33
  • 34. LEARNABLE CONVOLUTION AS FEATURE DETECTOR (TRANSLATION INVARIANCE) 5/12/16 34 http://colah.github.io/posts/2015-09-NN-Types-FP/ Learnable kernels andreykurenkov.com Feature detector, often many
  • 35. CNN IS (CONVOLUTION Ă  POOLING) REPEATED 5/12/16 35 F(x) = NeuralNet(Pooling(Rectifier(Conv(x))) classifier max/mean nonlinearity feature detector can be repeated N times - depth adeshpande3.github.io Design parameters: •  Padding •  Stride •  #Filters (maps) •  Filter size •  Pooling size •  #Layers •  Activation function
  • 36. CNN: WHERE IT WORKS  Translation invariance: Image, video, repeated motifs  Examples: ­ Sentence – a sequence of words (used on conjunction with word embedding) ­ Sentence – a sequence of characters ­ DNA sequence of {A,C,T,G} ­ Relations (e.g., right-to, left-to, father-of, etc) http://www.nature.com/nbt/journal/v33/n8/full/nbt.3300.html DeepBind, Nature 2015
  • 37. CURRENT WORK: COMBINATIONS OF {FFN,RNN,CNN}  Image classification: CNN + FFN  Video modelling: CNN + RNN  Image caption generation: CNN + RNN  Sentence classification: CNN + FFN  Sentence classification: RNN + FFN  Regular shapes (chain, tree, grid): CNN | RNN https://vincentweisen.wordpress.com/2016/05/30/ammai-lecture-14-deep-learning-methods-for-image-captioning/
  • 38. PRACTICAL REALISATION  More data  More GPUs  Bigger models  Better models  Faster iterations  Pushing the limit of priors  Pushing the limit of patience (best models take 2-3 weeks to run)  A LOT OF NEW TRICKS http://felixlaumon.github.io/2015/01/08/kaggle-right-whale.html Data as new fuel
  • 39. TWO ISSUES IN LEARNING 1.  Slow learning and local traps ­  Very deep nets make gradients uninformative ­  Model uncertainty leads to rugged objective functions with exponentially many local minima 2.  Data/model uncertainty and overfitting ­  Many models possible ­  Models are currently very big with hundreds of millions parameters ­  Deeper is more powerful, but more parameters.
  • 40. SOLVING ISSUE OF SLOW LEARNING AND LOCAL TRAPS  Redesign model, e.g., using skip-connections  Sensible initialisation  Adaptive stochastic gradient descent
  • 41. SENSIBLE INITIALISATION  Random Gaussian initialisation often works well ­ TIP: Control the fan-in/fan-out norms  If not, use pre-training ­ When no existing models: Unsupervised learning (e.g., word2vec, language models, autoencoders) ­ Transfer from other models, e.g., popular in vision with AlexNet, Inception, ResNet, etc. Source: (Erhan et al, JMLR'10, Fig 6)
  • 42. STOCHASTIC GRADIENT DESCENT (SGD)  Using mini-batch to smooth out gradient  Use large enough learning rate to get over poor local minima  Periodically reduce the learning rate to land into a good local minima  It sounds like simulated annealing, but without proven global minima  Works well in practice since the energy landscape is a funnel Source: https://charlesmartin14.wordpress.com/2015/03/25/why-does-deep-learning-work/
  • 43. ADAPTIVE SGDS  Problems with SGD ­  Poor gradient information, ill-conditioning, slow convergence rate ­  Scheduling learning rate is an art ­  Pathological curvature  Speed search: Exploiting local search direction with momentum  Rescaling gradient with Adagrad  A smoother version of Adagrad: RMSProp (usually good for RNNs)  All tricks combined: Adam (usually good for most jobs)
  • 44. 5/12/16 44 Source: Alec Radford Init learning rate Gradient Smoothing factor Previous gradients Adagrad(Duchi et al, 2011)
  • 45. SOLVING ISSUE OF DATA/MODEL UNCERTAINTY AND OVERFITTING  Dropouts as fast ensemble/Bayesian  Max-norm: hidden units, channel and path Dropout is known as the best trick in the past 10 years
  • 46. DROPOUTS  A method to build ensemble of neural nets at zero cost ­ For each param update, for each data point, randomly remove part of hidden units Source: (Srivastava et al, JMLR 2014)
  • 47. DROPOUTS AS ENSEMBLE  A method to build ensemble of neural nets at zero cost ­ There are ​2↑𝐾  such options for K units ­ At the end, adjust the param with the same proportion  Only one model reported, but with the effect of n models, where n is number of data points.  Can be extended easily to: ­ Drop features ­ Drop connections ­ Drop any components
  • 48. PART I: WRAPPING UP 5/12/16 48 andreykurenkov.com
  • 49. TWO MAJOR VIEWS OF “DEPTH” IN DEEP LEARNING ­ [2006-2012] Learning layered representations, from raw data to abstracted goal (DBN, DBM, SDAE, GSN). ­  Typically 2-3 layers. ­  High hope for unsupervised learning. A conference set up for this: ICLR, starting in 2013. ­  We will return in Part III. ­ [1991-1997] & [2012-2016] Learning using multiple steps, from data to goal (LSTM/GRU, NTM/DNC, N2N Mem, HWN, CLN). ­  Reach hundreds if not thousands layers. ­  Learning as credit-assignment. ­  Supervised learning won. ­  Unsupervised learning took a detour (VAE, GAN, NADE/MADE). 5/12/16 49
  • 50. WHEN DOES DEEP LEARNING WORK?  Lots of data (e.g., millions)  Strong, clean training signals (e.g., when human can provide correct labels – cognitive domains). ­ Andrew Ng of Baidu: When humans do well within sub-second.  Data structures are well-defined (e.g., image, speech, NLP, video)  Data is compositional (luckily, most data are like this)  The more primitive (raw) the data, the more benefit of using deep learning. 5/12/16 50
  • 51. IN CASE YOU FORGOT, DEEP LEARNING MODELS ARE COMBINATIONS OF  Three models: ­ FFN (layered vectors) ­ RNN (recurrence for sequences) ­ CNN (translation invariance + pooling) Ă  Architecture engineering!  A bag of tricks: ­ dropout ­ piece-wise linear units (i.e., ReLU) ­ adaptive stochastic gradient descent ­ data augmentation ­ skip-connections
  • 52. END OF PART I 5/12/16 52