SlideShare a Scribd company logo
1 of 42
Strictly Confidential
15/11/2016
Applications of Deep
Neural Network
Dr. Z. Xing
Lead of Deep Learning Taskforce, Data Science & Analytics @ NIO USA, Inc
3200 N 1st St, San Jose, CA 95134
September 13th, 2017 @ Tsinghua University, Beijing, China
2
• The volume and heterogeneity of the data that
we are dealing with nowadays has reached a
level of unprecedented complexities and
subtleties
Data “science”
• At the end of the day, data “science” all comes
down to understanding an intricate
representation of the data, so called
representation learning
A typical example of industrial
scale data
• ~ 40 M clicks per second
• ~ 2.5 M servers
• ~ 5.7 terawatt hours annually
≈ 68 M AC units
• ~10-15 exabytes (1018) ≈ 30
M personal laptops
• ….......
= ⨂
3
• Computer/machine can help on representation learning task, but only to a
certain extent…...
• “Conventional” machine learning approaches limited in various ways,
manifested as a difficult learning task when raw format of data is fed to the
system, just as how human learning system processes data.
• Feature extraction is a man-crafted process that needs careful engineering,
domain expertise, etc. Difficult to generalize, scale up with the increasing data
size/model
“Conventional” machine learning
• man-crafted features
• manually create
representation of the data
classification
illustration
purpose only
4
• Artificial intelligence (AI)
“The science and engineering of making intelligent
machines” John McCarthy 1955
Artificial intelligence
5
• Infancy concept of neurons/circuits
originates from Neuro-science,
biophysics, computational physics
Concept of neural network
• The electrical signals (voltage spikes) that
brain processes does NOT represent the
external world at all, how neurons decode
such signals is complicated process, in two
folds: (a) time-dependencies (transient
neuron functions), (b) electrical functionalities
of each cell, activations
• 1011 neurons in human brain, 1015
connections (connectionist….)
soma/body
synapses
6
Concept of “deep” learning
• Representation learning is the key advantage which allows raw format of
data processing, avert the need of man-crafted features
• Multiple levels of representations of data, multiple levels of abstraction.
Accommodate flexible rank of latent space, which locally resembles
Euclidean space
• Each level is often a non-linear module,
aggregating multiple levels allows system to learn
complicated physics
• Higher/deeper levels amplify the components of
input that are relevant/crucial to optimization goals
while suppressing the less relevant part
optimization goal
7
“Deep” against “shallow”
• We want our system to be selective on things that are relevant or important,
while being invariant to things that are not important, for example orientations
of the object, background color, so on and so forth
• “Shallow” or even linear classifier can only carve input space into an over-
simplified regions/hyper-planes
Wolf Samoyed
8
• Unsupervised learning, transfer learning (domain
adaption)
• Auto-encoder
• Variational Auto-encoder (VAE)
• Restricted Boltzmann machine
Different learning mechanisms
• Supervised learning
• Objective function measures an error (𝛿) between system output and
desired target; internal weights keep getting tuned to minimize this 𝛿,
guided by gradients
• However, optimization happens at the level of expected value over many
training instances
• Also optimization goal is to match between two patterns, not taking into
account an overall strategic goal (winning a chess game etc.)
• Stochastic gradient descent, Stochastic Gradient Descent Tricks, (SGD, Bottou, 2007, ref. 18)
• Diederik P. Kingma, Auto-Encoding Variational Bayes, arxiv 1312.6114
encoder
decoder
9
Selectivity–invariance dilemma
• Symmetries in the data: many tasks are invariant to transformations of the
data, for example the recognition task is invariant to changing in pose, light,
location…. (symmetries)
• Human brain can learn to recongnize objects after seen only a few examples
(unsupervised), while most machine learning systems need huge amount of
labelled data (supervised)
• Factoring out the symmetries from the data, while retaining selectivity, is the
key to build artificial intelligence that can compete with human intelligence
Classical learning theory focuses on supervised learning and
postulates that a suitable hypothesis space is given. In other
words, data representation and how to select and learn it, is
classically not considered to be part of the learning problem,
but rather as a prior information.
visual cortex
…...
• Fabio Anselmi, On Invariance and Selectivity in Representation Learning, arxiv 1503.05938
• Attempts of utilizing
group theory, group
average have been
made, on the theory
side, to derive
invariant
representation
learning
10
Local minima for large networks
• Numerical analysis in statistical physics, random matrix theory, neural network
theory shows that local minima rarely an issue for large networks
• Key difference being the dimensionality of the space; proliferation of saddle
points, rather than local minima becomes more relevant in solving high
dimensional problem
• Yann N. Dauphin, Yoshua Bengio et. al (2014), Identifying and attacking the saddle point problem in high-dimensional non-convex
optimization, arxiv 1406.2572
• Anna Choromanska et. al., The Loss Surfaces of Multilayer Networks, arxiv 412 0233
• Similar objective function at various
saddle points
• Statistics of Critical Points of Gaussian Fields on Large-Dimensional
Spaces, Bray and Dean (2007), Phys. Rev. Lett. 98, 150201
• Replica Symmetry Breaking Condition Exposed by Random Matrix
Calculation of Landscape Complexity, Fyodorov Williams (2007),
etc.
Fully connected layer
• Makes no assumption at all on the data
features
• Does not persist any invariance of the input
feature map
• Expensive in terms of computation and
memory consumption
• Multi-layer perceptron (MLP)
11
nonlinearities
nonlinearities
12
Backward propagation (BP)
• BP guides the computer to update
its internal parameters by using
Chain rule of derivatives
• The central problem that BP solves is to evaluate the influences of a parameter on a
function whose computation involves multiple elementary steps (Lagrangian
formalism)
Lagrange
function
Objective
function
Constraints
(network
dynamics)
Lagrange multiplier takes into account the backward
dynamics
Z. Xing, Measurement of the semileptonic CP violating asymmetry a_sl in B_s decays and
the D_s - D_s production asymmetry in 7 TeV pp collisions}", CERN-THESIS-2013-078",
https://inspirehep.net/record/1296591?ln=en
• Y Le Cun, A theoretical
framework of backward
propagation, Proceedings of the
1988 Connectionist Model
Summer School, p21-28, 1988
Convolutional layer
• Convolutional neural network
• 3-dimensional neurons
• local connectivity at each filter/kernel
(local features of data)
• weights-sharing between all neurons
in the same layer, usually named as
an unit of kernel/filter to be convoluted
with input volume (invariance of data)
• output 3D neurons
• depth of filter bank:
Do
• input 3D
neurons
𝐷 𝑜
𝐷𝑖
= 𝐷 𝑜× 𝐷𝑖 × 𝐹 × 𝐹
# of
learnable
weights
𝑁𝑜 =
𝑁𝑖 − 𝐹 + 2 × 𝑝
𝑠
+ 1
s: stride
• the kernels/filters
essential can pick
up latent features
such as brightness
of image, contrast,
RGB color, edges,
etc.
13
Connection to neuro-science
• One route of developing your deep neural net architecture is inspirations
from neuro-science, such as human visual cortex
• Cross channel information learning (cascaded 1x1 convolution) is
biologically inspired because human visual cortex have receptive fields
(kernels) tuned to different orientation
- local groups of
values are often
highly correlated
- invariance to
location, weights
sharing
14
Charles F. Cadieu et. al., Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition, PLOS
Computational Biology, December 2014, Volume 10, Issue 12
Translational invariance
• Convolutional layer relies on translational invariance
(convolution commutes with translation)
• local input regions
• only relative locations are taken into account
stride kernel size
fks determines layer types:
• convolutional
• max pooling
• activation function
translation operator
15
Recurrent architecture
• Recurrent network structures can be used to
learn potential temporal correlations/structures
in the data
• Once “unrolled” or “unfolded”, all layers share
the same weights, can be viewed as
feedforward networks, thus can be optimized
using BP (BPTT, through time)
• However, there is exploding or vanishing
gradient problem along the temporal axis
• Different formalisms and implementations of
recurrent activations are proposed (LSTM, fixed
unit recurrent weights, GRU, etc.) to alleviate
the issue as well as gradient clipping approach
xt=0
ht=0
yt=0
xt=1
ht=1
yt=1
xt=2
ht=2
yt=2
w>1 or
w<1
all these recurrent edges share
the same synaptic weights
16
LSTM – long short-term memory
• Special treatment: memory cells
• Novel inclusion of multiplicative nodes, all edges into or out of these nodes
have fixed unit weight, people used call this fixed unit weight as “constant
error carousel”
• A. Grave, Generating Sequences With Recurrent Neural Networks, arXiv 1308.0850
• A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence. Springer, 2012
recurrence
components
fixed unit weight alleviates
vanishing gradient problems
errors
memory
flushing
17
Sequence generation
• Recurrent network can be used for sequence generation
• a man riding a wave on top of a surfboard . (p=0.040413)
• a person riding a surf board on a wave (p=0.017452)
• a man riding a wave on a surfboard in the ocean (p=0.005743)
trainin
g
testing/inference
18
Gated Recurrent Unit (GRU)
• GRU also utilized gating unit to regulate the temporal flow, but with
a simple linear interpolation, instead of memory cell
• Kyunghyun Cho, On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, arXiv:1409.1259
LSTM GRU
reset
update
previousstates
previousstates
• Both LSTM and GRU
utilizes an additive
component when
updating the states which
keeps partial influences
from previous timestamp
19
Activation of hidden layers
• A neural network without any activation would simply be a linear
regression model. Activation function accommodates
sophisticated nonlinearities for data such as images, videos,
speeches, etc.
o Sigmoid function: saturation causes vanishing gradients, slow
convergence, not zero-centered
o Tanh: vanishing gradient problem
o ReLu: avoids and rectifies vanishing gradient, no need for input
normalization, could result in “dead” neurons
o “Leaky” ReLu, pReLu (parameterized ReLu)
o Human neuron activations can actually be a stochastic process
20
Normalization
• Local response normalization. Normalize across neighboring kernels,
lateral inhibition, competition for big activities across neurons computed by
different kernel/filter
• Batch normalization, reducing internal covariate shift
(ICS)
• “Whitening” input feature map accelerates the
training speed and convergence
• But simple normalization procedure may violate
the identify transform depending the non-linear
activation form
21
Pooling
• Summarize across neighboring groups of neurons in the same
kernel map to reduce computations, feature map size
• Less over-fitting
• Aggregates localized spatial information
Alternatives:
• Maximum
• Sum
• Average
• Weighted average with
distance from the center pixel
• Overlapped, non-overlapped
• …......
22
Output layer
• Training a deep neural network is a highly non convex optimization
problem that we usually solve using convex methods
• “Softmax” function: original motivation being treat the outputs of NN as
probabilities conditioned on the inputs, normalized to unity
𝑝 𝑦 = 𝑗 𝑧 𝑖
=
𝑒
𝑧 𝑗
(𝑖)
𝑗=0
𝑘
𝑒 𝑧 𝑘
(𝑖)
Anders Øland, Be Careful What You Back propagate: A Case For Linear Output Activations & Gradient Boosting, arxiv 1707.04199
• What output layer generates is actually not a probability distribution as we all
conjectured
• gradient boosting method,
exponentiating the errors
from the output layer, non-
normalized
23
Reduce over fitting
• Data augmentation…....
• “Drop-out” treatment, randomly drops neurons to prevent
overfitting, “re-scaling” needed when making inference
• Nitish Srivastava et. al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958
24
Image classification
• AlexNet, architecture contributions:
• ReLu, ~6 times faster than saturating approaches
• Local response normalization: ~2% increase in the
precision
• Reduce overfitting:
• data augmentation
• drop-out: a neuron cannot rely on the presence of
particular other neurons thus forced to learn in a
more robust manner
• A. Krizhevsky, ImageNet Classification with
Deep Convolutional Neural Networks, NIPS
2012
output layer: 1000-
way softmax
embedding vector
25
Network in network
• GoogleNet, Inception Net
• Key idea: how to use dense → sparse to improve computational
efficiency
• Local sparsity of using Network-in-network
• “1x1 convolution”, dimensionality reduction in rank of latent feature
manifold (cross-channel pooling layer)
• Hebbian principle – neurons that fire together, wire together
• Christian Szegedy et. al. Going deeper with convolutions
https://arxiv.org/pdf/1409.4842.pdf
• https://arxiv.org/pdf/1512.00567.pdf
• arXiv:1312.4400v3
NIN
enhance representational power
𝑁𝑜 =
𝑁𝑖 − 𝐹 + 2 × 𝑝
𝑠
+ 1
𝐷𝑖 𝐷 𝑜
5x5
3x3
1x1average
pooling
1x1
1x1
9 × 9 > 5 × 5 + 3 × 3 + 1 × 1 = 35
26
Bench mark results
9/13/2017 Invited talk at Tsinghua University
27
Object detection
• Object detection task imposes
additional request to a classifier in
terms of the localization of a/multiple
objects
• Sliding window approach (DPM,
deformable part model) is
computationally too expensive
• Regional proposal method adds some
prior hypothesis on regions that are
promising. However may have multiple
steps pipelined together (RPN for
objectness score, detection network,
classification network)
re-
purposed
28
You Only Look Once (YOLO)
• Labeling images for detection is far more expensive than labelling for
classification or tagging
• Leverage the classification data expands the scope of current detection
systems (transfer learning)
• YOLO is NOT a repurposed classifier
29
YOLO v2 improvements
• Batch norm: un-necessitate
the need of regularization,
drop-out
• Anchor box concept, remove
fully connected layers
• Higher resolution for classifier
part to better adapt to
detections
anchor box increases recall
and does change mAP
30
YOLO v2 results
• Results from most recent YOLO
paper
31
Semantic segmentation
• Approaches such as dilated convolutions are utilized to take
into account the context module in the picture, multi-scale
receptive field
• Enet https://arxiv.org/pdf/1606.02147.pdf
• SegNet https://arxiv.org/pdf/1511.00561.pdf
• “Dense” prediction problem with per-
pixel level precision required
• Context model crucial in this
application
• Typical “encoder-decoder”
architecture: network gets deeper
feature while map narrows down
32
Segmentation performances
• Metrics such as
intersection over union
(IOU) are used to
measure the performance
of segmentation
• Image quality tends to
influence the results
significantly
33
Three-dimensional data
• 3D segmentation, no “voxelization” or cross-
sectional rendering needed even on
unstructured data
• Permutation invariance, learning
transformation matrix of point cloud
https://arxiv.org/pdf/1704.03847.pdf
https://arxiv.org/pdf/1612.00593.pdf
combining the
global and local
per-point
embedding
34
Audio and natural language processing (NLP)
• Audio signals can be represented as a localized format, either in the
temporal or frequency/spectral domain
• Z. Xing et. al. Big Data (Big Data), 2016 IEEE International Conference
• Z. Xing et. al. https://arxiv.org/pdf/1705.05229.pdf
• Text/words can also be embedded, so called
“word vectors”
35
music embedding
Generative adversarial network (GAN)
• While discriminative models manifested with a great success, generative models had less
impact due to difficulties with intractable probabilistic computations (MLE). “Two-player”
min-max approach sidesteps this problem.
• Ian J. Goodfellow, et. al. Generative Adversarial Nets, arXiv:1406.2661v1
• Phillip Isola, Image-to-Image Translation with Conditional Adversarial Networks, arXiv:1611.07004v1
generatordiscriminator
D
G
• here z is some random noise
MinMax
36
Reinforcement learning concept
• Environment representation learning framework naturally follows
human/animal learning processes (“agent”-“environment” nomenclature).
Agent’s actions depends on the state, and may or may not change the
future environment
• Deep neural network re-enables R.L. by learning complex data
representation, without any hand-crafted feature extraction
• Agent state to action mapping depicted by a policy function, which can be
stochastic as well
• Volodymyr Mnih et. al. Human-level control through deep reinforcement learning, nature14236, 2015
37
Formalisms
• Value-base approaches such as Deep Q-Network (DQN):
• Learn value function, implicit policy function (ε-greedy)
• “Experience replay” utilizes to remove correlation that causes
divergence problems of R.L.
• Solely in the context of MDP assumption
• Policy-based approaches such policy gradient method
• No value function, learn policy
• High variance issue
• MDP not necessarily assumed
• Actor-Critic
state actionrewards
policy
38
Applications
• Navigating through an intersection under complicated environments
• David Isele, Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning, arXiv:1705.01196
• Motion negotiations
between “agents” under
dynamically changing
environment
• Shai Shalev-Shwartz et. al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, arXiv:1610.03295v1
39
40
Summary and future research
• Unsupervised learning
• human learns the world more naturally by discovering,
rather than supervision
• Convolutional networks combining with recurrence to takes
into account the temporal correlations, thus make
predictions in a dynamic fashion
• Reinforcement learning to pre-guide the learning into the
“ROI” (region of interest)
data representation
learning
complex
reasoning
Backup slides
41
Thank You

More Related Content

What's hot

Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Neural networks
Neural networksNeural networks
Neural networksBasil John
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural NetworkOmkar Lokhande
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Basics of Artificial Neural Network
Basics of Artificial Neural Network Basics of Artificial Neural Network
Basics of Artificial Neural Network Subham Preetam
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computingVajira Thambawita
 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaEr. Arpit Sharma
 
Neural networks...
Neural networks...Neural networks...
Neural networks...Molly Chugh
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkBurhan Muzafar
 

What's hot (20)

Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Soft computing01
Soft computing01Soft computing01
Soft computing01
 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Neural networks
Neural networksNeural networks
Neural networks
 
Neural network
Neural networkNeural network
Neural network
 
Neural
NeuralNeural
Neural
 
MaLAI_Hyderabad presentation
MaLAI_Hyderabad presentationMaLAI_Hyderabad presentation
MaLAI_Hyderabad presentation
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural Network
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Basics of Artificial Neural Network
Basics of Artificial Neural Network Basics of Artificial Neural Network
Basics of Artificial Neural Network
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharma
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 

Similar to Tsinghua invited talk_zhou_xing_v2r0

MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxSagarTekwani4
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017Iwan Sofana
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introductionZahra Amini
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptNJUSTAiMo
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural networksurat murthy
 

Similar to Tsinghua invited talk_zhou_xing_v2r0 (20)

Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Deep learning
Deep learningDeep learning
Deep learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Tsinghua invited talk_zhou_xing_v2r0

  • 1. Strictly Confidential 15/11/2016 Applications of Deep Neural Network Dr. Z. Xing Lead of Deep Learning Taskforce, Data Science & Analytics @ NIO USA, Inc 3200 N 1st St, San Jose, CA 95134 September 13th, 2017 @ Tsinghua University, Beijing, China
  • 2. 2 • The volume and heterogeneity of the data that we are dealing with nowadays has reached a level of unprecedented complexities and subtleties Data “science” • At the end of the day, data “science” all comes down to understanding an intricate representation of the data, so called representation learning A typical example of industrial scale data • ~ 40 M clicks per second • ~ 2.5 M servers • ~ 5.7 terawatt hours annually ≈ 68 M AC units • ~10-15 exabytes (1018) ≈ 30 M personal laptops • …....... = ⨂
  • 3. 3 • Computer/machine can help on representation learning task, but only to a certain extent…... • “Conventional” machine learning approaches limited in various ways, manifested as a difficult learning task when raw format of data is fed to the system, just as how human learning system processes data. • Feature extraction is a man-crafted process that needs careful engineering, domain expertise, etc. Difficult to generalize, scale up with the increasing data size/model “Conventional” machine learning • man-crafted features • manually create representation of the data classification illustration purpose only
  • 4. 4 • Artificial intelligence (AI) “The science and engineering of making intelligent machines” John McCarthy 1955 Artificial intelligence
  • 5. 5 • Infancy concept of neurons/circuits originates from Neuro-science, biophysics, computational physics Concept of neural network • The electrical signals (voltage spikes) that brain processes does NOT represent the external world at all, how neurons decode such signals is complicated process, in two folds: (a) time-dependencies (transient neuron functions), (b) electrical functionalities of each cell, activations • 1011 neurons in human brain, 1015 connections (connectionist….) soma/body synapses
  • 6. 6 Concept of “deep” learning • Representation learning is the key advantage which allows raw format of data processing, avert the need of man-crafted features • Multiple levels of representations of data, multiple levels of abstraction. Accommodate flexible rank of latent space, which locally resembles Euclidean space • Each level is often a non-linear module, aggregating multiple levels allows system to learn complicated physics • Higher/deeper levels amplify the components of input that are relevant/crucial to optimization goals while suppressing the less relevant part optimization goal
  • 7. 7 “Deep” against “shallow” • We want our system to be selective on things that are relevant or important, while being invariant to things that are not important, for example orientations of the object, background color, so on and so forth • “Shallow” or even linear classifier can only carve input space into an over- simplified regions/hyper-planes Wolf Samoyed
  • 8. 8 • Unsupervised learning, transfer learning (domain adaption) • Auto-encoder • Variational Auto-encoder (VAE) • Restricted Boltzmann machine Different learning mechanisms • Supervised learning • Objective function measures an error (𝛿) between system output and desired target; internal weights keep getting tuned to minimize this 𝛿, guided by gradients • However, optimization happens at the level of expected value over many training instances • Also optimization goal is to match between two patterns, not taking into account an overall strategic goal (winning a chess game etc.) • Stochastic gradient descent, Stochastic Gradient Descent Tricks, (SGD, Bottou, 2007, ref. 18) • Diederik P. Kingma, Auto-Encoding Variational Bayes, arxiv 1312.6114 encoder decoder
  • 9. 9 Selectivity–invariance dilemma • Symmetries in the data: many tasks are invariant to transformations of the data, for example the recognition task is invariant to changing in pose, light, location…. (symmetries) • Human brain can learn to recongnize objects after seen only a few examples (unsupervised), while most machine learning systems need huge amount of labelled data (supervised) • Factoring out the symmetries from the data, while retaining selectivity, is the key to build artificial intelligence that can compete with human intelligence Classical learning theory focuses on supervised learning and postulates that a suitable hypothesis space is given. In other words, data representation and how to select and learn it, is classically not considered to be part of the learning problem, but rather as a prior information. visual cortex …... • Fabio Anselmi, On Invariance and Selectivity in Representation Learning, arxiv 1503.05938 • Attempts of utilizing group theory, group average have been made, on the theory side, to derive invariant representation learning
  • 10. 10 Local minima for large networks • Numerical analysis in statistical physics, random matrix theory, neural network theory shows that local minima rarely an issue for large networks • Key difference being the dimensionality of the space; proliferation of saddle points, rather than local minima becomes more relevant in solving high dimensional problem • Yann N. Dauphin, Yoshua Bengio et. al (2014), Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, arxiv 1406.2572 • Anna Choromanska et. al., The Loss Surfaces of Multilayer Networks, arxiv 412 0233 • Similar objective function at various saddle points • Statistics of Critical Points of Gaussian Fields on Large-Dimensional Spaces, Bray and Dean (2007), Phys. Rev. Lett. 98, 150201 • Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity, Fyodorov Williams (2007), etc.
  • 11. Fully connected layer • Makes no assumption at all on the data features • Does not persist any invariance of the input feature map • Expensive in terms of computation and memory consumption • Multi-layer perceptron (MLP) 11 nonlinearities nonlinearities
  • 12. 12 Backward propagation (BP) • BP guides the computer to update its internal parameters by using Chain rule of derivatives • The central problem that BP solves is to evaluate the influences of a parameter on a function whose computation involves multiple elementary steps (Lagrangian formalism) Lagrange function Objective function Constraints (network dynamics) Lagrange multiplier takes into account the backward dynamics Z. Xing, Measurement of the semileptonic CP violating asymmetry a_sl in B_s decays and the D_s - D_s production asymmetry in 7 TeV pp collisions}", CERN-THESIS-2013-078", https://inspirehep.net/record/1296591?ln=en • Y Le Cun, A theoretical framework of backward propagation, Proceedings of the 1988 Connectionist Model Summer School, p21-28, 1988
  • 13. Convolutional layer • Convolutional neural network • 3-dimensional neurons • local connectivity at each filter/kernel (local features of data) • weights-sharing between all neurons in the same layer, usually named as an unit of kernel/filter to be convoluted with input volume (invariance of data) • output 3D neurons • depth of filter bank: Do • input 3D neurons 𝐷 𝑜 𝐷𝑖 = 𝐷 𝑜× 𝐷𝑖 × 𝐹 × 𝐹 # of learnable weights 𝑁𝑜 = 𝑁𝑖 − 𝐹 + 2 × 𝑝 𝑠 + 1 s: stride • the kernels/filters essential can pick up latent features such as brightness of image, contrast, RGB color, edges, etc. 13
  • 14. Connection to neuro-science • One route of developing your deep neural net architecture is inspirations from neuro-science, such as human visual cortex • Cross channel information learning (cascaded 1x1 convolution) is biologically inspired because human visual cortex have receptive fields (kernels) tuned to different orientation - local groups of values are often highly correlated - invariance to location, weights sharing 14 Charles F. Cadieu et. al., Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition, PLOS Computational Biology, December 2014, Volume 10, Issue 12
  • 15. Translational invariance • Convolutional layer relies on translational invariance (convolution commutes with translation) • local input regions • only relative locations are taken into account stride kernel size fks determines layer types: • convolutional • max pooling • activation function translation operator 15
  • 16. Recurrent architecture • Recurrent network structures can be used to learn potential temporal correlations/structures in the data • Once “unrolled” or “unfolded”, all layers share the same weights, can be viewed as feedforward networks, thus can be optimized using BP (BPTT, through time) • However, there is exploding or vanishing gradient problem along the temporal axis • Different formalisms and implementations of recurrent activations are proposed (LSTM, fixed unit recurrent weights, GRU, etc.) to alleviate the issue as well as gradient clipping approach xt=0 ht=0 yt=0 xt=1 ht=1 yt=1 xt=2 ht=2 yt=2 w>1 or w<1 all these recurrent edges share the same synaptic weights 16
  • 17. LSTM – long short-term memory • Special treatment: memory cells • Novel inclusion of multiplicative nodes, all edges into or out of these nodes have fixed unit weight, people used call this fixed unit weight as “constant error carousel” • A. Grave, Generating Sequences With Recurrent Neural Networks, arXiv 1308.0850 • A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence. Springer, 2012 recurrence components fixed unit weight alleviates vanishing gradient problems errors memory flushing 17
  • 18. Sequence generation • Recurrent network can be used for sequence generation • a man riding a wave on top of a surfboard . (p=0.040413) • a person riding a surf board on a wave (p=0.017452) • a man riding a wave on a surfboard in the ocean (p=0.005743) trainin g testing/inference 18
  • 19. Gated Recurrent Unit (GRU) • GRU also utilized gating unit to regulate the temporal flow, but with a simple linear interpolation, instead of memory cell • Kyunghyun Cho, On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, arXiv:1409.1259 LSTM GRU reset update previousstates previousstates • Both LSTM and GRU utilizes an additive component when updating the states which keeps partial influences from previous timestamp 19
  • 20. Activation of hidden layers • A neural network without any activation would simply be a linear regression model. Activation function accommodates sophisticated nonlinearities for data such as images, videos, speeches, etc. o Sigmoid function: saturation causes vanishing gradients, slow convergence, not zero-centered o Tanh: vanishing gradient problem o ReLu: avoids and rectifies vanishing gradient, no need for input normalization, could result in “dead” neurons o “Leaky” ReLu, pReLu (parameterized ReLu) o Human neuron activations can actually be a stochastic process 20
  • 21. Normalization • Local response normalization. Normalize across neighboring kernels, lateral inhibition, competition for big activities across neurons computed by different kernel/filter • Batch normalization, reducing internal covariate shift (ICS) • “Whitening” input feature map accelerates the training speed and convergence • But simple normalization procedure may violate the identify transform depending the non-linear activation form 21
  • 22. Pooling • Summarize across neighboring groups of neurons in the same kernel map to reduce computations, feature map size • Less over-fitting • Aggregates localized spatial information Alternatives: • Maximum • Sum • Average • Weighted average with distance from the center pixel • Overlapped, non-overlapped • …...... 22
  • 23. Output layer • Training a deep neural network is a highly non convex optimization problem that we usually solve using convex methods • “Softmax” function: original motivation being treat the outputs of NN as probabilities conditioned on the inputs, normalized to unity 𝑝 𝑦 = 𝑗 𝑧 𝑖 = 𝑒 𝑧 𝑗 (𝑖) 𝑗=0 𝑘 𝑒 𝑧 𝑘 (𝑖) Anders Øland, Be Careful What You Back propagate: A Case For Linear Output Activations & Gradient Boosting, arxiv 1707.04199 • What output layer generates is actually not a probability distribution as we all conjectured • gradient boosting method, exponentiating the errors from the output layer, non- normalized 23
  • 24. Reduce over fitting • Data augmentation….... • “Drop-out” treatment, randomly drops neurons to prevent overfitting, “re-scaling” needed when making inference • Nitish Srivastava et. al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958 24
  • 25. Image classification • AlexNet, architecture contributions: • ReLu, ~6 times faster than saturating approaches • Local response normalization: ~2% increase in the precision • Reduce overfitting: • data augmentation • drop-out: a neuron cannot rely on the presence of particular other neurons thus forced to learn in a more robust manner • A. Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 output layer: 1000- way softmax embedding vector 25
  • 26. Network in network • GoogleNet, Inception Net • Key idea: how to use dense → sparse to improve computational efficiency • Local sparsity of using Network-in-network • “1x1 convolution”, dimensionality reduction in rank of latent feature manifold (cross-channel pooling layer) • Hebbian principle – neurons that fire together, wire together • Christian Szegedy et. al. Going deeper with convolutions https://arxiv.org/pdf/1409.4842.pdf • https://arxiv.org/pdf/1512.00567.pdf • arXiv:1312.4400v3 NIN enhance representational power 𝑁𝑜 = 𝑁𝑖 − 𝐹 + 2 × 𝑝 𝑠 + 1 𝐷𝑖 𝐷 𝑜 5x5 3x3 1x1average pooling 1x1 1x1 9 × 9 > 5 × 5 + 3 × 3 + 1 × 1 = 35 26
  • 27. Bench mark results 9/13/2017 Invited talk at Tsinghua University 27
  • 28. Object detection • Object detection task imposes additional request to a classifier in terms of the localization of a/multiple objects • Sliding window approach (DPM, deformable part model) is computationally too expensive • Regional proposal method adds some prior hypothesis on regions that are promising. However may have multiple steps pipelined together (RPN for objectness score, detection network, classification network) re- purposed 28
  • 29. You Only Look Once (YOLO) • Labeling images for detection is far more expensive than labelling for classification or tagging • Leverage the classification data expands the scope of current detection systems (transfer learning) • YOLO is NOT a repurposed classifier 29
  • 30. YOLO v2 improvements • Batch norm: un-necessitate the need of regularization, drop-out • Anchor box concept, remove fully connected layers • Higher resolution for classifier part to better adapt to detections anchor box increases recall and does change mAP 30
  • 31. YOLO v2 results • Results from most recent YOLO paper 31
  • 32. Semantic segmentation • Approaches such as dilated convolutions are utilized to take into account the context module in the picture, multi-scale receptive field • Enet https://arxiv.org/pdf/1606.02147.pdf • SegNet https://arxiv.org/pdf/1511.00561.pdf • “Dense” prediction problem with per- pixel level precision required • Context model crucial in this application • Typical “encoder-decoder” architecture: network gets deeper feature while map narrows down 32
  • 33. Segmentation performances • Metrics such as intersection over union (IOU) are used to measure the performance of segmentation • Image quality tends to influence the results significantly 33
  • 34. Three-dimensional data • 3D segmentation, no “voxelization” or cross- sectional rendering needed even on unstructured data • Permutation invariance, learning transformation matrix of point cloud https://arxiv.org/pdf/1704.03847.pdf https://arxiv.org/pdf/1612.00593.pdf combining the global and local per-point embedding 34
  • 35. Audio and natural language processing (NLP) • Audio signals can be represented as a localized format, either in the temporal or frequency/spectral domain • Z. Xing et. al. Big Data (Big Data), 2016 IEEE International Conference • Z. Xing et. al. https://arxiv.org/pdf/1705.05229.pdf • Text/words can also be embedded, so called “word vectors” 35 music embedding
  • 36. Generative adversarial network (GAN) • While discriminative models manifested with a great success, generative models had less impact due to difficulties with intractable probabilistic computations (MLE). “Two-player” min-max approach sidesteps this problem. • Ian J. Goodfellow, et. al. Generative Adversarial Nets, arXiv:1406.2661v1 • Phillip Isola, Image-to-Image Translation with Conditional Adversarial Networks, arXiv:1611.07004v1 generatordiscriminator D G • here z is some random noise MinMax 36
  • 37. Reinforcement learning concept • Environment representation learning framework naturally follows human/animal learning processes (“agent”-“environment” nomenclature). Agent’s actions depends on the state, and may or may not change the future environment • Deep neural network re-enables R.L. by learning complex data representation, without any hand-crafted feature extraction • Agent state to action mapping depicted by a policy function, which can be stochastic as well • Volodymyr Mnih et. al. Human-level control through deep reinforcement learning, nature14236, 2015 37
  • 38. Formalisms • Value-base approaches such as Deep Q-Network (DQN): • Learn value function, implicit policy function (ε-greedy) • “Experience replay” utilizes to remove correlation that causes divergence problems of R.L. • Solely in the context of MDP assumption • Policy-based approaches such policy gradient method • No value function, learn policy • High variance issue • MDP not necessarily assumed • Actor-Critic state actionrewards policy 38
  • 39. Applications • Navigating through an intersection under complicated environments • David Isele, Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning, arXiv:1705.01196 • Motion negotiations between “agents” under dynamically changing environment • Shai Shalev-Shwartz et. al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, arXiv:1610.03295v1 39
  • 40. 40 Summary and future research • Unsupervised learning • human learns the world more naturally by discovering, rather than supervision • Convolutional networks combining with recurrence to takes into account the temporal correlations, thus make predictions in a dynamic fashion • Reinforcement learning to pre-guide the learning into the “ROI” (region of interest) data representation learning complex reasoning