4. ILSVRC2012
Challenge: identify main objects present in images (from 1000 object
categories)
Tranining data: 1,2 million labelled images
October 13, 2012: results released
Winner: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton
(University of Toronto)
Score: top-5 test error rate of 15.3%, compared to 26.2% achieved
by second-best entry
6. ILSVRC2012
AlexNet:
deep ConvNet trained on raw RGB pixel values
60 million parameters and 650,000 neurons
5 convolutional layers (some followed by max-pooling layers) and
3 globally-connected layers
a final 1000-way softmax
trained on two NVIDIA GPUs for about a week
use of dropout in the globally-connected layers
8. Radical Change
Letter from Yann LeCun to editor of CVPR 2012:
Getting papers about feature learning accepted at vision
conference has always been a struggle, and I’ve had more than
my share of bad reviews over the years. I was very sure that
this paper was going to get good reviews because:
it uses no hand-crafted features (it’s all learned all the way
through. Incredibly, this was seen as a negative point by
the reviewers!);
it beats all published results on 3 standard datasets for
scene parsing;
it’s an order of magnitude faster than the competing
methods.
If that is not enough to get good reviews, I just don’t know
what is. So, I’m giving up on submitting to computer vision
conferences altogether. (. . . ) Submitting our papers is just a
waste of everyone’s time (and incredibly demoralizing to my lab
members).
9. Revolution?
History:
1980: introduction of ConvNets by Fukushima
late 1980s: further development by LeCun and
collaborators @ Bell Labs
late 1990s: LeNet-5 was reading about 20% of
written checks in U.S.
Breakthrough due to:
persistence of academic researchers
improved algorithms
increase in computing power
increase in amount of data
dissemination of knowledge
http://www.elitestreetsmagazine.com/magazine/2008/jan-mar/art.php
10. Neural Networks
1943 McCulloch and Pitts proposed first artificial neuron:
computes weighted sum of its binary input signals,
xi = 0, 1
y = θ
n
i=1
wi xi − u
1957 Rosenblatt developed a learning algorithm: the perceptron
(for linearly separable data only)
K Jain, J Mao, KM Mohiuddin - IEEE computer, 1996
12. Feed-Forward Neural Networks
neurons arranged in layers
neurons propagate signals only forward
input of jth neuron in layer l:
xl
j =
i
wl
ji yl−1
i
output:
yl
j = h xl
j
K Jain, J Mao, KM Mohiuddin - IEEE computer, 1996; commons.wikimedia.org
13. Backpropagation
Paul Werbos (1974):
1. initialize weights to small random values
2. choose input pattern
3. propagate signal forward through network
4. determine error (E) and propagate it backwards through network to
assign credit/blame to each unit
5. update weights by means of gradient descent:
∆wji = −η
∂E
∂wji
14. ConvNets
Feed-forward nets w/:
local receptive field
shared weights
Applications:
character recognition
face recognition
medical diagnosis
self-driving cars
object recognition
(e.g. birds)
15. Race to bring Deep Learning to the Masses
Major players:
Google
Facebook
Baidu
Microsoft
Nvidia
Apple
Amazon
LeCun @ Facebook
http://www.popsci.com/facebook-ai
16. Fooling ConvNets
fix trained network
carry out backprop using wrong class label
update input pixels:
Goodfellow, Shlens, and Szegedy, ICLR 2015
17. Dreaming ConvNets
fix trained network
initialize input by
average image of
some class
carry out backprop
using that class’
label
update input pixels:
Simonyan, Vedaldi, and Zisserman, arXiv:1312.6034
18. ConvNets for NLP Tasks
2008:
Case study: sentiment analysis (classification)
Rationale: key phrases, that are indicative of class membership, can
appear anywhere in a document
19. Applications
almost every image posted by Mrs
Merkel’s office of her in meetings
and summits has attracted
comments in Russian criticising
her and her policies.
Staff in Mrs Merkel’s office have
been deleting comments but some
remain despite the purge.
FAZ 07.06.2015:
Merkels Social-Media-Team, dessen Mitarbeiterzahl nicht
bekanntgegeben wird, war heillos ¨uberfordert.
20. Pre-trained Word Vectors
Word embeddings:
dense vectors (w/ dimension d of order 100)
derived from word co-occurrences: a word is characterized by the
company it keeps (Firth, 1957)
GloVe [Pennington, Socher, and Manning (2014)]:
log-bilinear regression model
learns word vectors, such that:
log(Xij ) = wT
i ˜wj + bi + ˜bj
Xij the number of times word j occurs in the context of word i
wi ∈ Rd word vectors
˜wj ∈ Rd context word vectors
Word2vec (skip-gram algorithm) [Mikolov et al. (2013)]:
shallow feed-forward neural network
learns word vectors, such that:
Xij
j Xij
=
ewT
i ˜wj
j ewT
i
˜wj
21. Pre-trained Word Vectors
Kim (2014): Sentence classification
Hyperparameters:
filters of width (region
size) 3, 4, and 5
100 feature maps each
max-pooling layer
penultimate layer: 300
units
Datasets (average sentence length ∼ 20):
movie reviews w/ one sentence per
review (pos/neg?)
electronic product reviews (pos/neg?)
TREC question dataset. Is question
about a person, a location, numeric
information, etc.? (6 categories).
arXiv:1408.5882
22. One-Hot Vectors
Johnson and Zhang (2015): Classification of larger pieces of text
(average size ∼ 200)
aardvark :
1
0
...
0
, zwieback :
0
...
0
1
Hyperparameters:
filter width (region size) 3
stack words in region
1000 feature maps
max-pooling
penultimate layer: 1000
units
Performance:
IMDB (|V | = 30k): error rate 8.74%
Amazon Elec: error rate 7.74%
arXiv:1412.1058
23. Character Input
Zhang, Zhao, and LeCun (2015): Large datasets
Hyperparameters:
alphabet of size 70
6 convolutional layers (all followed by max-pooling layers) and
3 fully-connected layers
filter width (region size) 7 or 3
1024 feature maps
Performance:
Model AG Sogou DBP. Yelp P. Yelp F. Yah. A. Amz. F. Amz. P.
BoW 11.19 7.15 3.39 7.76 42.01 31.11 45.36 9.60
BoW TFIDF 10.36 6.55 2.63 6.34 40.14 28.96 44.74 9.00
ngrams 7.96 2.92 1.37 4.36 43.74 31.53 45.73 7.98
ngrams TFIDF 7.64 2.81 1.31 4.56 45.20 31.49 47.56 8.46
ConvNet 12.82 4.88 1.73 5.89 39.62 29.55 41.31 5.51
arXiv:1509.01626
24. Outlook
Convenient and powerful libraries:
Theano (Lasagne, Keras) developed at LISA, University of Montreal
Torch primarily developed by Ronan Collobert (now @ Facebook),
used within Facebook, Google, Twitter, and NYU
TensorFlow by Google
The new iPhone 6S shows great GPU performance. So, expect
(more) deep learning coming to your phone.
Embedded devices like Nvidia’s TX1, a tiny
supercomputer w/ 256 CUDA cores and 4GB
memory, for driver-assistance systems and the like.
http://technonewschannel.com/tips-trick/5-hidden-features-of-android-camera-which-you-should-know/