Talk at Nuclai 2016 in Vienna
Can neural networks sing, dance, remix and rhyme? And most importantly, can they talk back? This talk will introduce Deep Neural Nets with textual and auditory understanding and some of the recent breakthroughs made in these fields. It will then show some of the exciting possibilities these technologies hold for "creative" use and explorations of human-machine interaction, where the main theorem is "augmentation, not automation".
http://events.nucl.ai/track/cognitive/#deep-neural-networks-that-talk-back-with-style
11. NEURAL NETWORKS CAN SAMPLE (XT > XT+1)
RECURRENT NEURAL NETWORK (RNN/LSTM) SAMPLING
▸ (Naive) Sampling
▸ Scheduled Sampling (ML) Bengio et al 2015
▸ Sequence Level (RL) Ranzati et al 2016
▸ Reward Augmented Maximum Likelihood (ML+RL) Nourouzi
et al forthcoming
12. ▸ Approach used for most recent “creative” generations
▸ (Char-RNN, Torch-RNN, etc)
LSTM SAMPLING (GRAVES 2013)
13. SCHEDULED SAMPLING (BENGIO ET AL 2015)
▸ At training start with ground truth and slowly move towards
using model predictions as next steps
14. SEQUENCE LEVEL TRAINING (RANZATO ET AL 2016)
▸ Use model predictions as next steps, but continuous reward/
loss through Reinforcement Learning
15. REWARD AUGMENTED MAXIMUM LIKELIHOOD (NOUROUZI ET AL FORTHCOMING)
▸ Generate targets sampled
around the correct solution
▸ “giving it mostly wrong
examples to learn the right ones”
31. EARLY LSTM MUSIC COMPOSITION (2002)
Douglas Eck and Jurgen Schmidhuber (2002) Learning The Long-Term Structure of the Blues?
32. AUDIO GENERATION: MIDI
Douglas Eck and Jurgen Schmidhuber (2002) Learning The Long-Term Structure of the Blues?
▸ https://soundcloud.com/graphific/pyotr-lstm-
tchaikovsky
A Recurrent Latent Variable Model for
Sequential Data, 2016,
J. Chung, K. Kastner, L. Dinh, K. Goel,
A. Courville, Y. Bengio
+ “modded VRNN:
33. AUDIO GENERATION: MIDI
Douglas Eck and Jurgen Schmidhuber (2002) Learning The Long-Term Structure of the Blues?
▸ https://soundcloud.com/graphific/neural-remix-net
A Recurrent Latent Variable Model for
Sequential Data, 2016,
J. Chung, K. Kastner, L. Dinh, K. Goel,
A. Courville, Y. Bengio
+ “modded VRNN:
42. python has a wide range of deep
learning-related libraries available
Deep Learning with Python
Low level
High level
deeplearning.net/software/theano
caffe.berkeleyvision.org
tensorflow.org/
lasagne.readthedocs.org/en/latest
and of course:
keras.io
47. Questions?
love letters? existential dilemma’s? academic questions? gifts?
find me at:
www.csc.kth.se/~roelof/
roelof@kth.se
@graphific
Consulting / Projects / Contracts / $$$ / more love letters?
http://www.graph-technologies.com/
roelof@graph-technologies.com
48. WHAT ABOUT CONVNETS?
▸ Awesome for interpreting features
▸ Recurrence can be “kind” of achieved with
▸ long splicing filters
▸ pooling layers
▸ smart architectures
49. Yoon Kim (2014) Convolutional Neural Networks for Sentence Classification
Xiang Zhang, Junbo Zhao, Yann LeCun (2015) Character-level Convolutional Networks for Text Classification
NLP
50. AUDIO
Keunwoo Choi, Jeonghee Kim,
George Fazekas, and Mark Sandler
(2016) Auralisation of Deep
Convolutional Neural Networks:
Listening to Learned Features
51. AUDIO
Keunwoo Choi, George Fazekas, Mark Sandler (2016) Explaining Deep
Convolutional Neural Networks on Music Classification
audio at:
https://keunwoochoi.wordpress.com/2016/03/23/what-cnns-see-when-cnns-see-spectrograms/