SlideShare a Scribd company logo
1 of 107
Download to read offline
오늘 당장 딥러닝 실험하기
2015. 08. 30.
김현호
1
소개
김현호
- UST 컴퓨터전공
- 한국전자통신연구원
자동통역연구실
- Team Popong mobile담
당
- 인공지능, 기계학습, 자연
어 처리
- stray.leone@gmail.com
2
순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
3
순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
4
순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
5
순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by Recurrent Neural Network
6
순서
1.Neural Network 이해
2.Deep Neural Network
a.Pretraining
b.Rectified Linear Unit
c.Drop out
3.Theano library
4.Deep Learning code using Theano
5.Deep Learning for Natural Language
Processing
a.Gensim library
b.automatic word spacing by RNN
7
요즘 딥러닝에 대한 관심
8
다수의 딥러닝 강연
9
10
11
12
13
14
Artificial Neural Network
15
Artificial Neural Network
인간 신경망의 구성요소인 뉴런의 동작방식이
모티브가 된 기계학습 시스템.
16
실제 뉴런 vs 인공 뉴런
17
실제 뉴런 vs 인공 뉴런
18
실제 뉴런 vs 인공 뉴런
19
실제 뉴런 vs 인공 뉴런
20
실제 뉴런 vs 인공 뉴런
신호 전달 방향
21
실제 뉴런 vs 인공 뉴런
신호 전달 방향
22
실제 뉴런 vs 인공 뉴런
신호 전달 방향
Weigh
t
23
Artificial Neural Network
24
Artificial Neural Network
25
Artificial Neural Network
26
Artificial Neural Network Learning
27
Artificial Neural Network Learning
28
Weight Weight Weight
Forward Propagation
29
Backward Propagation
30
Deep Neural Network
31
32
Deep Neural Network란….
Deep Neural Network란….
3층 이상의 hidden layer를 가진
Artificial Neural Network
33
기존 Deep Learning의 어려움
34
기존 Deep Learning의 어려움
35
deeper than two or three level networks yieled
poorer results
Deep Learning이 어려운 이유
- Overfitting
- Deep nets have lots of parameters
- Underfitting
- Gradient descent Vanishing
36
Deep Learning의 비약적 발전
- Pretraining
- Drop Out
- Rectified Linear Unit
37
Pretraining 성능
38
“Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio,
- pretraining initialization은
random initialize보다
better local minimum 에
서 시작한다.
39
Pretraining 성능
Without Pretraining With Pretraining
Pretraining방법
1)Contrastive Divergence
a) http://www.quora.com/What-is-contrastive-divergence
b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
2)AutoEncoder
40
Drop out
41
Rectified Linear Unit (ReLU)
42
Activation function
Rectified Linear Unit (ReLU)
43
Activation function
Sigmoid function Rectified Linear Unit
Rectified Linear Unit (ReLU)
44
Epoch sigmoid ReLU
1 0.7053 0.9433
2 0.8302 0.9647
3 0.8684 0.9723
3 0.8837 0.9737
4 0.89 0.9763
5 0.895 0.9792
... .... ...
... ... ...
11 0.9116 0.9829
12 0.9127 0.9838
13 0.9142 0.9821
14 0.9152 0.9838
15 0.9159 0.9832
Rectified Linear Unit (ReLU)실험결과
실험 조건
- code :
https://github.com/Newmu/The
ano-Tutorials
- data : mnist
45
Data Sets
46
MNIST
47
Cifar-10
48
Data Sets
- MNIST
- The MNIST database of handwritten digits
- 28x28 grayscale images
- 10 classes
- Cifar10
- The CIFAR-10 dataset consists of 60000 32x32
colour images in 10 classes, with 6000 images per
class.
- word2vec
49
Deep Learning 실험
50
Deep Learning 실험 시작
51
Theano 어원
- 여성 수학자
- 피타고라스의 아내
52
Deep Learning Library 비교
53
출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe
Theano
- Q) DNN을 자동으로 만들어 주나요??
- A) 아니요, Deep Neural Network를
직접 만들어야 함…
54
Theano
55
- DNN model learning library
(x)
- matrix 연산 등에 유용한 library
(o)
Why Theano
- Definition
- Theano is a Python library that allows you
to define, optimize, and evaluate
mathematical expressions involving multi-
dimensional arrays efficiently.
(http://deeplearning.net/software/theano/)
- Optimizing GPU-meta-programming code
generating array oriented optimizing math compiler
in Python
(https://github.com/josephmisiti/awesome-machine-learning)
56
Why Theano
- cuda code 작성하지 않고, python
code로 gpu 연산 수행
- grad(), updates, function()
- symbolic function
57
Why Theano - grad(), updates, function()
gradients = T.grad() 하면 직접 gradient가 계산
된다.
ex)
x = T.scalar()
gx = T.grad(x**2, x) ← x**2를 x에 대해서
gradient 값을 구한다. (= 2x)
58
Why Theano - grad(), updates, function()
300 updates = [
301 (param, param - learning_rate * gparam)
302 for param, gparam in zip(classifier.params, gparams)
303 ]
……..
308 train_model = theano.function(
309 inputs=[index],
310 outputs=cost,
311 updates=updates,
312 givens={
313 x: train_set_x[index * batch_size: (index + 1) * batch_size],
314 y: train_set_y[index * batch_size: (index + 1) * batch_size]
315 }
316 )
59
Why Theano - grad(), updates, function()
60
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
Why Theano - grad(), updates, function()
61
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
input
output
Why Theano - grad(), updates, function()
62
This module provides function(), commonly accessed as
theano.function, the interface for compiling graphs into
callable objects.
You’ve already seen example usage in the basic tutorial...
something like this:
>>> x = theano.tensor.dscalar()
>>> f = theano.function([x], 2*x)
>>> print f(4) # prints 8.0
http://deeplearning.net/software/theano/library/compile/function.html
input
output
Why Theano - grad(), updates, function()
x = dmatrix('x')
y = dmatrix('y')
z = x + y
f = theano.function([x,y], z) scalarscalar
scalar
Why Theano - grad(), updates, function()
x = dmatrix('x')
y = dmatrix('y')
z = x + y
f = theano.function([x,y], z)
Theano represents symbolic
mathematical computations
as graphs
scalarscalar
scalar
Why Theano - grad(), updates, function()
x = theano.tensor.dscalar('x')
y = theano.tensor.dscalar('y')
z = x + y
f = theano.function([x,y], z)
print f(4,3)
array(7.0)
scalarscalar
scalar
Install Theano
- Environment : ubuntu 14.04 64bit
- Install document :
http://deeplearning.net/software/theano/install_ubun
tu.html#install-ubuntu
66
$ sudo apt-get install python-numpy python-
scipy python-dev python-pip python-nose g++
libopenblas-dev git
$ sudo pip install Theano
Download Tutorial code
$ git clone https://github.com/lisa-lab/DeepLearningTutorials.git
Cloning into 'DeepLearningTutorials'...
remote: Counting objects: 3652, done.
remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652
Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done.
Resolving deltas: 100% (2161/2161), done.
Checking connectivity... done.
$ ls
DeepLearningTutorials
67
Run DBN
DeepLearningTutorials$ cd code
DeepLearningTutorials/code$ python DBN.py
Using gpu device 0: GeForce GTX 770
Downloading data from
http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
... loading data
... building the model
... getting the pretraining functions
... pre-training the model
Pre-training layer 0, epoch 0, cost -98.5296
Pre-training layer 0, epoch 1, cost -83.842
Pre-training layer 0, epoch 2, cost -80.688
Pre-training layer 0, epoch 3, cost -79.0362
Pre-training layer 0, epoch 4, cost -77.9295
68
DBN.py
13 from logistic_sgd import LogisticRegression, load_data
303 datasets = load_data(dataset)
304
305 train_set_x, train_set_y = datasets[0]
306 valid_set_x, valid_set_y = datasets[1]
307 test_set_x, test_set_y = datasets[2]
69
DBN.py
18 # start-snippet-1
19 class DBN(object):
…….
314 print '... building the model'
315 # construct the Deep Belief Network
316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28,
317 hidden_layers_sizes=[1000, 1000, 1000],
318 n_outs=10)
70
28 * 28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
DBN.py
71
325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x,
326 batch_size=batch_size,
327 k=k)
……………
353 print '... getting the finetuning functions'
354 train_fn, validate_model, test_model = dbn.build_finetune_functions(
355 datasets=datasets,
356 batch_size=batch_size,
357 learning_rate=finetune_lr
358 )
DBN.py
72
228 train_fn = theano.function(
229 inputs=[index],
230 outputs=self.finetune_cost,
231 updates=updates,
232 givens={
233 self.x: train_set_x[
234 index * batch_size: (index + 1) * batch_size
235 ],
236 self.y: train_set_y[
237 index * batch_size: (index + 1) * batch_size
238 ]
239 }
240 )
DBN.py
73
380 while (epoch < training_epochs) and (not done_looping):
381 epoch = epoch + 1
382 for minibatch_index in xrange(n_train_batches):
383
384 minibatch_avg_cost = train_fn(minibatch_index)
385 iter = (epoch - 1) * n_train_batches + minibatch_index
386
387 if (iter + 1) % validation_frequency == 0:
388
389 validation_losses = validate_model()
390 this_validation_loss = numpy.mean(validation_losses)
DNN using ReLU
import theano
from theano import tensor as T
from theano.sandbox.rng_mrg import
MRG_RandomStreams as RandomStreams
import numpy as np
from load import mnist
74
DNN using ReLU
def floatX(X):
return np.asarray(X, dtype=theano.config.floatX)
def init_weights(shape):
return theano.shared(floatX(np.random.randn(*shape) * 0.01))
def rectify(X):
return T.maximum(X, 0.)
def softmax(X):
e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x'))
return e_x / e_x.sum(axis=1).dimshuffle(0, 'x')
75
DNN using ReLU
def model(X, w_h, w_h2, w_o):
h = rectify(T.dot(X, w_h))
h2 = rectify(T.dot(h, w_h2))
py_x = softmax(T.dot(h2, w_o))
return h, h2, py_x
def prop(cost, params, lr=0.001):
grads = T.grad(cost=cost, wrt=params)
updates = []
for p, g in zip(params, grads):
updates.append((p, p - lr * g))
return updates
76
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
77
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
78
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
params = [w_h, w_h2, w_o]
updates = prop(cost, params, lr=0.001)
79
trX, teX, trY, teY = mnist(onehot=True)
X = T.fmatrix()
Y = T.fmatrix()
w_h = init_weights((784, 625))
w_h2 = init_weights((625, 625))
w_o = init_weights((625, 10))
h, h2, py_x = model(X, w_h, w_h2, w_o)
y_x = T.argmax(py_x, axis=1)
cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y))
params = [w_h, w_h2, w_o]
updates = prop(cost, params, lr=0.001)
train = theano.function(inputs=[X, Y], outputs=cost, updates=updates)
predict = theano.function(inputs=[X], outputs=y_x)
80
for i in range(100):
for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)):
cost = train(trX[start:end], trY[start:end])
print np.mean(np.argmax(teY, axis=1) == predict(teX))
81
play with data
82
load_data()
172 def load_data(dataset):
…...
193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
194 dataset = new_path
…...
204 print '... loading data'
205
206 # Load the dataset
207 f = gzip.open(dataset, 'rb')
208 train_set, valid_set, test_set = cPickle.load(f)
209 f.close()
83
data 만들기1
train_set.x.txt
84
input vector length
input
vector
size
data 만들기1
train_set.y.txt
85
input
vector
size
data 만들기1
86
from numpy import genfromtxt
import gzip, cPickle
…………….
train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",")
…………………..
train_set = train_set_x, train_set_x
valid_set = valid_set_x, valid_set_x
test_set = test_set_x, test_set_x
print "writing to pkl.gz..."
data_set = [train_set, valid_set, test_set]
print "zip data into a file"
f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb')
print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz"
cPickle.dump(data_set,f,protocol=2)
f.close()
for n, sentence in enumerate(file_lines):
……………………..
data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz"
……………………….
# save vector list
numpy.savez(data_batch_fpath,
data=numpy.asarray(sentence_vector_list),
labels=label_vector,
length=max_length,
dim=dimension)
87
data 만들기2
save, load model
88
save model
load model
Theano modes
89
Theano modes
90
.bashrc
226 # Theano Settings
227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high
91
Deep Learning
For Natural Language Processing
92
Deep Learning
For Natural Language Processing
93
Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
Deep Learning
For Natural Language Processing
94
one-hot (1 of K)
representation
Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다.
95
형태소 단위로 분리
one-hot (1 of K)
representation
Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
96
index 0(나) 1(가) 2(는) ... ... ... ... 999(.)
나 1 0 0 0 0 0 0 0
는 0 0 1 0 0 0 0 0
.. 0 0 0 0 0 1 0 0
.. 0 0 0 0 1 0 0 0
다 0 0 0 0 0 0 1 0
형태소 단위로 분리
one-hot (1 of K)
representation
문자의 벡터로 표현
Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
97
형태소 단위로 분리
word2vec
representation
문자의 벡터로 표현
Deep Learning
For Natural Language Processing
- 나는 밥을 먹는다
- 나 는 밥 을 먹 는 다
- Word2Vec model
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
98
형태소 단위로 분리
word2vec
representation
문자의 벡터로 표현
Deep Learning
For Natural Language Processing
- 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0]
- 밥 = [0.323112, -0.021232, …….. , 0.82123123]
99
word2vec
representation
one-hot (1 of K)
representation
Gensim
- definition
- Gensim is a Python library for topic modelling, document indexing
and similarity retrieval with large corpora
- word2vec class
- word vector representation
- multi threading
- Skip Gram
- Continuous Bag of Words
100
Gensim - import, settings
101
# imports
9 from gensim.models.word2vec import LineSentence
10 from gensim.models import word2vec
32 # settings
33 THEADS = 8 # progress with multi threading
34 DIMENSION = 50
35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow
36 WINDOW_SIZE = 8
37 NTimes = 10 # repeat number of sentences
38 min_count_of_word = 5
………………..
65 from gensim import utils
Gensim - training, save model
102
97 # load raw sentence
98 sentences = LineSentence(input_train_file_path)
99 # model settings
100 model = word2vec.Word2Vec(size=dimension, workers=THEADS,
min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE)
101
102 # build voca and train
103 number_iter = NTimes # number of iterations (epochs) over the corpus
104 model.build_vocab(sentences)
105
106 ss = utils.RepeatCorpusNTimes(sentences, number_iter)
107 model.train(ss)
108 # save model
109 model.save(model_file_name)
110 model.save_word2vec_format(model_file_name + '.bin', binary=True)
Gensim - load model, test
103
83 try:
84 model = utils.SaveLoad.load(fname=model_file_name)
85 except:
86 print "failed to load. Retrying by load_word2vec_format() !!"
87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin")
297 x = model [w.decode('utf-8')]
314 mw, score = model.most_similar(positive=[x])[0]
315 print "most similar : ",mw
316 print "target vector :", x
‘서울’의 most similar words
104
most similar words similarity
대구 0.4282917082309723
광주 0.4046330451965332
부산 0.40132588148117065
울산 0.3863871693611145
수원 0.38555505871772766
청주 0.35919708013534546
안양 0.35622960329055786
주왕산 0.3543151617050171
평택 0.3505415618419647
cebu 0.34598737955093384
Auto word spacing
with Recurrent Neural Network
105
- 0 0 1 0 1 0 0
- 나는 밥을 먹는다
- [0.323112, -0.021232, …….. , 0.82123123]
Deep Learning 실험하면서 어려웠던
것들
- layer의 개수, layer당 node의 개수, learning
rate, epoch횟수, batch횟수, activation
function 선택 등 선택해야할 parameter들이
많다.
- parameter 바꿔서 실험결과를 확인하는 데에
오래 걸린다.
- big data이기때문에 gpu memory 문제
106
Thank you
107
Setting GPU
building lmdb for caffe
Softmax
functionBias
Negative
Log Likelihood
http://goo.gl/forms/IR45liXoQ3

More Related Content

What's hot

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 
Do more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python wayDo more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python way
Jaime Buelta
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 

What's hot (20)

PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Multiprocessing with python
Multiprocessing with pythonMultiprocessing with python
Multiprocessing with python
 
PyTorch crash course
PyTorch crash coursePyTorch crash course
PyTorch crash course
 
PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017PyTorch Tutorial for NTU Machine Learing Course 2017
PyTorch Tutorial for NTU Machine Learing Course 2017
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-Answering
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
TensorFlow example for AI Ukraine2016
TensorFlow example  for AI Ukraine2016TensorFlow example  for AI Ukraine2016
TensorFlow example for AI Ukraine2016
 
Do more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python wayDo more than one thing at the same time, the Python way
Do more than one thing at the same time, the Python way
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Anil Thomas - Object recognition
Anil Thomas - Object recognitionAnil Thomas - Object recognition
Anil Thomas - Object recognition
 
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
Introduction To TensorFlow | Deep Learning with TensorFlow | TensorFlow For B...
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 

Similar to [Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4
zukun
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
Remco Wendt
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
Kan-Ru Chen
 

Similar to [Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용 (20)

Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer Vision
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Enhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep nettsEnhance your java applications with deep learning using deep netts
Enhance your java applications with deep learning using deep netts
 
Introduction To Tensorflow
Introduction To TensorflowIntroduction To Tensorflow
Introduction To Tensorflow
 
Why GC is eating all my CPU?
Why GC is eating all my CPU?Why GC is eating all my CPU?
Why GC is eating all my CPU?
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hype
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Android Boot Time Optimization
Android Boot Time OptimizationAndroid Boot Time Optimization
Android Boot Time Optimization
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
 
Scaling Deep Learning with MXNet
Scaling Deep Learning with MXNetScaling Deep Learning with MXNet
Scaling Deep Learning with MXNet
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNetAlex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
Alex Smola at AI Frontiers: Scalable Deep Learning Using MXNet
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

[Pycon 2015] 오늘 당장 딥러닝 실험하기 제출용

  • 1. 오늘 당장 딥러닝 실험하기 2015. 08. 30. 김현호 1
  • 2. 소개 김현호 - UST 컴퓨터전공 - 한국전자통신연구원 자동통역연구실 - Team Popong mobile담 당 - 인공지능, 기계학습, 자연 어 처리 - stray.leone@gmail.com 2
  • 3. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 3
  • 4. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 4
  • 5. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 5
  • 6. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by Recurrent Neural Network 6
  • 7. 순서 1.Neural Network 이해 2.Deep Neural Network a.Pretraining b.Rectified Linear Unit c.Drop out 3.Theano library 4.Deep Learning code using Theano 5.Deep Learning for Natural Language Processing a.Gensim library b.automatic word spacing by RNN 7
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 16. Artificial Neural Network 인간 신경망의 구성요소인 뉴런의 동작방식이 모티브가 된 기계학습 시스템. 16
  • 17. 실제 뉴런 vs 인공 뉴런 17
  • 18. 실제 뉴런 vs 인공 뉴런 18
  • 19. 실제 뉴런 vs 인공 뉴런 19
  • 20. 실제 뉴런 vs 인공 뉴런 20
  • 21. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 21
  • 22. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 22
  • 23. 실제 뉴런 vs 인공 뉴런 신호 전달 방향 Weigh t 23
  • 28. Artificial Neural Network Learning 28 Weight Weight Weight
  • 33. Deep Neural Network란…. 3층 이상의 hidden layer를 가진 Artificial Neural Network 33
  • 34. 기존 Deep Learning의 어려움 34
  • 35. 기존 Deep Learning의 어려움 35 deeper than two or three level networks yieled poorer results
  • 36. Deep Learning이 어려운 이유 - Overfitting - Deep nets have lots of parameters - Underfitting - Gradient descent Vanishing 36
  • 37. Deep Learning의 비약적 발전 - Pretraining - Drop Out - Rectified Linear Unit 37
  • 38. Pretraining 성능 38 “Why Does Unsupervised Pre-training Help Deep Learning?” 2010 bengio, - pretraining initialization은 random initialize보다 better local minimum 에 서 시작한다.
  • 40. Pretraining방법 1)Contrastive Divergence a) http://www.quora.com/What-is-contrastive-divergence b) https://www.youtube.com/watch?v=p4Vh_zMw-HQ&index=36&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH 2)AutoEncoder 40
  • 42. Rectified Linear Unit (ReLU) 42 Activation function
  • 43. Rectified Linear Unit (ReLU) 43 Activation function Sigmoid function Rectified Linear Unit
  • 45. Epoch sigmoid ReLU 1 0.7053 0.9433 2 0.8302 0.9647 3 0.8684 0.9723 3 0.8837 0.9737 4 0.89 0.9763 5 0.895 0.9792 ... .... ... ... ... ... 11 0.9116 0.9829 12 0.9127 0.9838 13 0.9142 0.9821 14 0.9152 0.9838 15 0.9159 0.9832 Rectified Linear Unit (ReLU)실험결과 실험 조건 - code : https://github.com/Newmu/The ano-Tutorials - data : mnist 45
  • 49. Data Sets - MNIST - The MNIST database of handwritten digits - 28x28 grayscale images - 10 classes - Cifar10 - The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. - word2vec 49
  • 51. Deep Learning 실험 시작 51
  • 52. Theano 어원 - 여성 수학자 - 피타고라스의 아내 52
  • 53. Deep Learning Library 비교 53 출처 : http://t-robotics.blogspot.kr/2015/06/hw-sw.html#.Vd59KPntlBe
  • 54. Theano - Q) DNN을 자동으로 만들어 주나요?? - A) 아니요, Deep Neural Network를 직접 만들어야 함… 54
  • 55. Theano 55 - DNN model learning library (x) - matrix 연산 등에 유용한 library (o)
  • 56. Why Theano - Definition - Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi- dimensional arrays efficiently. (http://deeplearning.net/software/theano/) - Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python (https://github.com/josephmisiti/awesome-machine-learning) 56
  • 57. Why Theano - cuda code 작성하지 않고, python code로 gpu 연산 수행 - grad(), updates, function() - symbolic function 57
  • 58. Why Theano - grad(), updates, function() gradients = T.grad() 하면 직접 gradient가 계산 된다. ex) x = T.scalar() gx = T.grad(x**2, x) ← x**2를 x에 대해서 gradient 값을 구한다. (= 2x) 58
  • 59. Why Theano - grad(), updates, function() 300 updates = [ 301 (param, param - learning_rate * gparam) 302 for param, gparam in zip(classifier.params, gparams) 303 ] …….. 308 train_model = theano.function( 309 inputs=[index], 310 outputs=cost, 311 updates=updates, 312 givens={ 313 x: train_set_x[index * batch_size: (index + 1) * batch_size], 314 y: train_set_y[index * batch_size: (index + 1) * batch_size] 315 } 316 ) 59
  • 60. Why Theano - grad(), updates, function() 60 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html
  • 61. Why Theano - grad(), updates, function() 61 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html input output
  • 62. Why Theano - grad(), updates, function() 62 This module provides function(), commonly accessed as theano.function, the interface for compiling graphs into callable objects. You’ve already seen example usage in the basic tutorial... something like this: >>> x = theano.tensor.dscalar() >>> f = theano.function([x], 2*x) >>> print f(4) # prints 8.0 http://deeplearning.net/software/theano/library/compile/function.html input output
  • 63. Why Theano - grad(), updates, function() x = dmatrix('x') y = dmatrix('y') z = x + y f = theano.function([x,y], z) scalarscalar scalar
  • 64. Why Theano - grad(), updates, function() x = dmatrix('x') y = dmatrix('y') z = x + y f = theano.function([x,y], z) Theano represents symbolic mathematical computations as graphs scalarscalar scalar
  • 65. Why Theano - grad(), updates, function() x = theano.tensor.dscalar('x') y = theano.tensor.dscalar('y') z = x + y f = theano.function([x,y], z) print f(4,3) array(7.0) scalarscalar scalar
  • 66. Install Theano - Environment : ubuntu 14.04 64bit - Install document : http://deeplearning.net/software/theano/install_ubun tu.html#install-ubuntu 66 $ sudo apt-get install python-numpy python- scipy python-dev python-pip python-nose g++ libopenblas-dev git $ sudo pip install Theano
  • 67. Download Tutorial code $ git clone https://github.com/lisa-lab/DeepLearningTutorials.git Cloning into 'DeepLearningTutorials'... remote: Counting objects: 3652, done. remote: Total 3652 (delta 0), reused 0 (delta 0), pack-reused 3652 Receiving objects: 100% (3652/3652), 7.79 MiB | 2.32 MiB/s, done. Resolving deltas: 100% (2161/2161), done. Checking connectivity... done. $ ls DeepLearningTutorials 67
  • 68. Run DBN DeepLearningTutorials$ cd code DeepLearningTutorials/code$ python DBN.py Using gpu device 0: GeForce GTX 770 Downloading data from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz ... loading data ... building the model ... getting the pretraining functions ... pre-training the model Pre-training layer 0, epoch 0, cost -98.5296 Pre-training layer 0, epoch 1, cost -83.842 Pre-training layer 0, epoch 2, cost -80.688 Pre-training layer 0, epoch 3, cost -79.0362 Pre-training layer 0, epoch 4, cost -77.9295 68
  • 69. DBN.py 13 from logistic_sgd import LogisticRegression, load_data 303 datasets = load_data(dataset) 304 305 train_set_x, train_set_y = datasets[0] 306 valid_set_x, valid_set_y = datasets[1] 307 test_set_x, test_set_y = datasets[2] 69
  • 70. DBN.py 18 # start-snippet-1 19 class DBN(object): ……. 314 print '... building the model' 315 # construct the Deep Belief Network 316 dbn = DBN(numpy_rng=numpy_rng, n_ins=28 * 28, 317 hidden_layers_sizes=[1000, 1000, 1000], 318 n_outs=10) 70 28 * 28 . . . . . . . . . . . . . . . . . . . . . . . . . . 10
  • 71. DBN.py 71 325 pretraining_fns = dbn.pretraining_functions(train_set_x=train_set_x, 326 batch_size=batch_size, 327 k=k) …………… 353 print '... getting the finetuning functions' 354 train_fn, validate_model, test_model = dbn.build_finetune_functions( 355 datasets=datasets, 356 batch_size=batch_size, 357 learning_rate=finetune_lr 358 )
  • 72. DBN.py 72 228 train_fn = theano.function( 229 inputs=[index], 230 outputs=self.finetune_cost, 231 updates=updates, 232 givens={ 233 self.x: train_set_x[ 234 index * batch_size: (index + 1) * batch_size 235 ], 236 self.y: train_set_y[ 237 index * batch_size: (index + 1) * batch_size 238 ] 239 } 240 )
  • 73. DBN.py 73 380 while (epoch < training_epochs) and (not done_looping): 381 epoch = epoch + 1 382 for minibatch_index in xrange(n_train_batches): 383 384 minibatch_avg_cost = train_fn(minibatch_index) 385 iter = (epoch - 1) * n_train_batches + minibatch_index 386 387 if (iter + 1) % validation_frequency == 0: 388 389 validation_losses = validate_model() 390 this_validation_loss = numpy.mean(validation_losses)
  • 74. DNN using ReLU import theano from theano import tensor as T from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams import numpy as np from load import mnist 74
  • 75. DNN using ReLU def floatX(X): return np.asarray(X, dtype=theano.config.floatX) def init_weights(shape): return theano.shared(floatX(np.random.randn(*shape) * 0.01)) def rectify(X): return T.maximum(X, 0.) def softmax(X): e_x = T.exp(X - X.max(axis=1).dimshuffle(0, 'x')) return e_x / e_x.sum(axis=1).dimshuffle(0, 'x') 75
  • 76. DNN using ReLU def model(X, w_h, w_h2, w_o): h = rectify(T.dot(X, w_h)) h2 = rectify(T.dot(h, w_h2)) py_x = softmax(T.dot(h2, w_o)) return h, h2, py_x def prop(cost, params, lr=0.001): grads = T.grad(cost=cost, wrt=params) updates = [] for p, g in zip(params, grads): updates.append((p, p - lr * g)) return updates 76
  • 77. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) 77
  • 78. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) 78
  • 79. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y)) params = [w_h, w_h2, w_o] updates = prop(cost, params, lr=0.001) 79
  • 80. trX, teX, trY, teY = mnist(onehot=True) X = T.fmatrix() Y = T.fmatrix() w_h = init_weights((784, 625)) w_h2 = init_weights((625, 625)) w_o = init_weights((625, 10)) h, h2, py_x = model(X, w_h, w_h2, w_o) y_x = T.argmax(py_x, axis=1) cost = T.mean(T.nnet.categorical_crossentropy(py_x, Y)) params = [w_h, w_h2, w_o] updates = prop(cost, params, lr=0.001) train = theano.function(inputs=[X, Y], outputs=cost, updates=updates) predict = theano.function(inputs=[X], outputs=y_x) 80
  • 81. for i in range(100): for start, end in zip(range(0, len(trX), 128), range(128, len(trX), 128)): cost = train(trX[start:end], trY[start:end]) print np.mean(np.argmax(teY, axis=1) == predict(teX)) 81
  • 83. load_data() 172 def load_data(dataset): …... 193 if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz': 194 dataset = new_path …... 204 print '... loading data' 205 206 # Load the dataset 207 f = gzip.open(dataset, 'rb') 208 train_set, valid_set, test_set = cPickle.load(f) 209 f.close() 83
  • 86. data 만들기1 86 from numpy import genfromtxt import gzip, cPickle ……………. train_set_x = genfromtxt(dir_path+"train_set.x.txt", delimiter=",") ………………….. train_set = train_set_x, train_set_x valid_set = valid_set_x, valid_set_x test_set = test_set_x, test_set_x print "writing to pkl.gz..." data_set = [train_set, valid_set, test_set] print "zip data into a file" f= gzip.open(output_dir+str(i)+"_"+pkl_filename+".pkl.gz",'wb') print "zip data file name is " + str(i)+"_"+pkl_filename+".pkl.gz" cPickle.dump(data_set,f,protocol=2) f.close()
  • 87. for n, sentence in enumerate(file_lines): …………………….. data_batch_fpath= vector_dir+"data_batch_"+str(n)+".npz" ………………………. # save vector list numpy.savez(data_batch_fpath, data=numpy.asarray(sentence_vector_list), labels=label_vector, length=max_length, dim=dimension) 87 data 만들기2
  • 88. save, load model 88 save model load model
  • 90. Theano modes 90 .bashrc 226 # Theano Settings 227 export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32,exception_verbosity=high
  • 91. 91 Deep Learning For Natural Language Processing
  • 92. 92 Deep Learning For Natural Language Processing
  • 93. 93 Deep Learning For Natural Language Processing
  • 94. - 나는 밥을 먹는다 Deep Learning For Natural Language Processing 94 one-hot (1 of K) representation
  • 95. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다. 95 형태소 단위로 분리 one-hot (1 of K) representation
  • 96. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 - 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0] 96 index 0(나) 1(가) 2(는) ... ... ... ... 999(.) 나 1 0 0 0 0 0 0 0 는 0 0 1 0 0 0 0 0 .. 0 0 0 0 0 1 0 0 .. 0 0 0 0 1 0 0 0 다 0 0 0 0 0 0 1 0 형태소 단위로 분리 one-hot (1 of K) representation 문자의 벡터로 표현
  • 97. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 97 형태소 단위로 분리 word2vec representation 문자의 벡터로 표현
  • 98. Deep Learning For Natural Language Processing - 나는 밥을 먹는다 - 나 는 밥 을 먹 는 다 - Word2Vec model - 밥 = [0.323112, -0.021232, …….. , 0.82123123] 98 형태소 단위로 분리 word2vec representation 문자의 벡터로 표현
  • 99. Deep Learning For Natural Language Processing - 밥 = [0,0,0,0,0,0,0,………,0,0,0,0,1,0,0,0,0,0,0] - 밥 = [0.323112, -0.021232, …….. , 0.82123123] 99 word2vec representation one-hot (1 of K) representation
  • 100. Gensim - definition - Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora - word2vec class - word vector representation - multi threading - Skip Gram - Continuous Bag of Words 100
  • 101. Gensim - import, settings 101 # imports 9 from gensim.models.word2vec import LineSentence 10 from gensim.models import word2vec 32 # settings 33 THEADS = 8 # progress with multi threading 34 DIMENSION = 50 35 SKIPGRAM = 1 # 1 is skip gram, 0 is cbow 36 WINDOW_SIZE = 8 37 NTimes = 10 # repeat number of sentences 38 min_count_of_word = 5 ……………….. 65 from gensim import utils
  • 102. Gensim - training, save model 102 97 # load raw sentence 98 sentences = LineSentence(input_train_file_path) 99 # model settings 100 model = word2vec.Word2Vec(size=dimension, workers=THEADS, min_count=min_count_of_word, sg=SKIPGRAM, window=WINDOW_SIZE) 101 102 # build voca and train 103 number_iter = NTimes # number of iterations (epochs) over the corpus 104 model.build_vocab(sentences) 105 106 ss = utils.RepeatCorpusNTimes(sentences, number_iter) 107 model.train(ss) 108 # save model 109 model.save(model_file_name) 110 model.save_word2vec_format(model_file_name + '.bin', binary=True)
  • 103. Gensim - load model, test 103 83 try: 84 model = utils.SaveLoad.load(fname=model_file_name) 85 except: 86 print "failed to load. Retrying by load_word2vec_format() !!" 87 model =word2vec.load_word2vec_format(fname=model_file_name+".bin") 297 x = model [w.decode('utf-8')] 314 mw, score = model.most_similar(positive=[x])[0] 315 print "most similar : ",mw 316 print "target vector :", x
  • 104. ‘서울’의 most similar words 104 most similar words similarity 대구 0.4282917082309723 광주 0.4046330451965332 부산 0.40132588148117065 울산 0.3863871693611145 수원 0.38555505871772766 청주 0.35919708013534546 안양 0.35622960329055786 주왕산 0.3543151617050171 평택 0.3505415618419647 cebu 0.34598737955093384
  • 105. Auto word spacing with Recurrent Neural Network 105 - 0 0 1 0 1 0 0 - 나는 밥을 먹는다 - [0.323112, -0.021232, …….. , 0.82123123]
  • 106. Deep Learning 실험하면서 어려웠던 것들 - layer의 개수, layer당 node의 개수, learning rate, epoch횟수, batch횟수, activation function 선택 등 선택해야할 parameter들이 많다. - parameter 바꿔서 실험결과를 확인하는 데에 오래 걸린다. - big data이기때문에 gpu memory 문제 106
  • 107. Thank you 107 Setting GPU building lmdb for caffe Softmax functionBias Negative Log Likelihood http://goo.gl/forms/IR45liXoQ3