Caffe framework tutorial2

Caffe Framework Tutorial2
Layer, Net, Test

Index
• Layer
– Data
– ImageData
– Convolution
– Pooling
– ReLU
– InnerProduct
– LRN
• Net
– Mnist
– CIFAR-10
– ImageNet (Ilsvrc12)
• Net change Test
– Mnist
– CIFAR-10
• 64x64x3 Image Folder
• 64x64x3 Image Resize To 32x32x3

Data Layer
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file:
"examples/cifar10/mean.binaryproto"
}
data_param {
source:
"examples/cifar10/ilsvrc12_train_lmdb"
batch_size: 100
backend: LMDB
}
}
• Required
– source: the name of the directory
containing the database
– batch_size: the number of inputs to
process at one time
• Optional
– backend [default LEVELDB]: choose
whether to use a LEVELDB or LMDB
LevelDB ,LMDB : efficient databases
• Example
– Data.mdb(41MB)
– Lock.mdb(8.2kB

ImageData Layer
layer {
name: "data"
type: "ImageData"
top: "data"
top: "label"
transform_param {
mirror: false
crop_size: 227
mean_file:
"data/ilsvrc12/imagenet_mean.binaryproto"
}
image_data_param {
source: "examples/_temp/file_list.txt"
batch_size: 50
new_height: 256
new_width: 256
}
}
• Required
– source: name of a text file, with each
line giving an image filename and label
– batch_size: number of images to batch
together

Convolution Layer
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1“
param { lr_mult: 1 } # learning rate for the filters
param { lr_mult: 2 } # learning rate for the biases
convolution_param {
num_output: 32 # learn 32 filters
kernel_size: 5 # each filter is 5x5
pad: 2
stride: 1 # step 1 pixels between each filter application
weight_filler {
type: "gaussian“# initialize the filters from a Gaussian
std: 0.0001 # default mean: 0
}
# initialize the biases to zero (0)
bias_filler { type: "constant" }
}
}
• Required
– num_output (c_o): the number of
filters
– kernel_size (or kernel_h and kern
el_w): specifies height and width
of each filter
• Strongly Recommended
– weight_filler [default type:
'constant' value: 0]
• Optional
– pad [default 0]: specifies the
number of pixels to (implicitly)
add to each side of the input
– stride [default 1]: specifies the
intervals at which to apply the
filters to the input

Pooling Layer
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
• Required
– kernel_size : specifies height and width of
each filter
• Optional
– pool [default MAX]: the pooling method.
Currently MAX, AVE, or STOCHASTIC
– pad [default 0]: specifies the number of
pixels to (implicitly) add to each side of
the input
– stride [default 1]: specifies the intervals at
which to apply the filters to the input
• 예) stride 2 : step two pixels (in the
bottom blob) between pooling regions

ReLU Layer (Rectified-Linear and Leaky-ReLU)
Rectified 정류된, Leaky-ReLU 새는, 구멍이 난
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1" }
• Parameters optional
• negative_slope [default 0]:
– specifies whether to leak the
negative part by multiplying it
with the slope value rather than
setting it to 0.
• Input x, Compute Output y
• y = x (if x > 0)
• y = x * negative_slope (if x <= 0)

Why ReLU, Drop-Out!
• Drop-Out
– 2014. Toronto. paper title
• Dropout : A Simple Way to Prevent Neural Networks from Overfitting
• The key idea is to randomly drop units (along with their connections) from the
neural network during training
– Regularizer 의 일종
– Hidden Node를 모두 훈련시키지 않고, Random하게 Drop Out시킨다
• 관련된 Weight들은 훈련되지 않는다.

Drop-Out
• … using ReLUs trained with dropout during frame level training provide an 4.2% relative
improvement over a DNN trained with sigmoid units…

Rectified-Linear unit(ReLU)
• Drop-Out 은 학습이 느리다.
– Drop Out된 weight들은 학습이 일어나지 않는다.
• Non-Linear Activation Function의 교체
– 일반적으로 사용되는 Logistic Sigmoid, tanh 대신 ReLu 사용
• ReLU의 장점
– reduced likelihood of the gradient to vanish
– Sparsity

InnerProduct Layer
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool3"
top: "ip1"
param { lr_mult: 1 } # learning rate for the filters
param { lr_mult: 2 } # learning rate for the biases
inner_product_param {
num_output: 64
weight_filler {
type: "gaussian"
std: 0.1
}
bias_filler {type: "constant“ }
# initialize the biases to zero (0)
}
}
• Required
– num_output (c_o): the number
of filters
• Input
– n * 컬러채널 * height * width
• Output
– n * c_o

LRN Layer (Local Response Normalization)
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
• performs a kind of “lateral
inhibition - 측면억제(側面抑制)” by
normalizing over local input
regions
• Each input value is divided
by (1+(α/n)∑ix2i)β,
– where n is the size of each local region,
and the sum is taken over the region
centered at that value
• Optional
– local_size [default 5]: the number of
channels to sum over (for cross channel
LRN) or the side length of the square
region to sum over (for within channel
LRN)
– alpha [default 1]: the scaling parameter
– beta [default 5]: the exponent

CIFAR-10 (2010, Hinton/Krizhevsky)
• 10 classes
• 32x32 color image
• 6,000 images per class
• 5,000 training / 1,000
test images per class
• 전체 60,000장 =
(5,000+1,000) 개 * 10개
클래스

Ilsvrc12
(ImageNet large Scale Visual Recognition Challenge 2012)
• AlexNet
• 1.3 million high-resolution images
– Resize to 224x224x3
• Class 1,000
• 150,000 per class
• Net
– five convolutional layers
– two globally connected layers
– final 1000-way softmax.

• Caffe/examples/cifar10 모델 사용
• 응 교수의 matlab homework
– 64x64x3 image
– 4 클래스(비행기, 자동차,고양이, 개)
– Train(클래스당 500개), Test(클래스당 500개)
• 준비
– Resize 64x64x3 -> 32x32x3
– Mean data

Mnist Model net
Convolution1
Pooling1
Convolution2
Pooling2
ReLU1
InneProduct1
Accuracy
Data
SoftmaxWithLoss
ReLU2

CNN Exercise net in MATLAB
Convolution1
Pooling1
Convolution2
Pooling2
Softmax
1000 iterations
Data
Feature Extracted
Data

cifar10 net
Convolution1 Pooling1
Convolution2 Pooling2
ReLU1
ReLU2
Convolution3 Pooling3ReLU3
InnerProduct1 InnerProduct2
Accuracy
Data
SoftmaxWithLoss

Ilsvrc12(ImageNet large Scale Visual Recognition Challenge 2012)
https://github.com/dnouri/skynet/blob/master/cuda-convnet/imagenet-layers/layers-imagenet.cfg
Conv1 F48 ReLU1
Conv2 F32 Pool2 S2
Pool1 S2
ReLU2
Conv3 F96
Conv4 F32
ReLU3
InnerProduct8
Accuracy
Data
SoftmaxWithLoss
LRN1
LRN2
ReLU4
Conv5 F32 ReLU5 Pool5 S2
InnerProduct6 ReLU6 Dropout6 0.5
InnerProduct7 ReLU7 Dropout7 0.5

Network Visualization
http://cs.stanford.edu/people/karpathy/convnetjs/
• Conv (32x32x16)
– filter (5x5x3)
– stride 1
• relu (32x32x16)
• pool (16x16x16)
– Filter (2x2,)
• conv (16x16x20)
– Filter (5x5x16)
• relu (16x16x20)
• pool (8x8x20)
– Filter (2x2)
• conv (8x8x20)
– Filter (5x5x20)
• relu (8x8x20)
• pool (4x4x20)
– Filter (2x2)
'airplane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck

Network Visualization
http://cs.stanford.edu/people/karpathy/convnetjs/
• Conv (32x32x16)
– filter (5x5x3)
– stride 1
• relu (32x32x16)
• pool (16x16x16)
– Filter (2x2,)
• conv (16x16x20)
– Filter (5x5x16)
• relu (16x16x20)
• pool (8x8x20)
– Filter (2x2)
• conv (8x8x20)
– Filter (5x5x20)
• relu (8x8x20)
• pool (4x4x20)
– Filter (2x2)
• InnerProduct
– (1x1x10)
• softmax
– (1x1x10)
• Input (32x32x3)
'airplane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck

ImageNet 사용법
• 보조 데이터 다운로드
– $ ./data/ilsvrc12/get_ilsvrc_aux.sh
• backup folder : caffe/examples/imagenet
• 수정 Create_imagenet.sh
– RESIZE=false -> RESIZE=true
• Create the leveldbs with
– $ ./examples/imagenet/create_imagenet.sh
• Create mean data
– $ ./examples/imagenet/make_imagenet_mean.sh
– 생성 : data/ilsvrc12/imagenet_mean.binaryproto
• Train Start
– $ ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

ImageNet Data Prepare
• Requirement
– Save Image Files
• /path/to/imagenet/train/n01440764/n01440764_10026.JPEG
• /path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG
– Describe Folder
• train.txt
– n01440764/n01440764_10026.JPEG 0
– n01440764/n01440764_10027.JPEG 1
• val.txt
– ILSVRC2012_val_00004614.JPEG 954
– ILSVRC2012_val_00004615.JPEG 211
• Let’s Make lmdb
– $ ./examples/imagenet/create_imagenet.sh
• ilsvrc12_train_lmdb for Train
• ilsvrc12_val_lmdb for validate

Mnist Model net 수정 실험
• Iteration 100
• Basic model loss:0.34
• Remove convoution1, pooling1,
– loss:0.411
• Remove ReLU1
– loss:0.426
• Remove InnerProduct1
– loss:0.450
• Remove pooling2
– loss:0.522
• Remove convolution2
– loss:0.753
Convolution1
Pooling1
Convolution2
Pooling2
ReLU1
InnerProduct1
InnerProduct2
Accuracy
Data
SoftmaxWithLoss

Test
• Classify (Airplane, car, cat, dog)
– 이미지 사이즈 64x64x3
• Model
– Matlab cnn net
• softMax Layer에서만 학습 Iteratoin = 1,000
• 이미지 사이즈 조절 없음
– cifar10 net
• Iteration 5000
• 이미지 사이즈 64x64x3 -> 32x32x3 으로 조절해서 DB 생성
– Cifar10 net
• Iteration 5000
• 사이즈 조절 없이 DB 생성
– Labellio
• deep learning web service

Prepare 1/2. Make .mdb
• create_imagenet.sh
RESIZE=true
if $RESIZE; then
RESIZE_HEIGHT=32
RESIZE_WIDTH=32
else
RESIZE_HEIGHT=0
RESIZE_WIDTH=0
Fi
GLOG_logtostderr=1
$TOOLS/convert_imageset
--resize_height=$RESIZE_HEIGHT
--resize_width=$RESIZE_WIDTH
--shuffle
$TRAIN_DATA_ROOT
$DATA/train.txt
$EXAMPLE/ilsvrc12_train_lmdb
• 1. 64x64x3 이미지들을 폴더에 저장하고
• 2. train.txt 파일에 이미지 경로와 라벨을
모두 적는다.
• 3. lmdb 데이터 베이스를 만든다.
• lmdb 데이터 베이스를 만들 때 사이즈를
조절 할 수 있다.
– 사이즈를 조절 시 RESIZE=true
– 사이즈를 조절 NO RESIZE=false
– 사이즈를 조절 시 Data.mdb(32.9MB)
– 사이즈를 조절 NO Data.mdb(8.3MB)
– Lock.mbd 크기는 항상 고정 이다 8.2 kB

Prepare 2/2. Make mean data
• make_imagenet_mean.sh
EXAMPLE=examples/imagenet
DATA=data/ilsvrc12
TOOLS=build/tools
$TOOLS/compute_image_mean
$EXAMPLE/ilsvrc12_train_lmdb
$DATA/mean.binaryproto

Result
• Matlab cnn net
– softMax Layer에서만 학습 Iteratoin = 1,000
– 이미지 사이즈 조절 없음
– Testing accuracy = 0.8
• cifar10 net
– 이미지 사이즈 64x64x3 -> 32x32x3 으로 조절 해서 db생성
– Iteration 5,000, loss = 0.00059
– Testing accuracy = 0.755, loss=1.58
– Overfitting!
• Cifar10 net
– Iteration 5000, traning loss = 0.00026
– Test accuracy = 0.724, loss=1.94
– worse overfitting!
• Labellio
– Training accuracy = 0.66
– Test accuracy = ?

참고
• Face Feature Recognition System with
Deep Belief Networks, for Korean/KIISE
Thesis
– http://www.slideshare.net/sogo1127/face-
feature-recognition-system-with-deep-belief-
networks-for-korean
• Labellio
– http://devblogs.nvidia.com/parallelforall/label
lio-scalable-cloud-architecture-efficient-
multi-gpu-deep-learning/

Caffe framework tutorial2

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Caffe framework tutorial2

Ähnlich wie Caffe framework tutorial2 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Caffe framework tutorial2

Hinweis der Redaktion