SeRanet is super resolution software that uses deep learning to enhance low-resolution images. It introduces concepts of "split" and "splice" where the input image is divided into four branches representing different pixel regions, and these branches are fused to form the output image. This approach provides flexibility in model design compared to processing the entire image as once. SeRanet also uses a technique called "fusion" where it combines two different CNNs - one for the main task and one for an auxiliary task - to leverage their complementary representations and improve performance. Experimental results show SeRanet produces higher quality super resolution than conventional methods like bicubic resizing as well as other deep learning based methods like waifu2x.
2. Table of contents
Introduction
Machine learning
Deep learning
SRCNN
Problem
Introduction of previous works
“Image Super-Resolution Using Deep Convolutional Networks”
waifu2x
SeRanet
Sprice
Fusion
CNN model
Result
Performance
Conclusion
4. What is machine learning
There are 3 major category in machine learning
・Supervised learning
Pile of input data and “correct/labeled” output data is given during the training.
Goal: train software to output “correct/label” value from given input data.
Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))
Voice recognition (input: human voice data, output: text which human speaks))
・Unsupervised learning
Only a lot of input data is given.
Goal: categorize data based on the data statistics (find out existing deviation in the data)
Ex. Categorize the type of cancer
Link users who have similar interests in the web application for recommendation
・Reinforcement learning
The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes
interference with environment and agent gets some “reward”.
Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO
Robot self learning: how to control own parts
5. What is machine learning
There are 3 major category in machine learning
・Supervised learning
Pile of input data and “correct/labeled” output data is given during the training.
Goal: train software to output “correct/label” value from given input data.
Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))
Voice recognition (input: human voice data, output: text which human speaks))
・Unsupervised learning
Only a lot of input data is given.
Goal: categorize data based on the data statistics (find out existing deviation in the data)
Ex. Categorize the type of cancer
Link users who have similar interests in the web application for recommendation
・Reinforcement learning
The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes
interference with environment and agent gets some “reward”.
Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO
Robot self learning: how to control own parts
SeRanet uses this machine learning
6. Deep learning
“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-
level abstractions in data by using a deep graph with multiple processing layers, composed of multiple
linear and non-linear transformations.”
cite from Wikipedia “Deep Learning”
Input Output
8. Super resolution task by machine learning
Problem definition
・You are given a compressed picture with half size.
Recover original picture and output it.
Training phase:
Map
The goal of this machine learning is to construct a map to convert compressed picture
into original picture (as close as possible).
Original pictureCompressed picture
(half size)
9. Super resolution task by machine learning
After training
・Input: arbitrary picture → Output: twice size picture with super resolution
Twice size picture
High quality
Picture to be enlarged
map
obtained by machine learning
10. Representation of the “map”
Deep Convolutional Neural Network (CNN) is used.
- Current trend for image recognition task
11. Previous work ①
“Image Super-Resolution Using Deep Convolutional Networks”
Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang
https://arxiv.org/abs/1501.00092
・The original paper which suggest “SRCNN”.
It reports that superior result is obtained for super resolution
using Convolutional Neural Network.
In this slide, this work paper be denoted as
“SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
12. Algorithm summary
1.Read picture/image file
2.Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size)
3.Convert RGB format into YCbCr format, and extract Y Channel
4.Normalization: Convert value range from 0-255 to 0-1
5.Input Y Channel data into CNN
As output, we obtain Y channel data with normalized value
6.Revert value range to 0-255
7.CbCr Channel is enlarged by conventional method like Bicubic method etc.
Compose obtained Y channel and CbCr Channel to get final result.
※ 3., 7. can be skipped when you construct CNN with input/output RGB Channel
13. Remark of algorithm ①
・Deal with only Y Channel as input/output data of CNN
- Human is more sensitive to luminance (Y) than color difference chrominance (CrCb).
- Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channelYCbCr
decomposition
14. Remark of algorithm ①
・Deal with only Y Channel as input/output data of CNN
- Human is more sensitive to luminance (Y) than color difference chrominance (CrCb).
- Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channelYCbCr
decomposition
・RGB Channel training is difficult?
SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data.
The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.
15. Remark of algorithm ②
・Enlarge the picture/image data in advance before input to CNN
SRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge
picture before input the data to CNN
[Reason]
Input/output picture size is almost same when you implement convolutional neural network by machine learning
library.
(※Regolously, output picture size will be smaller by filter size - 1)
input output
CNN
- Enlarge picture
- Y channel extraction
Enlarge picture, CNN is not used for Cr・Cb Channel
compose
16. Previous work ① CNN model
CNN Layer1 Layer2 Layer3
In channel 1 32 64
Out channel 32 64 1
Kernel size 9 5 5
# of parameter 2628 51264 1664
# of convolution 2592×4WH 51200×4WH 1600×4WH
Relatively shallow CNN architecture with big kernel size
SRCNN paper’s CNN model, one example (many other parameters are tested in the paper)
※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel
※# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel
Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total.
total # of parameter:55556
total # of convolution:55392×4WH
17. Previous work ② waifu2x
waifu2x
https://github.com/nagadomi/waifu2x
The term “waifu” comes from Japanese pronunciation of “wife”
(Japanese uses the term “wife” to their favorite female anime character)
https://github.com/nagadomi/waifu2x
Open source software, originally published to enlarge art-style image
It also supports picture style now.
You can test the application on server.
http://waifu2x.udp.jp/
18. Previous work ② waifu2x
waifu2x is open source software, which makes other software engineers to develop the related
software.
Many of the derivative software is published now.
[Related links (in Japanese)]
・waifu2xとその派生ソフト一覧 http://kourindrug.sakura.ne.jp/waifu2x.html
・サルでも分かる waifu2x のアルゴリズム
https://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view
19. 先行ソフト② waifu2x CNN model
畳み込みのKernel sizeを3と小さくとる分、深いニューラルネットを構成している。
CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7
In channel 1 32 32 64 64 128 128
Out channel 32 32 64 64 128 128 1
Kernel size 3 3 3 3 3 3 3
# of parameter 320 9248 18496 36928 73856 147584 1153
# of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH
Deep CNN architecture with small kernel size
total # of parameter :287585
total # of convolution :287136×4WH
20. What is special for SRCNN task
The big difference point with image reconition task
1.Position sensitivity is required
+ Image recognition task:
- Translation invariant property is welcomed and Max pooling or Stride technique is often utilized.
+ SRCNN task:
- Translation variant property is necessary for super resolution to since it requires position-dependent
output.
2.Feature map image size don’t reduce during the CNN image processing.
→As the number of feature map increases, amount of calculation increases
Speed/memory restriction is severe
Required memory for CNN ≒ The volume of rectangular in CNN model figure
For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN.
The number of feature map can be bigger if the image size is smaller.
21. Table of contents
Explanation of SeRanet project starts from here.
Introduction of the idea behind SeRanet.
SeRanet
Idea 1 Sprice
Idea 2 Fusion
SeRanet CNN model
22. SeRanet Idea 1 Splice
Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output
→ Introduce “Split” and “Splice” concept
input size: w × h
Split
Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image
Split: 4 branches of neural network (NN) with size w × h will be created
Splice
output size: 2w × 2h
LU
RU
LD
RD
23. SeRanet Idea 1 Splice
After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down
(LD), Right-Down(RD) pixel of enlarged picture.
Split Splice
Input image Output image
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1)
(1, 1)(1, 1)
(1, 2)
(1, 2)(1, 2)
(2, 1)
(2, 1)
(2, 1) (2, 2)
(2, 2)
(2, 2)
At splice phase, 4 branches of CNN will be combined/spliced to get twice size image
LU RU
LD RD
24. The effect of introducing Splice
→ Flexibility of neural network modelling
Input w × h
Split Splice
Output 2w × 2h
3rd Phase2nd Phase1st Phase
1st Phase:Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase.
→The larger size of feature map, Kernel size is accepted in this phase.
2nd Phase:4 branches CNN with image size wxh.
Total calculation amount is same with 3rd Phase, but the parameter learned at each branch
(LU, RU, LD, RD) can be different.
→model representation potential will grow.
Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image.
3rd Phase:Image size 2wx2h. The last phase to get output.
Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.
26. Fusion…
The method has introduced in Colorization paper
・“Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors
for Automatic Image Colorization with Simultaneous Classification”
Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa
http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
The research aims to convert
monotone input image to
colorful output image through
CNN supervised learning.
https://github.com/satoshiiizuka/siggraph2016_colorization
Input
Output
27. Neural network used in Colorization paper
Upper CNN: Main CNN used for colorization
Lower CNN: This CNN is trained for image classification
So, different purpose CNN is utilized to help improve the performance of main CNN.
Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”.
The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN.
Example of how global feature helps colorization (Read paper for detail)
- It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside.
- It reduces to mistakenly use brown ground color when picture is taken on the sea.
http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
28. SeRanet Idea 2 Fusion
SeRanet combines/fusions 2 types of CNN at 1st Phase.
Purpose: Combining different type non-linear activation to get wide variety of model representation
※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development.
Upper CNN uses Leaky ReLU for Activation
Lower CNN uses Sigmoid for Activation
29. SeRanet CNN model
CNN model of seranet_v1
Split Splice
CNN Layer1 Layer2 Layer3
In channel 3 64 64
Out channel 64 64 128
Kernel size 5 5 5
# of parameter 4864 102464 204928
# of convolution 4800×WH 102400×WH 204800×WH
Layer4 Layer5 Layer6
256 512 256
512 256 128
1 1 3
131584 131584 295040
131072×WH 131072×WH 294912×WH
Layer7 Layer8 Layer9 Layer10
128 128 128 128
128 128 128 3
3 3 3 3
147584 147584 147584 3584
147456×4WH 147456×4WH 147456×4WH 3456×4WH
Fusion
total # of parameter :3303680
total # of convolution :1159150×4WH
×2 ×4
3rd Phase2nd Phase1st Phase
30. Comparison
Parameter: 10 times more than waifu2x
Convolution: 4 times
The number of parameter increases more compared to the number of convolution(calculation) increase.
This is because SeRanet have position-dependent paramter (LU, RU, LD, RD).
→ Question: The increase of parameter and calculation results in better performance???
Model SRCNN paper waifu2x SeRanet_v1
Total parameter 55556 287585 3303680
Total convolution 55392×4WH 287136×4WH 1159150×4WH
31. Table of contents
Result
Performance Comparison between various resize methods
・ Bicubic
・ Lanczos
・ waifu2x
・ SeRanet
Conventional resize method
Resize through CNN
- Forward/backward slide for comparison
- Slideshare may be difficult to find the difference, See the link for comparison
https://github.com/corochann/SeRanet
38. The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel)
Result
Original data (ground truth data, for reference)
39. Performance Comparison between various resize methods
*Based on personal feeling
・ Bicubic
・ Lanczos
・ waifu2x
・ SeRanet
・ Original image
(comparison by specific mesurement is not done yet)
Almost same
Almost same
Different
Different
Conventional resize method
Resize through CNN
Result
40. Summary
SeRanet
・ Big size CNN is used(Depth 9 layer, total parameter 3303680 )
・RGB 3 Channel is used for input/output of CNN instead of only Y Channel
・Split, Splicing CNN
Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter
・Fusion
Different non-linearity is combined for flexibility of model representation
・Convolutional RBM Pretraining
The performance still not matured yet,
we may improve more to get the output more close to original image.
41. At last,,,
+ The project is open source project, on github
https://github.com/corochann/SeRanet
+ Improvement idea, discussion welcome
+ My Blog: http://corochann.com/
* If there is in-appropriate citing, please let me know.
* SeRanet is personal project, I may be misunderstanding.
Please let me know if there’s wrong information.