SlideShare a Scribd company logo
1 of 41
SeRanet
Super resolution software through Deep Learning
https://github.com/corochann/SeRanet
Table of contents
Introduction
Machine learning
Deep learning
SRCNN
Problem
Introduction of previous works
“Image Super-Resolution Using Deep Convolutional Networks”
waifu2x
SeRanet
Sprice
Fusion
CNN model
Result
Performance
Conclusion
Table of contents
Introduction
Machine learning
Deep learning
What is machine learning
There are 3 major category in machine learning
・Supervised learning
Pile of input data and “correct/labeled” output data is given during the training.
Goal: train software to output “correct/label” value from given input data.
Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))
Voice recognition (input: human voice data, output: text which human speaks))
・Unsupervised learning
Only a lot of input data is given.
Goal: categorize data based on the data statistics (find out existing deviation in the data)
Ex. Categorize the type of cancer
Link users who have similar interests in the web application for recommendation
・Reinforcement learning
The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes
interference with environment and agent gets some “reward”.
Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO
Robot self learning: how to control own parts
What is machine learning
There are 3 major category in machine learning
・Supervised learning
Pile of input data and “correct/labeled” output data is given during the training.
Goal: train software to output “correct/label” value from given input data.
Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.))
Voice recognition (input: human voice data, output: text which human speaks))
・Unsupervised learning
Only a lot of input data is given.
Goal: categorize data based on the data statistics (find out existing deviation in the data)
Ex. Categorize the type of cancer
Link users who have similar interests in the web application for recommendation
・Reinforcement learning
The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes
interference with environment and agent gets some “reward”.
Goal: Find out an action which maximizes the reward agent can gain.
Ex. Deepmind DQN, Alpha GO
Robot self learning: how to control own parts
SeRanet uses this machine learning
Deep learning
“Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-
level abstractions in data by using a deep graph with multiple processing layers, composed of multiple
linear and non-linear transformations.”
cite from Wikipedia “Deep Learning”
Input Output
Table of contents
SRCNN
Problem
Introduction of previous works
“Image Super-Resolution Using Deep Convolutional Networks”
waifu2x
Super resolution task by machine learning
Problem definition
・You are given a compressed picture with half size.
Recover original picture and output it.
Training phase:
Map
The goal of this machine learning is to construct a map to convert compressed picture
into original picture (as close as possible).
Original pictureCompressed picture
(half size)
Super resolution task by machine learning
After training
・Input: arbitrary picture → Output: twice size picture with super resolution
Twice size picture
High quality
Picture to be enlarged
map
obtained by machine learning
Representation of the “map”
Deep Convolutional Neural Network (CNN) is used.
- Current trend for image recognition task
Previous work ①
“Image Super-Resolution Using Deep Convolutional Networks”
Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang
https://arxiv.org/abs/1501.00092
・The original paper which suggest “SRCNN”.
It reports that superior result is obtained for super resolution
using Convolutional Neural Network.
In this slide, this work paper be denoted as
“SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
Algorithm summary
1.Read picture/image file
2.Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size)
3.Convert RGB format into YCbCr format, and extract Y Channel
4.Normalization: Convert value range from 0-255 to 0-1
5.Input Y Channel data into CNN
As output, we obtain Y channel data with normalized value
6.Revert value range to 0-255
7.CbCr Channel is enlarged by conventional method like Bicubic method etc.
Compose obtained Y channel and CbCr Channel to get final result.
※ 3., 7. can be skipped when you construct CNN with input/output RGB Channel
Remark of algorithm ①
・Deal with only Y Channel as input/output data of CNN
- Human is more sensitive to luminance (Y) than color difference chrominance (CrCb).
- Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channelYCbCr
decomposition
Remark of algorithm ①
・Deal with only Y Channel as input/output data of CNN
- Human is more sensitive to luminance (Y) than color difference chrominance (CrCb).
- Only deal Y channel to reduce the problem complexity of learning CNN.
Y channel Cr channel Cb channelYCbCr
decomposition
・RGB Channel training is difficult?
SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data.
The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.
Remark of algorithm ②
・Enlarge the picture/image data in advance before input to CNN
SRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge
picture before input the data to CNN
[Reason]
Input/output picture size is almost same when you implement convolutional neural network by machine learning
library.
(※Regolously, output picture size will be smaller by filter size - 1)
input output
CNN
- Enlarge picture
- Y channel extraction
Enlarge picture, CNN is not used for Cr・Cb Channel
compose
Previous work ① CNN model
CNN Layer1 Layer2 Layer3
In channel 1 32 64
Out channel 32 64 1
Kernel size 9 5 5
# of parameter 2628 51264 1664
# of convolution 2592×4WH 51200×4WH 1600×4WH
Relatively shallow CNN architecture with big kernel size
SRCNN paper’s CNN model, one example (many other parameters are tested in the paper)
※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel
※# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel
Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total.
total # of parameter:55556
total # of convolution:55392×4WH
Previous work ② waifu2x
waifu2x
https://github.com/nagadomi/waifu2x
The term “waifu” comes from Japanese pronunciation of “wife”
(Japanese uses the term “wife” to their favorite female anime character)
https://github.com/nagadomi/waifu2x
Open source software, originally published to enlarge art-style image
It also supports picture style now.
You can test the application on server.
http://waifu2x.udp.jp/
Previous work ② waifu2x
waifu2x is open source software, which makes other software engineers to develop the related
software.
Many of the derivative software is published now.
[Related links (in Japanese)]
・waifu2xとその派生ソフト一覧 http://kourindrug.sakura.ne.jp/waifu2x.html
・サルでも分かる waifu2x のアルゴリズム
https://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view
先行ソフト② waifu2x CNN model
畳み込みのKernel sizeを3と小さくとる分、深いニューラルネットを構成している。
CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7
In channel 1 32 32 64 64 128 128
Out channel 32 32 64 64 128 128 1
Kernel size 3 3 3 3 3 3 3
# of parameter 320 9248 18496 36928 73856 147584 1153
# of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH
Deep CNN architecture with small kernel size
total # of parameter :287585
total # of convolution :287136×4WH
What is special for SRCNN task
The big difference point with image reconition task
1.Position sensitivity is required
+ Image recognition task:
- Translation invariant property is welcomed and Max pooling or Stride technique is often utilized.
+ SRCNN task:
- Translation variant property is necessary for super resolution to since it requires position-dependent
output.
2.Feature map image size don’t reduce during the CNN image processing.
→As the number of feature map increases, amount of calculation increases
Speed/memory restriction is severe
Required memory for CNN ≒ The volume of rectangular in CNN model figure
For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN.
The number of feature map can be bigger if the image size is smaller.
Table of contents
Explanation of SeRanet project starts from here.
Introduction of the idea behind SeRanet.
SeRanet
Idea 1 Sprice
Idea 2 Fusion
SeRanet CNN model
SeRanet Idea 1 Splice
Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output
→ Introduce “Split” and “Splice” concept
input size: w × h
Split
Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image
Split: 4 branches of neural network (NN) with size w × h will be created
Splice
output size: 2w × 2h
LU
RU
LD
RD
SeRanet Idea 1 Splice
After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down
(LD), Right-Down(RD) pixel of enlarged picture.
Split Splice
Input image Output image
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1) (1, 2)
(2, 1) (2, 2)
(1, 1)
(1, 1)(1, 1)
(1, 2)
(1, 2)(1, 2)
(2, 1)
(2, 1)
(2, 1) (2, 2)
(2, 2)
(2, 2)
At splice phase, 4 branches of CNN will be combined/spliced to get twice size image
LU RU
LD RD
The effect of introducing Splice
→ Flexibility of neural network modelling
Input w × h
Split Splice
Output 2w × 2h
3rd Phase2nd Phase1st Phase
1st Phase:Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase.
→The larger size of feature map, Kernel size is accepted in this phase.
2nd Phase:4 branches CNN with image size wxh.
Total calculation amount is same with 3rd Phase, but the parameter learned at each branch
(LU, RU, LD, RD) can be different.
→model representation potential will grow.
Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image.
3rd Phase:Image size 2wx2h. The last phase to get output.
Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.
目次
SeRanet
Idea 1 Sprice
Idea 2 Fusion
SeRanet CNN model
Fusion…
The method has introduced in Colorization paper
・“Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors
for Automatic Image Colorization with Simultaneous Classification”
Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa
http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
The research aims to convert
monotone input image to
colorful output image through
CNN supervised learning.
https://github.com/satoshiiizuka/siggraph2016_colorization
Input
Output
Neural network used in Colorization paper
Upper CNN: Main CNN used for colorization
Lower CNN: This CNN is trained for image classification
So, different purpose CNN is utilized to help improve the performance of main CNN.
Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”.
The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN.
Example of how global feature helps colorization (Read paper for detail)
- It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside.
- It reduces to mistakenly use brown ground color when picture is taken on the sea.
http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
SeRanet Idea 2 Fusion
SeRanet combines/fusions 2 types of CNN at 1st Phase.
Purpose: Combining different type non-linear activation to get wide variety of model representation
※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development.
Upper CNN uses Leaky ReLU for Activation
Lower CNN uses Sigmoid for Activation
SeRanet CNN model
CNN model of seranet_v1
Split Splice
CNN Layer1 Layer2 Layer3
In channel 3 64 64
Out channel 64 64 128
Kernel size 5 5 5
# of parameter 4864 102464 204928
# of convolution 4800×WH 102400×WH 204800×WH
Layer4 Layer5 Layer6
256 512 256
512 256 128
1 1 3
131584 131584 295040
131072×WH 131072×WH 294912×WH
Layer7 Layer8 Layer9 Layer10
128 128 128 128
128 128 128 3
3 3 3 3
147584 147584 147584 3584
147456×4WH 147456×4WH 147456×4WH 3456×4WH
Fusion
total # of parameter :3303680
total # of convolution :1159150×4WH
×2 ×4
3rd Phase2nd Phase1st Phase
Comparison
Parameter: 10 times more than waifu2x
Convolution: 4 times
The number of parameter increases more compared to the number of convolution(calculation) increase.
This is because SeRanet have position-dependent paramter (LU, RU, LD, RD).
→ Question: The increase of parameter and calculation results in better performance???
Model SRCNN paper waifu2x SeRanet_v1
Total parameter 55556 287585 3303680
Total convolution 55392×4WH 287136×4WH 1159150×4WH
Table of contents
Result
Performance Comparison between various resize methods
・ Bicubic
・ Lanczos
・ waifu2x
・ SeRanet
Conventional resize method
Resize through CNN
- Forward/backward slide for comparison
- Slideshare may be difficult to find the difference, See the link for comparison
https://github.com/corochann/SeRanet
Result
Input picture
Result
Bicubic (OpenCV resize method is used)
Result
Lanczos (OpenCV resize method is used)
Result
waifu2x (http://waifu2x.udp.jp/, Style:photo, Noise reduction: None, Upscaling: 2x)
Result
SeRanet
Result
Original data (ground truth data, for reference)
The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel)
Result
Original data (ground truth data, for reference)
Performance Comparison between various resize methods
*Based on personal feeling
・ Bicubic
・ Lanczos
・ waifu2x
・ SeRanet
・ Original image
(comparison by specific mesurement is not done yet)
Almost same
Almost same
Different
Different
Conventional resize method
Resize through CNN
Result
Summary
SeRanet
・ Big size CNN is used(Depth 9 layer, total parameter 3303680 )
・RGB 3 Channel is used for input/output of CNN instead of only Y Channel
・Split, Splicing CNN
Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter
・Fusion
Different non-linearity is combined for flexibility of model representation
・Convolutional RBM Pretraining
The performance still not matured yet,
we may improve more to get the output more close to original image.
At last,,,
+ The project is open source project, on github
https://github.com/corochann/SeRanet
+ Improvement idea, discussion welcome
+ My Blog: http://corochann.com/
* If there is in-appropriate citing, please let me know.
* SeRanet is personal project, I may be misunderstanding.
Please let me know if there’s wrong information.

More Related Content

What's hot

Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
Dmytro Mishkin
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
JaeJun Yoo
 

What's hot (20)

[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
AlexNet
AlexNetAlexNet
AlexNet
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Batch normalization
Batch normalizationBatch normalization
Batch normalization
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Image super resolution
Image super resolutionImage super resolution
Image super resolution
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 

Similar to SeRanet introduction

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Sitakanta Mishra
 

Similar to SeRanet introduction (20)

Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Devanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural NetworkDevanagari Digit and Character Recognition Using Convolutional Neural Network
Devanagari Digit and Character Recognition Using Convolutional Neural Network
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1
 
Report face recognition : ArganRecogn
Report face recognition :  ArganRecognReport face recognition :  ArganRecogn
Report face recognition : ArganRecogn
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
Real Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep LearningReal Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep Learning
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
IPT.pdf
IPT.pdfIPT.pdf
IPT.pdf
 
Scene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural NetworkScene recognition using Convolutional Neural Network
Scene recognition using Convolutional Neural Network
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

SeRanet introduction

  • 1. SeRanet Super resolution software through Deep Learning https://github.com/corochann/SeRanet
  • 2. Table of contents Introduction Machine learning Deep learning SRCNN Problem Introduction of previous works “Image Super-Resolution Using Deep Convolutional Networks” waifu2x SeRanet Sprice Fusion CNN model Result Performance Conclusion
  • 3. Table of contents Introduction Machine learning Deep learning
  • 4. What is machine learning There are 3 major category in machine learning ・Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks)) ・Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation ・Reinforcement learning The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”. Goal: Find out an action which maximizes the reward agent can gain. Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts
  • 5. What is machine learning There are 3 major category in machine learning ・Supervised learning Pile of input data and “correct/labeled” output data is given during the training. Goal: train software to output “correct/label” value from given input data. Ex. Image recognition (input: image data, output: recognition result (human, cat, car etc.)) Voice recognition (input: human voice data, output: text which human speaks)) ・Unsupervised learning Only a lot of input data is given. Goal: categorize data based on the data statistics (find out existing deviation in the data) Ex. Categorize the type of cancer Link users who have similar interests in the web application for recommendation ・Reinforcement learning The problem setting: agent chooses an “action” inside given “environment”. Choosing an action makes interference with environment and agent gets some “reward”. Goal: Find out an action which maximizes the reward agent can gain. Ex. Deepmind DQN, Alpha GO Robot self learning: how to control own parts SeRanet uses this machine learning
  • 6. Deep learning “Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high- level abstractions in data by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations.” cite from Wikipedia “Deep Learning” Input Output
  • 7. Table of contents SRCNN Problem Introduction of previous works “Image Super-Resolution Using Deep Convolutional Networks” waifu2x
  • 8. Super resolution task by machine learning Problem definition ・You are given a compressed picture with half size. Recover original picture and output it. Training phase: Map The goal of this machine learning is to construct a map to convert compressed picture into original picture (as close as possible). Original pictureCompressed picture (half size)
  • 9. Super resolution task by machine learning After training ・Input: arbitrary picture → Output: twice size picture with super resolution Twice size picture High quality Picture to be enlarged map obtained by machine learning
  • 10. Representation of the “map” Deep Convolutional Neural Network (CNN) is used. - Current trend for image recognition task
  • 11. Previous work ① “Image Super-Resolution Using Deep Convolutional Networks” Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang https://arxiv.org/abs/1501.00092 ・The original paper which suggest “SRCNN”. It reports that superior result is obtained for super resolution using Convolutional Neural Network. In this slide, this work paper be denoted as “SRCNN paper” in the following http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
  • 12. Algorithm summary 1.Read picture/image file 2.Enlarge the picture twice size in advance as preprocessing (can be generalized to n-times size) 3.Convert RGB format into YCbCr format, and extract Y Channel 4.Normalization: Convert value range from 0-255 to 0-1 5.Input Y Channel data into CNN As output, we obtain Y channel data with normalized value 6.Revert value range to 0-255 7.CbCr Channel is enlarged by conventional method like Bicubic method etc. Compose obtained Y channel and CbCr Channel to get final result. ※ 3., 7. can be skipped when you construct CNN with input/output RGB Channel
  • 13. Remark of algorithm ① ・Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN. Y channel Cr channel Cb channelYCbCr decomposition
  • 14. Remark of algorithm ① ・Deal with only Y Channel as input/output data of CNN - Human is more sensitive to luminance (Y) than color difference chrominance (CrCb). - Only deal Y channel to reduce the problem complexity of learning CNN. Y channel Cr channel Cb channelYCbCr decomposition ・RGB Channel training is difficult? SRCNN paper surveys the case when CNN is trained for input/output RGB 3 Channel data. The result suggests that Y channel CNN has superior performance than RGB CNN when the CNN size is not so big.
  • 15. Remark of algorithm ② ・Enlarge the picture/image data in advance before input to CNN SRCNN paper uses Bicubic method, waifu2x (explained later) uses Nearest neighbor method to enlarge picture before input the data to CNN [Reason] Input/output picture size is almost same when you implement convolutional neural network by machine learning library. (※Regolously, output picture size will be smaller by filter size - 1) input output CNN - Enlarge picture - Y channel extraction Enlarge picture, CNN is not used for Cr・Cb Channel compose
  • 16. Previous work ① CNN model CNN Layer1 Layer2 Layer3 In channel 1 32 64 Out channel 32 64 1 Kernel size 9 5 5 # of parameter 2628 51264 1664 # of convolution 2592×4WH 51200×4WH 1600×4WH Relatively shallow CNN architecture with big kernel size SRCNN paper’s CNN model, one example (many other parameters are tested in the paper) ※# of parameter = In channel * Out channel * (kernel size)^2 + Out channel ※# of convolution per pixel = In channel * Out channel * (kernel size) ^2 + Out channel Soppse the picture size before enlarge as w × h pixel, then the picture size after enlarge will be 2w×2h. We need convolution for 4wh pixels in total. total # of parameter:55556 total # of convolution:55392×4WH
  • 17. Previous work ② waifu2x waifu2x https://github.com/nagadomi/waifu2x The term “waifu” comes from Japanese pronunciation of “wife” (Japanese uses the term “wife” to their favorite female anime character) https://github.com/nagadomi/waifu2x Open source software, originally published to enlarge art-style image It also supports picture style now. You can test the application on server. http://waifu2x.udp.jp/
  • 18. Previous work ② waifu2x waifu2x is open source software, which makes other software engineers to develop the related software. Many of the derivative software is published now. [Related links (in Japanese)] ・waifu2xとその派生ソフト一覧 http://kourindrug.sakura.ne.jp/waifu2x.html ・サルでも分かる waifu2x のアルゴリズム https://drive.google.com/file/d/0B22mWPiNr-6-RVVpaGhZa1hJTnM/view
  • 19. 先行ソフト② waifu2x CNN model 畳み込みのKernel sizeを3と小さくとる分、深いニューラルネットを構成している。 CNN Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7 In channel 1 32 32 64 64 128 128 Out channel 32 32 64 64 128 128 1 Kernel size 3 3 3 3 3 3 3 # of parameter 320 9248 18496 36928 73856 147584 1153 # of convolution 288×4WH 9216×4WH 18432×4WH 36864×4WH 73728×4WH 147456×4WH 1152×4WH Deep CNN architecture with small kernel size total # of parameter :287585 total # of convolution :287136×4WH
  • 20. What is special for SRCNN task The big difference point with image reconition task 1.Position sensitivity is required + Image recognition task: - Translation invariant property is welcomed and Max pooling or Stride technique is often utilized. + SRCNN task: - Translation variant property is necessary for super resolution to since it requires position-dependent output. 2.Feature map image size don’t reduce during the CNN image processing. →As the number of feature map increases, amount of calculation increases Speed/memory restriction is severe Required memory for CNN ≒ The volume of rectangular in CNN model figure For image recognition task, usually feature map image size will be smaller in the deeper layer of CNN. The number of feature map can be bigger if the image size is smaller.
  • 21. Table of contents Explanation of SeRanet project starts from here. Introduction of the idea behind SeRanet. SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
  • 22. SeRanet Idea 1 Splice Input with the pre-scaled size w × h picture image, and getting 2w × 2h size picture as output → Introduce “Split” and “Splice” concept input size: w × h Split Splice: 4 branches of neural network will be merged to obtain one 2w × 2h size image Split: 4 branches of neural network (NN) with size w × h will be created Splice output size: 2w × 2h LU RU LD RD
  • 23. SeRanet Idea 1 Splice After split, 4 branches of neural network corresponds to Left-Up (LU), Right-Up (RU), Left-Down (LD), Right-Down(RD) pixel of enlarged picture. Split Splice Input image Output image (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 2) (2, 1) (2, 2) (1, 1) (1, 1)(1, 1) (1, 2) (1, 2)(1, 2) (2, 1) (2, 1) (2, 1) (2, 2) (2, 2) (2, 2) At splice phase, 4 branches of CNN will be combined/spliced to get twice size image LU RU LD RD
  • 24. The effect of introducing Splice → Flexibility of neural network modelling Input w × h Split Splice Output 2w × 2h 3rd Phase2nd Phase1st Phase 1st Phase:Image size is wxh before enlarged, the amount of calculation is 1/4 compared to 3rd phase. →The larger size of feature map, Kernel size is accepted in this phase. 2nd Phase:4 branches CNN with image size wxh. Total calculation amount is same with 3rd Phase, but the parameter learned at each branch (LU, RU, LD, RD) can be different. →model representation potential will grow. Another advantage is memory consumption is smaller compared to 3rd phase due to the size of image. 3rd Phase:Image size 2wx2h. The last phase to get output. Memory consumption and the amount of calculation will be 4 times larger than 1st Phase.
  • 25. 目次 SeRanet Idea 1 Sprice Idea 2 Fusion SeRanet CNN model
  • 26. Fusion… The method has introduced in Colorization paper ・“Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification” Satoshi Iizuka, Edgar Simo-Serra and Hiroshi Ishikawa http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/ The research aims to convert monotone input image to colorful output image through CNN supervised learning. https://github.com/satoshiiizuka/siggraph2016_colorization Input Output
  • 27. Neural network used in Colorization paper Upper CNN: Main CNN used for colorization Lower CNN: This CNN is trained for image classification So, different purpose CNN is utilized to help improve the performance of main CNN. Lower CNN is expected to be learning global feature, e.g. “The image is taken in outsize”. The research reports colorization accuracy has increased by “fusioning” the global feature into main CNN. Example of how global feature helps colorization (Read paper for detail) - It reduces to mistakenly use sky-color at the top of image when the picture is taken in inside. - It reduces to mistakenly use brown ground color when picture is taken on the sea. http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/
  • 28. SeRanet Idea 2 Fusion SeRanet combines/fusions 2 types of CNN at 1st Phase. Purpose: Combining different type non-linear activation to get wide variety of model representation ※I want to use Convolutional Restricted Boltzmann Machine with Pre-training for lower CNN in the future development. Upper CNN uses Leaky ReLU for Activation Lower CNN uses Sigmoid for Activation
  • 29. SeRanet CNN model CNN model of seranet_v1 Split Splice CNN Layer1 Layer2 Layer3 In channel 3 64 64 Out channel 64 64 128 Kernel size 5 5 5 # of parameter 4864 102464 204928 # of convolution 4800×WH 102400×WH 204800×WH Layer4 Layer5 Layer6 256 512 256 512 256 128 1 1 3 131584 131584 295040 131072×WH 131072×WH 294912×WH Layer7 Layer8 Layer9 Layer10 128 128 128 128 128 128 128 3 3 3 3 3 147584 147584 147584 3584 147456×4WH 147456×4WH 147456×4WH 3456×4WH Fusion total # of parameter :3303680 total # of convolution :1159150×4WH ×2 ×4 3rd Phase2nd Phase1st Phase
  • 30. Comparison Parameter: 10 times more than waifu2x Convolution: 4 times The number of parameter increases more compared to the number of convolution(calculation) increase. This is because SeRanet have position-dependent paramter (LU, RU, LD, RD). → Question: The increase of parameter and calculation results in better performance??? Model SRCNN paper waifu2x SeRanet_v1 Total parameter 55556 287585 3303680 Total convolution 55392×4WH 287136×4WH 1159150×4WH
  • 31. Table of contents Result Performance Comparison between various resize methods ・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet Conventional resize method Resize through CNN - Forward/backward slide for comparison - Slideshare may be difficult to find the difference, See the link for comparison https://github.com/corochann/SeRanet
  • 33. Result Bicubic (OpenCV resize method is used)
  • 34. Result Lanczos (OpenCV resize method is used)
  • 35. Result waifu2x (http://waifu2x.udp.jp/, Style:photo, Noise reduction: None, Upscaling: 2x)
  • 37. Result Original data (ground truth data, for reference)
  • 38. The difference can be found at detail point of 1st picture, thin stalk at 4th picture (high frequency channel) Result Original data (ground truth data, for reference)
  • 39. Performance Comparison between various resize methods *Based on personal feeling ・ Bicubic ・ Lanczos ・ waifu2x ・ SeRanet ・ Original image (comparison by specific mesurement is not done yet) Almost same Almost same Different Different Conventional resize method Resize through CNN Result
  • 40. Summary SeRanet ・ Big size CNN is used(Depth 9 layer, total parameter 3303680 ) ・RGB 3 Channel is used for input/output of CNN instead of only Y Channel ・Split, Splicing CNN Left-Up, Right-Up, Left-Down, Right-Down branch will use different paramter ・Fusion Different non-linearity is combined for flexibility of model representation ・Convolutional RBM Pretraining The performance still not matured yet, we may improve more to get the output more close to original image.
  • 41. At last,,, + The project is open source project, on github https://github.com/corochann/SeRanet + Improvement idea, discussion welcome + My Blog: http://corochann.com/ * If there is in-appropriate citing, please let me know. * SeRanet is personal project, I may be misunderstanding. Please let me know if there’s wrong information.