SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Introduction to computer vision
with Convoluted Neural Networks
Marcin Jedyk
Disclaimer
• This is not in-depth session on neural
networks and ML – see link at the end
• Due to amount of concepts it’s rather fast
paced presentation and doesn’t cover all
topics on mentioned subjects
• I hope this will give you some insight and you
can pick topics of your interests and learn in
free time
What is it all about?
• Problems Computer Vision tries to address
• Neural Networks (in 60 seconds)
• Basic building blocks of CNNs
• History and evolution of CNNs
• How and why CNNs work on digital images?
• Limitations of CNNs
• Object detection
Computer vision
• Make computer understand and make sense
of digital images (photos, videos and data
from sensors)
• Object classification
• Object detection
• Video tracking
• Object segmentation
Computer vision - applications
• Medical images classification
• Land surveying
• Surveillance systems – threat detection
• Autonomous vehicles
• Military systems – i.e. guided missiles
Computer vision - applications
Neural Networks in 60 seconds
• Collection of connected units
• Compositions of functions
• Building blocks:
– Input layer
– Hidden layers*
– Output layer
– Weights
– Activation functions**
– Biases
* 0 or more (seriously)
Neural Networks in 60 (more) seconds
How do they learn?
• Get some data you are going to use for training:
– X (input), Y (ground truth for given X)
– Split into training/validation, or
training/validation/test
• Feed “train X” into network
• Compare result with ground truth and adjust
weights through back-propagation until network
is optimized
• Occasionally test on validation data
• After training is completed do final check on test
data
Training: train vs. test error
Digit recognition with NN
• Digit recognition – common test for ML
algorithms (NNs, SVM, etc.)
• MNIST – database of handwritten digits
– Gray scale, 28x28 pixels
Preparing network layout
Preparing network layout
• MNIST – reshape 28x28 image into 1x784 (input
size)
• 10 digits to classify == 10 classes of NN
• I.e. reshaping 3x3 into 1x9
Sample NN for MNIST
https://www.tensorflow.org/get_started/mnist/beginners
0.01
0.02
0.02
0.06
0.07
0.1
0.08
0.12
0.13
0.39
Problems with FC NN for images?
• Number of connections (weights) grows really
fast – becomes memory and computationally
expensive
• I.e. if we add 1 hidden layer with 392 units to
MNIST NN we will increase number of weights
from 7840 to 315168
• What if we operate on larger images with more
classes? i.e. 50x50 with 20 classes and larger
hidden layer? 2.5k input, 1.25k hidden layer and
20 classes output would give 3.15M weights
Problems with FC NN for images?
“Images are highly spatially correlated thus looking at a pixel
at time is wasteful.”
What CNNs do differently?
• Networks designed specifically for images*
• They look at regions of images
• Input images have 3D shape [WxHxD]
• They use convolutions to extract spatial
features (i.e. edges, blobs of colors, shapes)
CNN the genesis
• First LeNet developed in 1988
• LeNet5 – pioneering CNN used back in 1994 for
digit classification – MNIST error rate of 0.95%*
• Neural network designed specifically for digit
classification
• State of the art at that time, widely used by
banks, etc.
• What’s so special? The way it looks at input
LeNet5
• How feature maps are produced?
– By sliding filters over input and convolving them.
– I.e. filter ‘5’ over 28x28 digit with padding of 2,
stride 1
Convolution? WTF
• Multiply and add like-wise elements of input and flipped kernel
CNN – building blocks
• CNN hyper parameters:
– K: number of filters
• Input is passed through filters and produce ‘feature maps’ – more
filters will learn different properties of images
• Also know as kernels or weights
– F: spatial extent of a filter
• Portion of input the filter is looking at, i.e. 2x2, 3x3, etc. patch of an
input
– S: stride:
• How far are we sliding a filter in each step over input? Smaller strides
will capture mode details of an image
– P: zero padding
• Add ‘0’ at the edge of image – allows to preserve spatial size after
CONV; also allows to better capture features on the edge of images
• ReLU – activation function following Conv layer, breaks
linearity of linear functions (there are other than ReLU)
How do trained kernels look like?
CNNs: other ops
• Max pooling: dimension reduction
Applying 2x2 max-pool filter to 4x4 matrix:
Advancements in CNNs?
• First LeNet developed in 1988; LeNet5 1994
(paper published in 1998)
• Since then, not much has advanced until 2010
when Dan Claudiu Ciresan and Jurgen
Schmidhuber published one of first
implementations of NN on GPU (GTX 280)
• NNs are generally quite expensive to
computer and running it on GPUs enable to
train more complex models.
How to compare software solutions in
image recognition space?
• ILSVRC - ImageNet Large Scale Visual
Recognition Competition
• Teams come up with solution to classify
objects in digital images
• Around 1.2m images to train on; 1000 classes
• Scores based on error rate of top-5 and top-1
classifications
Top-5 prediction example
Advancements in CNNs?
• ILSVRC 2012 – AlexNet a CNN wins competition
with top-5 error @15.3% (runner-up with score
26.2%)
– It has 650k neurons and 60m parameters
– Trained on 2x GTX 580 for 5 to 6 days
Advancements in CNNs?
• (2013) OverFeat wins competition with top-5
error @13.6% on ImageNet
– Uses much smaller kernels 3x3
– Is deeper that previous network
Advancements in CNNs?
• (2014) VGG top-5 error @7.1% (2nd place)
– Learns bounding boxes – i.e. object location in image
– Much deeper than previous networks. 140m parameters
We need to go deeper
Advancements in CNNs?
• (2014) GoogLeNet top-5 error @6.67%
– 22 layers! But only 60m parameters
– Introduces concept of ‘inception’. Applies filters of
different sizes to capture invariances at different scales
Advancements in CNNs?
Advancements in CNNs?
• (2015) ResNet top-5 error @3.57%
– How many layers? 152!
– Concept of shortcut connections – prevents
information from being forgotten
Training deep CNNs
• Large nets can take weeks of training on multiple
high end GPUs to learn on ImageNet sets
• The more data the better. How to expand
learning set?
– Randomly flip left-right, bottom-top
– Randomly crop
– Introduce noise
– Modify colors
– Rotate
– All of above at the same time
Accuracy vs. performance
• Ok, so models are getting bigger (more ops,
more weights) and more accurate. How about
getting faster?
• MobileNet: recent network by Google
researchers with reduced number of
connections which outperforms simpler
networks (ref at the end)
Training deep CNNs
• That’s lots of hassle to train a net, the must be
another way? Right?
• Use pre-trained networks; fine-tune last few FC
layers.
• With pre-trained nets you are reusing snapshot of
kernels (weights) from conv layers.
• Fine-tuning works because conv layers (closer to
input) learned reusable patterns (edges, colors,
textures, etc.) which are applicable across
multiple computer vision categories
How about object detection?
There are multiple approaches
• R-CNN (R for Region based), Fast R-CNN, Faster R-
CNN
– Basically, has two outputs – “regression head” and
“classification head”
• YOLO (you look only once), YOLO v2 (YOLO9000)
– Apply single CNN to full image, divide it into small
regions and predict probabilities of object classes for
each box. Then pass through accuracy threshold
https://www.youtube.com/watch?v=VOC3huqHrss
• OverFeat: https://arxiv.org/pdf/1312.6229.pdf
• VGG in large scale img setting
https://arxiv.org/pdf/1409.1556.pdf
• CNN benchmarks: https://github.com/jcjohnson/cnn-
benchmarks
• MobileNets: https://arxiv.org/pdf/1704.04861.pdf
• Convolution explained:
http://www.songho.ca/dsp/convolution/convolution2d_exa
mple.html
• Pre-trained nets (scores are different to what has been
achieved in competition due to variances in training
process):
https://github.com/tensorflow/models/tree/master/slim
• Good ML course:
https://www.coursera.org/learn/machine-
learning
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
PyData
 

Was ist angesagt? (20)

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
ECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT Devices
ECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT DevicesECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT Devices
ECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT Devices
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
 
Depth estimation using deep learning
Depth estimation using deep learningDepth estimation using deep learning
Depth estimation using deep learning
 
Neural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlcNeural network based image compression with lifting scheme and rlc
Neural network based image compression with lifting scheme and rlc
 
Deep learning
Deep learningDeep learning
Deep learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 

Ähnlich wie Introduction to computer vision

Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 

Ähnlich wie Introduction to computer vision (20)

Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Deep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdfDeep-learning-for-computer-vision-applications-using-matlab.pdf
Deep-learning-for-computer-vision-applications-using-matlab.pdf
 
Sp19_P2.pptx
Sp19_P2.pptxSp19_P2.pptx
Sp19_P2.pptx
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwriting
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
Developing and Deploying Deep Learning Based Computer Vision Systems - Alka N...
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Handwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPTHandwritten Digit Recognition(Convolutional Neural Network) PPT
Handwritten Digit Recognition(Convolutional Neural Network) PPT
 

Kürzlich hochgeladen

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Introduction to computer vision

  • 1. Introduction to computer vision with Convoluted Neural Networks Marcin Jedyk
  • 2. Disclaimer • This is not in-depth session on neural networks and ML – see link at the end • Due to amount of concepts it’s rather fast paced presentation and doesn’t cover all topics on mentioned subjects • I hope this will give you some insight and you can pick topics of your interests and learn in free time
  • 3. What is it all about? • Problems Computer Vision tries to address • Neural Networks (in 60 seconds) • Basic building blocks of CNNs • History and evolution of CNNs • How and why CNNs work on digital images? • Limitations of CNNs • Object detection
  • 4. Computer vision • Make computer understand and make sense of digital images (photos, videos and data from sensors) • Object classification • Object detection • Video tracking • Object segmentation
  • 5. Computer vision - applications • Medical images classification • Land surveying • Surveillance systems – threat detection • Autonomous vehicles • Military systems – i.e. guided missiles
  • 6. Computer vision - applications
  • 7. Neural Networks in 60 seconds • Collection of connected units • Compositions of functions • Building blocks: – Input layer – Hidden layers* – Output layer – Weights – Activation functions** – Biases * 0 or more (seriously)
  • 8. Neural Networks in 60 (more) seconds
  • 9. How do they learn? • Get some data you are going to use for training: – X (input), Y (ground truth for given X) – Split into training/validation, or training/validation/test • Feed “train X” into network • Compare result with ground truth and adjust weights through back-propagation until network is optimized • Occasionally test on validation data • After training is completed do final check on test data
  • 10. Training: train vs. test error
  • 11. Digit recognition with NN • Digit recognition – common test for ML algorithms (NNs, SVM, etc.) • MNIST – database of handwritten digits – Gray scale, 28x28 pixels
  • 13. Preparing network layout • MNIST – reshape 28x28 image into 1x784 (input size) • 10 digits to classify == 10 classes of NN • I.e. reshaping 3x3 into 1x9
  • 14. Sample NN for MNIST https://www.tensorflow.org/get_started/mnist/beginners 0.01 0.02 0.02 0.06 0.07 0.1 0.08 0.12 0.13 0.39
  • 15. Problems with FC NN for images? • Number of connections (weights) grows really fast – becomes memory and computationally expensive • I.e. if we add 1 hidden layer with 392 units to MNIST NN we will increase number of weights from 7840 to 315168 • What if we operate on larger images with more classes? i.e. 50x50 with 20 classes and larger hidden layer? 2.5k input, 1.25k hidden layer and 20 classes output would give 3.15M weights
  • 16. Problems with FC NN for images? “Images are highly spatially correlated thus looking at a pixel at time is wasteful.”
  • 17. What CNNs do differently? • Networks designed specifically for images* • They look at regions of images • Input images have 3D shape [WxHxD] • They use convolutions to extract spatial features (i.e. edges, blobs of colors, shapes)
  • 18. CNN the genesis • First LeNet developed in 1988 • LeNet5 – pioneering CNN used back in 1994 for digit classification – MNIST error rate of 0.95%* • Neural network designed specifically for digit classification • State of the art at that time, widely used by banks, etc. • What’s so special? The way it looks at input
  • 19. LeNet5 • How feature maps are produced? – By sliding filters over input and convolving them. – I.e. filter ‘5’ over 28x28 digit with padding of 2, stride 1
  • 20. Convolution? WTF • Multiply and add like-wise elements of input and flipped kernel
  • 21. CNN – building blocks • CNN hyper parameters: – K: number of filters • Input is passed through filters and produce ‘feature maps’ – more filters will learn different properties of images • Also know as kernels or weights – F: spatial extent of a filter • Portion of input the filter is looking at, i.e. 2x2, 3x3, etc. patch of an input – S: stride: • How far are we sliding a filter in each step over input? Smaller strides will capture mode details of an image – P: zero padding • Add ‘0’ at the edge of image – allows to preserve spatial size after CONV; also allows to better capture features on the edge of images • ReLU – activation function following Conv layer, breaks linearity of linear functions (there are other than ReLU)
  • 22. How do trained kernels look like?
  • 23. CNNs: other ops • Max pooling: dimension reduction Applying 2x2 max-pool filter to 4x4 matrix:
  • 24. Advancements in CNNs? • First LeNet developed in 1988; LeNet5 1994 (paper published in 1998) • Since then, not much has advanced until 2010 when Dan Claudiu Ciresan and Jurgen Schmidhuber published one of first implementations of NN on GPU (GTX 280) • NNs are generally quite expensive to computer and running it on GPUs enable to train more complex models.
  • 25. How to compare software solutions in image recognition space? • ILSVRC - ImageNet Large Scale Visual Recognition Competition • Teams come up with solution to classify objects in digital images • Around 1.2m images to train on; 1000 classes • Scores based on error rate of top-5 and top-1 classifications
  • 27. Advancements in CNNs? • ILSVRC 2012 – AlexNet a CNN wins competition with top-5 error @15.3% (runner-up with score 26.2%) – It has 650k neurons and 60m parameters – Trained on 2x GTX 580 for 5 to 6 days
  • 28. Advancements in CNNs? • (2013) OverFeat wins competition with top-5 error @13.6% on ImageNet – Uses much smaller kernels 3x3 – Is deeper that previous network
  • 29. Advancements in CNNs? • (2014) VGG top-5 error @7.1% (2nd place) – Learns bounding boxes – i.e. object location in image – Much deeper than previous networks. 140m parameters
  • 30. We need to go deeper
  • 31. Advancements in CNNs? • (2014) GoogLeNet top-5 error @6.67% – 22 layers! But only 60m parameters – Introduces concept of ‘inception’. Applies filters of different sizes to capture invariances at different scales
  • 33. Advancements in CNNs? • (2015) ResNet top-5 error @3.57% – How many layers? 152! – Concept of shortcut connections – prevents information from being forgotten
  • 34. Training deep CNNs • Large nets can take weeks of training on multiple high end GPUs to learn on ImageNet sets • The more data the better. How to expand learning set? – Randomly flip left-right, bottom-top – Randomly crop – Introduce noise – Modify colors – Rotate – All of above at the same time
  • 35. Accuracy vs. performance • Ok, so models are getting bigger (more ops, more weights) and more accurate. How about getting faster? • MobileNet: recent network by Google researchers with reduced number of connections which outperforms simpler networks (ref at the end)
  • 36. Training deep CNNs • That’s lots of hassle to train a net, the must be another way? Right? • Use pre-trained networks; fine-tune last few FC layers. • With pre-trained nets you are reusing snapshot of kernels (weights) from conv layers. • Fine-tuning works because conv layers (closer to input) learned reusable patterns (edges, colors, textures, etc.) which are applicable across multiple computer vision categories
  • 37. How about object detection?
  • 38. There are multiple approaches • R-CNN (R for Region based), Fast R-CNN, Faster R- CNN – Basically, has two outputs – “regression head” and “classification head” • YOLO (you look only once), YOLO v2 (YOLO9000) – Apply single CNN to full image, divide it into small regions and predict probabilities of object classes for each box. Then pass through accuracy threshold https://www.youtube.com/watch?v=VOC3huqHrss
  • 39. • OverFeat: https://arxiv.org/pdf/1312.6229.pdf • VGG in large scale img setting https://arxiv.org/pdf/1409.1556.pdf • CNN benchmarks: https://github.com/jcjohnson/cnn- benchmarks • MobileNets: https://arxiv.org/pdf/1704.04861.pdf • Convolution explained: http://www.songho.ca/dsp/convolution/convolution2d_exa mple.html • Pre-trained nets (scores are different to what has been achieved in competition due to variances in training process): https://github.com/tensorflow/models/tree/master/slim
  • 40. • Good ML course: https://www.coursera.org/learn/machine- learning

Hinweis der Redaktion

  1. ----- Meeting Notes (28/06/17 21:53) ----- how challanges of computer vision can be addressed with convoluted neural networks
  2. Vision – being able to see. Video tracking – how long are you waiting in a queue? Which isles are you visiting in a shop? Apple vs orange; where is walley? Video-tracking yolo. Threat detection toy gun vs real.
  3. Vision – being able to see. Video tracking – how long are you waiting in a queue? Which isles are you visiting in a shop? Apple vs orange; where is walley? Video-tracking yolo. Threat detection toy gun vs real.
  4. Before we move onto ML and NN, let’s touch on computer vision. What sort of problems is it trying to address.
  5. Activation functions – break linearity of NNs allowing to learn more complex functions tha
  6. This network tries to learn different aspects of input images. How? Apply random filters to extract interesting features – kernels (weights) are convolving over input and produce feature maps