SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Handwritten Recognition using Deep Learning
with R
Poo Kuan Hoong
August 17, 2016
1
Google DeepMind Alphago
2
Introduction
In the past 10 years, machine learning and Artificial Intelligence (AI) have shown
tremendous progress
The recent success can be attributed to:
Explosion of data
Cheap computing cost - CPUs and GPUs
Improvement of machine learning models
Much of the current excitement concerns a subfield of it called “deep learning”.
3
Human Brain
4
Neural Networks
Deep Learning is primarily about neural networks, where a network is an
interconnected web of nodes and edges.
Neural nets were designed to perform complex tasks, such as the task of placing
objects into categories based on a few attributes.
Neural nets are highly structured networks, and have three kinds of layers - an input,
an output, and so called hidden layers, which refer to any layers between the input and
the output layers.
Each node (also called a neuron) in the hidden and output layers has a classifier.
5
Neural Network Layers
6
Neural Network: Forward Propagation
The input neurons first receive the data features of the object. After processing the
data, they send their output to the first hidden layer.
The hidden layer processes this output and sends the results to the next hidden layer.
This continues until the data reaches the final output layer, where the output value
determines the object’s classification.
This entire process is known as Forward Propagation, or Forward prop.
7
Neural Network: Backward Propagation
To train a neural network over a large set of labelled data, you must continuously
compute the difference between the network’s predicted output and the actual output.
This difference is called the cost, and the process for training a net is known as
backpropagation, or backprop
During backprop, weights and biases are tweaked slightly until the lowest possible cost is
achieved.
An important aspect of this process is the gradient, which is a measure of how much
the cost changes with respect to a change in a weight or bias value.
8
The 1990s view of what was wrong with
back-propagation
It required a lot of labelled training data
Almost all data is unlabeled
The learning time did not scale well
It was very slow in networks with multiple hidden layers.
It got stuck at local optima
These were often surprisingly good but there was no good theory
9
Deep Learning
Deep learning refers to artificial neural networks that are composed of many layers.
It’s a growing trend in Machine Learning due to some favorable results in applications
where the target function is very complex and the datasets are large.
10
Deep Learning: Benefits
Robust
No need to design the features ahead of time - features are automatically learned to be optimal for
the task at hand
Robustness to natural variations in the data is automatically learned
Generalizable
The same neural net approach can be used for many different applications and data types
Scalable
Performance improves with more data, method is massively parallelizable
11
Deep Learning: Weaknesses
Deep Learning requires a large dataset, hence long training period.
In term of cost, Machine Learning methods like SVMs and other tree ensembles are
very easily deployed even by relative machine learning novices and can usually get you
reasonably good results.
Deep learning methods tend to learn everything. It’s better to encode prior
knowledge about structure of images (or audio or text).
The learned features are often difficult to understand. Many vision features are also
not really human-understandable (e.g, concatenations/combinations of different
features).
Requires a good understanding of how to model multiple modalities with
traditional tools.
12
Deep Learning: Applications
13
H2O Library
H2O is an open source, distributed, Java machine learning library
Ease of Use via Web Interface
R, Python, Scala, Spark & Hadoop Interfaces
Distributed Algorithms Scale to Big Data
Package can be downloaded from http://www.h2o.ai/download/h2o/r
14
H2O R Package on CRAN
15
H2O booklets
H2O reference booklets can be downwloaded from https://github.com/h2oai/h2o-3
/tree/master/h2o-docs/src/booklets/v2_2015/PDFs/online
16
MNIST Handwritten Dataset
The MNIST database consists of handwritten digits.
The training set has 60,000 examples, and the test set has 10,000 examples.
The MNIST database is a subset of a larger set available from NIST. The digits have
been size-normalized and centered in a fixed-size image
For this demo, the Kaggle pre-processed training and testing dataset were used. The
training dataset, (train.csv), has 42000 rows and 785 columns.
17
Demo
The sourcecode can be accessed from here
https://github.com/kuanhoong/myRUG_DeepLearning
18
Create training and testing datasets
19
Start H2O Cluster from R and load data into
H2O
20
Deep Learning in R: Train & Test
21
Result
22
Lastly…
23

Weitere ähnliche Inhalte

Was ist angesagt?

Amoeba Operating System
Amoeba Operating SystemAmoeba Operating System
Amoeba Operating System
Burhan Abbasi
 

Was ist angesagt? (20)

cloud computing basics
cloud computing basicscloud computing basics
cloud computing basics
 
A LABORATORY MANUAL FOR DATABASE MANAGEMENT SYSTEM (DMS) 22319
A LABORATORY MANUAL FOR DATABASE MANAGEMENT SYSTEM (DMS) 22319A LABORATORY MANUAL FOR DATABASE MANAGEMENT SYSTEM (DMS) 22319
A LABORATORY MANUAL FOR DATABASE MANAGEMENT SYSTEM (DMS) 22319
 
Green cloud computing
Green cloud computingGreen cloud computing
Green cloud computing
 
Grid protocol architecture
Grid protocol architectureGrid protocol architecture
Grid protocol architecture
 
IaaS and PaaS
IaaS and PaaSIaaS and PaaS
IaaS and PaaS
 
Virtualization Uses - Server Consolidation
Virtualization Uses - Server Consolidation Virtualization Uses - Server Consolidation
Virtualization Uses - Server Consolidation
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Introduction to Virtualization
Introduction to VirtualizationIntroduction to Virtualization
Introduction to Virtualization
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
IBM Cloud Introduction
IBM Cloud IntroductionIBM Cloud Introduction
IBM Cloud Introduction
 
Cloud Reference Model
Cloud Reference ModelCloud Reference Model
Cloud Reference Model
 
Cs6703 grid and cloud computing unit 1
Cs6703 grid and cloud computing unit 1Cs6703 grid and cloud computing unit 1
Cs6703 grid and cloud computing unit 1
 
Amoeba Operating System
Amoeba Operating SystemAmoeba Operating System
Amoeba Operating System
 
Cloud computing risks
Cloud computing risksCloud computing risks
Cloud computing risks
 
Huawei's AI Strategy and Full-Stack Portfolio Launch
Huawei's AI Strategy and Full-Stack Portfolio LaunchHuawei's AI Strategy and Full-Stack Portfolio Launch
Huawei's AI Strategy and Full-Stack Portfolio Launch
 
Green cloud computing
Green  cloud computingGreen  cloud computing
Green cloud computing
 
Deep learning seminar report
Deep learning seminar reportDeep learning seminar report
Deep learning seminar report
 
Fundamental of cloud computing
Fundamental of cloud computingFundamental of cloud computing
Fundamental of cloud computing
 
Unit 3
Unit   3Unit   3
Unit 3
 

Andere mochten auch

Handwritten Digit recognition with R. Classification Problem
Handwritten Digit recognition with R. Classification ProblemHandwritten Digit recognition with R. Classification Problem
Handwritten Digit recognition with R. Classification Problem
Guillermo Santos
 

Andere mochten auch (18)

An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Handwritten Digit recognition with R. Classification Problem
Handwritten Digit recognition with R. Classification ProblemHandwritten Digit recognition with R. Classification Problem
Handwritten Digit recognition with R. Classification Problem
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
HSA Overview
HSA Overview HSA Overview
HSA Overview
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Deep Learning Survey
Deep Learning SurveyDeep Learning Survey
Deep Learning Survey
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
HSA Foundation Overview
HSA Foundation OverviewHSA Foundation Overview
HSA Foundation Overview
 
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPUKeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
KeynoteTHE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 

Ähnlich wie Handwritten Recognition using Deep Learning with R

Seed block algorithm
Seed block algorithmSeed block algorithm
Seed block algorithm
Dipak Badhe
 
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
APJ ABDUL KALAM TECHNICAL UNIVERSITY
 

Ähnlich wie Handwritten Recognition using Deep Learning with R (20)

Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Week3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptxWeek3-Deep Neural Network (DNN).pptx
Week3-Deep Neural Network (DNN).pptx
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Deep Neural Networks (DNN)
Deep Neural Networks (DNN)Deep Neural Networks (DNN)
Deep Neural Networks (DNN)
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Deep Learning Demystified
Deep Learning DemystifiedDeep Learning Demystified
Deep Learning Demystified
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Seed block algorithm
Seed block algorithmSeed block algorithm
Seed block algorithm
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep Learning
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Deep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with TheanoDeep Learning libraries and first experiments with Theano
Deep Learning libraries and first experiments with Theano
 
Large Scale Distributed Deep Networks
Large Scale Distributed Deep NetworksLarge Scale Distributed Deep Networks
Large Scale Distributed Deep Networks
 
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
Industrial training (Artificial Intelligence, Machine Learning & Deep Learnin...
 

Mehr von Poo Kuan Hoong

Mehr von Poo Kuan Hoong (20)

Build an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBMBuild an efficient Machine Learning model with LightGBM
Build an efficient Machine Learning model with LightGBM
 
Tensor flow 2.0 what's new
Tensor flow 2.0  what's newTensor flow 2.0  what's new
Tensor flow 2.0 what's new
 
The future outlook and the path to be Data Scientist
The future outlook and the path to be Data ScientistThe future outlook and the path to be Data Scientist
The future outlook and the path to be Data Scientist
 
Data Driven Organization and Data Commercialization
Data Driven Organization and Data CommercializationData Driven Organization and Data Commercialization
Data Driven Organization and Data Commercialization
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Explore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer LearningExplore and Have Fun with TensorFlow: Transfer Learning
Explore and Have Fun with TensorFlow: Transfer Learning
 
Deep Learning with R
Deep Learning with RDeep Learning with R
Deep Learning with R
 
Explore and have fun with TensorFlow: An introductory to TensorFlow
Explore and have fun with TensorFlow: An introductory	to TensorFlowExplore and have fun with TensorFlow: An introductory	to TensorFlow
Explore and have fun with TensorFlow: An introductory to TensorFlow
 
The path to be a Data Scientist
The path to be a Data ScientistThe path to be a Data Scientist
The path to be a Data Scientist
 
Microsoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community BootcampMicrosoft APAC Machine Learning & Data Science Community Bootcamp
Microsoft APAC Machine Learning & Data Science Community Bootcamp
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
 
Context Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social MediaContext Aware Road Traffic Speech Information System from Social Media
Context Aware Road Traffic Speech Information System from Social Media
 
Virtual Interaction Using Myo And Google Cardboard (slides)
Virtual Interaction Using Myo And Google Cardboard (slides)Virtual Interaction Using Myo And Google Cardboard (slides)
Virtual Interaction Using Myo And Google Cardboard (slides)
 
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users AnalysisA Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis
 
Towards Auto-Extracting Car Park Structures: Image Processing Approach on Low...
Towards Auto-Extracting Car Park Structures: Image Processing Approach on Low...Towards Auto-Extracting Car Park Structures: Image Processing Approach on Low...
Towards Auto-Extracting Car Park Structures: Image Processing Approach on Low...
 
Discovery of Twitter User Interestingness Based on Retweets, Reply Mentions a...
Discovery of Twitter User Interestingness Based on Retweets, Reply Mentions a...Discovery of Twitter User Interestingness Based on Retweets, Reply Mentions a...
Discovery of Twitter User Interestingness Based on Retweets, Reply Mentions a...
 
A Comparison of People Counting Techniques via Video Scene Analysis
A Comparison of People Counting Techniques viaVideo Scene AnalysisA Comparison of People Counting Techniques viaVideo Scene Analysis
A Comparison of People Counting Techniques via Video Scene Analysis
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Handwritten Recognition using Deep Learning with R

  • 1. Handwritten Recognition using Deep Learning with R Poo Kuan Hoong August 17, 2016 1
  • 3. Introduction In the past 10 years, machine learning and Artificial Intelligence (AI) have shown tremendous progress The recent success can be attributed to: Explosion of data Cheap computing cost - CPUs and GPUs Improvement of machine learning models Much of the current excitement concerns a subfield of it called “deep learning”. 3
  • 5. Neural Networks Deep Learning is primarily about neural networks, where a network is an interconnected web of nodes and edges. Neural nets were designed to perform complex tasks, such as the task of placing objects into categories based on a few attributes. Neural nets are highly structured networks, and have three kinds of layers - an input, an output, and so called hidden layers, which refer to any layers between the input and the output layers. Each node (also called a neuron) in the hidden and output layers has a classifier. 5
  • 7. Neural Network: Forward Propagation The input neurons first receive the data features of the object. After processing the data, they send their output to the first hidden layer. The hidden layer processes this output and sends the results to the next hidden layer. This continues until the data reaches the final output layer, where the output value determines the object’s classification. This entire process is known as Forward Propagation, or Forward prop. 7
  • 8. Neural Network: Backward Propagation To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value. 8
  • 9. The 1990s view of what was wrong with back-propagation It required a lot of labelled training data Almost all data is unlabeled The learning time did not scale well It was very slow in networks with multiple hidden layers. It got stuck at local optima These were often surprisingly good but there was no good theory 9
  • 10. Deep Learning Deep learning refers to artificial neural networks that are composed of many layers. It’s a growing trend in Machine Learning due to some favorable results in applications where the target function is very complex and the datasets are large. 10
  • 11. Deep Learning: Benefits Robust No need to design the features ahead of time - features are automatically learned to be optimal for the task at hand Robustness to natural variations in the data is automatically learned Generalizable The same neural net approach can be used for many different applications and data types Scalable Performance improves with more data, method is massively parallelizable 11
  • 12. Deep Learning: Weaknesses Deep Learning requires a large dataset, hence long training period. In term of cost, Machine Learning methods like SVMs and other tree ensembles are very easily deployed even by relative machine learning novices and can usually get you reasonably good results. Deep learning methods tend to learn everything. It’s better to encode prior knowledge about structure of images (or audio or text). The learned features are often difficult to understand. Many vision features are also not really human-understandable (e.g, concatenations/combinations of different features). Requires a good understanding of how to model multiple modalities with traditional tools. 12
  • 14. H2O Library H2O is an open source, distributed, Java machine learning library Ease of Use via Web Interface R, Python, Scala, Spark & Hadoop Interfaces Distributed Algorithms Scale to Big Data Package can be downloaded from http://www.h2o.ai/download/h2o/r 14
  • 15. H2O R Package on CRAN 15
  • 16. H2O booklets H2O reference booklets can be downwloaded from https://github.com/h2oai/h2o-3 /tree/master/h2o-docs/src/booklets/v2_2015/PDFs/online 16
  • 17. MNIST Handwritten Dataset The MNIST database consists of handwritten digits. The training set has 60,000 examples, and the test set has 10,000 examples. The MNIST database is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image For this demo, the Kaggle pre-processed training and testing dataset were used. The training dataset, (train.csv), has 42000 rows and 785 columns. 17
  • 18. Demo The sourcecode can be accessed from here https://github.com/kuanhoong/myRUG_DeepLearning 18
  • 19. Create training and testing datasets 19
  • 20. Start H2O Cluster from R and load data into H2O 20
  • 21. Deep Learning in R: Train & Test 21