SlideShare a Scribd company logo
1 of 16
Clear Lines Consulting · clear-lines.com 5/20/2013 · 1
F# Coding Dojo
A gentle introduction to Machine
Learning with F#
Clear Lines Consulting · clear-lines.com 5/20/2013 · 2
The goal tonight
» Take a Kaggle data science contest
» Write some code and have fun
» Write a classifier, from scratch, using F#
» Learn some Machine Learning concepts
» Stretch goal: send results to Kaggle
Clear Lines Consulting · clear-lines.com 5/20/2013 · 3
What you may need to know
Clear Lines Consulting · clear-lines.com 5/20/2013 · 4
Kaggle Digit Recognizer contest
» Full description on Kaggle.com
» Dataset: hand-written digits (0, 1, … , 9)
» Goal = automatically recognize digits
» Training sample = 50,000 examples
» Contest: predict 20,000 “unknown” digits
Clear Lines Consulting · clear-lines.com 5/20/2013 · 5
The data “looks like that”
1
Clear Lines Consulting · clear-lines.com 5/20/2013 · 6
Real data
» 28 x 28 pixels
» Grayscale: each pixel 0 (white) to 255 (black)
» Flattened: one record = Number + 784 Pixels
» CSV file
Clear Lines Consulting · clear-lines.com 5/20/2013 · 7
Illustration (simplified data)
Pixels (real: 784 fields, from 0 to 255)Actual Number
1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
Clear Lines Consulting · clear-lines.com 5/20/2013 · 8
What’s a Classifier?
» “Give me an unknown data point, and I will
predict what class it belongs to”
» In this case, classes = 0, 1, 2, … 9
» Unknown data point = scanned digit, without
the class it belongs to
Clear Lines Consulting · clear-lines.com 5/20/2013 · 9
The KNN Classifier
» KNN = K-Nearest-Neighbors algorithm
» Given an unknown subject to classify,
» Look up all the known examples,
» Find the K closest examples,
» Take a majority vote,
» Predict what the majority says
Clear Lines Consulting · clear-lines.com 5/20/2013 · 10
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
Which item from the sample
is nearest / closest to the Unknown
item we want to predict?
Suppose we have just 2 examples in the sample,
and want to predict the class of Unknown
Clear Lines Consulting · clear-lines.com 5/20/2013 · 11
What does “close” mean?
» To define “close” we need a distance
» We can use the distance between images as a
measure for “close”
» Other distances can be used as well
» Note: Square root not important here
Clear Lines Consulting · clear-lines.com 5/20/2013 · 12
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
X
1
X
X
X
X
X
X
X
X
0
Differences
Let’s compute the distance
between Unknown and our
two examples…
Clear Lines Consulting · clear-lines.com 5/20/2013 · 13
Illustration: 1 nearest neighbor
1
0
?
Sample
Unknown
1
0
?

    
(255-0)2
(255-0)2
(255-0)2 (0-255)2 Etc… Distance = 721
Distance = 255
Clear Lines Consulting · clear-lines.com 5/20/2013 · 14
Illustration: 1 nearest neighbor
1
0
?
SampleUnknown The first example is closest
to our Unknown candidate:
we predict that Unknown
has the same Number, 1
Clear Lines Consulting · clear-lines.com 5/20/2013 · 15
Questions?
Clear Lines Consulting · clear-lines.com 5/20/2013 · 16
Let’s start coding!
» Code 1-nearest-neighbor classifier
» “Guided script” available at:
» Bit.ly/FSharp-ML-Dojo
» https://gist.github.com/mathias-
brandewinder/5558573

More Related Content

Similar to FSharp and Machine Learning Dojo

TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptxTE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
AbhijeetDhanrajSalve
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
Surendra Gadwal
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Lucidworks
 

Similar to FSharp and Machine Learning Dojo (20)

TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptxTE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and Anomalies
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
 
MLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLMLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigML
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

FSharp and Machine Learning Dojo

  • 1. Clear Lines Consulting · clear-lines.com 5/20/2013 · 1 F# Coding Dojo A gentle introduction to Machine Learning with F#
  • 2. Clear Lines Consulting · clear-lines.com 5/20/2013 · 2 The goal tonight » Take a Kaggle data science contest » Write some code and have fun » Write a classifier, from scratch, using F# » Learn some Machine Learning concepts » Stretch goal: send results to Kaggle
  • 3. Clear Lines Consulting · clear-lines.com 5/20/2013 · 3 What you may need to know
  • 4. Clear Lines Consulting · clear-lines.com 5/20/2013 · 4 Kaggle Digit Recognizer contest » Full description on Kaggle.com » Dataset: hand-written digits (0, 1, … , 9) » Goal = automatically recognize digits » Training sample = 50,000 examples » Contest: predict 20,000 “unknown” digits
  • 5. Clear Lines Consulting · clear-lines.com 5/20/2013 · 5 The data “looks like that” 1
  • 6. Clear Lines Consulting · clear-lines.com 5/20/2013 · 6 Real data » 28 x 28 pixels » Grayscale: each pixel 0 (white) to 255 (black) » Flattened: one record = Number + 784 Pixels » CSV file
  • 7. Clear Lines Consulting · clear-lines.com 5/20/2013 · 7 Illustration (simplified data) Pixels (real: 784 fields, from 0 to 255)Actual Number 1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
  • 8. Clear Lines Consulting · clear-lines.com 5/20/2013 · 8 What’s a Classifier? » “Give me an unknown data point, and I will predict what class it belongs to” » In this case, classes = 0, 1, 2, … 9 » Unknown data point = scanned digit, without the class it belongs to
  • 9. Clear Lines Consulting · clear-lines.com 5/20/2013 · 9 The KNN Classifier » KNN = K-Nearest-Neighbors algorithm » Given an unknown subject to classify, » Look up all the known examples, » Find the K closest examples, » Take a majority vote, » Predict what the majority says
  • 10. Clear Lines Consulting · clear-lines.com 5/20/2013 · 10 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown Which item from the sample is nearest / closest to the Unknown item we want to predict? Suppose we have just 2 examples in the sample, and want to predict the class of Unknown
  • 11. Clear Lines Consulting · clear-lines.com 5/20/2013 · 11 What does “close” mean? » To define “close” we need a distance » We can use the distance between images as a measure for “close” » Other distances can be used as well » Note: Square root not important here
  • 12. Clear Lines Consulting · clear-lines.com 5/20/2013 · 12 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown X 1 X X X X X X X X 0 Differences Let’s compute the distance between Unknown and our two examples…
  • 13. Clear Lines Consulting · clear-lines.com 5/20/2013 · 13 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown 1 0 ?       (255-0)2 (255-0)2 (255-0)2 (0-255)2 Etc… Distance = 721 Distance = 255
  • 14. Clear Lines Consulting · clear-lines.com 5/20/2013 · 14 Illustration: 1 nearest neighbor 1 0 ? SampleUnknown The first example is closest to our Unknown candidate: we predict that Unknown has the same Number, 1
  • 15. Clear Lines Consulting · clear-lines.com 5/20/2013 · 15 Questions?
  • 16. Clear Lines Consulting · clear-lines.com 5/20/2013 · 16 Let’s start coding! » Code 1-nearest-neighbor classifier » “Guided script” available at: » Bit.ly/FSharp-ML-Dojo » https://gist.github.com/mathias- brandewinder/5558573