SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Machine Learning Methods
 for CAPTCHA Recognition
       Rachel Shadoan
       Zachery Tidwell, II
CAPTCHA
Completely Automated Public Turing Test to tell Computers and Humans Apart


Why are they interesting?
  o Harder than normal text recognition
         On par with handwriting recognition,
         reading damaged text
  o Techniques translate well to other problems
         Facial recognition (Gonzaga, 2002)
         Weed identification (Yang, 2000)
  o Near infinite data sets
         Easier to avoid over-fitting
Hypothesis

CAPTCHA recognition can be
 accomplished to a high degree
 of accuracy using machine
 learning methods with minimal
 preprocessing of inputs.
Methods
           Tools
              o JCaptcha
              o Image Processing

Learning Methods        Segmentation Methods
  o Feed-forward Neural   o Overlapping
    Nets                     o Whitespace
  o Self-Organizing Maps     o K-Means
  o K-Means
  o Cluster Classification
JCaptcha

o Open-source CAPTCHA
  generation software
o Highly configurable
   Can produce CAPTCHAs of
   many levels of difficulty

o Check it out at:
  http://jcaptcha.sourceforge.net
Image Processing
Sparse Image
  Represents Images as unbounded set of pixels
  Each pixel is a value between 0 and 1 and a
    coordinate pair
  Center each image before turning into a matrix of
    0s and 1s




         Original          After Transformation
Feed-Forward Neural Nets




      As covered in class
Self-Organizing Maps
Training                          Collection
    Initialize N buckets to         For many inputs
       random values
                                          Sort each input into 
    For each input                        the bucket it most 
       Find the bucket that is            closely matches
       “closest” to the input       For each bucket and each 
       Adjust the “closest”         character
       bucket to more closely             Calculate the 
       match the input using              probability of that 
       exponential average                character going into 
                                          that bucket.
K-Means
• Very similar to Self‐
  Organizing Maps 
  (SOMs)
• Can use the same 
  classifying mechanism 
  as used for SOM
Overlapping Segmentation
• Divide image into
  fixed number of
  overlapping tiles of
  the same size
• In our case, 20 x 20
  pixels with a 50%
  overlap
• Discard chunks
  under a certain size   Note: This is a B with
                         part of it cut off, not
  and chunks that are    an E. Therein lies the
  all white              rub.
Whitespace Segmentation
• Iterate through the
  image from left to
  right—segment
  when a full column
  of whitespace is
  encountered
• Works perfectly for
  well-spaced text
K-Means Segmentation
• Performs better
  than heuristic
  segmentation on
  closely-packed
  inputs
Segmentation Comparison
     Even‐width


     Whitespace


     K‐Means



     Even‐width


     Whitespace


     K‐Means
Experiment 1
Machine Learning Method:
  Self-Organizing Map
Topology
  200 buckets, initialized randomly
Inputs:
  3 letter CATPCHAs
  Random fonts
  Letters A-G
  “Chunked” using overlapping segmentation
Experiment 1 Results
Buckets fell into three primary categories:

  Distinguishable
  letters


  Chunks with halves
  of two letters

  Indistinguishable
  noise
Experiment 1 Results
Experiment 2
ML Method:                                        Contains … ?
  Neural Net
                                                             A: 0 or 1
Topology:                                                    B : 0 or 1
                                                             C: 0 or 1




                           400 Nodes
  Fully connected




                                       50 Nodes




                                                   7 Nodes
                                                             D: 0 or 1
                                                             E: 0 or 1
  400 inputs                                                 F: 0 or 1
  50 node hidden layer                                       G: 0 or 1

  7 outputs
Inputs:
  Single letter CATPCHAs
  Random fonts
  Letters A-G
Experiment 2 Results




     Neural Net Learning Curve
Experiment 2 Results

                                               Past a certain
                                               number of nodes
                                               in the hidden
                                               layer, the
                                               topology ceases
                                               to have a huge
                                               impact on
                                               accuracy.



Neural Net Accuracy vs. Size of Hidden Layer
Experiment 3
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-G
Experiment 3




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐G
Experiment 4
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      4 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 4




Neural Net vs. SOM on CAPTCHAs Length 4, Letters A‐Z
Experiment 5
ML Method:                ML Method:
 SOM                       Neural Net
Topology:                 Topology:
 500 buckets               Fully connected
                           400 inputs
                           1000 node hidden layer
                           7 outputs
Inputs:
      5 letter CATPCHAs
      Fandom fonts
      Letters A-Z
Experiment 5




Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
What it all means
• Increasing number of characters
  dramatically decreases total accuracy
  because segmentation quality decreases
• True positive rate goes down when
  segmentation quality decreases
• Hence, better segmentation is the key
Future Work
Improved Segmentation
   o Wirescreen segmentation
   o Ensemble techniques
Improved True Positive Rates with Current
  System
   o Ensemble techniques
New problems
   o Handwriting recognition
   o Bot net of doom
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Modelsbutest
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksNguyen Quang
 
Network embedding
Network embeddingNetwork embedding
Network embeddingSOYEON KIM
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceJonathan Mugan
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoLidia Pivovarova
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsNesreen K. Ahmed
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detectionNASIM ALAM
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueStephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 

Was ist angesagt? (20)

Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Machine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative ModelsMachine Learning: Generative and Discriminative Models
Machine Learning: Generative and Discriminative Models
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural NetworksSequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
AINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, CoutoAINL 2016: Castro, Lopez, Cavalcante, Couto
AINL 2016: Castro, Lopez, Cavalcante, Couto
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen LarroqueTagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
Tagged network (colored clique network) COGNITIVE 2015 by Stephen Larroque
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 

Andere mochten auch

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network Bushra Jbawi
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningcrew1274
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking SystemAyan Omer
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captchacrew1274
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captchakaranwayne
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?ameyakulk
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthWebCrazyLabs
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captchakunalkiit
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 

Andere mochten auch (20)

CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network CAPTCHA and Convolutional neural network
CAPTCHA and Convolutional neural network
 
Captcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learningCaptcha-recognition-with-active-deep-learning
Captcha-recognition-with-active-deep-learning
 
CAPTCHA Cracking System
CAPTCHA Cracking SystemCAPTCHA Cracking System
CAPTCHA Cracking System
 
Captcha
CaptchaCaptcha
Captcha
 
breaking PHP web Captcha
breaking PHP web Captchabreaking PHP web Captcha
breaking PHP web Captcha
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captchas
CaptchasCaptchas
Captchas
 
Captcha seminar
Captcha seminar Captcha seminar
Captcha seminar
 
captcha.ppt
 captcha.ppt captcha.ppt
captcha.ppt
 
Generic Solving Of Text Based Captcha
Generic Solving Of Text Based CaptchaGeneric Solving Of Text Based Captcha
Generic Solving Of Text Based Captcha
 
Human or Intelligent Machine?
Human or Intelligent Machine?Human or Intelligent Machine?
Human or Intelligent Machine?
 
CAPTCHA
CAPTCHACAPTCHA
CAPTCHA
 
Captcha ppt
Captcha pptCaptcha ppt
Captcha ppt
 
CAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for YouthCAPTCHA- Newly Attractive Presentation for Youth
CAPTCHA- Newly Attractive Presentation for Youth
 
Seminar report on captcha
Seminar report on captchaSeminar report on captcha
Seminar report on captcha
 
Captcha
CaptchaCaptcha
Captcha
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
deCaptcha
deCaptchadeCaptcha
deCaptcha
 
Captcha
CaptchaCaptcha
Captcha
 

Ähnlich wie Machine Learning Methods For Captcha Recognition

Original SOINN
Original SOINNOriginal SOINN
Original SOINNSOINN Inc.
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalSuhas Pillai
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 

Ähnlich wie Machine Learning Methods For Captcha Recognition (7)

Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 
Original SOINN
Original SOINNOriginal SOINN
Original SOINN
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 

Kürzlich hochgeladen

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Machine Learning Methods For Captcha Recognition

  • 1. Machine Learning Methods for CAPTCHA Recognition Rachel Shadoan Zachery Tidwell, II
  • 2. CAPTCHA Completely Automated Public Turing Test to tell Computers and Humans Apart Why are they interesting? o Harder than normal text recognition On par with handwriting recognition, reading damaged text o Techniques translate well to other problems Facial recognition (Gonzaga, 2002) Weed identification (Yang, 2000) o Near infinite data sets Easier to avoid over-fitting
  • 3. Hypothesis CAPTCHA recognition can be accomplished to a high degree of accuracy using machine learning methods with minimal preprocessing of inputs.
  • 4. Methods Tools o JCaptcha o Image Processing Learning Methods Segmentation Methods o Feed-forward Neural o Overlapping Nets o Whitespace o Self-Organizing Maps o K-Means o K-Means o Cluster Classification
  • 5. JCaptcha o Open-source CAPTCHA generation software o Highly configurable Can produce CAPTCHAs of many levels of difficulty o Check it out at: http://jcaptcha.sourceforge.net
  • 6. Image Processing Sparse Image Represents Images as unbounded set of pixels Each pixel is a value between 0 and 1 and a coordinate pair Center each image before turning into a matrix of 0s and 1s Original After Transformation
  • 7. Feed-Forward Neural Nets As covered in class
  • 8. Self-Organizing Maps Training Collection Initialize N buckets to  For many inputs random values Sort each input into  For each input the bucket it most  Find the bucket that is  closely matches “closest” to the input For each bucket and each  Adjust the “closest”  character bucket to more closely  Calculate the  match the input using  probability of that  exponential average character going into  that bucket.
  • 9. K-Means • Very similar to Self‐ Organizing Maps  (SOMs) • Can use the same  classifying mechanism  as used for SOM
  • 10. Overlapping Segmentation • Divide image into fixed number of overlapping tiles of the same size • In our case, 20 x 20 pixels with a 50% overlap • Discard chunks under a certain size Note: This is a B with part of it cut off, not and chunks that are an E. Therein lies the all white rub.
  • 11. Whitespace Segmentation • Iterate through the image from left to right—segment when a full column of whitespace is encountered • Works perfectly for well-spaced text
  • 12. K-Means Segmentation • Performs better than heuristic segmentation on closely-packed inputs
  • 13. Segmentation Comparison Even‐width Whitespace K‐Means Even‐width Whitespace K‐Means
  • 14. Experiment 1 Machine Learning Method: Self-Organizing Map Topology 200 buckets, initialized randomly Inputs: 3 letter CATPCHAs Random fonts Letters A-G “Chunked” using overlapping segmentation
  • 15. Experiment 1 Results Buckets fell into three primary categories: Distinguishable letters Chunks with halves of two letters Indistinguishable noise
  • 17. Experiment 2 ML Method: Contains … ? Neural Net A: 0 or 1 Topology: B : 0 or 1 C: 0 or 1 400 Nodes Fully connected 50 Nodes 7 Nodes D: 0 or 1 E: 0 or 1 400 inputs F: 0 or 1 50 node hidden layer G: 0 or 1 7 outputs Inputs: Single letter CATPCHAs Random fonts Letters A-G
  • 18. Experiment 2 Results Neural Net Learning Curve
  • 19. Experiment 2 Results Past a certain number of nodes in the hidden layer, the topology ceases to have a huge impact on accuracy. Neural Net Accuracy vs. Size of Hidden Layer
  • 20. Experiment 3 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-G
  • 22. Experiment 4 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 4 letter CATPCHAs Fandom fonts Letters A-Z
  • 24. Experiment 5 ML Method: ML Method: SOM Neural Net Topology: Topology: 500 buckets Fully connected 400 inputs 1000 node hidden layer 7 outputs Inputs: 5 letter CATPCHAs Fandom fonts Letters A-Z
  • 25. Experiment 5 Neural Net vs. SOM on CAPTCHAs Length 5, Letters A-Z
  • 26. What it all means • Increasing number of characters dramatically decreases total accuracy because segmentation quality decreases • True positive rate goes down when segmentation quality decreases • Hence, better segmentation is the key
  • 27. Future Work Improved Segmentation o Wirescreen segmentation o Ensemble techniques Improved True Positive Rates with Current System o Ensemble techniques New problems o Handwriting recognition o Bot net of doom