Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Performance evaluation of GANs in a semisupervised OCR use case

592 Aufrufe

Veröffentlicht am

Even in the age of big data, labeled data is a scarce resource in many machine learning use cases. Florian Wilhelm evaluates generative adversarial networks (GANs) when used to extract information from vehicle registrations under a varying amount of labeled data, compares the performance with supervised learning techniques, and demonstrates a significant improvement when using unlabeled data.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Performance evaluation of GANs in a semisupervised OCR use case

  1. 1. Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11
  2. 2. Special Interests • Mathematical Modelling • Recommendation Systems • Data Science in Production • Python Data Stack • Maintainer of PyScaffold Dr. Florian Wilhelm Principal Data Scientist @ inovex @FlorianWilhelm FlorianWilhelm florianwilhelm.info 2 Florian Tanten Master Thesis @ inovex October 2017 - May 2018
  3. 3. IT-project house for digital transformation: ‣ Agile Development & Management ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings Using technology to inspire our clients. And ourselves. inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de
  4. 4. 4 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  5. 5. 5https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics Vehicle Identification Number (VIN) Unique identifier like a fingerprint of a vehicle serial number country security code model year assembly plant details flexible fuel vehicles manufacturer
  6. 6. 6 Use Case VIN: WF0DXXGAKDEJ37385 VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 Engine power: 143 PS Equipment: - Xenon Lights ... Information about the car: Spotting the vehicle identification number (VIN) in images of vehicle registration documents
  7. 7. 7 OCR -Libraries PyOCR Commercial software Open source tools
  8. 8. 8 „VSSZZZGJZHR03G533“ ??? + OCR with Tesseract
  9. 9. 9 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  10. 10. Character detection & extraction Character recognition 11Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“ Methodology in Text Spotting Sliding Window Computer Vision Tools Others - Connected components - Stroke width transform - Edge detection - SVM - Learning with HOG - CNN - Region proposal - Hypotheses CNN pooling Character or word CNN CNN + RNN SVM Nearest Neighbor High-performer current studies CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning 379Character Recognition ... Spotting = Detection + Recognition
  11. 11. 12https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/ Convolutional Neural Network Max pooling with a 2x2 filter and stride = 2Convolution with 3x3 kernel and stride = 1
  12. 12. 14 Agenda 1. Use Case 2. Data and Pipeline 3. Semi-supervised Learning 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  13. 13. 15 Objectives - ~170 images of vehicle registration documents b) Semi-supervised method a) Supervised method 2. Comparison of classifiers 1. Implementation of a prototype „XLG0H200NA0A10348“ Dataset: Text Spotting
  14. 14. 16 End-to-End Text Spotting Pipeline Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Only one window per character All windows Non Maximum Suppression All windows with characters Region of Interest Extractor Image depicting only VIN X L G 0 H 2 0 N A 10 04 43 80
  15. 15. 17 Small Dataset What to do about that? 1. Data Generation 2. Data Augmentation
  16. 16. 18 Data Augmentation Data augmentation: Datasets: Original image labeled manually as „0“ 2 classes 36 classes Chararacter Recognizer (36 classes) Label: „0“ Character Detector (2 classes) Label: „character“ Label: „no character“
  17. 17. 19 170 images of vehicle registration documents Training set 85 images 85 images Training sets of classifiers Testing sets of classifiers Testing sets of pipeline 85 images RecognizerDetector ~ 42000 images 2 classes ~ 8000 images 36 classes ~ 42000 images 2 classes ~ 8000 images 36 classes RecognizerDetector Data Augmentation Data Augmentation Testing set Datasets
  18. 18. 20 Classifiers 1. Supervised Convolutional Neural Network 2. Semi-supervised Generative Adversarial Network Generator Discriminator Input Feature extraction Classification
  19. 19. 21 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  20. 20. 22 Yann LeCun Director of Facebook AI Research, Prof at NYU “... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“ Ian J. Goodfellow @ Google Brain
  21. 21. 23 Generative Adversarial Network Generator (G) Discriminator (D) Goal: Generate images, which seem to be realistic Goal: Differentiate between fake and real images
  22. 22. 24 Generative Adversarial Network Generator (G) Discriminator (D) Is D correct? „D classified the generated image as 10% real“ „yes“ A B . . . 8 9 F Real imagesReal labeled images
  23. 23. 25Goodfellow et al. (2014), Generative Adversarial Networks Mathematical formulation Discriminator output for real images Discriminator output for fake images Discriminator calculates likelihood [0,1] for an image being real Maximizing discriminator loss Minimizing generator loss Objective function Training (alternating)
  24. 24. 26 Example of generated images Training images: Generated images during learning process:
  25. 25. 27 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  26. 26. 28 Semi-supervised Learning Supervised Learning Unsupervised Learning Semi-supervised Learning • Makes use of unlabeled data • Combines supervised and unsupervised learning
  27. 27. 29 Semi-supervised GAN for Character Detection Real labeled images Real unlabeled images Generator Discriminator
  28. 28. 30 Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results
  29. 29. 31 Character Detector (2 classes) 60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained „Character“ „No character“ Manually generated images with CAPTCHA methods Pretraining of DCNN Size of labeled training set Accuracy Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20
  30. 30. 32 Character Detector (2 classes) 60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained Supervised GAN Generator Discriminator Real labeled images C C F C C F Supervised GAN Size of labeled training set Accuracy Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20
  31. 31. 33 Character Detector (2 classes) 60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN Discriminator C C F Generator F Real labeled images CC Real unlabeled images Semi-supervised GAN Size of labeled training set Accuracy Bildschirmfoto 2018-04-24 um 17.48.20
  32. 32. 34 Character Recognizer (36 classes) 0,00% 10,00% 20,00% 30,00% 40,00% 50,00% 60,00% 70,00% 80,00% 90,00% 100,00% 36 72 108 200 300 400 600 800 1000 5000 8000 60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained SupervisedGAN Character DetectorCharacter Recognizer Size of labeled training set Accuracy Size of labeled training set Accuracy Bildschirmfoto 2018-04-24 um 17.48.20
  33. 33. .. 35 End-to-End Text Spotting Pipeline Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor Accuracy = 99.94% 85 images 1. 2. 85. .
  34. 34. 36 Google Cloud Vision API Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor 85 images ∅ Levenshtein distance = 4.49 85 images of VINs .. . Our ApproachGoogle Cloud Vision API vs. ∅ Levenshtein distance = 0.011 Levenshtein distance: Classification Label AYZ33 XYZ321 = 3
  35. 35. 37 Key Learnings • Custom solutions can tremendously outperform off-the-shelve software in a specific use-case • Semi-supervised GANs can be successfully applied in use-cases with little data • With simple data augmentation techniques having only little data might be enough
  36. 36. 38 Bibliography - Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“ - Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“ - Girshick et al. (2015), „Fast R-CNN“ - Girshick et al. (2015), „Faster R-CNN“ - He et al. (2017), „Mask-R-CNN“ - Goodfellow et al. (2014) „Generative Adversarial Networks"
  37. 37. Thank you! Florian Wilhelm Principal Data Scientist inovex GmbH Schanzenstraße 6-20 Kupferhütte 1.13 51063 Köln florian.wilhelm@inovex.de

×