Online vehicle marketplaces are embracing artificial intelligence to ease the process of selling a vehicle on their platform. The tedious work of copying information from the vehicle registration document into some web form can be automated with the help of smart text-spotting systems, in which the seller takes a picture of the document, and the necessary information is extracted automatically.
Florian Wilhelm details the components of a text-spotting system, including the subtasks of object detection and optical character recognition (OCR). Florian elaborates on the challenges of OCR in documents with various distortions and artifacts, which rule out off-the-shelf products for this task. After offering an overview of semisupervised learning based on generative adversarial networks (GANs), Florian evaluates the performance gains of this method compared to supervised learning. More specifically, for a varying amount of labeled data, he compares the accuracy of a convolution neural network (CNN) to a GANthat uses additional unlabeled data during the training phase, showing that GANs significantly outperform classical CNNs in use cases with a lack of labeled data.
What you'll learn:
Understand how semisupervised learning with GANs works
Explore beneficial semisupervised methods based on GANs for use cases with a limited amount of labeled data
Gain insight into an interesting OCR use case of an online vehicle marketplace
Event: O'Reilly Artificial Intelligence Conference, London, 11.10.2018
Speaker: Dr. Florian Wilhelm
Mehr Tech-Vorträge: www.inovex.de/vortraege
Mehr Tech-Artikel: www.inovex.de/blog
2. Special Interests
• Mathematical Modelling
• Recommendation Systems
• Data Science in Production
• Python Data Stack
• Maintainer of PyScaffold
Dr. Florian Wilhelm
Principal Data Scientist @ inovex
@FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
2
Florian Tanten
Master Thesis @ inovex
October 2017 - May 2018
3. IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Cologne · Munich ·
Pforzheim · Hamburg · Stuttgart.
www.inovex.de
4. 4
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
6. 6
Use Case
VIN:
WF0DXXGAKDEJ37385
VIN-Decoder Manufacturer: BMW
Model: X3
Year: 2013-03-21
Engine power: 143 PS
Equipment:
- Xenon Lights
...
Information about the car:
Spotting the vehicle identification number (VIN) in images
of vehicle registration documents
12. 14
Agenda
1. Use Case
2. Data and Pipeline
3. Semi-supervised Learning
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
13. 15
Objectives
- ~170 images of vehicle registration documents
b) Semi-supervised method
a) Supervised method
2. Comparison of classifiers
1. Implementation of a prototype „XLG0H200NA0A10348“
Dataset:
Text
Spotting
14. 16
End-to-End Text Spotting Pipeline
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Only one window per character
All windows
Non Maximum Suppression
All windows with characters
Region of Interest Extractor
Image depicting only VIN
X L G 0 H 2 0 N A 10 04 43 80
19. 21
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
20. 22
Yann LeCun
Director of Facebook AI Research, Prof at NYU
“... (GANs) and the variations
that are now being proposed
is the most interesting idea
in the last 10 years in ML, in
my opinion.“
Ian J. Goodfellow
@ Google Brain
22. 24
Generative Adversarial Network
Generator (G)
Discriminator (D) Is D
correct?
„D classified the generated image as 10% real“
„yes“
A
B
.
.
.
8
9
F
Real imagesReal labeled images
23. 25Goodfellow et al. (2014), Generative Adversarial Networks
Mathematical formulation
Discriminator output
for real images
Discriminator output
for fake images
Discriminator calculates likelihood [0,1] for an image being real
Maximizing discriminator loss
Minimizing generator loss
Objective function
Training (alternating)
27. 29
Semi-supervised GAN for Character Detection
Real labeled images
Real unlabeled
images
Generator
Discriminator
28. 30
Agenda
1. Use Case
2. Text Spotting
3. Data and Pipeline
4. Generative Adversarial Networks
5. Semi-supervised Learning
6. Results
29. 31
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained
„Character“ „No character“
Manually generated images with CAPTCHA methods
Pretraining of DCNN
Size of labeled training set
Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20Bildschirmfoto 2018-04-24 um 17.48.20
30. 32
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained Supervised GAN
Generator
Discriminator
Real labeled
images
C
C
F
C C
F
Supervised GAN
Size of labeled training set
Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20
Bildschirmfoto 2018-04-24 um
17.48.20
31. 33
Character Detector (2 classes)
60,00%
70,00%
80,00%
90,00%
100,00%
20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained Supervised GAN Semi-supervised GAN
Discriminator
C
C
F
Generator
F
Real labeled
images
CC
Real unlabeled
images
Semi-supervised GAN
Size of labeled training set
Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20
32. 34
Character Recognizer (36 classes)
0,00%
10,00%
20,00%
30,00%
40,00%
50,00%
60,00%
70,00%
80,00%
90,00%
100,00%
36 72 108 200 300 400 600 800 1000 5000 8000
60,00%
70,00%
80,00%
90,00%
100,00%
20
50
100
200
400
700
1000
5000
15000
30000
42000
DCNN DCNN pretrained SupervisedGAN
Character DetectorCharacter Recognizer
Size of labeled training set
Accuracy
Size of labeled training set
Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20
33. ..
35
End-to-End Text Spotting Pipeline
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Non Maximum Suppression
Region of Interest Extractor
Accuracy = 99.94%
85 images
1.
2.
85.
.
34. 36
Google Cloud Vision API
Sliding window
Character Detector (2 classes)
Chararacter Recognizer (36 classes)
Non Maximum Suppression
Region of Interest Extractor
85 images
∅ Levenshtein distance = 4.49
85 images of VINs
..
.
Our ApproachGoogle Cloud Vision API vs.
∅ Levenshtein distance = 0.011
Levenshtein distance:
Classification Label
AYZ33 XYZ321 = 3
35. 37
Key Learnings
• Custom solutions can tremendously outperform
off-the-shelve software in a specific use-case
• Semi-supervised GANs can be successfully
applied in use-cases with little data
• With simple data augmentation techniques
having only little data might be enough
36. 38
Bibliography
- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“
- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
- Girshick et al. (2015), „Fast R-CNN“
- Girshick et al. (2015), „Faster R-CNN“
- He et al. (2017), „Mask-R-CNN“
- Goodfellow et al. (2014) „Generative Adversarial Networks"