Roger Labahn (University of Rostock, DE): Handwritten Text Recognition. Key concepts
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
fundamental of entomology all in one topics of entomology
co:op-READ-Convention Marburg - Roger Labahn
1. Handwritten Text Recognition:
Key concepts
PD Dr. Roger Labahn
Computational Intelligence Technology Lab
Mathematical Optimization Group
Institute for Mathematics
University of Rostock
co:op Convention | READ Kickoff 19.01.2016
2. Handwritten Text Recognition: Key concepts
Introduction
Concepts – Problems – Tasks
Recognition & Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
3. Framework – Workflow
...
...
...
• Application: keyword search, transcription, . . .. . .. . .
OUT textual information (words, positions, ...) with alternatives & confidences
⇑⇑⇑
• HTR-Engine
⇑⇑⇑
IN writing images (lines, words, table cells, form fields, ...)
• Layout Analysis: . . .. . .. . . , text blocks
...
...
...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
4. Framework – Workflow
...
...
...
• Application: keyword search, transcription, . . .. . .. . .
OUT textual information (words, positions, ...) with alternatives & confidences
⇑⇑⇑
• HTR-Engine
⇑⇑⇑
IN writing images (lines, words, table cells, form fields, ...)
• Layout Analysis: . . .. . .. . . , text blocks
...
...
...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
5. Alternative recognition strategies
Topological methods
• learn & read graphical substructures of writings
• arcs, lines, curves, holes, ...
HMM based methods
• Hidden Markov Models
• learn & read states while traversing the writing
RNN based methods
• Recurrent Neural Networks
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
6. Alternative recognition strategies
Topological methods
• learn & read graphical substructures of writings
• arcs, lines, curves, holes, ...
HMM based methods
• Hidden Markov Models
• learn & read states while traversing the writing
RNN based methods
• Recurrent Neural Networks
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
7. Recognition Engine C I T lab & MoU partner PLANET
Decoding textual output
• textual interpretation of recognition results
• matching external requierements / knowledge (dictionaries, language model, ...)
⇑⇑⇑ ⇑⇑⇑
Recognition recognition matrix
• recognition information from image information
• processing standardized writing image
⇑⇑⇑ ⇑⇑⇑
Writing preprocessing standardized writing
• corrections & normalizations
• e.g.: baseline, slant, height, ...
co:op Convention | READ Kickoff HTR Key Concepts | Introduction Roger Labahn | C I T lab
8. Introduction
Concepts – Problems – Tasks
Segmentation
Context
Language
HTR
Recognition & Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks Roger Labahn | C I T lab
9. Segmentation ?
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
•
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
10. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
•
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
11. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• B
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
12. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
13. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
14. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.a
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
15. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
16. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
17. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣.
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
18. Segmentation ? NONE !
(classical) OCR = Optical Character Recognition
• Reading single characters Sub-images per character ! ?
• B a d ␣ D o ??? a n
Segmentationfree Reading
• processing the entire writing image: word . . .. . .. . . line . . .. . .. . .
• scanning information data sequence (signal) / character sequence
• BB.ad␣.D
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Segmentation Roger Labahn | C I T lab
19. Image context is essential !
Single segment without context
•
• virtually not (sufficiently) readable
Character sequence without context
•
• virtually not (sufficiently) explainable
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
20. Image context is essential !
Single segment without context
• u ?? OR n ??
• virtually not (sufficiently) readable
Character sequence without context
• ???
• virtually not (sufficiently) explainable
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Context Roger Labahn | C I T lab
21. Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
22. Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
23. Language context is essential !
Free reading – no restrictions for possible reading results
• BB.ad␣DDolo.auu
• application: figures & general numbers, ...
Comparison against dictionary or keyword
• task: • Read a german city name from a given list !
• Find the name Bad Doberan !
• Bad Doberan
• goal: optimal / possible correspondence
writing / reading result dictionary entry / keyword
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | Language Roger Labahn | C I T lab
24. OCR ?
new paradigm – new concepts
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
25. OCR ? HTR !
new paradigm – new concepts new term !
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
26. OCR ? HTR !
new paradigm – new concepts new term !
• HTR Handwritten Text Recognition
• ATR Automatic Text Recognition
• ... ???
co:op Convention | READ Kickoff HTR Key Concepts | Concepts – Problems – Tasks | HTR Roger Labahn | C I T lab
27. Introduction
Concepts – Problems – Tasks
Recognition & Training
Feature extraction
Writing processing
Neural Network
Parameter training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training Roger Labahn | C I T lab
28. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
29. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
30. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
31. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
32. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
33. From pixel values to features
original grey image
Filtering
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Feature extraction Roger Labahn | C I T lab
34. Collect & remember context !
Writing processing
• scanning in different directions data sequences (signals)
•
Information memory
• neural networks with complex neurons (cells)
• recurrent connections =⇒=⇒=⇒ memory
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
35. Collect & remember context !
Writing processing
• scanning in different directions data sequences (signals)
•
Information memory
• neural networks with complex neurons (cells)
• recurrent connections =⇒=⇒=⇒ memory
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Writing processing Roger Labahn | C I T lab
36. Complex cells
co:op Convention | READ Kickoff HTR Key Concepts | Recognition & Training | Neural Network Roger Labahn | C I T lab
37. Complex cells – memory by recurrent connections
6
?
??
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
38. Hierarchical Neuronal Networks
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
39. From feature input to network output
(Figure from GRAVES, SCHMIDHUBER: Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks)
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
40. From feature input to network output
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Neural Network Roger Labahn | C I T lab
41. Parameter training: Machine Learning
Theory
• objective: optimally adapt parameters in cells along network connections
• idea: train the network with learning data samples
• optimization: minimize error (network output vs. sample target) over training data
Practice: impression of large application cases
• 104
network cells
• 106
trainable parameters
• 104
learning data samples (writing images)
• 150 training epochs each processing every sample once
• 4 weeks training from the scratch
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Parameter training Roger Labahn | C I T lab
42. Learning data . . .. . .. . .
• . . .. . .. . . labeled training samples ground truth
HTR: writing images with correct text
• . . .. . .. . . the more the better . . .. . .. . . BUT:
start with realistic (reasonable) number improve while working
• . . .. . .. . . represent all project data . . .. . .. . . BUT:
start with HTR (networks) from similar collections corpora
• . . .. . .. . . contribute to general HTR engine improvement:
put into network repository for specific application cases
co:op Convention | READ Kickoff HTR Key Concepts | Recognition Training | Parameter training Roger Labahn | C I T lab
43. Introduction
Concepts – Problems – Tasks
Recognition Training
Interpretation – Decoding
Network output
Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding Roger Labahn | C I T lab
44. Channel probabilities
Pre-conditions
• (abstract) alphabet of (abstract) characters
• text composed of exactly these characters
• alphabet characters ⇐⇒⇐⇒⇐⇒ network output neurons channels
• example: digits, uppercase letters, lowercase letters, special characters ␣-
• much more general: any symbol unit learnable from training data
• current (large) application case: up to 150 character channels
• independent from (natural) language – reading/writing direction – understanding
Network output
probability of (character) channel at writing (image) position
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
45. Confidence Matrix – recognition / perception matrix
. B D a d l o u ␣
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Network output Roger Labahn | C I T lab
46. Expression matching
• restrict to
permissible words dictionary
keyword(s) construct(s) regular expression
• consider
character confidences probability measure
or their negative logarithms distance measure
Algorithmic method
• compare confidence matrix against any permissible expression
• use extremely fast algorithm: Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
47. Expression matching
• restrict to
permissible words dictionary
keyword(s) construct(s) regular expression
• consider
character confidences probability measure
or their negative logarithms distance measure
Algorithmic method
• compare confidence matrix against any permissible expression
• use extremely fast algorithm: Dynamic Programming
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
48. Decoding
Objective – Result
• permissible expression(s) with best matching to recognition output
• best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance
• best alternatives ranked by measure (probability / distance)
Practice: impression of actual application cases
• only decoding on pre-processed lines
• searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average
• reading 1 page against 11.650 word dictionary: 8 - 9 sec. average
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
49. Decoding
Objective – Result
• permissible expression(s) with best matching to recognition output
• best matching ⇐⇒⇐⇒⇐⇒ maximal probability ⇐⇒⇐⇒⇐⇒ minimum distance
• best alternatives ranked by measure (probability / distance)
Practice: impression of actual application cases
• only decoding on pre-processed lines
• searching 1 keyword in 10.500 lines (433 pages) : 2 - 3 sec. average
• reading 1 page against 11.650 word dictionary: 8 - 9 sec. average
co:op Convention | READ Kickoff HTR Key Concepts | Interpretation – Decoding | Decoding Roger Labahn | C I T lab
52. Introduction
Concepts – Problems – Tasks
Recognition Training
Interpretation – Decoding
Epilog
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
53. Results
from C I T lab’s contribution to ICDAR’s HTRtS-2015 contest
WER = 0%
CER = 0%
who has with or without right the temporary possession of it : and
who has with or without right the temporary possession of it : and
WER = 17%
CER = 4%
operation of this act is spent upon Titius only ,
operation of this act isspeut upon Titius only ,
WER = 67%
CER = 52%
of the said first issue : the amount of such second consequently gap/ to the
of the and put feet the without of such ; said uitrquunity be the
WER = 80%
CER = 17%
for a simple personal Injury the Offender ’ s punish=
For on simple personal injury the offenders punish .
2. Examples of test line images of increasing difficulty. The reference transcript and the CITlab system hypothesis are displayed (in this order) below
h image. The corresponding WER and CER figures are also shown on the right of each image.
the lines with crossed-out word can be transcribed as the
ine shows. Finally, we can see that sometimes, if the line
a large WER but a low CER, the transcript can be more
ul than if the WER is lower and the CER higher (see third
[4] A. Graves, M. Liwicki, S. Fern´andez, R. Bertolami, H. Bunke, and
J. Schmidhuber, “A Novel Connectionist System for Unconstrained
Handwriting Recognition,” IEEE Tr. PAMI, vol. 31, no. 5, pp. 855–
868, 2009.
(Figure from SÁNCHEZ, TOSELLI, ROMERO, VIDAL: ICDAR2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset)
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
54. Thanks . . .. . .. . .
C I T lab Group – URO MoU Partner
PLANET intgelligent systems GmbH
EU Funding
Recognition Enrichment of Archival Documents
. . .. . .. . . for your attention!
co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab
55. co:op Convention | READ Kickoff HTR Key Concepts | Epilog Roger Labahn | C I T lab