icdarmhmdsdt

A View on the Past and Future of Character and Document Recognition

Hiromichi Fujisawa
Central Research Laboratory, Hitachi, Ltd.
Kokubunji, Tokyo, Japan 185-8601
hiromichi.fujisawa.sb-at-hitachi.com

Abstract their ubiquity, are expanding its territory to image
documents. Handwriting is being given a second look
The paper first gives an overview on the technical at its importance. Papers do not seem to go away.
advances in the field of character and document rec-
ognition, decade by decade. Then, it highlights key 2. Brief historical view
technical developments especially for Kanji (Chinese
character) recognition in Japan. Technical issues 2.1. Overview
around post address recognition are then discussed,
which have promoted advanced techniques including The first commercial OCR appeared in the 1950s in
information integration. Robustness design principles the US and, since then, each decade has seen represen-
are introduced. Finally, future prospects are discussed. tative developments. In the 1960s, IBM produced
several models of “optical readers” for machine-
1. Introduction printed and hand-printed numbers for business use.
One of the models could read 200 fonts of printed
An industrial view on the character and document documents. In this decade, postal automation for
recognition technology is presented, looking from the mechanical letter sorting adopted OCRs for the first
past to the future. Since the birth of commercial Opti- time to automatically read postal codes to determine
cal Character Readers (OCRs) in the 1950s, the destinations, in the US, Europe, and Japan, independ-
character and document recognition technology has ently. In Japan, Toshiba and NEC developed hand-
made tremendous advancement, always supporting printed digit recognition apparatus in 1967, which were
industrial and commercial applications. At the same put into operation in 1968.
time, these business applications have always promot- In the 1970s, commercial OCRs were becoming
ed investments in new technology developments. We pervasive in Japan. Hitachi introduced the first hand-
can see a virtuous cycle in here. New technologies printed numeral OCR for business use in 1973, and
made new applications possible, and the new applica- NEC introduced the first hand-printed Katakana OCR
tions supported the new technology developments. in 1976. Japanese Ministry of International Trade and
It seems, however, that the wave of IT technologies Industry (today’s Ministry of Economy, Trade and
is surging over this area, which has been cultivated Industry) led a ten-year national project on pattern
over the period of more than fifty years. Most of in- recognition, including Kanji recognition and handwrit-
formation seems to be born in digital, possibly ten character recognition, from 1971. It attracted many
diminishing the demands for this technology. As a students and researchers into pattern recognition. In the
matter of fact, this is the second of this kind. The first US, IBM introduced a deposit processing system (IBM
was the wave of Office Automation in the 1980s. A 3895) in 1977, which could recognize unconstrained
strong expectation was that paper documents would handwritten numbers on bank checks. The author had a
disappear because all documents would be produced chance to see its operation at Mellon Bank in Pitts-
electronically. The result was on the contrary that the burgh in 1981 and was explained it could read about
peak sales of OCRs in Japan were in the 80s. 50% of them, while others were hand-coded.
It is felt, however, that the second wave might be The 1980s was a decade that benefited from the
different. This time, it might change the perspective technological progress in semiconductor devices such
completely. Other views are of course possible. For as image sensors, microprocessors, memories, and
instance, search technologies, which have established custom-designed LSIs. The hardware became smaller

than ever to place on desktop, thanks to microproces- each other in the 1970s in Japan. Commercial OCRs
sors and custom-designed LSIs. Then, larger, cheaper were using structural methods for handwritten alpha-
memories and image sensors enabled whole page im- numerics and Katakana’s, and pattern matching
ages to be scanned and stored in a memory for further methods for machine-printed alphanumerics. A pattern
processing, allowing more advanced recognition and matching method had been proved experimentally to
wider applications. For example, a handwritten numer- be applicable to machine-printed Kanji recognition by
al OCR that could recognize touching characters was the late 1970s [8-10].
introduced for the first time in 1983. In the late 1980s, The problem for us in those days was a method for
Japanese commercial OCRs introduced machine- recognizing handwritten Kanji’s. It was like an unex-
printed and hand-printed Kanji recognition capabilities, plored, huge mountain standing in front of us. What
which could recognize about 2,400 classes of Kanji’s. was clear was that the structural approach and simple
Another important feature of this decade is that op- pattern matching approach could not conquer it. The
tical disks for computer use were developed and put former had weakness in explosive topological varia-
into use for patent automation systems in the US and in tions due to complex strokes, while the latter had
Japan. They can be considered the first “digital librar- weakness in shape variations; however the latter
ies.” The Japanese patent office system currently stores seemed to have greater chance of success.
approximately 50 million documents or 200 million The concept of blurring as feature extraction was
pages. Most of the documents are in terms of scanned extended to directional features and found to be effec-
digital images. So, it was the time when studies on tive for handwritten Kanji recognition, though a
document understanding and document layout analysis preliminary study began with handwritten numeral
began in Japan. recognition [11, 12]. By introducing spatial continuous
The changes in the 1990s were due to performance feature extraction, the optimum amount of blurring
improvements in UNIX workstations and, then, per- turned out to be surprisingly large. Non-linear shape
sonal computers. Though scanning and image normalization [13, 14] and statistical classifier methods
preprocessing were still realized in hardware, major [1] boosted the recognition accuracy to a commercial
part of recognition were implemented in software. The value. We learned that blurring should be considered as
implication was that general purpose programming a means to obtain reduced prominent dimensions (sub-
languages like c and c++ could be used for recognition space) rather than to lower computational cost, though
algorithms, allowing engineers to develop more com- the effects seem similar. Normally, a feature vector for
plicated algorithms, and also expanding the research Kanji patterns consists of 8 x 8 x 4 elements (Figure 1)
community more to academia. Software OCR packages and the subspace after statistical analysis has around
running on PCs appeared in the market as well. 100 dimensions.
Freely handwritten character recognition techniques Recent advancement in Chinese (Kanji) recognition
were extensively studied, and successfully applied to methods is well presented in [15].
bank check readers and postal address readers. Ad-
vanced layout analysis techniques enabled recognition
of wide varieties of business forms. We were also in-
volved in the development of a postal address
recognition system for Japanese mail pieces.
The IAPR conferences and research communities
have been contributing to technical progresses. Many
of the methods playing key roles in today’s systems
have been studied thoroughly. Examples are artificial
neural networks, Hidden Markov Models (HMMs),
polynomial function classifiers, modified quadratic
discriminant function (MQDF) classifiers [1], support Figure 1. Directional features
vector machines (SVMs), classifier combination [2, 3],
information integration, and lexicon-directed character 2.3. Character segmentation algorithms
string recognition [4-7], some of which have original
versions back to 1960s. In the 1960s and 1970s, a flying-spot scanner, laser
scanner or other kind of mechanical scanner was used
2.2. Character recognition algorithms with a photo-multiplier as a sensor to obtain character
images. In a sense, character segmentation was done
The structural analysis approach and pattern match- with these kinds of scanning devices. Then in the
ing (or statistical) approach were the ideas competing 1980s appeared semiconductor sensors and memories,

allowing OCRs to scan and store an image of one char- state traverse is from the first place in the lattice, a
acter line and, later, a full page image. penalty zero is given, and if any label does not coincide,
This change relaxed strict conditions on OCR form the edge “Others” is selected, giving the penalty of 15.
specifications, for example, allowing smaller non- Generally, the penalty depends on the position in the
separated writing boxes, which required a touching lattice (Figure 3). In this way, every word in a lexicon
digit separation algorithm, however [16]. In 1983, is given a penalty value, and a word with the smallest
Hitachi produced one of the first OCRs that could penalty is determined to be the recognized word. This
segment and recognize touching handwritten digits was used successfully for address phrases, provided
based on a multiple-hypothesis segmentation- that character segmentation was reliable enough.
recognition method. Contour shape analysis could When Furigana (pronunciation in terms of syllabic
identify candidate touching points (Figure 2). characters) was available in addition to the Kanji ver-
This direction of changes led us to “forms process- sion, both versions could be recognized and the results
ing,” whose ultimate target was to read unknown forms, could be merged for a higher accuracy. In Japanese
or at least those forms that were not specifically des- business forms, it is normal that we are requested to fill
igned for OCRs. But, this meant that users became less in Kanji and Furigana versions.
careful in their writing styles, and, therefore, OCRs had As discussed later again, when segmentation is not
to be more accurate for freely written characters. Tech- reliable as for freely handwritten phrases, more com-
nically, such techniques as run-length code-based plicated knowledge integration approaches are required.
preprocessing, connected component analysis, contour
shape analysis, touching character-line separation,
segmentation-recognition integration, etc. were devel-
oped.

Figure 3. Finite state automaton

3. Robustness against uncertainty and
variability

Figure 2. Segmentation of touching digits Postal address recognition is one of the “ideal” ap-
plications that advance the technology. It is
2.4. Linguistic information integration technology-rich, posing a lot of technical problems,
and it promises post office innovation, whose invest-
Kanji OCRs, which read Kanji’s as one of the ex- ments pay off. R&D projects for developing a system
tended functions of commercial OCR, were used for that could read handwritten and machine-printed full-
reading handwritten Kanji names and addresses. The addresses were led in the US, Europe and Japan, in
first generations used OCR forms with fixed, separated industry and academia, in the 1990s.
boxes, causing no segmentation problem. Question was The recognition engine we developed for post of-
how to keep the phrase recognition accuracy high. fices became complex as shown in Figure 4, as a result
We could utilize a priori linguistic knowledge to of coping with various problems. The recognition sub-
pick up correct choices from the candidate lattice after modules output uncertain (intermediate) decisions.
character recognition. The method we developed used Dozens of such decisions must be made in series until
a finite state automaton, which was dynamically creat- reaching the final address interpretation. A natural
ed from the lattice and was equivalent to the lattice solution is to hand over multiple candidates, which we
contents [17]. Then, a word (or a character string) from call “hypotheses,” to the following stages. Then, we
a lexicon is fed into the automaton and an active state need to introduce a mechanism to control the sequen-
makes transitions through edges whose label coincides tial decision process, which is a kind of optimum
with the input characters. If the label through which a search after all.

algorithms were integrated into the software recogni-
tion engine successfully. Figure 5 shows the full-
address recognition rates for handwritten addresses for
four versions, V1 through V4. The horizontal axis
shows sample dataset numbers, which have been rear-
ranged so that the rates come into decreasing order.

Figure 4. Postal address recognition

Table 1. Design principles for robustness
Principles Expected effects
Figure 5. Improvements in handwritten
Hypothesis-Driven When the type of a problem is address recognition
uncertain, set up hypotheses,
process, and test the results
4. Future prospects
Deferred Decision / Do not decide; leave the deci-
Multiple Hypotheses sion to the next experts carrying
over multiple hypotheses A question is to where we should go, perhaps. The
anticipated needs for character and document recogni-
Process Integration Solve a problem by multiple
tion for the future include the following:
different-field experts as a team
Information

Combination- Decide as a team of multiple • Archival records and image replacement docu-
Integration

Based Integration same-field experts ments in e-Government
Corroboration- Utilize other input informa- • Books and historical documents for global search
Based Integration tion; seek more evidence • Handwriting captured by digital pens
Alternative Solutions Solve a problem by multiple • Text-in-the-scene captured by cameras
alternative approaches • Text in video
Perturbation Modify the problem slightly
and try it again
One thing that is almost clear is that we are going
into a “long tail” part of the market. The “head” part
has been already computerized either with the existing
Post address phrases are semantic-rich, and “infor-
mation integration” approach can be successfully OCR technology or other totally electronic means. The
applied. As other intelligent handwriting recognition remaining part has an extremely wide variety of docu-
systems apply, we have developed a segmentation- ments with not so many instances for each. Non-
recognition-interpretation integrated method [6, 7], in standardized business forms are this kind of example.
which segmentation produces a segmentation candidate For instance, small and medium-sized companies in
network, where an optimum path is selected by Japan are still using paper forms to make bank transac-
evaluating the likelihood. It is done by pattern match- tions. For each company, the number of transactions is
ing against the language model for possible address not so large, but banks receive many different types of
phrases. Segmentation required geometrical alignment forms from many companies. More intelligent, versa-
tile form reader may solve this problem.
be evaluated as well [18].
Handwriting is being reconsidered its importance in
The issues were how to design “robustness” in the
education and in a knowledge work context. The act of
system. We formulated the design principles for ro-
bustness as shown in Table 1 while carrying out the writing helps reading and thinking processes, and a
project [19, 20]. By applying them, many additional digital pen can capture handwritten annotations and
memos, being stored in computers. As papers being

considered the medium for such processes [21], we can se characters,” IEEE Trans. Electronic Computers,
print out electronically produced documents and work Vol. EC-15, No. 1, 1966, pp. 91-101.
on them with a digital pen, while all information can be [9] S. Yamamoto, A. Nakajima, K. Nakata, “Chinese
kept in computers. Then, search capability on such character recognition by hierarchical pattern match-
hand-annotated documents will be an important tool. ing,” Proc. 1st IJCPR, Washington DC, 1973,
A mobile device with text-in-scene recognition will pp.183-194.
be a necessary gadget for travelers in foreign countries. [10] H. Fujisawa, Y. Nakano, Y. Kitazume, and M.
It should be able to help understanding of second and Yasuda, “Development of a Kanji OCR: An Opti-
third languages, or more. Color image processing, cal Chinese Character Reader,” Proc. 4th IJCPR,
geometric perspective normalization, text segmentation, Kyoto, Nov. 1978, pp. 815-820.
[11] M. Yasuda and H. Fujisawa, “An Improvement of
adaptive thresholding, etc. need to be studied.
Correlation Method for Character Recognition,”
What is common through out these applications is
Systems, Computers, Controls, Scripta Publishing
that no single recognition algorithm may realize them. Co., Vol. 10, No. 2, 1979, pp. 29-38.
Higher-order image processing before recognition will [12] H. Fujisawa and C-L. Liu, “Directional Pattern
be mandatory. The solution should be comprehensive. Matching for Character Recognition Revisited,”
Proc. 7th ICDAR, Edinburgh, Aug. 2003, pp. 794-
5. Conclusions 798.
[13] J. Tsukumo and H. Tanaka, “Classification of
For the bright future of this technological commu- Handprinted Chinese Characters Using Non-linear
nity, both vision and fundamental technology are in Normalization and Correlation Methods,” Proc. 9th
demand. Vision will show applications with new value ICPR, Rome, Italy, 1988, pp. 168-171.
propositions that require new technology. But, tech- [14] C.-L. Liu, “Normalization-Cooperated Gradient
nology creates demands as well. Feature Extraction for Handwritten Character Rec-
ognition,” IEEE Trans. PAMI, Vol. 29, No. 6, 2007,
pp. 1465-1469.
6. References [15] C.-L. Liu, “Handwritten Chinese Character Rec-
ognition: Effects of Shape Normalization and
[1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Mi- Feature Extraction,” Proc. Summit on Arabic and
yake, “Modified Quadratic Discriminant Functions Chinese Handwriting, College Park, Sep. 2006.
and the Application to Chinese Character Recogni- [16] H. Fujisawa, Y. Nakano, and K. Kurino, “Seg-
tion,” IEEE Trans. PAMI, Vol. 9, No. 1, 1987, pp. mentation Methods for Character Recognition:
149-153. From Segmentation to Document Structure Analy-
[2] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of sis,” Proc. IEEE, Vol. 80, No. 7, 1992, pp. 1079-
Combining Multiple Classifiers and Their Applica- 1092.
tions to Handwriting Recognition,” IEEE Trans. [17] K. Marukawa, M. Koga, Y. Shima, and H. Fuji-
SMC, Vol. 22, No. 3, 1992, pp. 418-435. sawa, “An Error Correction Algorithm for
[3] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision Handwritten Chinese Character Address Recogni-
Combination in Multiple Classifier Systems,” IEEE tion,” Proc. 1st ICDAR, Saint-Malo, Sep. 1991, pp.
Trans. PAMI, Vol. 16, No. 1, 1994, pp. 66-75. 916-924.
[4] F. Kimura, M. Sridhar, and Z. Chen, “Improve- [18] T. Kagehiro, M. Koga, H. Sako, and H. Fujisawa,
ments of Lexicon-Directed Algorithm for “Segmentation of Handwritten Kanji Numerals In-
Recognition of Unconstrained Hand-Written tegrating Peripheral Information by Bayesian
Words,” Proc. 2nd ICDAR, Tsukuba, Japan, Oct. Rule,” Proc. IAPR MVA’98, Chiba, Japan, Nov.
1993, pp. 18-22. 1998, pp. 439-442.
[5] C. H. Chen, “Lexicon-Driven Word Recognition,” [19] H. Fujisawa, “How to Deal with Uncertainty and
Proc. 3rd ICDAR, Montreal, Canada, Aug. 1995, Variability: Experience and Solutions,” Proc.
pp. 919-922. Summit on Arabic and Chinese Handwriting, Col-
[6] M. Koga, R. Mine, H. Sako, and H. Fujisawa, lege Park, Sep. 2006.
“Lexical Search Approach for Character-String [20] H. Fujisawa, “Robustness Design of Industrial
Recognition,” Proc. 3rd DAS, Nagano, Japan, Nov. Strength Recognition Systems,” Digital Document
1998, pp. 237-251. Processing: Major Directions and Recent Advances,
[7] C.-L. Liu, M. Koga and H. Fujisawa, “Lexicon- B.B. Chaudhuri (Ed.), Springer-Verlag, London,
driven Segmentation and Recognition of Handwrit- 2007, pp. 185-212.
ten Character Strings for Japanese Address [21] A.J. Sellen and R.H. Harper, “The Myth of the
Reading,” IEEE Trans. PAMI, Vol. 24, No. 11, Paperless Office,” The MIT Press, 2001.
2002, pp. 425-1437.
[8] R. Casey, G. Nagy, “Recognition of printed Chine-

icdarmhmdsdt

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie icdarmhmdsdt

Ähnlich wie icdarmhmdsdt (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

icdarmhmdsdt