SlideShare ist ein Scribd-Unternehmen logo
1 von 5
Downloaden Sie, um offline zu lesen
A View on the Past and Future of Character and Document Recognition


                                            Hiromichi Fujisawa
                                 Central Research Laboratory, Hitachi, Ltd.
                                    Kokubunji, Tokyo, Japan 185-8601
                                   hiromichi.fujisawa.sb-at-hitachi.com


                       Abstract                              their ubiquity, are expanding its territory to image
                                                             documents. Handwriting is being given a second look
   The paper first gives an overview on the technical        at its importance. Papers do not seem to go away.
advances in the field of character and document rec-
ognition, decade by decade. Then, it highlights key          2. Brief historical view
technical developments especially for Kanji (Chinese
character) recognition in Japan. Technical issues            2.1. Overview
around post address recognition are then discussed,
which have promoted advanced techniques including                The first commercial OCR appeared in the 1950s in
information integration. Robustness design principles        the US and, since then, each decade has seen represen-
are introduced. Finally, future prospects are discussed.     tative developments. In the 1960s, IBM produced
                                                             several models of “optical readers” for machine-
1. Introduction                                              printed and hand-printed numbers for business use.
                                                             One of the models could read 200 fonts of printed
   An industrial view on the character and document          documents. In this decade, postal automation for
recognition technology is presented, looking from the        mechanical letter sorting adopted OCRs for the first
past to the future. Since the birth of commercial Opti-      time to automatically read postal codes to determine
cal Character Readers (OCRs) in the 1950s, the               destinations, in the US, Europe, and Japan, independ-
character and document recognition technology has            ently. In Japan, Toshiba and NEC developed hand-
made tremendous advancement, always supporting               printed digit recognition apparatus in 1967, which were
industrial and commercial applications. At the same          put into operation in 1968.
time, these business applications have always promot-            In the 1970s, commercial OCRs were becoming
ed investments in new technology developments. We            pervasive in Japan. Hitachi introduced the first hand-
can see a virtuous cycle in here. New technologies           printed numeral OCR for business use in 1973, and
made new applications possible, and the new applica-         NEC introduced the first hand-printed Katakana OCR
tions supported the new technology developments.             in 1976. Japanese Ministry of International Trade and
   It seems, however, that the wave of IT technologies       Industry (today’s Ministry of Economy, Trade and
is surging over this area, which has been cultivated         Industry) led a ten-year national project on pattern
over the period of more than fifty years. Most of in-        recognition, including Kanji recognition and handwrit-
formation seems to be born in digital, possibly              ten character recognition, from 1971. It attracted many
diminishing the demands for this technology. As a            students and researchers into pattern recognition. In the
matter of fact, this is the second of this kind. The first   US, IBM introduced a deposit processing system (IBM
was the wave of Office Automation in the 1980s. A            3895) in 1977, which could recognize unconstrained
strong expectation was that paper documents would            handwritten numbers on bank checks. The author had a
disappear because all documents would be produced            chance to see its operation at Mellon Bank in Pitts-
electronically. The result was on the contrary that the      burgh in 1981 and was explained it could read about
peak sales of OCRs in Japan were in the 80s.                 50% of them, while others were hand-coded.
   It is felt, however, that the second wave might be            The 1980s was a decade that benefited from the
different. This time, it might change the perspective        technological progress in semiconductor devices such
completely. Other views are of course possible. For          as image sensors, microprocessors, memories, and
instance, search technologies, which have established        custom-designed LSIs. The hardware became smaller
than ever to place on desktop, thanks to microproces-       each other in the 1970s in Japan. Commercial OCRs
sors and custom-designed LSIs. Then, larger, cheaper        were using structural methods for handwritten alpha-
memories and image sensors enabled whole page im-           numerics and Katakana’s, and pattern matching
ages to be scanned and stored in a memory for further       methods for machine-printed alphanumerics. A pattern
processing, allowing more advanced recognition and          matching method had been proved experimentally to
wider applications. For example, a handwritten numer-       be applicable to machine-printed Kanji recognition by
al OCR that could recognize touching characters was         the late 1970s [8-10].
introduced for the first time in 1983. In the late 1980s,      The problem for us in those days was a method for
Japanese commercial OCRs introduced machine-                recognizing handwritten Kanji’s. It was like an unex-
printed and hand-printed Kanji recognition capabilities,    plored, huge mountain standing in front of us. What
which could recognize about 2,400 classes of Kanji’s.       was clear was that the structural approach and simple
    Another important feature of this decade is that op-    pattern matching approach could not conquer it. The
tical disks for computer use were developed and put         former had weakness in explosive topological varia-
into use for patent automation systems in the US and in     tions due to complex strokes, while the latter had
Japan. They can be considered the first “digital librar-    weakness in shape variations; however the latter
ies.” The Japanese patent office system currently stores    seemed to have greater chance of success.
approximately 50 million documents or 200 million              The concept of blurring as feature extraction was
pages. Most of the documents are in terms of scanned        extended to directional features and found to be effec-
digital images. So, it was the time when studies on         tive for handwritten Kanji recognition, though a
document understanding and document layout analysis         preliminary study began with handwritten numeral
began in Japan.                                             recognition [11, 12]. By introducing spatial continuous
    The changes in the 1990s were due to performance        feature extraction, the optimum amount of blurring
improvements in UNIX workstations and, then, per-           turned out to be surprisingly large. Non-linear shape
sonal computers. Though scanning and image                  normalization [13, 14] and statistical classifier methods
preprocessing were still realized in hardware, major        [1] boosted the recognition accuracy to a commercial
part of recognition were implemented in software. The       value. We learned that blurring should be considered as
implication was that general purpose programming            a means to obtain reduced prominent dimensions (sub-
languages like c and c++ could be used for recognition      space) rather than to lower computational cost, though
algorithms, allowing engineers to develop more com-         the effects seem similar. Normally, a feature vector for
plicated algorithms, and also expanding the research        Kanji patterns consists of 8 x 8 x 4 elements (Figure 1)
community more to academia. Software OCR packages           and the subspace after statistical analysis has around
running on PCs appeared in the market as well.              100 dimensions.
    Freely handwritten character recognition techniques        Recent advancement in Chinese (Kanji) recognition
were extensively studied, and successfully applied to       methods is well presented in [15].
bank check readers and postal address readers. Ad-
vanced layout analysis techniques enabled recognition
of wide varieties of business forms. We were also in-
volved in the development of a postal address
recognition system for Japanese mail pieces.
    The IAPR conferences and research communities
have been contributing to technical progresses. Many
of the methods playing key roles in today’s systems
have been studied thoroughly. Examples are artificial
neural networks, Hidden Markov Models (HMMs),
polynomial function classifiers, modified quadratic
discriminant function (MQDF) classifiers [1], support                 Figure 1. Directional features
vector machines (SVMs), classifier combination [2, 3],
information integration, and lexicon-directed character     2.3. Character segmentation algorithms
string recognition [4-7], some of which have original
versions back to 1960s.                                        In the 1960s and 1970s, a flying-spot scanner, laser
                                                            scanner or other kind of mechanical scanner was used
2.2. Character recognition algorithms                       with a photo-multiplier as a sensor to obtain character
                                                            images. In a sense, character segmentation was done
   The structural analysis approach and pattern match-      with these kinds of scanning devices. Then in the
ing (or statistical) approach were the ideas competing      1980s appeared semiconductor sensors and memories,
allowing OCRs to scan and store an image of one char-       state traverse is from the first place in the lattice, a
acter line and, later, a full page image.                   penalty zero is given, and if any label does not coincide,
   This change relaxed strict conditions on OCR form        the edge “Others” is selected, giving the penalty of 15.
specifications, for example, allowing smaller non-          Generally, the penalty depends on the position in the
separated writing boxes, which required a touching          lattice (Figure 3). In this way, every word in a lexicon
digit separation algorithm, however [16]. In 1983,          is given a penalty value, and a word with the smallest
Hitachi produced one of the first OCRs that could           penalty is determined to be the recognized word. This
segment and recognize touching handwritten digits           was used successfully for address phrases, provided
based on a multiple-hypothesis segmentation-                that character segmentation was reliable enough.
recognition method. Contour shape analysis could                When Furigana (pronunciation in terms of syllabic
identify candidate touching points (Figure 2).              characters) was available in addition to the Kanji ver-
   This direction of changes led us to “forms process-      sion, both versions could be recognized and the results
ing,” whose ultimate target was to read unknown forms,      could be merged for a higher accuracy. In Japanese
or at least those forms that were not specifically des-     business forms, it is normal that we are requested to fill
igned for OCRs. But, this meant that users became less      in Kanji and Furigana versions.
careful in their writing styles, and, therefore, OCRs had       As discussed later again, when segmentation is not
to be more accurate for freely written characters. Tech-    reliable as for freely handwritten phrases, more com-
nically, such techniques as run-length code-based           plicated knowledge integration approaches are required.
preprocessing, connected component analysis, contour
shape analysis, touching character-line separation,
segmentation-recognition integration, etc. were devel-
oped.




                                                                     Figure 3. Finite state automaton

                                                            3. Robustness against uncertainty and
                                                            variability

   Figure 2. Segmentation of touching digits                    Postal address recognition is one of the “ideal” ap-
                                                            plications that advance the technology. It is
2.4. Linguistic information integration                     technology-rich, posing a lot of technical problems,
                                                            and it promises post office innovation, whose invest-
    Kanji OCRs, which read Kanji’s as one of the ex-        ments pay off. R&D projects for developing a system
tended functions of commercial OCR, were used for           that could read handwritten and machine-printed full-
reading handwritten Kanji names and addresses. The          addresses were led in the US, Europe and Japan, in
first generations used OCR forms with fixed, separated      industry and academia, in the 1990s.
boxes, causing no segmentation problem. Question was            The recognition engine we developed for post of-
how to keep the phrase recognition accuracy high.           fices became complex as shown in Figure 4, as a result
    We could utilize a priori linguistic knowledge to       of coping with various problems. The recognition sub-
pick up correct choices from the candidate lattice after    modules output uncertain (intermediate) decisions.
character recognition. The method we developed used         Dozens of such decisions must be made in series until
a finite state automaton, which was dynamically creat-      reaching the final address interpretation. A natural
ed from the lattice and was equivalent to the lattice       solution is to hand over multiple candidates, which we
contents [17]. Then, a word (or a character string) from    call “hypotheses,” to the following stages. Then, we
a lexicon is fed into the automaton and an active state     need to introduce a mechanism to control the sequen-
makes transitions through edges whose label coincides       tial decision process, which is a kind of optimum
with the input characters. If the label through which a     search after all.
algorithms were integrated into the software recogni-
                                                                        tion engine successfully. Figure 5 shows the full-
                                                                        address recognition rates for handwritten addresses for
                                                                        four versions, V1 through V4. The horizontal axis
                                                                        shows sample dataset numbers, which have been rear-
                                                                        ranged so that the rates come into decreasing order.




               Figure 4. Postal address recognition

       Table 1. Design principles for robustness
                 Principles                 Expected effects
                                                                            Figure 5. Improvements in handwritten
   Hypothesis-Driven                  When the type of a problem is                   address recognition
                                     uncertain, set up hypotheses,
                                     process, and test the results
                                                                        4. Future prospects
 Deferred Decision /                 Do not decide; leave the deci-
Multiple Hypotheses                 sion to the next experts carrying
                                    over multiple hypotheses               A question is to where we should go, perhaps. The
                                                                        anticipated needs for character and document recogni-
                 Process Integration Solve a problem by multiple
                                                                        tion for the future include the following:
                                    different-field experts as a team
 Information




                 Combination-        Decide as a team of multiple        • Archival records and image replacement docu-
Integration




                Based Integration   same-field experts                     ments in e-Government
           Corroboration-              Utilize other input informa-      • Books and historical documents for global search
         Based Integration           tion; seek more evidence            • Handwriting captured by digital pens
   Alternative Solutions               Solve a problem by multiple       • Text-in-the-scene captured by cameras
                                     alternative approaches              • Text in video
   Perturbation                        Modify the problem slightly
                                     and try it again
                                                                            One thing that is almost clear is that we are going
                                                                        into a “long tail” part of the market. The “head” part
                                                                        has been already computerized either with the existing
   Post address phrases are semantic-rich, and “infor-
mation integration” approach can be successfully                        OCR technology or other totally electronic means. The
applied. As other intelligent handwriting recognition                   remaining part has an extremely wide variety of docu-
systems apply, we have developed a segmentation-                        ments with not so many instances for each. Non-
recognition-interpretation integrated method [6, 7], in                 standardized business forms are this kind of example.
which segmentation produces a segmentation candidate                    For instance, small and medium-sized companies in
network, where an optimum path is selected by                           Japan are still using paper forms to make bank transac-
evaluating the likelihood. It is done by pattern match-                 tions. For each company, the number of transactions is
ing against the language model for possible address                     not so large, but banks receive many different types of
phrases. Segmentation required geometrical alignment                    forms from many companies. More intelligent, versa-
                                                                        tile form reader may solve this problem.
be evaluated as well [18].
                                                                            Handwriting is being reconsidered its importance in
   The issues were how to design “robustness” in the
                                                                        education and in a knowledge work context. The act of
system. We formulated the design principles for ro-
bustness as shown in Table 1 while carrying out the                     writing helps reading and thinking processes, and a
project [19, 20]. By applying them, many additional                     digital pen can capture handwritten annotations and
                                                                        memos, being stored in computers. As papers being
considered the medium for such processes [21], we can           se characters,” IEEE Trans. Electronic Computers,
print out electronically produced documents and work            Vol. EC-15, No. 1, 1966, pp. 91-101.
on them with a digital pen, while all information can be    [9] S. Yamamoto, A. Nakajima, K. Nakata, “Chinese
kept in computers. Then, search capability on such              character recognition by hierarchical pattern match-
hand-annotated documents will be an important tool.             ing,” Proc. 1st IJCPR, Washington DC, 1973,
    A mobile device with text-in-scene recognition will         pp.183-194.
be a necessary gadget for travelers in foreign countries.   [10] H. Fujisawa, Y. Nakano, Y. Kitazume, and M.
It should be able to help understanding of second and           Yasuda, “Development of a Kanji OCR: An Opti-
third languages, or more. Color image processing,               cal Chinese Character Reader,” Proc. 4th IJCPR,
geometric perspective normalization, text segmentation,         Kyoto, Nov. 1978, pp. 815-820.
                                                            [11] M. Yasuda and H. Fujisawa, “An Improvement of
adaptive thresholding, etc. need to be studied.
                                                                Correlation Method for Character Recognition,”
    What is common through out these applications is
                                                                Systems, Computers, Controls, Scripta Publishing
that no single recognition algorithm may realize them.          Co., Vol. 10, No. 2, 1979, pp. 29-38.
Higher-order image processing before recognition will       [12] H. Fujisawa and C-L. Liu, “Directional Pattern
be mandatory. The solution should be comprehensive.             Matching for Character Recognition Revisited,”
                                                                Proc. 7th ICDAR, Edinburgh, Aug. 2003, pp. 794-
5. Conclusions                                                  798.
                                                            [13] J. Tsukumo and H. Tanaka, “Classification of
   For the bright future of this technological commu-           Handprinted Chinese Characters Using Non-linear
nity, both vision and fundamental technology are in             Normalization and Correlation Methods,” Proc. 9th
demand. Vision will show applications with new value            ICPR, Rome, Italy, 1988, pp. 168-171.
propositions that require new technology. But, tech-        [14] C.-L. Liu, “Normalization-Cooperated Gradient
nology creates demands as well.                                 Feature Extraction for Handwritten Character Rec-
                                                                ognition,” IEEE Trans. PAMI, Vol. 29, No. 6, 2007,
                                                                pp. 1465-1469.
6. References                                               [15] C.-L. Liu, “Handwritten Chinese Character Rec-
                                                                ognition: Effects of Shape Normalization and
[1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Mi-            Feature Extraction,” Proc. Summit on Arabic and
    yake, “Modified Quadratic Discriminant Functions            Chinese Handwriting, College Park, Sep. 2006.
    and the Application to Chinese Character Recogni-       [16] H. Fujisawa, Y. Nakano, and K. Kurino, “Seg-
    tion,” IEEE Trans. PAMI, Vol. 9, No. 1, 1987, pp.           mentation Methods for Character Recognition:
    149-153.                                                    From Segmentation to Document Structure Analy-
[2] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of              sis,” Proc. IEEE, Vol. 80, No. 7, 1992, pp. 1079-
    Combining Multiple Classifiers and Their Applica-           1092.
    tions to Handwriting Recognition,” IEEE Trans.          [17] K. Marukawa, M. Koga, Y. Shima, and H. Fuji-
    SMC, Vol. 22, No. 3, 1992, pp. 418-435.                     sawa, “An Error Correction Algorithm for
[3] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision          Handwritten Chinese Character Address Recogni-
    Combination in Multiple Classifier Systems,” IEEE           tion,” Proc. 1st ICDAR, Saint-Malo, Sep. 1991, pp.
    Trans. PAMI, Vol. 16, No. 1, 1994, pp. 66-75.               916-924.
[4] F. Kimura, M. Sridhar, and Z. Chen, “Improve-           [18] T. Kagehiro, M. Koga, H. Sako, and H. Fujisawa,
    ments of Lexicon-Directed Algorithm for                     “Segmentation of Handwritten Kanji Numerals In-
    Recognition of Unconstrained Hand-Written                   tegrating Peripheral Information by Bayesian
    Words,” Proc. 2nd ICDAR, Tsukuba, Japan, Oct.               Rule,” Proc. IAPR MVA’98, Chiba, Japan, Nov.
    1993, pp. 18-22.                                            1998, pp. 439-442.
[5] C. H. Chen, “Lexicon-Driven Word Recognition,”          [19] H. Fujisawa, “How to Deal with Uncertainty and
    Proc. 3rd ICDAR, Montreal, Canada, Aug. 1995,               Variability: Experience and Solutions,” Proc.
    pp. 919-922.                                                Summit on Arabic and Chinese Handwriting, Col-
[6] M. Koga, R. Mine, H. Sako, and H. Fujisawa,                 lege Park, Sep. 2006.
    “Lexical Search Approach for Character-String           [20] H. Fujisawa, “Robustness Design of Industrial
    Recognition,” Proc. 3rd DAS, Nagano, Japan, Nov.            Strength Recognition Systems,” Digital Document
    1998, pp. 237-251.                                          Processing: Major Directions and Recent Advances,
[7] C.-L. Liu, M. Koga and H. Fujisawa, “Lexicon-               B.B. Chaudhuri (Ed.), Springer-Verlag, London,
    driven Segmentation and Recognition of Handwrit-            2007, pp. 185-212.
    ten Character Strings for Japanese Address              [21] A.J. Sellen and R.H. Harper, “The Myth of the
    Reading,” IEEE Trans. PAMI, Vol. 24, No. 11,                Paperless Office,” The MIT Press, 2001.
    2002, pp. 425-1437.
[8] R. Casey, G. Nagy, “Recognition of printed Chine-

Weitere ähnliche Inhalte

Ähnlich wie icdarmhmdsdt

A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
ijcsitcejournal
 
ISA Transactions 50th Year by Russ Rhinehart
ISA Transactions 50th Year by Russ RhinehartISA Transactions 50th Year by Russ Rhinehart
ISA Transactions 50th Year by Russ Rhinehart
ISA Interchange
 
Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PI
ijtsrd
 

Ähnlich wie icdarmhmdsdt (20)

IRJET- Photo Optical Character Recognition Model
IRJET- Photo Optical Character Recognition ModelIRJET- Photo Optical Character Recognition Model
IRJET- Photo Optical Character Recognition Model
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
 
Ijetcas14 371
Ijetcas14 371Ijetcas14 371
Ijetcas14 371
 
IRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANNIRJET-Optical Character Recognition using ANN
IRJET-Optical Character Recognition using ANN
 
How to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutionsHow to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutions
 
Software Trends: Past, Present and Future
Software Trends: Past, Present and FutureSoftware Trends: Past, Present and Future
Software Trends: Past, Present and Future
 
lecture1 introduction to computer graphics(Computer graphics tutorials)
lecture1 introduction to computer graphics(Computer graphics tutorials)lecture1 introduction to computer graphics(Computer graphics tutorials)
lecture1 introduction to computer graphics(Computer graphics tutorials)
 
GENERATING EXPERT SYSTEMS TO DETECT SPECIFIC BACTERIA TYPES AND EXTRACT HANDW...
GENERATING EXPERT SYSTEMS TO DETECT SPECIFIC BACTERIA TYPES AND EXTRACT HANDW...GENERATING EXPERT SYSTEMS TO DETECT SPECIFIC BACTERIA TYPES AND EXTRACT HANDW...
GENERATING EXPERT SYSTEMS TO DETECT SPECIFIC BACTERIA TYPES AND EXTRACT HANDW...
 
Generating Expert Systems to Detect Specific Bacteria Types and Extract Handw...
Generating Expert Systems to Detect Specific Bacteria Types and Extract Handw...Generating Expert Systems to Detect Specific Bacteria Types and Extract Handw...
Generating Expert Systems to Detect Specific Bacteria Types and Extract Handw...
 
Artificial Intelligence Research In Japan
Artificial Intelligence Research In JapanArtificial Intelligence Research In Japan
Artificial Intelligence Research In Japan
 
Art of artificial intelligence and automation
Art of artificial intelligence and automationArt of artificial intelligence and automation
Art of artificial intelligence and automation
 
Possibilities of Computer Graphics and Functions
Possibilities of Computer Graphics and FunctionsPossibilities of Computer Graphics and Functions
Possibilities of Computer Graphics and Functions
 
Long Island Adult Learning Conference - 2001
Long Island Adult Learning Conference - 2001Long Island Adult Learning Conference - 2001
Long Island Adult Learning Conference - 2001
 
ISA Transactions 50th Year by Russ Rhinehart
ISA Transactions 50th Year by Russ RhinehartISA Transactions 50th Year by Russ Rhinehart
ISA Transactions 50th Year by Russ Rhinehart
 
Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PI
 
Typesetting services
Typesetting servicesTypesetting services
Typesetting services
 
1502656599class_vi.pdf
1502656599class_vi.pdf1502656599class_vi.pdf
1502656599class_vi.pdf
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

icdarmhmdsdt

  • 1. A View on the Past and Future of Character and Document Recognition Hiromichi Fujisawa Central Research Laboratory, Hitachi, Ltd. Kokubunji, Tokyo, Japan 185-8601 hiromichi.fujisawa.sb-at-hitachi.com Abstract their ubiquity, are expanding its territory to image documents. Handwriting is being given a second look The paper first gives an overview on the technical at its importance. Papers do not seem to go away. advances in the field of character and document rec- ognition, decade by decade. Then, it highlights key 2. Brief historical view technical developments especially for Kanji (Chinese character) recognition in Japan. Technical issues 2.1. Overview around post address recognition are then discussed, which have promoted advanced techniques including The first commercial OCR appeared in the 1950s in information integration. Robustness design principles the US and, since then, each decade has seen represen- are introduced. Finally, future prospects are discussed. tative developments. In the 1960s, IBM produced several models of “optical readers” for machine- 1. Introduction printed and hand-printed numbers for business use. One of the models could read 200 fonts of printed An industrial view on the character and document documents. In this decade, postal automation for recognition technology is presented, looking from the mechanical letter sorting adopted OCRs for the first past to the future. Since the birth of commercial Opti- time to automatically read postal codes to determine cal Character Readers (OCRs) in the 1950s, the destinations, in the US, Europe, and Japan, independ- character and document recognition technology has ently. In Japan, Toshiba and NEC developed hand- made tremendous advancement, always supporting printed digit recognition apparatus in 1967, which were industrial and commercial applications. At the same put into operation in 1968. time, these business applications have always promot- In the 1970s, commercial OCRs were becoming ed investments in new technology developments. We pervasive in Japan. Hitachi introduced the first hand- can see a virtuous cycle in here. New technologies printed numeral OCR for business use in 1973, and made new applications possible, and the new applica- NEC introduced the first hand-printed Katakana OCR tions supported the new technology developments. in 1976. Japanese Ministry of International Trade and It seems, however, that the wave of IT technologies Industry (today’s Ministry of Economy, Trade and is surging over this area, which has been cultivated Industry) led a ten-year national project on pattern over the period of more than fifty years. Most of in- recognition, including Kanji recognition and handwrit- formation seems to be born in digital, possibly ten character recognition, from 1971. It attracted many diminishing the demands for this technology. As a students and researchers into pattern recognition. In the matter of fact, this is the second of this kind. The first US, IBM introduced a deposit processing system (IBM was the wave of Office Automation in the 1980s. A 3895) in 1977, which could recognize unconstrained strong expectation was that paper documents would handwritten numbers on bank checks. The author had a disappear because all documents would be produced chance to see its operation at Mellon Bank in Pitts- electronically. The result was on the contrary that the burgh in 1981 and was explained it could read about peak sales of OCRs in Japan were in the 80s. 50% of them, while others were hand-coded. It is felt, however, that the second wave might be The 1980s was a decade that benefited from the different. This time, it might change the perspective technological progress in semiconductor devices such completely. Other views are of course possible. For as image sensors, microprocessors, memories, and instance, search technologies, which have established custom-designed LSIs. The hardware became smaller
  • 2. than ever to place on desktop, thanks to microproces- each other in the 1970s in Japan. Commercial OCRs sors and custom-designed LSIs. Then, larger, cheaper were using structural methods for handwritten alpha- memories and image sensors enabled whole page im- numerics and Katakana’s, and pattern matching ages to be scanned and stored in a memory for further methods for machine-printed alphanumerics. A pattern processing, allowing more advanced recognition and matching method had been proved experimentally to wider applications. For example, a handwritten numer- be applicable to machine-printed Kanji recognition by al OCR that could recognize touching characters was the late 1970s [8-10]. introduced for the first time in 1983. In the late 1980s, The problem for us in those days was a method for Japanese commercial OCRs introduced machine- recognizing handwritten Kanji’s. It was like an unex- printed and hand-printed Kanji recognition capabilities, plored, huge mountain standing in front of us. What which could recognize about 2,400 classes of Kanji’s. was clear was that the structural approach and simple Another important feature of this decade is that op- pattern matching approach could not conquer it. The tical disks for computer use were developed and put former had weakness in explosive topological varia- into use for patent automation systems in the US and in tions due to complex strokes, while the latter had Japan. They can be considered the first “digital librar- weakness in shape variations; however the latter ies.” The Japanese patent office system currently stores seemed to have greater chance of success. approximately 50 million documents or 200 million The concept of blurring as feature extraction was pages. Most of the documents are in terms of scanned extended to directional features and found to be effec- digital images. So, it was the time when studies on tive for handwritten Kanji recognition, though a document understanding and document layout analysis preliminary study began with handwritten numeral began in Japan. recognition [11, 12]. By introducing spatial continuous The changes in the 1990s were due to performance feature extraction, the optimum amount of blurring improvements in UNIX workstations and, then, per- turned out to be surprisingly large. Non-linear shape sonal computers. Though scanning and image normalization [13, 14] and statistical classifier methods preprocessing were still realized in hardware, major [1] boosted the recognition accuracy to a commercial part of recognition were implemented in software. The value. We learned that blurring should be considered as implication was that general purpose programming a means to obtain reduced prominent dimensions (sub- languages like c and c++ could be used for recognition space) rather than to lower computational cost, though algorithms, allowing engineers to develop more com- the effects seem similar. Normally, a feature vector for plicated algorithms, and also expanding the research Kanji patterns consists of 8 x 8 x 4 elements (Figure 1) community more to academia. Software OCR packages and the subspace after statistical analysis has around running on PCs appeared in the market as well. 100 dimensions. Freely handwritten character recognition techniques Recent advancement in Chinese (Kanji) recognition were extensively studied, and successfully applied to methods is well presented in [15]. bank check readers and postal address readers. Ad- vanced layout analysis techniques enabled recognition of wide varieties of business forms. We were also in- volved in the development of a postal address recognition system for Japanese mail pieces. The IAPR conferences and research communities have been contributing to technical progresses. Many of the methods playing key roles in today’s systems have been studied thoroughly. Examples are artificial neural networks, Hidden Markov Models (HMMs), polynomial function classifiers, modified quadratic discriminant function (MQDF) classifiers [1], support Figure 1. Directional features vector machines (SVMs), classifier combination [2, 3], information integration, and lexicon-directed character 2.3. Character segmentation algorithms string recognition [4-7], some of which have original versions back to 1960s. In the 1960s and 1970s, a flying-spot scanner, laser scanner or other kind of mechanical scanner was used 2.2. Character recognition algorithms with a photo-multiplier as a sensor to obtain character images. In a sense, character segmentation was done The structural analysis approach and pattern match- with these kinds of scanning devices. Then in the ing (or statistical) approach were the ideas competing 1980s appeared semiconductor sensors and memories,
  • 3. allowing OCRs to scan and store an image of one char- state traverse is from the first place in the lattice, a acter line and, later, a full page image. penalty zero is given, and if any label does not coincide, This change relaxed strict conditions on OCR form the edge “Others” is selected, giving the penalty of 15. specifications, for example, allowing smaller non- Generally, the penalty depends on the position in the separated writing boxes, which required a touching lattice (Figure 3). In this way, every word in a lexicon digit separation algorithm, however [16]. In 1983, is given a penalty value, and a word with the smallest Hitachi produced one of the first OCRs that could penalty is determined to be the recognized word. This segment and recognize touching handwritten digits was used successfully for address phrases, provided based on a multiple-hypothesis segmentation- that character segmentation was reliable enough. recognition method. Contour shape analysis could When Furigana (pronunciation in terms of syllabic identify candidate touching points (Figure 2). characters) was available in addition to the Kanji ver- This direction of changes led us to “forms process- sion, both versions could be recognized and the results ing,” whose ultimate target was to read unknown forms, could be merged for a higher accuracy. In Japanese or at least those forms that were not specifically des- business forms, it is normal that we are requested to fill igned for OCRs. But, this meant that users became less in Kanji and Furigana versions. careful in their writing styles, and, therefore, OCRs had As discussed later again, when segmentation is not to be more accurate for freely written characters. Tech- reliable as for freely handwritten phrases, more com- nically, such techniques as run-length code-based plicated knowledge integration approaches are required. preprocessing, connected component analysis, contour shape analysis, touching character-line separation, segmentation-recognition integration, etc. were devel- oped. Figure 3. Finite state automaton 3. Robustness against uncertainty and variability Figure 2. Segmentation of touching digits Postal address recognition is one of the “ideal” ap- plications that advance the technology. It is 2.4. Linguistic information integration technology-rich, posing a lot of technical problems, and it promises post office innovation, whose invest- Kanji OCRs, which read Kanji’s as one of the ex- ments pay off. R&D projects for developing a system tended functions of commercial OCR, were used for that could read handwritten and machine-printed full- reading handwritten Kanji names and addresses. The addresses were led in the US, Europe and Japan, in first generations used OCR forms with fixed, separated industry and academia, in the 1990s. boxes, causing no segmentation problem. Question was The recognition engine we developed for post of- how to keep the phrase recognition accuracy high. fices became complex as shown in Figure 4, as a result We could utilize a priori linguistic knowledge to of coping with various problems. The recognition sub- pick up correct choices from the candidate lattice after modules output uncertain (intermediate) decisions. character recognition. The method we developed used Dozens of such decisions must be made in series until a finite state automaton, which was dynamically creat- reaching the final address interpretation. A natural ed from the lattice and was equivalent to the lattice solution is to hand over multiple candidates, which we contents [17]. Then, a word (or a character string) from call “hypotheses,” to the following stages. Then, we a lexicon is fed into the automaton and an active state need to introduce a mechanism to control the sequen- makes transitions through edges whose label coincides tial decision process, which is a kind of optimum with the input characters. If the label through which a search after all.
  • 4. algorithms were integrated into the software recogni- tion engine successfully. Figure 5 shows the full- address recognition rates for handwritten addresses for four versions, V1 through V4. The horizontal axis shows sample dataset numbers, which have been rear- ranged so that the rates come into decreasing order. Figure 4. Postal address recognition Table 1. Design principles for robustness Principles Expected effects Figure 5. Improvements in handwritten Hypothesis-Driven When the type of a problem is address recognition uncertain, set up hypotheses, process, and test the results 4. Future prospects Deferred Decision / Do not decide; leave the deci- Multiple Hypotheses sion to the next experts carrying over multiple hypotheses A question is to where we should go, perhaps. The anticipated needs for character and document recogni- Process Integration Solve a problem by multiple tion for the future include the following: different-field experts as a team Information Combination- Decide as a team of multiple • Archival records and image replacement docu- Integration Based Integration same-field experts ments in e-Government Corroboration- Utilize other input informa- • Books and historical documents for global search Based Integration tion; seek more evidence • Handwriting captured by digital pens Alternative Solutions Solve a problem by multiple • Text-in-the-scene captured by cameras alternative approaches • Text in video Perturbation Modify the problem slightly and try it again One thing that is almost clear is that we are going into a “long tail” part of the market. The “head” part has been already computerized either with the existing Post address phrases are semantic-rich, and “infor- mation integration” approach can be successfully OCR technology or other totally electronic means. The applied. As other intelligent handwriting recognition remaining part has an extremely wide variety of docu- systems apply, we have developed a segmentation- ments with not so many instances for each. Non- recognition-interpretation integrated method [6, 7], in standardized business forms are this kind of example. which segmentation produces a segmentation candidate For instance, small and medium-sized companies in network, where an optimum path is selected by Japan are still using paper forms to make bank transac- evaluating the likelihood. It is done by pattern match- tions. For each company, the number of transactions is ing against the language model for possible address not so large, but banks receive many different types of phrases. Segmentation required geometrical alignment forms from many companies. More intelligent, versa- tile form reader may solve this problem. be evaluated as well [18]. Handwriting is being reconsidered its importance in The issues were how to design “robustness” in the education and in a knowledge work context. The act of system. We formulated the design principles for ro- bustness as shown in Table 1 while carrying out the writing helps reading and thinking processes, and a project [19, 20]. By applying them, many additional digital pen can capture handwritten annotations and memos, being stored in computers. As papers being
  • 5. considered the medium for such processes [21], we can se characters,” IEEE Trans. Electronic Computers, print out electronically produced documents and work Vol. EC-15, No. 1, 1966, pp. 91-101. on them with a digital pen, while all information can be [9] S. Yamamoto, A. Nakajima, K. Nakata, “Chinese kept in computers. Then, search capability on such character recognition by hierarchical pattern match- hand-annotated documents will be an important tool. ing,” Proc. 1st IJCPR, Washington DC, 1973, A mobile device with text-in-scene recognition will pp.183-194. be a necessary gadget for travelers in foreign countries. [10] H. Fujisawa, Y. Nakano, Y. Kitazume, and M. It should be able to help understanding of second and Yasuda, “Development of a Kanji OCR: An Opti- third languages, or more. Color image processing, cal Chinese Character Reader,” Proc. 4th IJCPR, geometric perspective normalization, text segmentation, Kyoto, Nov. 1978, pp. 815-820. [11] M. Yasuda and H. Fujisawa, “An Improvement of adaptive thresholding, etc. need to be studied. Correlation Method for Character Recognition,” What is common through out these applications is Systems, Computers, Controls, Scripta Publishing that no single recognition algorithm may realize them. Co., Vol. 10, No. 2, 1979, pp. 29-38. Higher-order image processing before recognition will [12] H. Fujisawa and C-L. Liu, “Directional Pattern be mandatory. The solution should be comprehensive. Matching for Character Recognition Revisited,” Proc. 7th ICDAR, Edinburgh, Aug. 2003, pp. 794- 5. Conclusions 798. [13] J. Tsukumo and H. Tanaka, “Classification of For the bright future of this technological commu- Handprinted Chinese Characters Using Non-linear nity, both vision and fundamental technology are in Normalization and Correlation Methods,” Proc. 9th demand. Vision will show applications with new value ICPR, Rome, Italy, 1988, pp. 168-171. propositions that require new technology. But, tech- [14] C.-L. Liu, “Normalization-Cooperated Gradient nology creates demands as well. Feature Extraction for Handwritten Character Rec- ognition,” IEEE Trans. PAMI, Vol. 29, No. 6, 2007, pp. 1465-1469. 6. References [15] C.-L. Liu, “Handwritten Chinese Character Rec- ognition: Effects of Shape Normalization and [1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Mi- Feature Extraction,” Proc. Summit on Arabic and yake, “Modified Quadratic Discriminant Functions Chinese Handwriting, College Park, Sep. 2006. and the Application to Chinese Character Recogni- [16] H. Fujisawa, Y. Nakano, and K. Kurino, “Seg- tion,” IEEE Trans. PAMI, Vol. 9, No. 1, 1987, pp. mentation Methods for Character Recognition: 149-153. From Segmentation to Document Structure Analy- [2] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of sis,” Proc. IEEE, Vol. 80, No. 7, 1992, pp. 1079- Combining Multiple Classifiers and Their Applica- 1092. tions to Handwriting Recognition,” IEEE Trans. [17] K. Marukawa, M. Koga, Y. Shima, and H. Fuji- SMC, Vol. 22, No. 3, 1992, pp. 418-435. sawa, “An Error Correction Algorithm for [3] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision Handwritten Chinese Character Address Recogni- Combination in Multiple Classifier Systems,” IEEE tion,” Proc. 1st ICDAR, Saint-Malo, Sep. 1991, pp. Trans. PAMI, Vol. 16, No. 1, 1994, pp. 66-75. 916-924. [4] F. Kimura, M. Sridhar, and Z. Chen, “Improve- [18] T. Kagehiro, M. Koga, H. Sako, and H. Fujisawa, ments of Lexicon-Directed Algorithm for “Segmentation of Handwritten Kanji Numerals In- Recognition of Unconstrained Hand-Written tegrating Peripheral Information by Bayesian Words,” Proc. 2nd ICDAR, Tsukuba, Japan, Oct. Rule,” Proc. IAPR MVA’98, Chiba, Japan, Nov. 1993, pp. 18-22. 1998, pp. 439-442. [5] C. H. Chen, “Lexicon-Driven Word Recognition,” [19] H. Fujisawa, “How to Deal with Uncertainty and Proc. 3rd ICDAR, Montreal, Canada, Aug. 1995, Variability: Experience and Solutions,” Proc. pp. 919-922. Summit on Arabic and Chinese Handwriting, Col- [6] M. Koga, R. Mine, H. Sako, and H. Fujisawa, lege Park, Sep. 2006. “Lexical Search Approach for Character-String [20] H. Fujisawa, “Robustness Design of Industrial Recognition,” Proc. 3rd DAS, Nagano, Japan, Nov. Strength Recognition Systems,” Digital Document 1998, pp. 237-251. Processing: Major Directions and Recent Advances, [7] C.-L. Liu, M. Koga and H. Fujisawa, “Lexicon- B.B. Chaudhuri (Ed.), Springer-Verlag, London, driven Segmentation and Recognition of Handwrit- 2007, pp. 185-212. ten Character Strings for Japanese Address [21] A.J. Sellen and R.H. Harper, “The Myth of the Reading,” IEEE Trans. PAMI, Vol. 24, No. 11, Paperless Office,” The MIT Press, 2001. 2002, pp. 425-1437. [8] R. Casey, G. Nagy, “Recognition of printed Chine-