SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
A simple text-line segmentation method
                             for handwritten documents
                          M.Ravi Kumar1, R. Pradeep2, B.S.Puneeth Kumar3, and Prasad Babu4
                             Dept. of P.G.Studies in Computer Science, Kuvempu University
                                        Jnana Sahyadri, Shankaraghatta-577451
                 E-mail: ravi2142@yahoo.co.in, punithashp@gmail.com, kamitkar.prasad9@gmail.com


Abstract:
                   Text line segmentation is an important step       mainly due to the irregularity of lay- out and character shapes
because inaccurately segmented text lines will cause errors in       originated from the variability of writing styles. For
the recognition stage.. The nature of handwriting makes the          unconstrained handwritten documents, text line seg- mentation
process of text line segmentation very challenging. Text             and character segmentation-recognition are not solved though
characteristics can vary in font, size, orientation, alignment,      enormous efforts have been devoted to them and great
color, contrast, and background information. These variations        advances have been made. Text line segmentation of
turn the process of word detection complex and difficult.            handwritten documents is much more difficult than that of
Since handwritten text can vary greatly depending on the user        printed documents. Unlike that printed documents have
skills, disposition and cultural background. In this work we         approximately straight and parallel text lines, the lines in
have proposed the method which works on the different                handwritten documents are often un-uniformly skewed and
intensity values for extracting the text-lines.                      curved. Moreover, the spaces between handwritten text lines
                                                                     are often not obvious compared to the spaces between within-
                                                                     line characters, and some text lines may interfere with each
Keywords:                                                            other. Therefore, many text line detection techniques, such as
                                                                     projection analysis and K-nearest neighbor connected
Handwritten document, segmentation, text-lines;                      components (CCs) grouping, are not able to segment
                                                                     handwritten text lines successfully[2].
1. Introduction:                                                     On the other hand, piece-wise projections are sensitive to
                                                                     character’s size variation within text lines and significant gaps
                       Segmentation of a document image into         between successive words. These occurrences influence the
its basic entities, namely, text lines and words, is considered as   effectiveness of smearing methods too. In such cases, the
a non-trivial problem to solve in the field of handwritten           results of two adjacent zones may be ambiguous, affecting the
document recognition. The difficulties that arise in                 drawing of text-line separators along the document width. To
handwritten documents make the segmentation procedure a              deal with these problems we introduce a smooth version of the
challenging task. Different types of difficulties are                projection profiles to over segment each zone into candidate
encountered in the text line segmentation and the word               text and gap regions. Then, we reclassify these regions by
segmentation procedure. In case of text line segmentation            applying an HMM formulation that enhances statistics from
procedure, major difficulties include the difference in the skew     the whole document page. Starting from left and moving to the
angle between lines on the page or even along the same text          right we combine separators of consecutive zones considering
line, overlapping words and adjacent text lines touching.            their proximity and the local foreground density [3].
Furthermore, the frequent appearance of accents in many              These piece-wise projection based methods have a few
languages (e.g. French, Greek) makes the text line                   shortcomings: (a) they generate too many potential separating
segmentation a challenging task. In word segmentation,               lines, (b) the parameter of stripe width is predefined, (c) text-
difficulties that arise include the appearance of skew and slant     lines should not have significant skew as mentioned in and (d)
in the text line, the existence of punctuation marks along the       if there is no potential piece-wise line in the first and last
text line and the non- uniform spacing of words which is a           stripes, drawing a complete separating line will become
common residual in handwritten documents[1]. Text line               impossible in the any algorithms. These shortcomings are
segmentation from document images is one of the major                observed based on an experiment that we have conducted with
problems in document image analysis. It provides crucial             a number of text-pages. Some authors also used skew
information for the tasks of text block segmentation, character      information for text-line separation. In an unconstrained
seg-mentation and recognition, and text string recognition.          handwritten text-page, it is very difficult to detect the
Whereas the difficulty of machine-printed document analysis          orientation of each line on the basis of the skew calculated for
mainly lies in the complex layout structure and degraded             the entire page. Therefore, these methods may not work
image quality, handwritten document analysis is difficult            properly [4].



                                                                                                                                    1
The concept of the Hough transform is employed in the field       densities to assign overlapping CCs into text lines [7].
of document analysis for many purposes such as skew               Experimental results on a collection of 720 documents
detection, line detection, slant detection and text-line          (English, Arabic and children's handwriting) show that
segmentation. The Hough transform is employed for text-line       97.31% of text lines were segmented correctly. The writers
segmentation in different scripts. A block-based Hough            mention that “a more intelligent approach to cut an
transform is presented which is a modification of the             overlapping component is the goal of future work”. A recent
conventional Hough transform methodology. The algorithm           approach [8] uses block-based Hough transform to detect lines
includes partitioning of the connected component domain into      and merging methods to correct false alarms. Although the
three spatial sub-domains and applying a block-based Hough        algorithm achieves a 93.1% detection rate and a 96%
transform to detect the potential text lines [5].                 recognition rate, it is not flexible to follow variation of skew
Many efforts have been devoted to the difficult problem of        angle along the same text line and not very precise in the
hand- written text line segmentation. The methods can be          assignment of accents to text lines. Li et al. [9] discuss the
roughly categorized into three classes: top-down, bottom-up,      text-line detection task as an image segmentation problem.
and hybrid. Top-down methods partition the document image         They use a Gaussian window to convert a binary image into a
recursively into text regions, text lines, and words/characters   smooth gray-scale. Then they adopt the level set method to
with the assumption of straight lines. Bottom-up methods          evolve text-line boundaries and finally, geometrical constrains
group small units of image (pixels, CCs, characters, words,       are imposed to group CCs or segments as text lines. They
etc.) into text lines and then text regions. Bottom-up grouping   report pixel-level hit rates varying from 92% to 98% on
can be viewed as a clustering process, which aggregates image     different scripts and mention that “the major failures happen
components according to proximity and does not rely on the        because two neighboring text lines touch each other
assumption of straight lines. Hybrid methods combine bottom-      significantly”. A similar method [10] evaluates eight different
up grouping and top-down partitioning in different ways. All      spatial measures between pairs of CCs to locate words in
the three approaches have their disadvantages. Top-down           handwritten postal ad- dresses. The best metric proved to be
methods do not perform well on curved and overlapping text        the one which combines the result of the minimum run-length
lines. The performance of bottom-up grouping relies on some       method and the vertical overlapping of two successive CCs.
heuristic rules or artificial parameters, such as the between-    Additionally, this metric is adjusted by utilizing the results of
component distance metric for clustering. On the other hand,      a punctuation detection algorithm (periods and commas).
hybrid methods are complicated in computation, and the            Then, a suitable threshold is computed by an iterative
design of a robust combination scheme is non-trivial [2].         procedure. The algorithm tested on 1000 address images and
A thinning operation has also been used by other researchers      performed an error rate of about 10%.
for text-line segmentation of Japanese and Indian text            Manmatha and Rothfeder [11] propose an effective for noisy
documents. Thinning algorithms followed by post-processing        historical documents scale space approach. The line image is
operations are employed for the entire background region of       filtered with an anisotropic Laplacian at several scales in order
an input text image to detect the separating borderlines.         to produce blobs which correspond to portions of characters at
Recently, some techniques have used level set, active contour     small scales and to words at larger scales. The optimum scale
and a variational Bayes method for text-line segmentation.        is estimated by three different techniques (line height, page
Density estimation and the level set method (LSM) were            averaging and free search) from which the line height showed
utilized for text-line segmentation. A probability map is         best results. Much more challenging task is line segmentation
estimated from an input document image, where each element        in historical documents due to a great deal of noise. Feldbach
represents the probability of the original pixel belonging to a   and Tonnies [12] have proposed a bottom up method for
text line. The level set method (LSM) is utilized to determine    historical church documents that requires parameters to be set
the boundary evolution of neighboring text lines. At first, a     according to the type of handwriting. They report a 90%
matched filter bank approach is used for smoothing the input      correct segmentation rate for constant parameter values which
text image. The central line of text-line components is then      rises to 97% for adjusted ones.
computed using ridges over the smoothed image. Finally, the       Another integrated system for such documents [13] creates a
active contours (snakes) over the ridges are adapted to obtain    foreground/background transition count map to find probable
the text-line segmentation result [4].                            locations of text lines and applies min-cut/max-flow algorithm
                                                                  to separate initially connected text lines .The method performs
2. Related Work:                                                  high accuracy (over 98%) in 20 images of George
                                                                  Washington's manuscript.
In this section, we give a brief review of recent work on text
line and word segmentation in handwritten document images.
As far as we know, the following techniques either achieved       3. Segmentation challenges:
the best results in the corresponding test datasets, or are
elements of integrated systems for specific tasks. One of the     In this section we the challenges involved in the segmentation
most accurate methods uses piece-wise projection pro- files to    of the text-lines. When dealing with handwritten text, line
obtain an initial set of candidate lines and bivariate Gaussian




                                                                                                                                 2
segmentation has to solve some obstacles that are uncommon        4. Motivation:
in modern printed text [6]. Among the most predominant are:       The above challenges motivated us to take this challenging
                                                                  work.
3.1 Skewed lines: Lines of text in general are not straight.
These lines are not parallel to each other.                       5. Proposed Method:

                                                                  In the proposed method our objective is to identify the
                                                                  boundary of the text-lines, which consist two steps (i)
                                                                  Generating partial boundary line and (ii) Generating complete
                                                                  boundary line. By using these partial lines it is very difficult to
                                                                  differentiate between the two lines. Because these partial lines
                                                                  are having gaps and they are broken. Thus we need to
                                                                  construct the complete line which acts as differentiator for
                                                                  identifying the text-lines. In constructing the complete line we
                                                                  will consider the highest frequent y co-ordinate value in each
                                                                  partial boundary lines, y co-ordinate values are the complete
Figure 3.1: Skewed Lines
                                                                  boundaries for each text-lines. In last step we segment the
                                                                  each text-line by representing the each text-line with different
3.2 Fluctuating lines: Lines of text are partially or fully
                                                                  colors.
connected to other text-lines.

                                                                  5.1 Generating partial text-line boundary:

                                                                  The partial text-line boundary is generated by blocking the
                                                                  text-lines. Using morphological operations like erosion. Here
                                                                  blocking is done for filling the holes and gaps between the
                                                                  words. This helps in drawing the partial line. Here the lines
                                                                  are drawn at the edge of the every blocked text-line. However
                                                                  these partial lines are not sufficient to differentiate the text-
                                                                  lines. This is shown in figure 5.1.
Figure 3.2: Fluctuating Lines
                                                                  5.2 Generating complete text-line boundary:
3.3 Line proximity: Small gaps between neighboring text
lines will cause touching and overlapping of components,          As in the previous step we obtained the partial boundary lines,
usually words or letters, between lines and irregularity in       these broken boundary lines are not sufficient for segmenting
geometrical properties of the line, such as line width, height,   the text-lines, thus we need to generate complete line which is
distance in between words and lines, leftmost position etc.       continuous. So now by using these partial lines, we draw the
                                                                  complete boundary lines with the help of the frequent vertical
                                                                  points. These y co-ordinate values are used to differentiate
                                                                  between text-lines, and these complete boundary lines helps in
                                                                  segmentation. This is showed in figure 5.2

                                                                  5.3 Text-line segmentation:

                                                                  Once the complete boundary lines were drawn, it is easy to
                                                                  segment the text-lines by assigning different values to each
                                                                  character in between the lines. In this section, we give the
                                                                  different intensity values for the characters in a text-line
                                                                  between the two drawn lines. Through the different intensity
                                                                  values of the characters in a text-line, this method recognizes
Figure 3.3: Line proximity                                        the different lines. The segmentation for different languages
Source: [6]                                                       are showed in figure 5.3




                                                                                                                                   3
Figure 5.1: Illustration of Partial text-line boundary detection. (a) A handwritten document. (b) Blocked text-
lines. (c) Partial boundary lines




Figure 5.2: Illustration of complete text-line boundary detection. (a) Complete boundary lines with partial
boundary lines. (b) Complete boundary lines without partial boundary lines




                                                                                                                  4
Figure 5.3: Illustration of segmented text-lines. (a) Segmented English document. (b) Segmented Kannada
document (c) Segmented Hindi document (d) Segmented6. Block diagram of proposed method:
                                                         English document




                                                                                                          5
7. Results and discussion:
We conducted the experiments on the languages like Kannada,      [7] Z. Razak, K. Zulkiflee, et al., Off-line handwriting text line
English, Hindi and Arabic of 500 documents (kannada-150,         segmentation: a review, International Journal of Computer
English-200, Hindi-100 and Arabic-50) and obtained the           Science and Network Security 8 (7) (2008) 12–20.
accurate results.                                                [8] G. Louloudis, B. Gatos, C. Halatsis, Text line detection in
                                                                 unconstrained handwritten documents using a block-based
                                                                 Hough transform approach, in: Proceedings of International
8. Conclusion and future work:                                   Conference on Document Analysis and Recognition, 2007, pp.
                                                                 599–603
In this work we are mainly concentrated on extracting required    [9] Y. Li, Y. Zheng, D. Doermann, S. Jaeger, Script-
text-line from the given document and obtained the specified     independent text line segmentation in freestyle handwritten
text-line accurately. The proposed method works well for         documents, IEEE Transactions on Pattern Analysis and
segmenting the text-line of handwritten document. Our            Machine Intelligence 30 (8) (2008) 1313–1329.
method only works on the fixed word length and text-lines        [10] G. Seni, E. Cohen, External word segmentation of off-
without skew; in our future work we will improve the results     line handwritten text lines, Pattern Recognition 27 (1994) 41–
by segmenting the text-lines with above described limitations.   52.
                                                                 [11] R. Manmatha, J.L. Rothfeder, A scale space approach for
9. References:                                                   automatically segmenting words from historical handwritten
                                                                 documents, IEEE Transactions on Pattern Analysis and
[1] G. Louloudisa, B.Gatosb, I.Pratikakisb, C.HalatsisaText      Machine Intelligence 27 (8) (2005) 1212–1225.
line and word segmentation of handwritten documents. Pattern     [12] M. Feldbach, K.D. Tonnies, Line detection and
Recognition (2008) pp. 3169 – 3183                               segmentation in historical church registers, in: Proceedings of
[2] Fei Yin, Cheng-LinLiu. Handwritten Chinese text line         International Conference on Document Analysis and
segmentation by clustering with distance metric learning         Recognition, 2001, pp. 743–747.
Pattern Recognition (2009) pp. 3146 -- 3157                      [13] D.J. Kennard, W.A. Barrett, Separating lines of text in
[3] Vassilis Papavassiliou, Themos Stafylakis, et al..,          free-form handwritten historical documents, in: Proceedings of
Handwritten document image segmentation into textlines and       International Workshop on Document Image Analysis for
words. Pattern Recognition (2010) pp. 369 -- 377                 Libraries, 2006, pp. 12–23.
[4] Alireza Alaei UmapadaPal, et al..,. A new scheme for
unconstrained handwritten text-line segmentation. Pattern
Recognition (2011) pp. 917–928
[5] A Block-Based Hough Transform Mapping for Text Line
Detection in Handwritten Documents G. Louloudis1, B.
Gatos2, I. Pratikakis2, K. Halatsis1
[6] Ashu Kumar , Simpel Rani Jindal, Galaxy Singla Line
segmentation using contour tracing, 2012 Vol 3




                                                                                                                                 6

Weitere ähnliche Inhalte

Andere mochten auch

A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten TextA Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten TextRanda Elanwar
 
Word segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningWord segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningI3E Technologies
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionEditor IJCATR
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...Editor IJMTER
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
 
Online handwritten script recognition
Online handwritten script recognitionOnline handwritten script recognition
Online handwritten script recognitionDhiraj Singh
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...AM Publications
 
Arabic Handwritten Script Recognition Towards Generalization: A Survey
Arabic Handwritten Script Recognition Towards Generalization: A Survey Arabic Handwritten Script Recognition Towards Generalization: A Survey
Arabic Handwritten Script Recognition Towards Generalization: A Survey Randa Elanwar
 
Devanagari Character Recognition
Devanagari Character RecognitionDevanagari Character Recognition
Devanagari Character RecognitionPulkit Goyal
 
Artificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionArtificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionDr. Uday Saikia
 
Text extraction From Digital image
Text extraction From Digital imageText extraction From Digital image
Text extraction From Digital imageKaushik Godhani
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 

Andere mochten auch (16)

Spotting Customers in Trouble
Spotting Customers in TroubleSpotting Customers in Trouble
Spotting Customers in Trouble
 
A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten TextA Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text
 
Word segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learningWord segmentation method for handwritten documents based on structured learning
Word segmentation method for handwritten documents based on structured learning
 
Holistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word RecognitionHolistic Approach for Arabic Word Recognition
Holistic Approach for Arabic Word Recognition
 
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 
Online handwritten script recognition
Online handwritten script recognitionOnline handwritten script recognition
Online handwritten script recognition
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...
 
Arabic Handwritten Script Recognition Towards Generalization: A Survey
Arabic Handwritten Script Recognition Towards Generalization: A Survey Arabic Handwritten Script Recognition Towards Generalization: A Survey
Arabic Handwritten Script Recognition Towards Generalization: A Survey
 
Devanagari Character Recognition
Devanagari Character RecognitionDevanagari Character Recognition
Devanagari Character Recognition
 
Artificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character RecognitionArtificial Neural Network / Hand written character Recognition
Artificial Neural Network / Hand written character Recognition
 
Text extraction From Digital image
Text extraction From Digital imageText extraction From Digital image
Text extraction From Digital image
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 

Ähnlich wie A simple text ine segmentation method for handwritten documents1

Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesiosrjce
 
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
 
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...ijdpsjournal
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionijaia
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...AIRCC Publishing Corporation
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...AIRCC Publishing Corporation
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character RecognitionIOSR Journals
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Recordscsandit
 
Preprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsPreprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsIJECEIAES
 
Java Abs Online Handwritten Script Recognition
Java Abs   Online Handwritten Script RecognitionJava Abs   Online Handwritten Script Recognition
Java Abs Online Handwritten Script Recognitionncct
 
Scene text recognition in mobile applications by character descriptor and str...
Scene text recognition in mobile applications by character descriptor and str...Scene text recognition in mobile applications by character descriptor and str...
Scene text recognition in mobile applications by character descriptor and str...eSAT Journals
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
 
Customized mask region based convolutional neural networks for un-uniformed ...
Customized mask region based convolutional neural networks  for un-uniformed ...Customized mask region based convolutional neural networks  for un-uniformed ...
Customized mask region based convolutional neural networks for un-uniformed ...IJECEIAES
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformPooja G N
 
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...inventionjournals
 

Ähnlich wie A simple text ine segmentation method for handwritten documents1 (20)

Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document images
 
P01725105109
P01725105109P01725105109
P01725105109
 
50120130406021
5012013040602150120130406021
50120130406021
 
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
 
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
COHESIVE MULTI-ORIENTED TEXT DETECTION AND RECOGNITION STRUCTURE IN NATURAL S...
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
 
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
Bag of Visual Words for Word Spotting in Handwritten Documents Based on Curva...
 
Online Hand Written Character Recognition
Online Hand Written Character RecognitionOnline Hand Written Character Recognition
Online Hand Written Character Recognition
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
Preprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphsPreprocessing techniques for recognition of ancient Kannada epigraphs
Preprocessing techniques for recognition of ancient Kannada epigraphs
 
Java Abs Online Handwritten Script Recognition
Java Abs   Online Handwritten Script RecognitionJava Abs   Online Handwritten Script Recognition
Java Abs Online Handwritten Script Recognition
 
Scene text recognition in mobile applications by character descriptor and str...
Scene text recognition in mobile applications by character descriptor and str...Scene text recognition in mobile applications by character descriptor and str...
Scene text recognition in mobile applications by character descriptor and str...
 
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...
 
F045053236
F045053236F045053236
F045053236
 
Customized mask region based convolutional neural networks for un-uniformed ...
Customized mask region based convolutional neural networks  for un-uniformed ...Customized mask region based convolutional neural networks  for un-uniformed ...
Customized mask region based convolutional neural networks for un-uniformed ...
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
Detecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width TransformDetecting text from natural images with Stroke Width Transform
Detecting text from natural images with Stroke Width Transform
 
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
Study on a Hybrid Segmentation Approach for Handwritten Numeral Strings in Fo...
 

Kürzlich hochgeladen

Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlDr. Bruce A. Johnson
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptxmary850239
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashasashalaycock03
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 

Kürzlich hochgeladen (20)

Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
EBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting BlEBUS5423 Data Analytics and Reporting Bl
EBUS5423 Data Analytics and Reporting Bl
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
 
Work Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sashaWork Experience for psp3 portfolio sasha
Work Experience for psp3 portfolio sasha
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 

A simple text ine segmentation method for handwritten documents1

  • 1. A simple text-line segmentation method for handwritten documents M.Ravi Kumar1, R. Pradeep2, B.S.Puneeth Kumar3, and Prasad Babu4 Dept. of P.G.Studies in Computer Science, Kuvempu University Jnana Sahyadri, Shankaraghatta-577451 E-mail: ravi2142@yahoo.co.in, punithashp@gmail.com, kamitkar.prasad9@gmail.com Abstract: Text line segmentation is an important step mainly due to the irregularity of lay- out and character shapes because inaccurately segmented text lines will cause errors in originated from the variability of writing styles. For the recognition stage.. The nature of handwriting makes the unconstrained handwritten documents, text line seg- mentation process of text line segmentation very challenging. Text and character segmentation-recognition are not solved though characteristics can vary in font, size, orientation, alignment, enormous efforts have been devoted to them and great color, contrast, and background information. These variations advances have been made. Text line segmentation of turn the process of word detection complex and difficult. handwritten documents is much more difficult than that of Since handwritten text can vary greatly depending on the user printed documents. Unlike that printed documents have skills, disposition and cultural background. In this work we approximately straight and parallel text lines, the lines in have proposed the method which works on the different handwritten documents are often un-uniformly skewed and intensity values for extracting the text-lines. curved. Moreover, the spaces between handwritten text lines are often not obvious compared to the spaces between within- line characters, and some text lines may interfere with each Keywords: other. Therefore, many text line detection techniques, such as projection analysis and K-nearest neighbor connected Handwritten document, segmentation, text-lines; components (CCs) grouping, are not able to segment handwritten text lines successfully[2]. 1. Introduction: On the other hand, piece-wise projections are sensitive to character’s size variation within text lines and significant gaps Segmentation of a document image into between successive words. These occurrences influence the its basic entities, namely, text lines and words, is considered as effectiveness of smearing methods too. In such cases, the a non-trivial problem to solve in the field of handwritten results of two adjacent zones may be ambiguous, affecting the document recognition. The difficulties that arise in drawing of text-line separators along the document width. To handwritten documents make the segmentation procedure a deal with these problems we introduce a smooth version of the challenging task. Different types of difficulties are projection profiles to over segment each zone into candidate encountered in the text line segmentation and the word text and gap regions. Then, we reclassify these regions by segmentation procedure. In case of text line segmentation applying an HMM formulation that enhances statistics from procedure, major difficulties include the difference in the skew the whole document page. Starting from left and moving to the angle between lines on the page or even along the same text right we combine separators of consecutive zones considering line, overlapping words and adjacent text lines touching. their proximity and the local foreground density [3]. Furthermore, the frequent appearance of accents in many These piece-wise projection based methods have a few languages (e.g. French, Greek) makes the text line shortcomings: (a) they generate too many potential separating segmentation a challenging task. In word segmentation, lines, (b) the parameter of stripe width is predefined, (c) text- difficulties that arise include the appearance of skew and slant lines should not have significant skew as mentioned in and (d) in the text line, the existence of punctuation marks along the if there is no potential piece-wise line in the first and last text line and the non- uniform spacing of words which is a stripes, drawing a complete separating line will become common residual in handwritten documents[1]. Text line impossible in the any algorithms. These shortcomings are segmentation from document images is one of the major observed based on an experiment that we have conducted with problems in document image analysis. It provides crucial a number of text-pages. Some authors also used skew information for the tasks of text block segmentation, character information for text-line separation. In an unconstrained seg-mentation and recognition, and text string recognition. handwritten text-page, it is very difficult to detect the Whereas the difficulty of machine-printed document analysis orientation of each line on the basis of the skew calculated for mainly lies in the complex layout structure and degraded the entire page. Therefore, these methods may not work image quality, handwritten document analysis is difficult properly [4]. 1
  • 2. The concept of the Hough transform is employed in the field densities to assign overlapping CCs into text lines [7]. of document analysis for many purposes such as skew Experimental results on a collection of 720 documents detection, line detection, slant detection and text-line (English, Arabic and children's handwriting) show that segmentation. The Hough transform is employed for text-line 97.31% of text lines were segmented correctly. The writers segmentation in different scripts. A block-based Hough mention that “a more intelligent approach to cut an transform is presented which is a modification of the overlapping component is the goal of future work”. A recent conventional Hough transform methodology. The algorithm approach [8] uses block-based Hough transform to detect lines includes partitioning of the connected component domain into and merging methods to correct false alarms. Although the three spatial sub-domains and applying a block-based Hough algorithm achieves a 93.1% detection rate and a 96% transform to detect the potential text lines [5]. recognition rate, it is not flexible to follow variation of skew Many efforts have been devoted to the difficult problem of angle along the same text line and not very precise in the hand- written text line segmentation. The methods can be assignment of accents to text lines. Li et al. [9] discuss the roughly categorized into three classes: top-down, bottom-up, text-line detection task as an image segmentation problem. and hybrid. Top-down methods partition the document image They use a Gaussian window to convert a binary image into a recursively into text regions, text lines, and words/characters smooth gray-scale. Then they adopt the level set method to with the assumption of straight lines. Bottom-up methods evolve text-line boundaries and finally, geometrical constrains group small units of image (pixels, CCs, characters, words, are imposed to group CCs or segments as text lines. They etc.) into text lines and then text regions. Bottom-up grouping report pixel-level hit rates varying from 92% to 98% on can be viewed as a clustering process, which aggregates image different scripts and mention that “the major failures happen components according to proximity and does not rely on the because two neighboring text lines touch each other assumption of straight lines. Hybrid methods combine bottom- significantly”. A similar method [10] evaluates eight different up grouping and top-down partitioning in different ways. All spatial measures between pairs of CCs to locate words in the three approaches have their disadvantages. Top-down handwritten postal ad- dresses. The best metric proved to be methods do not perform well on curved and overlapping text the one which combines the result of the minimum run-length lines. The performance of bottom-up grouping relies on some method and the vertical overlapping of two successive CCs. heuristic rules or artificial parameters, such as the between- Additionally, this metric is adjusted by utilizing the results of component distance metric for clustering. On the other hand, a punctuation detection algorithm (periods and commas). hybrid methods are complicated in computation, and the Then, a suitable threshold is computed by an iterative design of a robust combination scheme is non-trivial [2]. procedure. The algorithm tested on 1000 address images and A thinning operation has also been used by other researchers performed an error rate of about 10%. for text-line segmentation of Japanese and Indian text Manmatha and Rothfeder [11] propose an effective for noisy documents. Thinning algorithms followed by post-processing historical documents scale space approach. The line image is operations are employed for the entire background region of filtered with an anisotropic Laplacian at several scales in order an input text image to detect the separating borderlines. to produce blobs which correspond to portions of characters at Recently, some techniques have used level set, active contour small scales and to words at larger scales. The optimum scale and a variational Bayes method for text-line segmentation. is estimated by three different techniques (line height, page Density estimation and the level set method (LSM) were averaging and free search) from which the line height showed utilized for text-line segmentation. A probability map is best results. Much more challenging task is line segmentation estimated from an input document image, where each element in historical documents due to a great deal of noise. Feldbach represents the probability of the original pixel belonging to a and Tonnies [12] have proposed a bottom up method for text line. The level set method (LSM) is utilized to determine historical church documents that requires parameters to be set the boundary evolution of neighboring text lines. At first, a according to the type of handwriting. They report a 90% matched filter bank approach is used for smoothing the input correct segmentation rate for constant parameter values which text image. The central line of text-line components is then rises to 97% for adjusted ones. computed using ridges over the smoothed image. Finally, the Another integrated system for such documents [13] creates a active contours (snakes) over the ridges are adapted to obtain foreground/background transition count map to find probable the text-line segmentation result [4]. locations of text lines and applies min-cut/max-flow algorithm to separate initially connected text lines .The method performs 2. Related Work: high accuracy (over 98%) in 20 images of George Washington's manuscript. In this section, we give a brief review of recent work on text line and word segmentation in handwritten document images. As far as we know, the following techniques either achieved 3. Segmentation challenges: the best results in the corresponding test datasets, or are elements of integrated systems for specific tasks. One of the In this section we the challenges involved in the segmentation most accurate methods uses piece-wise projection pro- files to of the text-lines. When dealing with handwritten text, line obtain an initial set of candidate lines and bivariate Gaussian 2
  • 3. segmentation has to solve some obstacles that are uncommon 4. Motivation: in modern printed text [6]. Among the most predominant are: The above challenges motivated us to take this challenging work. 3.1 Skewed lines: Lines of text in general are not straight. These lines are not parallel to each other. 5. Proposed Method: In the proposed method our objective is to identify the boundary of the text-lines, which consist two steps (i) Generating partial boundary line and (ii) Generating complete boundary line. By using these partial lines it is very difficult to differentiate between the two lines. Because these partial lines are having gaps and they are broken. Thus we need to construct the complete line which acts as differentiator for identifying the text-lines. In constructing the complete line we will consider the highest frequent y co-ordinate value in each partial boundary lines, y co-ordinate values are the complete Figure 3.1: Skewed Lines boundaries for each text-lines. In last step we segment the each text-line by representing the each text-line with different 3.2 Fluctuating lines: Lines of text are partially or fully colors. connected to other text-lines. 5.1 Generating partial text-line boundary: The partial text-line boundary is generated by blocking the text-lines. Using morphological operations like erosion. Here blocking is done for filling the holes and gaps between the words. This helps in drawing the partial line. Here the lines are drawn at the edge of the every blocked text-line. However these partial lines are not sufficient to differentiate the text- lines. This is shown in figure 5.1. Figure 3.2: Fluctuating Lines 5.2 Generating complete text-line boundary: 3.3 Line proximity: Small gaps between neighboring text lines will cause touching and overlapping of components, As in the previous step we obtained the partial boundary lines, usually words or letters, between lines and irregularity in these broken boundary lines are not sufficient for segmenting geometrical properties of the line, such as line width, height, the text-lines, thus we need to generate complete line which is distance in between words and lines, leftmost position etc. continuous. So now by using these partial lines, we draw the complete boundary lines with the help of the frequent vertical points. These y co-ordinate values are used to differentiate between text-lines, and these complete boundary lines helps in segmentation. This is showed in figure 5.2 5.3 Text-line segmentation: Once the complete boundary lines were drawn, it is easy to segment the text-lines by assigning different values to each character in between the lines. In this section, we give the different intensity values for the characters in a text-line between the two drawn lines. Through the different intensity values of the characters in a text-line, this method recognizes Figure 3.3: Line proximity the different lines. The segmentation for different languages Source: [6] are showed in figure 5.3 3
  • 4. Figure 5.1: Illustration of Partial text-line boundary detection. (a) A handwritten document. (b) Blocked text- lines. (c) Partial boundary lines Figure 5.2: Illustration of complete text-line boundary detection. (a) Complete boundary lines with partial boundary lines. (b) Complete boundary lines without partial boundary lines 4
  • 5. Figure 5.3: Illustration of segmented text-lines. (a) Segmented English document. (b) Segmented Kannada document (c) Segmented Hindi document (d) Segmented6. Block diagram of proposed method: English document 5
  • 6. 7. Results and discussion: We conducted the experiments on the languages like Kannada, [7] Z. Razak, K. Zulkiflee, et al., Off-line handwriting text line English, Hindi and Arabic of 500 documents (kannada-150, segmentation: a review, International Journal of Computer English-200, Hindi-100 and Arabic-50) and obtained the Science and Network Security 8 (7) (2008) 12–20. accurate results. [8] G. Louloudis, B. Gatos, C. Halatsis, Text line detection in unconstrained handwritten documents using a block-based Hough transform approach, in: Proceedings of International 8. Conclusion and future work: Conference on Document Analysis and Recognition, 2007, pp. 599–603 In this work we are mainly concentrated on extracting required [9] Y. Li, Y. Zheng, D. Doermann, S. Jaeger, Script- text-line from the given document and obtained the specified independent text line segmentation in freestyle handwritten text-line accurately. The proposed method works well for documents, IEEE Transactions on Pattern Analysis and segmenting the text-line of handwritten document. Our Machine Intelligence 30 (8) (2008) 1313–1329. method only works on the fixed word length and text-lines [10] G. Seni, E. Cohen, External word segmentation of off- without skew; in our future work we will improve the results line handwritten text lines, Pattern Recognition 27 (1994) 41– by segmenting the text-lines with above described limitations. 52. [11] R. Manmatha, J.L. Rothfeder, A scale space approach for 9. References: automatically segmenting words from historical handwritten documents, IEEE Transactions on Pattern Analysis and [1] G. Louloudisa, B.Gatosb, I.Pratikakisb, C.HalatsisaText Machine Intelligence 27 (8) (2005) 1212–1225. line and word segmentation of handwritten documents. Pattern [12] M. Feldbach, K.D. Tonnies, Line detection and Recognition (2008) pp. 3169 – 3183 segmentation in historical church registers, in: Proceedings of [2] Fei Yin, Cheng-LinLiu. Handwritten Chinese text line International Conference on Document Analysis and segmentation by clustering with distance metric learning Recognition, 2001, pp. 743–747. Pattern Recognition (2009) pp. 3146 -- 3157 [13] D.J. Kennard, W.A. Barrett, Separating lines of text in [3] Vassilis Papavassiliou, Themos Stafylakis, et al.., free-form handwritten historical documents, in: Proceedings of Handwritten document image segmentation into textlines and International Workshop on Document Image Analysis for words. Pattern Recognition (2010) pp. 369 -- 377 Libraries, 2006, pp. 12–23. [4] Alireza Alaei UmapadaPal, et al..,. A new scheme for unconstrained handwritten text-line segmentation. Pattern Recognition (2011) pp. 917–928 [5] A Block-Based Hough Transform Mapping for Text Line Detection in Handwritten Documents G. Louloudis1, B. Gatos2, I. Pratikakis2, K. Halatsis1 [6] Ashu Kumar , Simpel Rani Jindal, Galaxy Singla Line segmentation using contour tracing, 2012 Vol 3 6