The document summarizes research activities and tools developed by the National Center for Scientific Research "Demokritos" for the IMPACT project. It describes tools for border detection, page curl detection, and character segmentation. Evaluation results for the border detection and page curl detection tools on large datasets are provided.
1. IMPACT Tools Developed by NCSR IMPACT Final Conference 2011 24-25 October 2011, London, UK B. Gatos Computational Intelligence Laboratory Institute of Informatics and Telecommunications National Center for Scientific Research ( NCSR ) "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece
2.
3.
4.
5.
6.
7. Recent OCR projects Computational Intelligence Laboratory Institute of Informatics and Telecommunications N ational C enter for S cientific R esearch "Demokritos" GR-153 10 Agia Paraskevi, Athens, Greece IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
10. Information gain web ontology language Image Video Visual Information Non Visual Information Text Audio Video OCR http://www.casam-project.eu/ IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Fusion Low-level analysis Interpre tation
11. Video Logo Detection IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
12.
13. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
14. Border_Detection_v4 [0|1] [infile] [outfile1] [outfile2] parameter [0|1]: 0 -> only border removal, 1 -> border removal & page split parameter [infile]: Input filename (b/w or gray scale image) parameters [outfile1] [outfile2]: Output filenames (b/w or gray scale image) + web service implementation IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
15.
16.
17. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
18. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
19. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
20. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
21. 1 (Bad) 2 3 4 5 (Good) Av=4.3 Av=3.6 1. Final image almost destroyed! 2. Big part of text is missing 3. Small part of text is missing 4. All text is there, border not completely removed. 5. All text is there, border has been completely removed. 1. Final image almost destroyed! 2. Big part of text is missing 3. Small part of text is missing 4. All text is there, border not completely removed. 5. All text is there, border has been completely removed. 21709 images to test border removal 3003 newspaper images to test border removal
22. 1 (Bad) 2 3 4 5 (Good) Av=3.3 1. Page split fails! 2 Page split with problems. 3. Page split is correct, large parts of noise remains or text is removed 4. Page split is correct, small parts of noise remains or text is removed 5. Page split is correct, only black noise has been removed IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 3009 images to test page split (results on 50%)
25. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 3009 images to test page split
26. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK 458 images from BNF to test page split
27. Page_Curl_Correction _v4 [0|1] [infile] [outfile] parameter [0|1]: 0 -> coarse & fine correction, 1 -> only coarse correction parameter [infile]: Input filename (b/w or gray scale image) parameters [outfile] : Output filename (b/w or gray scale image) + web service implementation IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
28.
29. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
30. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
31. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
32.
33.
34. IMPACT Page Curl Correction v.4 87.78% (81.98% only coarse correction) BookRestorer 80.87% N. Stamatopoulos, B. Gatos and I. Pratikakis, " A Methodology for Document Image Dewarping Techniques Performance Evaluation ", 10th International Conference on Document Analysis and Recognition (ICDAR'09) , pp. 956-960, Barcelona, Spain, July 2009. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
35. 0.21 0.91 Character_Segmentation_v3 [WordImageFilename] [XMLOutputFilename] parameter [WordImageFilename]: An image containing a word parameter [XMLOutputFilename] : several character segmentation variations encoded following the XML schema of IBM used in TR3 (Adaptive OCR) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
36. Merged characters Broken characters Overlapped characters Noise IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
37.
38. 0.61 0.79 0.85 0.98 0.94 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
39. 0.83 0.63 0.73 0.89 0.90 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
40. 0.61 0.79 0.94 Evaluation of the result with the highest confidence IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
41. 0.61 0.79 0.94 Evaluation of the best possible result IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
42. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53. A. L. Kesidis, E. Galiotou, B. Gatos and I. Pratikakis, “ A word spotting framework for historical machine-printed documents ”, International Journal on Document Analysis and Recognition, DOI: 10.1007/s10032-010-0134-4, pp. 1-14, 2010. A. L. Kesidis, E. Galiotou, B. Gatos, A. Lampropoulos, I. Pratikakis, I. Manolessou and A. Ralli, " Accessing the content of Greek historical documents ", 3rd Workshop on Analytics for Noisy Unstructured Text Data (AND'09), pp. 55-62, Barcelona, Spain, July 2009 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
54.
55.
56.
57. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK Query by Keyword Query by Example Free Text OFFLINE PREPARATION – ADMINISTRATIVE TASKS Page segmentation and features extraction Admin Admin Admin Keywords definition Admin Letter templates definition Admin Admin Word Spotting by User ’ s feedback Admin ONLINE USAGE Searching All Users All Users All Users
58.
59.
60.
61.
62. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
64. H-DocPro v.1 IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
65. H-DocPro v.1 Step 1: Select the directory with your images or copy your images to directory [Install Dir]/images. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
66. H-DocPro v.1 Step 2: Select the directory for saving the results after pressing the "Settings" button. (default save directory: [Install Dir]/Results ) IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
67. H-DocPro v.1 Step 3: Select one or more document images. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
68.
69. H-DocPro v.1 Step 5: Select the method for every processing module by pressing "<" or ">" on every module at the workflow line. Right click on the module at the workflow line and deselect "Do not recalculate if result exists" if you want to recalculate an existing result. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
70. H-DocPro v.1 Step 6: Execute workflow by pressing "Apply Processes" IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
71. H-DocPro v.1 Step 7: View results on the preview window or right click on any module at the workflow line and select "View Result". If you right click on the right-most module you will view the final result otherwise you will view the intermediate results. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
72. H-DocPro v.1 - Document Image Processing Components Binarization NCSR: Based on "B. Gatos, I. Pratikakis and S. J. Perantonis, Adaptive Degraded Document Image Binarization, Pattern Recognition, Vol. 39, pp. 317-327, 2006" FR8.1: From FineReader Engine v. 8.1. IMPORTANT NOTICES: (a) You must have the engine already intalled. (b) You must edit file [Install Dir]/temp/Binarization/FRkey.txt and add your FineReader license key code IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
73. H-DocPro v.1 - Document Image Processing Components Border Removal Auto: Based on projection profiles and connected component analysis. Auto_Edit: Press inside the marked area and adjust it by draging the black points. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
74. H-DocPro v.1 - Document Image Processing Components Page Split Auto: Based on "N. Stamatopoulos, B. Gatos, T. Georgiou, Page frame detection for double page document images, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), pp. 401-408, Cambridge, MA, USA, June 2010" Auto_Edit: Press inside the left or right marked area and adjust it by dragging the black points. IMPACT Tools Developed by NCSR - IMPACT Final Conference 2011, 24-25 October, London, UK
75. ASM 2011, 12-13 April 2011, Munich, Germany H-DocPro v.1 - Document Image Processing Components Dewarping Auto: Based on "N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, Goal-oriented Rectification of Camera-Based Document Images, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011." IMPORTANT NOTICES: (a) It needs the MATLAB Component Runtime Installer, (b) it can be applied only to single column documents. Auto_Edit: Manually correct the position of the two lines and the two curves that delimit the text area by draging the corresponding black points. Press ">" button to test the result.