SlideShare ist ein Scribd-Unternehmen logo
1 von 55
Unlocking the Handwritten Content in  Document Images  Venu Govindaraju [email_address]
Handwritten Documents Relevance Scanner Storage OCR Noisy Text Newton Kinematics Notes Query Forms Letters Notes
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Challenge of Handwriting
Input Output 20187 + 2246 Handwriting Recognition
Postal Context  (138 mil records) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],LDR Lex Top 1 Top 2 10 96.5 98.7 100 89.2 94.1 1000 75.3 86.3
Paradigms Lexicon Driven OCR LDR Lexicon Free  OCR LFR Context Ranked Lexicon Segmentation Recognition Post-processing
Lexicon Free (LFR) i[.8], l[.8] u[.5], v[.2] w[.6], m[.3] w[.7] i[.7] u[.3] m[.2] m[.1] r[.4] d[.8] o[.5] ,[object Object],[object Object],[object Object],Find the best path in graph from segment 1 to 8
Lexicon Driven (LDR) Find the best way of accounting for  characters  ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1  and  3 is 7.2 - segments 1 and 2 is 7.6 w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 1 2 3 4 5 6 7 8 9 o[6.1]
Grapheme Models (LFR) Writer Specific Modeling Holistic Features grapheme pos orientation angle Down cusp 3.0 -90 o Up loop Down arc
[object Object],[object Object],[object Object],[object Object],ABLE TRIP TRAP A T N Words Letters Features Interactive Models (LDR) 1-way activation [McClelland and Rumelhart 1981] 2-way  interaction
Interactive Models (LDR) Phrase Level  T-crossings, loops, ascenders, descenders, length West Central Street West Main  Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1   Lexicon 2 Lexicon 3 Interactive Model features image 2-way interaction
Interactive Models Character Recognition ,[object Object],[object Object],[object Object],Gradient (4) and Moment (5) Features 0  1  0  1  1  1  0  0  1 [Park and Govindaraju, IEEE CVPR 2000]
Active Recognition
Results 10 class digit recognition 25656 training and 12242 test  (Postal +NIST) Active Model Neural  Net KNN Top 1% 95.7 % 96.4% 95.7% Temp 612 976 3,777 Msec 1.45 11.5 384 Training  hrs 1 24 1 Lex size LDR % GM % 10 96.86 96.56 100 91.36 89.12 1000 79.58 75.38 (Top 50) 98.00 98.40 20000 62.43 58.14 (Top 100) 93.59 93.39
Fusion   Identification Task Verification Task LDR LFR
Fusion of Recognizers Type III LDR 5.6 7.4 … LFR .52 .81 … Identification task: Amherst Buffalo … Verification task: 5.6 .52 Amherst Question:  if we find optimal  and  , is it necessarily  ?  Accept Reject
Traditional Fusion Rules ,[object Object],[object Object],[object Object],[object Object],[object Object]
Likelihood Ratio Verification Tasks ,[object Object],[object Object],Minimum risk criteria:  optimal decision boundaries coincide with the contours of likelihood ratio function: Metaclassification with NN, SVM, etc. also possible [Prabhakar, Jain 02] [Nandkumar, Jain, Das 08] Impostor Genuine Recognizer score 2 Recognizer score 1
Optimal Combination functions Identification Task Results Top choice correct rate Verification Task Results ROC LFR is correct 54.8% LDR is correct 77.2% Both are correct 48.9% Either is correct 83.0% Likelihood Ratio 69.8% Weighted Sum 81.6% ,[object Object]
Independence of Scores In a single trial Amherst 5.6 7.4 … Buffalo .52 .81 … LDR LFR … … . … .
Lexicon1 Lexicon  i Lexicon N Independence of Scores In a single trial Recognizer 1 Recognizer  M Tulyakov & Govindaraju, TIFS 2009 Independent? Dependent Dependent
Optimal  Combination  ? Correlated Scores Dependent on input signal Set size LFR LDR Both correct Either correct LR Weighted sum 54.8% 77.2% 48.9% 83.0% 69.8% 81.6% 6147 3366 4744 3005 5105 4293 5015 2 nd  choice 3 rd  choice 4 th  choice Mean LFR .4359 .4755 .4771 .1145 LDR .7885 .7825 .7673 .5685
Optimal Trainable Combination Function  Minimizing misclassification cost: Classify as  rather than Assume that scores assigned to different classes are independent : Tulyakov & Govindaraju IJPRAI 2009
Combination Methods  Identification Tasks No!  Traditional Training mixes the genuine and imposter scores from different trials. Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer Score 2 Recognizer score 1
Combination Methods  Identification Tasks Model  Training MUST process scores from one identification trial as a  single training sample . BRecognizer score 2 Recognizer score 1 Impostor Genuine Rexcognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Biometric score 1
Iterative Methods ,[object Object],[object Object],[object Object],Best Impostor Function ,[object Object],Likelihood Ratio Weighted sum Best Impostor Likelihood Ratio Logistic Sum Neural Network LFR & LDR 69.84 81.58 80.07 81.43 81.67 li & C 97.24 97.23 97.01 97.34 97.39 li & G 95.90 95.47 95.99 96.17 96.29
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search for Handwritten Documents ,[object Object],[object Object],[object Object],[object Object],[object Object],Lexicon Good Quality 10K  1K Historical 10K  1K Medical 4K Top 1 (%) 57 67 12 28 20 Top 3 (%) 69 72 22 44 27 Top 10 (%) 74 75 32 72 42
Search Engine Handwritten Forms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search Engine for Medical Forms ,[object Object],[object Object],[object Object]
Topic Categorization  Lexicon Reduction Lex Free Large Lexicon > 5K Handwritten Medical Documents ICR Features ~33% word Recognition rate (10 points gain) Topic  Categorization Select Reduced Lexicon ~2.5K Lex Driven
ICR Features Index
DIGESTIVE-SYSTEM  FQ  CHSN   PHRASE 30  0.72    PAIN INCIDENT 5  0.31    PAIN TRANSPORTED 42  0.54    PAIN CHEST 52  0.81    STOMACH PAIN 9  0.25    HOME PAIN 6  0.43    VOMITING ILLNESS Topic Features
(Chu-Carroll, et al., 1999) Topic Categorization
Results C: complete lexicon R: reduced lexicon A: category given S: features synthetic T: truth present CLT to RLT CL to RL CLT to ALT CLT to SLT HR  7.48%  7.42%  17.58%  7.42% Error Rate  10.78%  10.88%  24.53%  10.21%
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Urgent Issue of our Times ,[object Object],[object Object],Threat:   ‘If it’s not in Google, it doesn’t exist!’ Baird 2003
What is possible today? ,[object Object]
Document Enhancement [Shi, Setlur, and Govindaraju 2008]
Transcript-Mapping 1787 Thomas Jefferson letter and its transcript  Image Transcript + +
What is not possible today?
 
Crosslingual Retrieval Multilingual Document Corpus Retrieved Documents  English Hindi Sanskrit Translations of “strength”
SEARCH Handwritten Documents Image – Based  Use Image Based Features OCR - Based Use OCR Recognition Results Query rendered
Image Based Methods (Rath 07 IJDAR)  Poor performance in multiple writer scenarios
SEARCH Handwritten Documents Image – Based  Use Image Based Features- OCR - Based Use OCR recognition results
Indexing Retrieval Handwriting  Recognition
Vector IR Model (TF-IDF) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Baeza-Yates99]
Modifications to VM ,[object Object],[object Object]
[object Object],[object Object],[object Object],Estimation   :  word images 0.02  0.01  0.2  0.01 0.01 … Doc  d j [Rath 04, Howe 05]
Estimating Term Frequency
Estimating Segmentation ,[object Object],[object Object],[object Object],[object Object],[object Object],d  >  D 3 hypotheses
[object Object],[object Object],[object Object],Word Recognition
[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Loss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingLoss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingIJERA Editor
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)NAVER Engineering
 
Nov 04 MS1
Nov 04 MS1Nov 04 MS1
Nov 04 MS1Samimvez
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 

Was ist angesagt? (6)

Loss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic CodingLoss less DNA Solidity Using Huffman and Arithmetic Coding
Loss less DNA Solidity Using Huffman and Arithmetic Coding
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
 
Nov 04 MS1
Nov 04 MS1Nov 04 MS1
Nov 04 MS1
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 

Ähnlich wie Trivandrum

Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI SystemsGlobecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI SystemsStenio Fernandes
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
 
Towards better software quality assurance by providing intelligent support
Towards better software quality assurance by providing intelligent supportTowards better software quality assurance by providing intelligent support
Towards better software quality assurance by providing intelligent supportConcordia University
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug predictionMartin Pinzger
 
Alexander Sirenko - Query expansion for Question Answering
Alexander Sirenko - Query expansion for Question AnsweringAlexander Sirenko - Query expansion for Question Answering
Alexander Sirenko - Query expansion for Question AnsweringAlexander Sirenko
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels
 
Not Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityNot Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityRocco Oliveto
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationkrws
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionMartin Pinzger
 
Medical Simulation Standards: What can we learn from the DoD?
Medical Simulation Standards: What can we learn from the DoD?Medical Simulation Standards: What can we learn from the DoD?
Medical Simulation Standards: What can we learn from the DoD?Roger Smith
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Sebastiano Panichella
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...IMPACT Centre of Competence
 
Real Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth SensorsReal Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth SensorsWassim Filali
 

Ähnlich wie Trivandrum (20)

Csmr10c.ppt
Csmr10c.pptCsmr10c.ppt
Csmr10c.ppt
 
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI SystemsGlobecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
Towards better software quality assurance by providing intelligent support
Towards better software quality assurance by providing intelligent supportTowards better software quality assurance by providing intelligent support
Towards better software quality assurance by providing intelligent support
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
Wcre12b.ppt
Wcre12b.pptWcre12b.ppt
Wcre12b.ppt
 
A tale of experiments on bug prediction
A tale of experiments on bug predictionA tale of experiments on bug prediction
A tale of experiments on bug prediction
 
Alexander Sirenko - Query expansion for Question Answering
Alexander Sirenko - Query expansion for Question AnsweringAlexander Sirenko - Query expansion for Question Answering
Alexander Sirenko - Query expansion for Question Answering
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 
Not Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityNot Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software Quality
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
A Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug PredictionA Tale of Experiments on Bug Prediction
A Tale of Experiments on Bug Prediction
 
Medical Simulation Standards: What can we learn from the DoD?
Medical Simulation Standards: What can we learn from the DoD?Medical Simulation Standards: What can we learn from the DoD?
Medical Simulation Standards: What can we learn from the DoD?
 
Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?
 
Rui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase GenerationRui Meng - 2017 - Deep Keyphrase Generation
Rui Meng - 2017 - Deep Keyphrase Generation
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
 
Real Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth SensorsReal Time Human Posture Detection with Multiple Depth Sensors
Real Time Human Posture Detection with Multiple Depth Sensors
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 

Kürzlich hochgeladen

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

Trivandrum

  • 1. Unlocking the Handwritten Content in Document Images Venu Govindaraju [email_address]
  • 2. Handwritten Documents Relevance Scanner Storage OCR Noisy Text Newton Kinematics Notes Query Forms Letters Notes
  • 3.
  • 5. Input Output 20187 + 2246 Handwriting Recognition
  • 6.
  • 7. Paradigms Lexicon Driven OCR LDR Lexicon Free OCR LFR Context Ranked Lexicon Segmentation Recognition Post-processing
  • 8.
  • 9. Lexicon Driven (LDR) Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6 w[7.6] w[7.2] r[3.8] w[5.0] w[8.6] o[7.6]r[6.3] d[4.9] w[5.0] o[6.6] o[6.0] o[7.2] o[10.6] d[6.5] d[4.4] r[7.5] r[6.4] o[7.8]r[8.6] o[8.7]r[7.4] r[7.6] o[8.3] o[7.7]r[5.8] 1 2 3 4 5 6 7 8 9 o[6.1]
  • 10. Grapheme Models (LFR) Writer Specific Modeling Holistic Features grapheme pos orientation angle Down cusp 3.0 -90 o Up loop Down arc
  • 11.
  • 12. Interactive Models (LDR) Phrase Level T-crossings, loops, ascenders, descenders, length West Central Street West Main Street Sunset Avenue West Central Street East Central Street Sunset Avenue West Central Street West Central Avenue Sunset Avenue Lexicon 1 Lexicon 2 Lexicon 3 Interactive Model features image 2-way interaction
  • 13.
  • 15. Results 10 class digit recognition 25656 training and 12242 test (Postal +NIST) Active Model Neural Net KNN Top 1% 95.7 % 96.4% 95.7% Temp 612 976 3,777 Msec 1.45 11.5 384 Training hrs 1 24 1 Lex size LDR % GM % 10 96.86 96.56 100 91.36 89.12 1000 79.58 75.38 (Top 50) 98.00 98.40 20000 62.43 58.14 (Top 100) 93.59 93.39
  • 16. Fusion Identification Task Verification Task LDR LFR
  • 17. Fusion of Recognizers Type III LDR 5.6 7.4 … LFR .52 .81 … Identification task: Amherst Buffalo … Verification task: 5.6 .52 Amherst Question: if we find optimal and , is it necessarily ? Accept Reject
  • 18.
  • 19.
  • 20.
  • 21. Independence of Scores In a single trial Amherst 5.6 7.4 … Buffalo .52 .81 … LDR LFR … … . … .
  • 22. Lexicon1 Lexicon i Lexicon N Independence of Scores In a single trial Recognizer 1 Recognizer M Tulyakov & Govindaraju, TIFS 2009 Independent? Dependent Dependent
  • 23. Optimal Combination ? Correlated Scores Dependent on input signal Set size LFR LDR Both correct Either correct LR Weighted sum 54.8% 77.2% 48.9% 83.0% 69.8% 81.6% 6147 3366 4744 3005 5105 4293 5015 2 nd choice 3 rd choice 4 th choice Mean LFR .4359 .4755 .4771 .1145 LDR .7885 .7825 .7673 .5685
  • 24. Optimal Trainable Combination Function Minimizing misclassification cost: Classify as rather than Assume that scores assigned to different classes are independent : Tulyakov & Govindaraju IJPRAI 2009
  • 25. Combination Methods Identification Tasks No! Traditional Training mixes the genuine and imposter scores from different trials. Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Recognizer score 1 Impostor Genuine Recognizer Score 2 Recognizer score 1
  • 26. Combination Methods Identification Tasks Model Training MUST process scores from one identification trial as a single training sample . BRecognizer score 2 Recognizer score 1 Impostor Genuine Rexcognizer score 2 Recognizer score 1 Impostor Genuine Recognizer score 2 Biometric score 1
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Topic Categorization Lexicon Reduction Lex Free Large Lexicon > 5K Handwritten Medical Documents ICR Features ~33% word Recognition rate (10 points gain) Topic Categorization Select Reduced Lexicon ~2.5K Lex Driven
  • 34. DIGESTIVE-SYSTEM FQ CHSN PHRASE 30 0.72 PAIN INCIDENT 5 0.31 PAIN TRANSPORTED 42 0.54 PAIN CHEST 52 0.81 STOMACH PAIN 9 0.25 HOME PAIN 6 0.43 VOMITING ILLNESS Topic Features
  • 35. (Chu-Carroll, et al., 1999) Topic Categorization
  • 36. Results C: complete lexicon R: reduced lexicon A: category given S: features synthetic T: truth present CLT to RLT CL to RL CLT to ALT CLT to SLT HR  7.48%  7.42%  17.58%  7.42% Error Rate  10.78%  10.88%  24.53%  10.21%
  • 37.
  • 38.
  • 39.
  • 40. Document Enhancement [Shi, Setlur, and Govindaraju 2008]
  • 41. Transcript-Mapping 1787 Thomas Jefferson letter and its transcript Image Transcript + +
  • 42. What is not possible today?
  • 43.  
  • 44. Crosslingual Retrieval Multilingual Document Corpus Retrieved Documents English Hindi Sanskrit Translations of “strength”
  • 45. SEARCH Handwritten Documents Image – Based Use Image Based Features OCR - Based Use OCR Recognition Results Query rendered
  • 46. Image Based Methods (Rath 07 IJDAR) Poor performance in multiple writer scenarios
  • 47. SEARCH Handwritten Documents Image – Based Use Image Based Features- OCR - Based Use OCR recognition results
  • 49.
  • 50.
  • 51.
  • 53.
  • 54.
  • 55.

Hinweis der Redaktion

  1. ½ min Good Afternoon: I am Venu Govindaraju, Professor at the University at Buffalo. The title of my talk today is “Paradigms in Handwriting Recognition”. This will be in the context of “English” language and the Roman alphabet. The idea is to see if some of the techniques that have proved successful in English are also applicable to Arabic or Chinese. This will be an overview style presentation: describing paradigms, applications, and accuracy figures.
  2. In the postal application, we are able to operate in the Lexicon size 30 (average). When we do not have collateral information, how does one reduce the lexicon size.
  3. 1 min The problem of handwriting recognition has been typically defined as follows: - The inputs are: a bit-map image of the word to be recognized AND a lexicon of possible choices. The lexicon usually captures the context of the application at hand. When the lexicon is not provided by the application, it assumes the size of the entire English Dictionary or at least the words in common usage. In such cases, the lexicon can be of the size of tens of thousands of words. -The output is a ranked list of the lexical choices. The choices are often associated with a confidence score. In this talk, we will make the following 2 assumptions: that we are dealing with single words or short phrases of a few words. There has been a considerable body of work in recognition of entire sentences. An early paper on the topic was published by Kim, Govindaraju and Srihari in IJDAR 1997. Since the, several papers have been published on the topic most notably from Prof Suen’s group at Concordia and Prof. Bunke’s group in Switzerland. The second assumption is that we are dealing with offline handwriting recognition.
  4. We are looking at the narrative text in the medical forms. We are using medical dictionaries. It can be seen that the techniques scale to other applications as well. We want develop a search engine for such medical forms where a health official could search the forms by querying with some medical terms. We demonstrated the method of keyword spotting at the demo session yesterday. We will now describe an alternate method of attempting full transcription- which is expected to be errorful- and see if search engines are still viable. The handwriting is sloppy- written in ambulances and other emergency scenarios. Abbreviations are freely used. Documents are in carbon copies and binarization itself is a challenge.- we presented this work at DAS 06. Lexicon Free recognition can pick up only a few characters in a each word with reasonable confidence. Lexicon driven- the lexicons will be greater than 5K for which the accuracy is in the 20s. What should we do?
  5. One problem with cohesive phrases alone is that during the recognition phase we do not know the words. Therefore, we extract terms from these cohesive phrases to be used to model the category to which its associated. This is the basis for the hypothesis. For example [read slide]
  6. The pseudo-category vector is then attached to the matrix of category column vectors.
  7. Some more detail concerning the impact of ruled line removal on word recognition: We extracted all the test word images from lined pages and measured the top choice recognition performance. Here are the numbers: -- Total word images in test set : 848 from a total of 274 pages. Of these: -- Number of word images from pages with ruled lines: 460, from 146 lined pages. -- The ratio of words and pages with ruled lines in the 34 PAW data set: 460/848 = 54.25% (word), 146/274=53.28% (pages). Recognition performance on words from lined pages: -- Top1: Earlier: 318/460 = 69.13% Now: 349/460 = 75.87% The ruled line removal improves the word recognition for top 1 by 6.74% (evaluated on words from lined pages). Overall improvement for top 1 is by 4.13% (evaluated using test set including all word images from lined or non-lined pages - which we had reported earlier). Also the PAW recognizer is a straightforward implementation using a k-nearest neighbor classifier. The features used are CUBS Gradient, Structure and Concavity Features. The classifier is a very simple implementation that can be improved and its purpose was for testing the effectiveness of our features.
  8. Digital libraries like the George Washington Papers collection at the Library of Congress consist of approximately 152,000 handwritten document images and associated transcripts. The Newton Project aims to make all of Newton's writings available online. The task of aligning the transcription with handwritten text in these libraries would enable one to automatically generate an immense database of word images which in turn can be used as truth data by word recognizers to create transcriptions for the remaining scanned documents. The tedious process of manually dragging a box around each word in an image and keying in the annotations could thus be avoided. In forensic document evaluations capturing characteristics specific to a writer are of paramount importance both in writer identification and writer verification. Thus if a mapping algorithm correctly maps word images to lexicon words during preprocessing the accuracy of writer recognition would improve remarkably. For existing scanned images the alignment enables one to build interfaces where the transcript text can be browsed alongside the manuscript.
  9. Existing keyword spotting approaches can be classified into two categories: (a) Image based and (b) OCR based In image feature based indexing approaches, after preprocessing of document images and word segmentation, feature vectors are extracted from word images and stored in a database. When a user provides a query word, the similarity between the query and the word image in the database is computed, and word images are returned in the decreasing order of similarities. (b) In OCR based approaches, the indices are built from OCR scores such posterior probabilities or feature vector observational likelihoods (probability density) converted from distances returned by word recognizer.
  10. Existing keyword spotting approaches can be classified into two categories: (a) Image based and (b) OCR based In image feature based indexing approaches, after preprocessing of document images and word segmentation, feature vectors are extracted from word images and stored in a database. When a user provides a query word, the similarity between the query and the word image in the database is computed, and word images are returned in the decreasing order of similarities. (b) In OCR based approaches, the indices are built from OCR scores such posterior probabilities or feature vector observational likelihoods (probability density) converted from distances returned by word recognizer.