SlideShare ist ein Scribd-Unternehmen logo
1 von 8
2nd Succeed Hackathon 
10-11 April 2014, University of Alicante 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
Background 
 Libraries, archives, museums are digitising large quantities of (mainly historic) 
documents like books, newspapers, journals. 
 To make these digital documents searchable, images must first be converted to 
electronic text with help of OCR (optical character recognition) 
 Off the shelf OCR software is not suitable for processing historical documents – 
problems e.g. with old fonts, historic language variation, quality of paper originals 
 2009-2012: EU project IMPACT (IMProving ACcess to Text) 
(www.impact-project.eu) 
 2011: Official launch of the IMPACT Centre of Competence 
(www.digitisation.eu) 
 2013-2014: EU project SUCCEED 
(www.succeed-project.eu) 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
Tools, tools, tools 
“I have this great tool, it does XYZ… 
…if you call it with the right parameters… 
…and run it in this environment… 
…with these minor tweaks…” 
Succeed maintains an extensive list of tools for digitisation: 
http://succeed-project.eu/publications/available-tools/index-succeed 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
Interoperability 
VS 
VS 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
Bintray/Maven integration 
All software binaries can be directly downloaded from 
https://bintray.com/impactocr/maven 
You can also integrate them into your Maven project by adding this to your pom.xml: 
<repositories> 
<repository> 
<id>impactocr</id> 
<url>http://dl.bintray.com/impactocr/maven/</url> 
</repository> 
</repositories> 
<dependency> 
<groupId>eu.impact_project.iif.ws</groupId> 
<artifactId>generic-soap-client</artifactId> 
<version>0.7.0</version> 
</dependency> 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
More and more... 
OCR evaluation tool 
(https://github.com/impactcentre/ocrevalUAtion) 
= compare ground truth with OCR result 
PAGE generator 
(https://github.com/psnc-dl/page-generator) 
= generate training data for Tesseract OCR from PAGE xml 
Franken+ 
(https://github.com/idhmc-tamu/FrankenPlus) 
= create new training sets for Tesseract OCR 
Format converter 
(https://github.com/subugoe/format-converter) 
Convert between different text formats, e.g. ALTO, TEI, FRXML 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
2nd Hackathon, 10-11 April, Alicante 
Last but not least... 
Be creative and have fun coding and 
experimenting! 
Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.

Weitere ähnliche Inhalte

Ähnlich wie Succeed 2nd hackathon

AEGIS SP4 story - building an accessible mobile application
AEGIS SP4 story - building an accessible mobile applicationAEGIS SP4 story - building an accessible mobile application
AEGIS SP4 story - building an accessible mobile applicationAEGIS-ACCESSIBLE Projects
 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)cneudecker
 
LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15locloud
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
 
AEGIS SP3 story - building an accessible web application
AEGIS SP3 story - building an accessible web applicationAEGIS SP3 story - building an accessible web application
AEGIS SP3 story - building an accessible web applicationAEGIS-ACCESSIBLE Projects
 
IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KBcneudecker
 
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. 2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. IMPACT Centre of Competence
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Sandro D'Elia
 
MICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisMICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisThomas Kurz
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Daycneudecker
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011David F. Flanders
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdEOSC-hub project
 
CreativiTIC - Presentation Business Brochure!
CreativiTIC - Presentation Business Brochure!CreativiTIC - Presentation Business Brochure!
CreativiTIC - Presentation Business Brochure!Jorge R. López Benito
 
I1 anna lobovikov katz-elaich-eng
I1 anna lobovikov katz-elaich-engI1 anna lobovikov katz-elaich-eng
I1 anna lobovikov katz-elaich-engevaminerva
 
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Istvan Rath
 

Ähnlich wie Succeed 2nd hackathon (20)

Accessible project newsletter 5
Accessible project newsletter 5Accessible project newsletter 5
Accessible project newsletter 5
 
AEGIS SP4 story - building an accessible mobile application
AEGIS SP4 story - building an accessible mobile applicationAEGIS SP4 story - building an accessible mobile application
AEGIS SP4 story - building an accessible mobile application
 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
 
LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15LoCloud Annual Publishable Summary 2014-15
LoCloud Annual Publishable Summary 2014-15
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
AEGIS SP3 story - building an accessible web application
AEGIS SP3 story - building an accessible web applicationAEGIS SP3 story - building an accessible web application
AEGIS SP3 story - building an accessible web application
 
IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
 
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. 2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
VERITAS newsletter n° 4
VERITAS newsletter n° 4VERITAS newsletter n° 4
VERITAS newsletter n° 4
 
MICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media AnalysisMICO — Towards Contextual Media Analysis
MICO — Towards Contextual Media Analysis
 
New Goals of PARES: Spanish Archives Web Portal
New Goals of PARES: Spanish Archives Web PortalNew Goals of PARES: Spanish Archives Web Portal
New Goals of PARES: Spanish Archives Web Portal
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 
CV - Resume
CV - ResumeCV - Resume
CV - Resume
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Day
 
The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011The Archives Forum - The National Archives - 02 March 2011
The Archives Forum - The National Archives - 02 March 2011
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
 
CreativiTIC - Presentation Business Brochure!
CreativiTIC - Presentation Business Brochure!CreativiTIC - Presentation Business Brochure!
CreativiTIC - Presentation Business Brochure!
 
I1 anna lobovikov katz-elaich-eng
I1 anna lobovikov katz-elaich-engI1 anna lobovikov katz-elaich-eng
I1 anna lobovikov katz-elaich-eng
 
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
Eclipse DemoCamp Budapest 2016 November: Best of EclipseCon Europe 2016
 

Mehr von cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

Mehr von cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Kürzlich hochgeladen

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Succeed 2nd hackathon

  • 1. 2nd Succeed Hackathon 10-11 April 2014, University of Alicante Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 2. 2nd Hackathon, 10-11 April, Alicante Background  Libraries, archives, museums are digitising large quantities of (mainly historic) documents like books, newspapers, journals.  To make these digital documents searchable, images must first be converted to electronic text with help of OCR (optical character recognition)  Off the shelf OCR software is not suitable for processing historical documents – problems e.g. with old fonts, historic language variation, quality of paper originals  2009-2012: EU project IMPACT (IMProving ACcess to Text) (www.impact-project.eu)  2011: Official launch of the IMPACT Centre of Competence (www.digitisation.eu)  2013-2014: EU project SUCCEED (www.succeed-project.eu) Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 3. 2nd Hackathon, 10-11 April, Alicante Tools, tools, tools “I have this great tool, it does XYZ… …if you call it with the right parameters… …and run it in this environment… …with these minor tweaks…” Succeed maintains an extensive list of tools for digitisation: http://succeed-project.eu/publications/available-tools/index-succeed Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 4. 2nd Hackathon, 10-11 April, Alicante Interoperability VS VS Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 5.
  • 6. 2nd Hackathon, 10-11 April, Alicante Bintray/Maven integration All software binaries can be directly downloaded from https://bintray.com/impactocr/maven You can also integrate them into your Maven project by adding this to your pom.xml: <repositories> <repository> <id>impactocr</id> <url>http://dl.bintray.com/impactocr/maven/</url> </repository> </repositories> <dependency> <groupId>eu.impact_project.iif.ws</groupId> <artifactId>generic-soap-client</artifactId> <version>0.7.0</version> </dependency> Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 7. 2nd Hackathon, 10-11 April, Alicante More and more... OCR evaluation tool (https://github.com/impactcentre/ocrevalUAtion) = compare ground truth with OCR result PAGE generator (https://github.com/psnc-dl/page-generator) = generate training data for Tesseract OCR from PAGE xml Franken+ (https://github.com/idhmc-tamu/FrankenPlus) = create new training sets for Tesseract OCR Format converter (https://github.com/subugoe/format-converter) Convert between different text formats, e.g. ALTO, TEI, FRXML Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.
  • 8. 2nd Hackathon, 10-11 April, Alicante Last but not least... Be creative and have fun coding and experimenting! Succeed is supported by the European Union under FP7-ICT and coordinated by Universidad de Alicante.