SlideShare a Scribd company logo
1 of 25
Download to read offline
Tools for text digitisation and transcription 
Tools for text digitisation 
and transcription 
Tomasz Parkoła 
Poznan Supercomputing and Networking Center 
CERL annual seminar, 28.10.2014, Oslo, Norway
Tools for text digitisation and transcription 
Agenda 
• IMPACT Center of Competence 
• Expertise, tools & events 
• PSNC example 
• Summary
Tools for text digitisation and transcription 
IMPACT Centre of Competence in digitisation 
IMPACT 
CoC 
Content 
holders 
Service 
providers 
Researchers 
• Other competence 
centres and initiatives 
• Europeana 
• Research infrastructures
Tools for text digitisation and transcription 
IMPACT CoC members 
Premium members 
• Biblioteca Nacional de España 
• Bibliothèque nationale de France 
• British Library 
• Fraunhofer-Gesellschaft zur 
Förderung der angewandten 
Forschung 
• Fundación Biblioteca Virtual Miguel de 
Cervantes (Management and 
headquarters) 
• Instituut voor Nederlandse Lexicologie 
• Koninklijke Bibliotheek 
• Contentra Technologies (formerly 
Planman Technologies) 
• Poznań Supercomputing and 
Networking Center 
• Universidad de Alicante 
Standard members 
• Biblioteka Uniwersytecka we 
Wrocławiu (Wroclaw University 
Library) 
• California Digital Library 
• Centro de documentación teatral 
• DIGIBIS 
• Elzaburu 
• Göteborgs Universitet 
• Hochschulbibliothekszentrum des 
Landes Nordrhein-Westfalen 
(University Library Centre of North 
Rhine-Westphalia) 
• i2s Digibook 
• KU Leuven 
• Kungliga Biblioteket (National Library 
of Sweden) 
• LIBNOVA 
• Ludwig-Maximilians-Universität, 
Centrum für Informations- und 
Sprachverarbeitung 
Standard members (cont.) 
• Narodna in univerzitetna knjižnica 
(National Library of Slovenia) 
• National Library of Czech Republic 
• National Library of Egypt 
• National Library of Finland 
• National Library of Latvia 
• National Library of Serbia 
• Staats- und Universitätsbibliothek 
Bremen 
• Tecnilógica 
• Universitat de Barcelona 
• Universidad Complutense de Madrid 
• Universidad de Granada 
• Universidad de Murcia 
• Universidad de Salamanca 
• Universidad de Valladolid 
• University of Salford 
• Vinfra
Tools for text digitisation and transcription 
IMPACT CoC members 
source: 
h*p://www.amcharts.com/visited_countries/
Tools for text digitisation and transcription 
IMPACT CoC members 
source: 
h*p://www.amcharts.com/visited_countries/
Tools for text digitisation and transcription 
Main activities led by IMPACT CoC 
Website 
with 
tools, 
resources 
and 
guidance 
for 
digi;sa;on 
prac;ces 
Consulta-on 
in 
the 
context 
of 
tools 
and 
resources 
that 
can 
be 
applied 
in 
the 
digi;sa;on 
workflow 
Support 
for 
members 
to 
create 
new 
research 
ini;a;ves, 
projects 
and 
expert 
groups 
Knowledge 
dissemina-on 
via 
social 
media, 
events 
organisa;on 
and 
training 
materials
Tools for text digitisation and transcription 
Key benefits for members 
Cultural 
heritage 
Research 
centres 
Companies 
• Access, 
validate 
and 
iden;fy 
best 
digi;sa;on 
technologies 
• Meet 
the 
experts 
and 
define 
best 
prac;ces 
• Share 
experience 
and 
guide 
innova;on 
• Learn 
about 
research 
challenges 
• Collaborate 
and 
provide 
solu;on 
• Find 
project 
partners, 
sponsors 
or 
facilitators 
• Share 
knowledge 
and 
experience 
• Showcase 
your 
tools 
and 
services 
• Meet 
your 
target 
customers 
• Introduce 
innova;on
Tools for text digitisation and transcription 
Example: 
Geometric correction in the demonstrator platform
Tools for text digitisation and transcription 
Expertise and experience: overview 
ICT 
Innova;on 
Digi;sa;on 
Services 
and 
support 
R&D 
Face 
challenges
Tools for text digitisation and transcription 
Expertise and experience: examples 
• Projects 
• Consultancy 
• Spike-­‐solu;ons 
• Interoperability, 
standards, 
formats 
and 
licensing 
• Blogs, 
events, 
working 
groups 
• Public-­‐private 
partnership 
and 
sponsorship 
• Scanning, 
conversion, 
etc. 
• OCR 
enhancement 
& 
adapta;on 
• Legal 
consultancy 
• Tools 
integra;on 
• Workflow 
management 
• Online 
access 
Digi;sa;on 
projects 
Digi;sa;on 
services 
Research 
and 
development 
Community 
& 
coopera;on
Tools for text digitisation and transcription 
Tools: overview 
Image 
enhancement 
IMPACT 
plaTorm 
Segmenta;on 
OCR 
Evalua;on 
engines 
Other
Tools for text digitisation and transcription 
Tools: examples (http://digitisation.eu/demonstrator-platform) 
Image 
enhancement 
NCSR 
Border 
removal 
NCSR 
Geometric 
Correc;on 
NCSR 
Binarisa;on 
Abbyy 
FineReader 
10 
Binarisa;on 
Unpaper
Tools for text digitisation and transcription 
Tools: examples (http://digitisation.eu/demonstrator-platform) 
Segmenta;on 
Abbyy 
FineReader 
10 
Segmenta;on 
Uni. 
Salford 
region, 
line, 
word 
Segmenta;on 
Service 
NCSR 
character 
segmenta;on. 
Uni. 
Innsbruck
Tools for text digitisation and transcription 
Tools: examples (http://digitisation.eu/demonstrator-platform) 
OCR 
engines 
Abbyy 
FineReader 
10 
OCR 
Abbyy 
FineReader 
10 
with 
external 
dic;onary 
Uni. 
Salford 
Typewri*en 
OCR 
Tesseract 
3.00 
Gocr 
Ocropus 
Cuneiform
Tools for text digitisation and transcription 
Tools: examples (http://digitisation.eu/demonstrator-platform) 
Evalua;on 
NCSR 
OCR 
Evalua;on 
service 
Uni. 
Salford 
layout 
evalua;on 
service 
INL 
word 
evalua;on 
service 
A 
B
Tools for text digitisation and transcription 
Tools: examples (http://digitisation.eu/demonstrator-platform) 
Other 
ALTO 
and 
PAGE 
XML 
transforma;on 
Uni. 
Salford 
ground-­‐truth 
normalisa;on 
Uni. 
Salford 
PAGE 
XML 
to 
svg 
JP2 
Exif
Tools for text digitisation and transcription 
Resources 
• Linguistic data 
– OCR/IR lexica: Slovene, German, Spanish, 
Czech, Polish, Dutch, English, French and 
other coming soon. 
• Images and ground truth 
– Czech, Spanish, Polish, Bulgarian, Slovene, 
Biodiversity Heritage Library and other 
coming soon. 
• Annotated corpora 
– Spanish.
Tools for text digitisation and transcription 
Recent activities 
• Past events (2013-2014) 
– Developer workshop for tools integration 
– TPDL tutorial (state of the art tools for text digitisation) 
– Digitisation Days, DATecH and Succeed awards 
– Workshop to investigate interoperability issues in digitisation 
• Upcoming events (2014-2015) 
– November 28th, 2014: Succeed in digitisation. Spreading Excellence. 
http://www.succeed-project.eu/succeed-digitisation 
– September, 2015: TPDL 2015 organised by PSNC (premium 
member of IMPACT CoC) http://tpdl2015.info/ 
– 2016: Digitisation Days and DATecH 
• Supporting take-up of tools and resources 
– A dozen of cultural heritage institutions validating and integrating 
state of the art digitisation tools (Tesseract, ScanTailor, ImageMagick, 
JHOVE, NER, Korrektor, Gimp, Alchemy API, COBaLT, Omnipage, Abbyy 
FR) 
• Cooperation with other initiatives and centres of competence 
(e.g. Open Preservation Foundation)
Tools for text digitisation and transcription 
PSNC example: who are we? 
PSNC is a R&D centre located in Poznan, 
Poland, focused on ICT in the context of : 
• Cloud technologies (archiving & 
computing) and HPC 
• Network technologies (protocols, tools, 
management) 
• Innovative applications & services
Tools for text digitisation and transcription 
PSNC example: who are we? 
In the context of cultural heritage we provide 
• Polish National aggregator for Europeana and others 
• DInGO toolset for digitisation projects with over 100 
production-mode deployments 
• Virtual Transcription Laboratory tool with OCR training 
module and OCR execution support 
• Expertise (over 20 R&D projects, Digital Libraries Conference, 
training, workshops, consultancy)
Tools for text digitisation and transcription 
PSNC example: Virtual Transcription Laboratory 
Web portal which provides access to: 
• Cutouts tool (creation of customised recognition 
profiles) 
• OCR engine with multiple recognition profiles 
• Transcription editor with QA interface and group work 
support 
• TXT, hOCR and ePUB output formats 
http://wlt.synat.pcss.pl/
Tools for text digitisation and transcription 
PSNC example: why? 
High 
quality, 
efficient 
mass 
digi;sa;on 
is 
currently 
one 
of 
the 
crucial 
challenges 
for 
cultural 
heritage 
ins;tu;ons. 
PSNC 
is 
involved 
in 
the 
IMPACT 
CoC 
to 
help 
these 
ins;tu;ons 
to 
overcome 
exis;ng 
barriers 
and 
to 
face 
new 
challenges 
in 
this 
context. 
We 
believe 
that 
exper;se 
of 
IMPACT 
CoC 
members 
can 
significantly 
contribute 
to 
successful 
R&D 
ac;vi;es 
and 
cubng-­‐edge 
informa;on 
technologies 
for 
digi;sa;on. 
1 
2 
3
Tools for text digitisation and transcription 
Summary: Join us! 
Become 
standard 
member 
get 
support 
in 
the 
digi;sa;on 
programmes 
access 
part 
of 
the 
IMPACT 
CoC 
resources 
meet 
the 
experts 
advice 
Become 
premium 
member 
steer 
ac;vi;es 
led 
by 
IMPACT 
CoC 
get 
full 
access 
to 
IMPACT 
CoC 
resources 
ac;vely 
inves;gate 
and 
innovate 
digi;sa;on 
Contact: 
info@digi;sa;on.eu
Tools for text digitisation and transcription 
Thank you! 
Tomasz Parkoła 
tparkola@man.poznan.pl 
CERL annual seminar, 28.10.2014, Oslo, Norway

More Related Content

Viewers also liked

IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...IMPACT Centre of Competence
 
Biblioteca Virtual Miguel de Cervantes - Oskarbi Zubiarrain
Biblioteca Virtual Miguel de Cervantes - Oskarbi ZubiarrainBiblioteca Virtual Miguel de Cervantes - Oskarbi Zubiarrain
Biblioteca Virtual Miguel de Cervantes - Oskarbi ZubiarrainIMPACT Centre of Competence
 
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. 2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. IMPACT Centre of Competence
 
7. Technical development at the Meertens Institute. Marc Kemps Snijders.
7. Technical development at the Meertens Institute. Marc Kemps Snijders.7. Technical development at the Meertens Institute. Marc Kemps Snijders.
7. Technical development at the Meertens Institute. Marc Kemps Snijders.IMPACT Centre of Competence
 
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...IMPACT Centre of Competence
 
University library of KU Leuven - Sam Alloing et Demmy Verbecke
University library of KU Leuven - Sam Alloing et Demmy VerbeckeUniversity library of KU Leuven - Sam Alloing et Demmy Verbecke
University library of KU Leuven - Sam Alloing et Demmy VerbeckeIMPACT Centre of Competence
 
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...IMPACT Centre of Competence
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...IMPACT Centre of Competence
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...IMPACT Centre of Competence
 
IMPACT Final Event 26-06-2012 - Library experiences in IMPACT: National and ...
IMPACT Final Event 26-06-2012  - Library experiences in IMPACT: National and ...IMPACT Final Event 26-06-2012  - Library experiences in IMPACT: National and ...
IMPACT Final Event 26-06-2012 - Library experiences in IMPACT: National and ...IMPACT Centre of Competence
 
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]IMPACT Centre of Competence
 
Succeed Validation and Take up of Tools - Katrien Depuydt
Succeed Validation and Take up of Tools - Katrien DepuydtSucceed Validation and Take up of Tools - Katrien Depuydt
Succeed Validation and Take up of Tools - Katrien DepuydtIMPACT Centre of Competence
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...IMPACT Centre of Competence
 

Viewers also liked (16)

CONCERT IMPACT by Lotte Wilms
CONCERT IMPACT by Lotte WilmsCONCERT IMPACT by Lotte Wilms
CONCERT IMPACT by Lotte Wilms
 
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
 
Biblioteca Virtual Miguel de Cervantes - Oskarbi Zubiarrain
Biblioteca Virtual Miguel de Cervantes - Oskarbi ZubiarrainBiblioteca Virtual Miguel de Cervantes - Oskarbi Zubiarrain
Biblioteca Virtual Miguel de Cervantes - Oskarbi Zubiarrain
 
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project. 2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
2. Interoperability framework and Taverna. Enrique Molla, Succeed Project.
 
7. Technical development at the Meertens Institute. Marc Kemps Snijders.
7. Technical development at the Meertens Institute. Marc Kemps Snijders.7. Technical development at the Meertens Institute. Marc Kemps Snijders.
7. Technical development at the Meertens Institute. Marc Kemps Snijders.
 
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...
IMPACT Final Event 26-06-2012 - The IMPACT Centre of Competence by Rafael Car...
 
Kennisbank IMPACT by Lotte Wilms
Kennisbank IMPACT by Lotte WilmsKennisbank IMPACT by Lotte Wilms
Kennisbank IMPACT by Lotte Wilms
 
University library of KU Leuven - Sam Alloing et Demmy Verbecke
University library of KU Leuven - Sam Alloing et Demmy VerbeckeUniversity library of KU Leuven - Sam Alloing et Demmy Verbecke
University library of KU Leuven - Sam Alloing et Demmy Verbecke
 
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...
Datech2014 - Session 2 - An approach to Unsupervised Historical Text Normalis...
 
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...Datech2014-Session1-Document Representation Refinement for Precise Region Des...
Datech2014-Session1-Document Representation Refinement for Precise Region Des...
 
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
Datech2014 - Session 4 - Construction of Text Digitization System for Nôm His...
 
Image Enhancement tools by Lotte Wilms
Image Enhancement tools by Lotte WilmsImage Enhancement tools by Lotte Wilms
Image Enhancement tools by Lotte Wilms
 
IMPACT Final Event 26-06-2012 - Library experiences in IMPACT: National and ...
IMPACT Final Event 26-06-2012  - Library experiences in IMPACT: National and ...IMPACT Final Event 26-06-2012  - Library experiences in IMPACT: National and ...
IMPACT Final Event 26-06-2012 - Library experiences in IMPACT: National and ...
 
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
Impact centre of_competence_for_workshop_ocr_rouen_march_2011[1]
 
Succeed Validation and Take up of Tools - Katrien Depuydt
Succeed Validation and Take up of Tools - Katrien DepuydtSucceed Validation and Take up of Tools - Katrien Depuydt
Succeed Validation and Take up of Tools - Katrien Depuydt
 
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
Datech2014 - Session 5 - Wittgenstein’s Nachlass: WiTTFind and Wittgenstein A...
 

Similar to Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)

Building a digital scholarship centre on the successes of a Library Makerspace
Building a digital scholarship centre on the successes of a Library MakerspaceBuilding a digital scholarship centre on the successes of a Library Makerspace
Building a digital scholarship centre on the successes of a Library Makerspaceheila1
 
publishing production
publishing productionpublishing production
publishing productionEssam Obaid
 
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...TISP Project
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeGeorg Rehm
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysisPeter Bouda
 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual EuropeGeorg Rehm
 
Digital Excellence Program -- Success, Challenges, and The Future -- Business...
Digital Excellence Program -- Success, Challenges, and The Future -- Business...Digital Excellence Program -- Success, Challenges, and The Future -- Business...
Digital Excellence Program -- Success, Challenges, and The Future -- Business...robinphua
 
Presentation Eclic
Presentation EclicPresentation Eclic
Presentation EclicECLIC
 
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...CUBCCE Conference
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital agetechiaith
 
Workshop: Concluding Remarks
Workshop: Concluding RemarksWorkshop: Concluding Remarks
Workshop: Concluding Remarkslocloud
 
Naple presentation danish digital library
Naple presentation danish digital libraryNaple presentation danish digital library
Naple presentation danish digital libraryJakobheide
 

Similar to Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC) (20)

co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
Building a digital scholarship centre on the successes of a Library Makerspace
Building a digital scholarship centre on the successes of a Library MakerspaceBuilding a digital scholarship centre on the successes of a Library Makerspace
Building a digital scholarship centre on the successes of a Library Makerspace
 
publishing production
publishing productionpublishing production
publishing production
 
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...
Andrea Angiolini, Il Mulino, about Pandoracampus @ Frankfurt Book Fair 2014, ...
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual Europe
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 
KU Leuven - Words and numbers - ICoC
KU Leuven - Words and numbers - ICoCKU Leuven - Words and numbers - ICoC
KU Leuven - Words and numbers - ICoC
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
 
IMPACT Final Conference - Aly Conteh
IMPACT Final Conference - Aly ContehIMPACT Final Conference - Aly Conteh
IMPACT Final Conference - Aly Conteh
 
All WP Meeting Athens - Europeana Inside - Gordon McKenna
All WP Meeting Athens - Europeana Inside - Gordon McKennaAll WP Meeting Athens - Europeana Inside - Gordon McKenna
All WP Meeting Athens - Europeana Inside - Gordon McKenna
 
Digital Excellence Program -- Success, Challenges, and The Future -- Business...
Digital Excellence Program -- Success, Challenges, and The Future -- Business...Digital Excellence Program -- Success, Challenges, and The Future -- Business...
Digital Excellence Program -- Success, Challenges, and The Future -- Business...
 
DLCS
DLCSDLCS
DLCS
 
Presentation Eclic
Presentation EclicPresentation Eclic
Presentation Eclic
 
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Open Access Repository Capacity Strengthening Programme for Africa (OA-RCSP)
Open Access Repository Capacity Strengthening Programme for Africa (OA-RCSP)Open Access Repository Capacity Strengthening Programme for Africa (OA-RCSP)
Open Access Repository Capacity Strengthening Programme for Africa (OA-RCSP)
 
Workshop: Concluding Remarks
Workshop: Concluding RemarksWorkshop: Concluding Remarks
Workshop: Concluding Remarks
 
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
Naple presentation danish digital library
Naple presentation danish digital libraryNaple presentation danish digital library
Naple presentation danish digital library
 

More from IMPACT Centre of Competence

More from IMPACT Centre of Competence (20)

Session6 01.helmut schmid
Session6 01.helmut schmidSession6 01.helmut schmid
Session6 01.helmut schmid
 
Session1 03.hsian-an wang
Session1 03.hsian-an wangSession1 03.hsian-an wang
Session1 03.hsian-an wang
 
Session7 03.katrien depuydt
Session7 03.katrien depuydtSession7 03.katrien depuydt
Session7 03.katrien depuydt
 
Session7 02.peter kiraly
Session7 02.peter kiralySession7 02.peter kiraly
Session7 02.peter kiraly
 
Session6 04.giuseppe celano
Session6 04.giuseppe celanoSession6 04.giuseppe celano
Session6 04.giuseppe celano
 
Session6 03.sandra young
Session6 03.sandra youngSession6 03.sandra young
Session6 03.sandra young
 
Session6 02.jeremi ochab
Session6 02.jeremi ochabSession6 02.jeremi ochab
Session6 02.jeremi ochab
 
Session5 04.evangelos varthis
Session5 04.evangelos varthisSession5 04.evangelos varthis
Session5 04.evangelos varthis
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 03.juri opitz
Session2 03.juri opitzSession2 03.juri opitz
Session2 03.juri opitz
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)

  • 1. Tools for text digitisation and transcription Tools for text digitisation and transcription Tomasz Parkoła Poznan Supercomputing and Networking Center CERL annual seminar, 28.10.2014, Oslo, Norway
  • 2. Tools for text digitisation and transcription Agenda • IMPACT Center of Competence • Expertise, tools & events • PSNC example • Summary
  • 3. Tools for text digitisation and transcription IMPACT Centre of Competence in digitisation IMPACT CoC Content holders Service providers Researchers • Other competence centres and initiatives • Europeana • Research infrastructures
  • 4. Tools for text digitisation and transcription IMPACT CoC members Premium members • Biblioteca Nacional de España • Bibliothèque nationale de France • British Library • Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung • Fundación Biblioteca Virtual Miguel de Cervantes (Management and headquarters) • Instituut voor Nederlandse Lexicologie • Koninklijke Bibliotheek • Contentra Technologies (formerly Planman Technologies) • Poznań Supercomputing and Networking Center • Universidad de Alicante Standard members • Biblioteka Uniwersytecka we Wrocławiu (Wroclaw University Library) • California Digital Library • Centro de documentación teatral • DIGIBIS • Elzaburu • Göteborgs Universitet • Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (University Library Centre of North Rhine-Westphalia) • i2s Digibook • KU Leuven • Kungliga Biblioteket (National Library of Sweden) • LIBNOVA • Ludwig-Maximilians-Universität, Centrum für Informations- und Sprachverarbeitung Standard members (cont.) • Narodna in univerzitetna knjižnica (National Library of Slovenia) • National Library of Czech Republic • National Library of Egypt • National Library of Finland • National Library of Latvia • National Library of Serbia • Staats- und Universitätsbibliothek Bremen • Tecnilógica • Universitat de Barcelona • Universidad Complutense de Madrid • Universidad de Granada • Universidad de Murcia • Universidad de Salamanca • Universidad de Valladolid • University of Salford • Vinfra
  • 5. Tools for text digitisation and transcription IMPACT CoC members source: h*p://www.amcharts.com/visited_countries/
  • 6. Tools for text digitisation and transcription IMPACT CoC members source: h*p://www.amcharts.com/visited_countries/
  • 7. Tools for text digitisation and transcription Main activities led by IMPACT CoC Website with tools, resources and guidance for digi;sa;on prac;ces Consulta-on in the context of tools and resources that can be applied in the digi;sa;on workflow Support for members to create new research ini;a;ves, projects and expert groups Knowledge dissemina-on via social media, events organisa;on and training materials
  • 8. Tools for text digitisation and transcription Key benefits for members Cultural heritage Research centres Companies • Access, validate and iden;fy best digi;sa;on technologies • Meet the experts and define best prac;ces • Share experience and guide innova;on • Learn about research challenges • Collaborate and provide solu;on • Find project partners, sponsors or facilitators • Share knowledge and experience • Showcase your tools and services • Meet your target customers • Introduce innova;on
  • 9. Tools for text digitisation and transcription Example: Geometric correction in the demonstrator platform
  • 10. Tools for text digitisation and transcription Expertise and experience: overview ICT Innova;on Digi;sa;on Services and support R&D Face challenges
  • 11. Tools for text digitisation and transcription Expertise and experience: examples • Projects • Consultancy • Spike-­‐solu;ons • Interoperability, standards, formats and licensing • Blogs, events, working groups • Public-­‐private partnership and sponsorship • Scanning, conversion, etc. • OCR enhancement & adapta;on • Legal consultancy • Tools integra;on • Workflow management • Online access Digi;sa;on projects Digi;sa;on services Research and development Community & coopera;on
  • 12. Tools for text digitisation and transcription Tools: overview Image enhancement IMPACT plaTorm Segmenta;on OCR Evalua;on engines Other
  • 13. Tools for text digitisation and transcription Tools: examples (http://digitisation.eu/demonstrator-platform) Image enhancement NCSR Border removal NCSR Geometric Correc;on NCSR Binarisa;on Abbyy FineReader 10 Binarisa;on Unpaper
  • 14. Tools for text digitisation and transcription Tools: examples (http://digitisation.eu/demonstrator-platform) Segmenta;on Abbyy FineReader 10 Segmenta;on Uni. Salford region, line, word Segmenta;on Service NCSR character segmenta;on. Uni. Innsbruck
  • 15. Tools for text digitisation and transcription Tools: examples (http://digitisation.eu/demonstrator-platform) OCR engines Abbyy FineReader 10 OCR Abbyy FineReader 10 with external dic;onary Uni. Salford Typewri*en OCR Tesseract 3.00 Gocr Ocropus Cuneiform
  • 16. Tools for text digitisation and transcription Tools: examples (http://digitisation.eu/demonstrator-platform) Evalua;on NCSR OCR Evalua;on service Uni. Salford layout evalua;on service INL word evalua;on service A B
  • 17. Tools for text digitisation and transcription Tools: examples (http://digitisation.eu/demonstrator-platform) Other ALTO and PAGE XML transforma;on Uni. Salford ground-­‐truth normalisa;on Uni. Salford PAGE XML to svg JP2 Exif
  • 18. Tools for text digitisation and transcription Resources • Linguistic data – OCR/IR lexica: Slovene, German, Spanish, Czech, Polish, Dutch, English, French and other coming soon. • Images and ground truth – Czech, Spanish, Polish, Bulgarian, Slovene, Biodiversity Heritage Library and other coming soon. • Annotated corpora – Spanish.
  • 19. Tools for text digitisation and transcription Recent activities • Past events (2013-2014) – Developer workshop for tools integration – TPDL tutorial (state of the art tools for text digitisation) – Digitisation Days, DATecH and Succeed awards – Workshop to investigate interoperability issues in digitisation • Upcoming events (2014-2015) – November 28th, 2014: Succeed in digitisation. Spreading Excellence. http://www.succeed-project.eu/succeed-digitisation – September, 2015: TPDL 2015 organised by PSNC (premium member of IMPACT CoC) http://tpdl2015.info/ – 2016: Digitisation Days and DATecH • Supporting take-up of tools and resources – A dozen of cultural heritage institutions validating and integrating state of the art digitisation tools (Tesseract, ScanTailor, ImageMagick, JHOVE, NER, Korrektor, Gimp, Alchemy API, COBaLT, Omnipage, Abbyy FR) • Cooperation with other initiatives and centres of competence (e.g. Open Preservation Foundation)
  • 20. Tools for text digitisation and transcription PSNC example: who are we? PSNC is a R&D centre located in Poznan, Poland, focused on ICT in the context of : • Cloud technologies (archiving & computing) and HPC • Network technologies (protocols, tools, management) • Innovative applications & services
  • 21. Tools for text digitisation and transcription PSNC example: who are we? In the context of cultural heritage we provide • Polish National aggregator for Europeana and others • DInGO toolset for digitisation projects with over 100 production-mode deployments • Virtual Transcription Laboratory tool with OCR training module and OCR execution support • Expertise (over 20 R&D projects, Digital Libraries Conference, training, workshops, consultancy)
  • 22. Tools for text digitisation and transcription PSNC example: Virtual Transcription Laboratory Web portal which provides access to: • Cutouts tool (creation of customised recognition profiles) • OCR engine with multiple recognition profiles • Transcription editor with QA interface and group work support • TXT, hOCR and ePUB output formats http://wlt.synat.pcss.pl/
  • 23. Tools for text digitisation and transcription PSNC example: why? High quality, efficient mass digi;sa;on is currently one of the crucial challenges for cultural heritage ins;tu;ons. PSNC is involved in the IMPACT CoC to help these ins;tu;ons to overcome exis;ng barriers and to face new challenges in this context. We believe that exper;se of IMPACT CoC members can significantly contribute to successful R&D ac;vi;es and cubng-­‐edge informa;on technologies for digi;sa;on. 1 2 3
  • 24. Tools for text digitisation and transcription Summary: Join us! Become standard member get support in the digi;sa;on programmes access part of the IMPACT CoC resources meet the experts advice Become premium member steer ac;vi;es led by IMPACT CoC get full access to IMPACT CoC resources ac;vely inves;gate and innovate digi;sa;on Contact: info@digi;sa;on.eu
  • 25. Tools for text digitisation and transcription Thank you! Tomasz Parkoła tparkola@man.poznan.pl CERL annual seminar, 28.10.2014, Oslo, Norway