SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Development of Named Entities 
Recognition for French Newspapers 
Journée d’information 
Europeana Newspapers 
27/11/2014 
BnF / Paris, France 
Clemens Neudecker, State Library Berlin 
@cneudecker
What is „Named Entity Recognition“? 
• Named Entity Recognition (NER) is a sub-task of 
Information Extraction and is typically understood as being 
part of the area of Computational Linguistics / Natural 
Language Processing. 
• The main aim of NER is the automatic extraction and 
classification of knowledge or information from 
semantically unstructured text. 
• NER is still subject to academic research (cf. Google & 
MSR Competition) – practical use in the cultural heritage 
digitisation sector remains a rare case. 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 2
Asked differently: What is a „Named Entity“? 
• PERSON: 
• Names of persons and families, but also names of fictional 
persons („Albert Einstein“, „Präsident der USA“, „Micky Maus“) 
• ORGANISATION: 
• Names of companies, governemental or non-governemental 
organisations („IBM“, „The Beatles“, „Labour Party“) 
• PLACE: 
• Cities, Provinces, Counties, geographical areas, asf. 
(„Paris“, „Haute-Pyrénées“, „Alpes“) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
3
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
NER (I) 
4 
1. Detection/Classification of person names, places and 
organisations in a running text (includes POS)
NER (II) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
5 
2. Disambiguation of terms (Example “Jordan”) 
through contextual information
NER (III) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
6 
3. Linking to authority files and online databases 
(Linked Data)
Supported languages in ENP 
3 Languages: 
• German 
• Dutch 
• French 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
7
Approaches 
• Machine learning vs. rule-based 
• Advantages of machine-learning systems: 
• No need for specific linguistic expertise 
• Processing of large amounts of material 
• Advantages of rule-based systems: 
• Can be tuned to very high accuracy for particular texts 
• Adaptation to local grammar and specific text style 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
8
Software 
• Open Source ML software developed by Stanford 
University, adapted and extended for Europeana 
Newspapers by the KB National Library of the Netherlands 
• Software is available as open source from Github for 
download and testing: 
https://github.com/KBNLresearch/europeananp-ner 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
9
Training 
• Training the NER systems with the help of manually 
annotated corpora („gold corpus“) and gazzetteers 
• Publication of annotated data from ENP as open data 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
10
Encoding 
• Results of NER are stored in a library specific format: 
ALTO (Analyzed Layout and Text Object) 
• Versions > 2.1 of ALTO specifically allow to use NER „Tags“ 
<String STYLEREFS="ID7" HEIGHT="132.0" WIDTH="570.0" HPOS="5937.0" 
VPOS="3279.0" CONTENT="Reynolds" WC="0.95238096" TAGREFS="Tag5"></String> 
<String STYLEREFS="ID7" HEIGHT="102.0" WIDTH="540.0" HPOS="18438.0" 
VPOS="22008.0" CONTENT="Baltimore" WC="0.82539684„ TAGREFS="Tag10"></String> 
… 
<Tags> 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
11 
<NamedEntityTag ID="Tag5" TYPE="Person" LABEL="Reynolds"/> 
<NamedEntityTag ID="Tag6" TYPE=”Place" LABEL=”Baltimore"/> 
</Tags>
Problems and challenges 
• OCR errors reduce the accuracy of the classification and 
slow down the overall processing time for recognition due to 
high noise. 
• Historical spelling variation for person names and place 
names in particular. 
• In many cases the historical spelling variants can not be 
found in online knowledge bases. 
 Specific adaptation of the software via external modules 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
12
Initial results: Dutch 
Persons Places Organisations 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
13 
Precision 0.940 0.950 0.942 
Recall 0.588 0.760 0.559 
F-measure 0.689 0.838 0.671
Why Named Entity Recognition? 
• Example: Analysis of log files from the newspaper collection of 
the National Library of Wales shows that 9 out of 10 queries 
are for a person or place name! 
(Source: Paul Gooding, Exploring Usage of Digital Newspaper Archives through Web Log 
Analysis: A Case Study of Welsh Newspapers Online, presented at DH2014, Lausanne) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
14
Thank you for your attention! 
Merci de votre attention! 
@eurnews 
http://www.europeana-newspapers.eu 
http://www.theeuropeanlibrary.org/tel4/newspapers 
http://www.europeana.eu/

Weitere ähnliche Inhalte

Was ist angesagt?

Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana Newspapers
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineLIBER Europe
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspapers
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Data Driven Innovation
 
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankThe EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankViola Zazzera
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectEuropeana Newspapers
 

Was ist angesagt? (20)

Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLieder
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project Overview
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday Genereux
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available online
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation Plan
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
EurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_NeudeckerEurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_Neudecker
 
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think TankThe EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
The EPO and Tecnology Transfer: a brief overview 4T-Tech Transfer Think Tank
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
 

Andere mochten auch

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayEuropeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 

Andere mochten auch (7)

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
DocWorks Demo
DocWorks DemoDocWorks Demo
DocWorks Demo
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
What is a named entity
What is a named entityWhat is a named entity
What is a named entity
 
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 

Ähnlich wie Presentation of Clemens Neudecker, BnF Information Day

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerBiblioteca Nacional de España
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxFIWARE
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Apulian ICT Living Labs
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapersEuropeana
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
Introducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedIntroducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedParthenos
 
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...European Data Forum
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE
 
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Jisc
 
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Invest Northern Ireland
 
European Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEuropean Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEOSC-hub project
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 

Ähnlich wie Presentation of Clemens Neudecker, BnF Information Day (20)

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
FP7-ICT Programme
FP7-ICT ProgrammeFP7-ICT Programme
FP7-ICT Programme
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
BeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptxBeOpen_Martino Maggio.pptx
BeOpen_Martino Maggio.pptx
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapers
 
Enoll hannover-2013-anna
Enoll hannover-2013-annaEnoll hannover-2013-anna
Enoll hannover-2013-anna
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
Introducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updatedIntroducing parthenos powerpoint presentation december 2015 updated
Introducing parthenos powerpoint presentation december 2015 updated
 
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...EDF2012   Stefano Bertolo - Future European activities and funding perspectiv...
EDF2012 Stefano Bertolo - Future European activities and funding perspectiv...
 
OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!OpenAIRE Infrastructure & Services: we need your input!
OpenAIRE Infrastructure & Services: we need your input!
 
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020NordForsk Open Access Reykjavik 14-15/8-2014: H2020
NordForsk Open Access Reykjavik 14-15/8-2014: H2020
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
Horizon 2020 e-infrastructures - Draft Horizon 2020 WorkProgramme 2014-2015
 
European Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunitiesEuropean Open Science Cloud: Concept, status and opportunities
European Open Science Cloud: Concept, status and opportunities
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 

Mehr von Europeana Newspapers

Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers
 

Mehr von Europeana Newspapers (20)

Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday Kempf
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday Bolioli
 
ENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillemsENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillems
 
ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen
 
ENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizingaENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizinga
 
ENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijnsENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijns
 
ENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijckENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijck
 
ENP_ONB_infoday_Schaller
ENP_ONB_infoday_SchallerENP_ONB_infoday_Schaller
ENP_ONB_infoday_Schaller
 
ENP_ONB_infoday_Neudecker
ENP_ONB_infoday_NeudeckerENP_ONB_infoday_Neudecker
ENP_ONB_infoday_Neudecker
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Kürzlich hochgeladen (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Presentation of Clemens Neudecker, BnF Information Day

  • 1. Development of Named Entities Recognition for French Newspapers Journée d’information Europeana Newspapers 27/11/2014 BnF / Paris, France Clemens Neudecker, State Library Berlin @cneudecker
  • 2. What is „Named Entity Recognition“? • Named Entity Recognition (NER) is a sub-task of Information Extraction and is typically understood as being part of the area of Computational Linguistics / Natural Language Processing. • The main aim of NER is the automatic extraction and classification of knowledge or information from semantically unstructured text. • NER is still subject to academic research (cf. Google & MSR Competition) – practical use in the cultural heritage digitisation sector remains a rare case. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
  • 3. Asked differently: What is a „Named Entity“? • PERSON: • Names of persons and families, but also names of fictional persons („Albert Einstein“, „Präsident der USA“, „Micky Maus“) • ORGANISATION: • Names of companies, governemental or non-governemental organisations („IBM“, „The Beatles“, „Labour Party“) • PLACE: • Cities, Provinces, Counties, geographical areas, asf. („Paris“, „Haute-Pyrénées“, „Alpes“) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 3
  • 4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp NER (I) 4 1. Detection/Classification of person names, places and organisations in a running text (includes POS)
  • 5. NER (II) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 5 2. Disambiguation of terms (Example “Jordan”) through contextual information
  • 6. NER (III) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 6 3. Linking to authority files and online databases (Linked Data)
  • 7. Supported languages in ENP 3 Languages: • German • Dutch • French This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 7
  • 8. Approaches • Machine learning vs. rule-based • Advantages of machine-learning systems: • No need for specific linguistic expertise • Processing of large amounts of material • Advantages of rule-based systems: • Can be tuned to very high accuracy for particular texts • Adaptation to local grammar and specific text style This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 8
  • 9. Software • Open Source ML software developed by Stanford University, adapted and extended for Europeana Newspapers by the KB National Library of the Netherlands • Software is available as open source from Github for download and testing: https://github.com/KBNLresearch/europeananp-ner This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 9
  • 10. Training • Training the NER systems with the help of manually annotated corpora („gold corpus“) and gazzetteers • Publication of annotated data from ENP as open data This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 10
  • 11. Encoding • Results of NER are stored in a library specific format: ALTO (Analyzed Layout and Text Object) • Versions > 2.1 of ALTO specifically allow to use NER „Tags“ <String STYLEREFS="ID7" HEIGHT="132.0" WIDTH="570.0" HPOS="5937.0" VPOS="3279.0" CONTENT="Reynolds" WC="0.95238096" TAGREFS="Tag5"></String> <String STYLEREFS="ID7" HEIGHT="102.0" WIDTH="540.0" HPOS="18438.0" VPOS="22008.0" CONTENT="Baltimore" WC="0.82539684„ TAGREFS="Tag10"></String> … <Tags> This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 11 <NamedEntityTag ID="Tag5" TYPE="Person" LABEL="Reynolds"/> <NamedEntityTag ID="Tag6" TYPE=”Place" LABEL=”Baltimore"/> </Tags>
  • 12. Problems and challenges • OCR errors reduce the accuracy of the classification and slow down the overall processing time for recognition due to high noise. • Historical spelling variation for person names and place names in particular. • In many cases the historical spelling variants can not be found in online knowledge bases.  Specific adaptation of the software via external modules This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 12
  • 13. Initial results: Dutch Persons Places Organisations This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 13 Precision 0.940 0.950 0.942 Recall 0.588 0.760 0.559 F-measure 0.689 0.838 0.671
  • 14. Why Named Entity Recognition? • Example: Analysis of log files from the newspaper collection of the National Library of Wales shows that 9 out of 10 queries are for a person or place name! (Source: Paul Gooding, Exploring Usage of Digital Newspaper Archives through Web Log Analysis: A Case Study of Welsh Newspapers Online, presented at DH2014, Lausanne) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 14
  • 15. Thank you for your attention! Merci de votre attention! @eurnews http://www.europeana-newspapers.eu http://www.theeuropeanlibrary.org/tel4/newspapers http://www.europeana.eu/