SlideShare ist ein Scribd-Unternehmen logo
1 von 23
1RANLP’2011, Hissar, Bulgaria, 12.09.2011
JRC-Names: A freely available, highly multilingual
named entity resource
Hissar, Bulgaria, 12 September 2011
Ralf Steinberger, Bruno Pouliquen, Mijail Kabadjov,
Jenya Belyaeva & Erik van der Goot
Technical details and publications: http://langtech.jrc.ec.europa.eu/
Applications: http://emm.newbrief.eu/overview.html
2RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
3RANLP’2011, Hissar, Bulgaria, 12.09.2011
What is JRC-Names?
• JRC-Names consists of:
• Lists of names and their many spelling variants,
• ~205,000 person and organisation names plus
• ~204,000 name spelling variants
• In 27 scripts and many more languages
• Software to recognise these names in multilingual text, with offset and unique name identifier
• Download from http://langtech.jrc.ec.europa.eu/
4RANLP’2011, Hissar, Bulgaria, 12.09.2011
Possible uses of JRC-Names
• Standardise name spellings in databases, text collections and the internet for
improved retrieval (Stern & Sagot 2010)
• Improve Machine Translation – names must be treated differently from other
words (Babych & Hartley 2003; Steinberger & Pouliquen 2009)
• Use as input to learn automatic transliteration rules (e.g. Pouliquen 2009)
• Use output of JRC-Names as seeds to learn NER rules (e.g. Buchholz & van
den Bosch 2000)
• Social networks are less biased by national viewpoints if based on information
extracted from multilingual texts
• NER results are useful for other text mining tasks (opinion mining; co-reference
resolution; summarisation; topic detection and tracking; cross-lingual linking of
related documents across languages; …)
5RANLP’2011, Hissar, Bulgaria, 12.09.2011
Related work – other multilingual (ml) NE resources
• Wentlant et al. (2008) – built a ml NE repository based on Wikipedia links and
case information; 2.5 Mio English names, 250K German, 3K Swahili, …
• Toral et al. (2008) – built Named Entity WordNet by searching NEs in WordNet
and Wikipedia: 310K entities, including 278K persons
• Stern & Sagot (2010) – exploit French Wikipedia and GeoNames to produce
French resource: 263K person names + 883K variants.
• Maurel (2009) –produced Prolexbase mostly manually: 75K entities of all types
•  Most resources are based on Wikipedia
• Strong at providing cross-lingual and cross-script variants
• Offers only few other spelling variants
• No morphological inflections
• JRC-Names contains mostly spelling variants from real-life text,
enriched with Wikipedia – up to 413 variants for the same NE.
6RANLP’2011, Hissar, Bulgaria, 12.09.2011
Name variants found and used in 6 hours (!) of EMM news analysis
26.08.2011, PM
7RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
8RANLP’2011, Hissar, Bulgaria, 12.09.2011
NER on news gathered by the Europe Media Monitor (EMM)
• ~ 3600 Sources (world-wide, with focus on Europe)
• ~ 3225 news sources (web portals)
• ~ 360 specialist medical sites
• ~ 20 commercial newswires
• Specialist pay-for sources (LexisMed)
• 24/7, updated every 10 minutes
• ~ 100,000 articles / day in ~ 50 languages
• Named Entity Recognition (NER) performed on 20 languages.
• Articles are fed into the various publicly accessible EMM applications:
9RANLP’2011, Hissar, Bulgaria, 12.09.2011
Multilingual NER in EMM – A brief overview
• Lookup of most frequent known names and their variants in all languages
• Database currently contains about 1,18 million names + 225.000 variants (status July 2011)
• Including morphological (and other) variants by pre-generating inflection forms
(Slovene example):
Tony(a|o|u|om|em|m|ju|jem|ja)?s+Blair(a|o|u|om|em|m|ju|jem|ja)
• Guessing new names using empirically-derived lexical patterns in 20 languages.
• President, Minister, Head of State, Sir, American
• “death of”, “[0-9]+-year-old”, …
• Known first names + uppercase words
• Identification of a current average of 1,000 unknown names per day.
• Only names found repeatedly will become known names (error reduction).
10RANLP’2011, Hissar, Bulgaria, 12.09.2011
Multilingual name recognition using lexical patterns
asesinato del exprimer ministro Rafic al-Hariri, que la oposición atribuyóes
l'assassinat de l'ex-dirigeant Rafic Hariri et le départ du chef de la diplomfr
na de moord op oud-premier Rafiq al-Hariri gingen gisteren bijna eennl
libanesischen Regierungschef Rafik Hariri vor einem Monat wichtige Bde
danjega libanonskega premiera Rafika Haririja. Libanonska opozicija sisl
möödumisele ekspeaminister Rafik al-Hariri surma põhjustanud pommiplet
death of former Prime Minister Rafik Hariri, blamed by many oppositionen
‫اغتيال‬‫السابق‬ ‫الوزراء‬ ‫رئيس‬‫الحريري‬ ‫رفيق‬‫سابقا‬ ‫حدث‬ ‫وما‬ ‫يهودية‬ ‫بأياد‬ar
Бывший премьер-министр Ливана Рафик Харири, которыйru
11RANLP’2011, Hissar, Bulgaria, 12.09.2011
Merging name variants for the same entity
• For all newly found name forms, detect whether they are a variant of an existing NE:
• Transliteration;
• Normalisation, using ~30 hand-written rules and removing vowels;
• Calculate similarity (threshold: 94%).
• Below threshold  new entity
20%
+
80%
Condition:
12RANLP’2011, Hissar, Bulgaria, 12.09.2011
Enriching the EMM data with Wikipedia name variants
• For frequent or highly visible names, manually launch a Wikipedia mining process.
• Check for each variant of a name whether there is a Wikipedia entry.
• New name variants, in all scripts, will be recognised in new EMM articles.
Хамид Карзай
Hamid Karzai
Hamid Karzaï
Hamid Karsai
‫كرزاي‬ ‫حامد‬
हामिद करजई
哈米德·卡尔扎伊
http://en.wikipedia.org/wiki/Hamid_Karzai
13RANLP’2011, Hissar, Bulgaria, 12.09.2011
Manual moderation of EMM name database
• Process is fully automatic, but it can be useful to make changes manually.
• Manual process only for frequent or important names (e.g. Nobel Prize winners):
• Name changes: (e.g. Cardinal Josef Ratzinger  Pope Benedict XVI)
• Correct NER mistakes (e.g. Genius Report, Opfer von Diskriminierung);
• Add new stop name parts (e.g. Monday, Report);
• Merge name variants with similarity below the threshold;
• Change the display name of an entity;
• Correct the entity type (PER, ORG, T, U, …);
• Launch Wikipedia mining process;
• …
• Caveat: Name database contains errors!
14RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
15RANLP’2011, Hissar, Bulgaria, 12.09.2011
Statistics on JRC-Names (1)
• JRC-Names include names from the EMM database if any of the following hold:
• Found in 5 or more news clusters;
• Manually verified;
• Retrieved from Wikipedia;
• Number of entries (status July 2011):
• 205,000 distinct names;
• 204,000 additional variants;
• ~3.2% names of organisations / events
• Number of variants:
• 413 variants for Muammar Gaddafi (entity 262)
• 256 variants for Mikhail Saakashvili (entity 472)
• 246 variants for Mahmoud Ahmadinejad (entity 101358)
• Grows by ~230 new entities and ~430 new variants per week.
Variant forms No. of entities
1 63.76%
2 22.52%
3 5.31%
10 or more 3760 entities
50 or more 242 entities
100 or more 37 entities
16RANLP’2011, Hissar, Bulgaria, 12.09.2011
Statistics on JRC-Names (2)
• Number of scripts: 27 Number of languages: ???
• News mentions names from
around the world.
• Frequency does not reflect origin
• European Union (10101) is most
frequent entity in German, and
second in English.
• It does not matter where a name
like Silvio Berlusconi comes from.
17RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
18RANLP’2011, Hissar, Bulgaria, 12.09.2011
Details about the JRC-Names software
• Java-implemented demonstrator
• Finite state automaton
• Reads the NE resource file entities.gzip (frequently updated)
• Searches for known names (and their variants) in UTF8-encoded text files. Returns:
• Numerical name identifier
• Main name for that entity
• Name string found in the text
• Position (Offset and string length)
• For any given name string, returns all variants.
• Software and NE resource file can be downloaded from
• http://langtech.jrc.ec.europe.eu/ , Section on ‘Resources’
• Free usage, according to accompanying end-user licence.
19RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
20RANLP’2011, Hissar, Bulgaria, 12.09.2011
Treatment of morphological inflections
• The recognition of morphological inflections used in EMM processing chain are
not currently part of JRC-Names.
• We are working on a solution to include morphological processing in a future
release of JRC-Names.
• Further variants will also be included more consistently:
• Hyphenation (e.g. Yves Saint-Laurent vs. Yves Saint Laurent)
• Names with and without name ‘infixes’ (e.g. Khan al Khalil vs. Khan Khalil)
• Abbreviations (e.g. Saint vs. St.)
• …
• Current solution: Add the approximately 45,000 full-forms of inflected names,
as found in EMM processing results since January 2011, to the resource file
entities.gzip
• This helps to recognise the most frequent inflection forms of the frequent names.
21RANLP’2011, Hissar, Bulgaria, 12.09.2011
Agenda
• What is JRC-Names; What can it be used for
• Related work: other named entity (NE) resources
• How JRC-Names was produced
• Recognition of named entities in news reports in 20 languages
• Introduction to EMM
• Automatic mapping of name variants to the same entity
• Enrichment with Wikipedia variants
• Partial manual moderation
• Statistics on JRC-Names
• Programming details / Usage of the tool
• Solutions to capture morphological variants
• Further multilingual linguistic resources
22RANLP’2011, Hissar, Bulgaria, 12.09.2011
Further JRC/EC-provided multilingual linguistic resources
• JRC-Acquis (2006): 1 billion word parallel corpus in 22 languages
• DGT-TM (2007): Translation Memory in 22 languages; up to 2 million segments
•  DGT-TM-2011 (forthcoming): 23 languages; 4 million segments? Yearly updates
• JEX (JRC Eurovoc Indexer) (forthcoming): software to automatically label texts
according to the thousands of categories of the Eurovoc thesaurus; 23 languages.
• Further smaller resources:
• Multilingual summary evaluation data (2010): 4 clusters for each of 7 languages
• Sentiment-annotated collection of quotations (2010): English (German forthcoming)
• Multilingual Named Entity-annotated parallel corpus (forthcoming)
• Available at http://langtech.jrc.ec.europa.eu/, section on ‘Resources’
23RANLP’2011, Hissar, Bulgaria, 12.09.2011
JRC-Names: A freely available, highly multilingual
named entity resource
Hissar, Bulgaria, 12 September 2011
Ralf Steinberger, Bruno Pouliquen, Mijail Kabadjov,
Jenya Belyaeva & Erik van der Goot
Technical details and publications: http://langtech.jrc.ec.europa.eu/
Applications: http://emm.newbrief.eu/overview.html

Weitere ähnliche Inhalte

Was ist angesagt?

Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesNikolaos Konstantinou
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Richard Urban
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technologySteve Ray
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsSimon Dew
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsMelanie Courtot
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jNeo4j
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Gema Ramirez-Sanchez
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...Word Occurrence Based Extraction of Work Contributors from Statements of Resp...
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...The European Library
 

Was ist angesagt? (16)

Publishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web TechnologiesPublishing Data Using Semantic Web Technologies
Publishing Data Using Semantic Web Technologies
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
Federated data stores using semantic web technology
Federated data stores using semantic web technologyFederated data stores using semantic web technology
Federated data stores using semantic web technology
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documents
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
cldr_overview
cldr_overviewcldr_overview
cldr_overview
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4j
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...Word Occurrence Based Extraction of Work Contributors from Statements of Resp...
Word Occurrence Based Extraction of Work Contributors from Statements of Resp...
 

Andere mochten auch

Cellar: Semantic Database of EU Law - Diplohack Brussels
Cellar: Semantic Database of EU Law - Diplohack BrusselsCellar: Semantic Database of EU Law - Diplohack Brussels
Cellar: Semantic Database of EU Law - Diplohack BrusselsOpen Knowledge Belgium
 
Open Data Governance as an Integral Part of a Smart City: How to start with?
Open Data Governance as an Integral Part of a Smart City: How to start with?Open Data Governance as an Integral Part of a Smart City: How to start with?
Open Data Governance as an Integral Part of a Smart City: How to start with?Open Knowledge Belgium
 
Estrategia smart city ajuntament bcn - la-salle_bcn
Estrategia smart city   ajuntament bcn - la-salle_bcnEstrategia smart city   ajuntament bcn - la-salle_bcn
Estrategia smart city ajuntament bcn - la-salle_bcnLa Salle BCN
 
Beyond the single act of creation - Sustainability of hackathons
Beyond the single act of creation - Sustainability of hackathonsBeyond the single act of creation - Sustainability of hackathons
Beyond the single act of creation - Sustainability of hackathonsOpen Knowledge Belgium
 
The economic value behind (Linked) Open Data - Nikolaos Loutas
The economic value behind (Linked) Open Data - Nikolaos LoutasThe economic value behind (Linked) Open Data - Nikolaos Loutas
The economic value behind (Linked) Open Data - Nikolaos LoutasOpen Knowledge Belgium
 
When you are given Open Science, what will you do with it?
When you are given Open Science, what will you do with it?When you are given Open Science, what will you do with it?
When you are given Open Science, what will you do with it?Open Knowledge Belgium
 
Smart City Analytics (español)
Smart City Analytics (español)Smart City Analytics (español)
Smart City Analytics (español)Stratebi
 
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PME
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PMERSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PME
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PMEAdm Medef
 
The economic value behind (Linked) Open Data - Toon Vanagt
 The economic value behind (Linked) Open Data - Toon Vanagt The economic value behind (Linked) Open Data - Toon Vanagt
The economic value behind (Linked) Open Data - Toon VanagtOpen Knowledge Belgium
 
Rapport de gestion 2015 medef
Rapport de gestion 2015 medefRapport de gestion 2015 medef
Rapport de gestion 2015 medefAdm Medef
 
Caso Práctico: Ciudad de Málaga. Smart Cities
Caso Práctico: Ciudad de Málaga. Smart CitiesCaso Práctico: Ciudad de Málaga. Smart Cities
Caso Práctico: Ciudad de Málaga. Smart CitiesDavid Bueno Vallejo
 
Supporting the Marine Strategy Framework Directive - Diplohack Brussels
Supporting the Marine Strategy Framework Directive - Diplohack BrusselsSupporting the Marine Strategy Framework Directive - Diplohack Brussels
Supporting the Marine Strategy Framework Directive - Diplohack BrusselsOpen Knowledge Belgium
 
Data Derby, Belgium versus the Netherlands
Data Derby, Belgium versus the NetherlandsData Derby, Belgium versus the Netherlands
Data Derby, Belgium versus the NetherlandsOpen Knowledge Belgium
 

Andere mochten auch (20)

Cellar: Semantic Database of EU Law - Diplohack Brussels
Cellar: Semantic Database of EU Law - Diplohack BrusselsCellar: Semantic Database of EU Law - Diplohack Brussels
Cellar: Semantic Database of EU Law - Diplohack Brussels
 
Unvote - Diplohack Brussels
 Unvote - Diplohack Brussels Unvote - Diplohack Brussels
Unvote - Diplohack Brussels
 
EU SectorLex - Diplohack Brussels
EU SectorLex - Diplohack BrusselsEU SectorLex - Diplohack Brussels
EU SectorLex - Diplohack Brussels
 
Open data De Lijn
Open data De LijnOpen data De Lijn
Open data De Lijn
 
refEUgo - Diplohack Brussels
refEUgo - Diplohack BrusselsrefEUgo - Diplohack Brussels
refEUgo - Diplohack Brussels
 
Open Data Governance as an Integral Part of a Smart City: How to start with?
Open Data Governance as an Integral Part of a Smart City: How to start with?Open Data Governance as an Integral Part of a Smart City: How to start with?
Open Data Governance as an Integral Part of a Smart City: How to start with?
 
Smartcities
SmartcitiesSmartcities
Smartcities
 
Estrategia smart city ajuntament bcn - la-salle_bcn
Estrategia smart city   ajuntament bcn - la-salle_bcnEstrategia smart city   ajuntament bcn - la-salle_bcn
Estrategia smart city ajuntament bcn - la-salle_bcn
 
Beyond the single act of creation - Sustainability of hackathons
Beyond the single act of creation - Sustainability of hackathonsBeyond the single act of creation - Sustainability of hackathons
Beyond the single act of creation - Sustainability of hackathons
 
EnergyZip - Diplohack Brussels
EnergyZip - Diplohack BrusselsEnergyZip - Diplohack Brussels
EnergyZip - Diplohack Brussels
 
The economic value behind (Linked) Open Data - Nikolaos Loutas
The economic value behind (Linked) Open Data - Nikolaos LoutasThe economic value behind (Linked) Open Data - Nikolaos Loutas
The economic value behind (Linked) Open Data - Nikolaos Loutas
 
Ciudades inteligentes
Ciudades inteligentesCiudades inteligentes
Ciudades inteligentes
 
When you are given Open Science, what will you do with it?
When you are given Open Science, what will you do with it?When you are given Open Science, what will you do with it?
When you are given Open Science, what will you do with it?
 
Smart City Analytics (español)
Smart City Analytics (español)Smart City Analytics (español)
Smart City Analytics (español)
 
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PME
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PMERSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PME
RSE : le Medef et EcoVadis publient un guide pratique pour accompagner les PME
 
The economic value behind (Linked) Open Data - Toon Vanagt
 The economic value behind (Linked) Open Data - Toon Vanagt The economic value behind (Linked) Open Data - Toon Vanagt
The economic value behind (Linked) Open Data - Toon Vanagt
 
Rapport de gestion 2015 medef
Rapport de gestion 2015 medefRapport de gestion 2015 medef
Rapport de gestion 2015 medef
 
Caso Práctico: Ciudad de Málaga. Smart Cities
Caso Práctico: Ciudad de Málaga. Smart CitiesCaso Práctico: Ciudad de Málaga. Smart Cities
Caso Práctico: Ciudad de Málaga. Smart Cities
 
Supporting the Marine Strategy Framework Directive - Diplohack Brussels
Supporting the Marine Strategy Framework Directive - Diplohack BrusselsSupporting the Marine Strategy Framework Directive - Diplohack Brussels
Supporting the Marine Strategy Framework Directive - Diplohack Brussels
 
Data Derby, Belgium versus the Netherlands
Data Derby, Belgium versus the NetherlandsData Derby, Belgium versus the Netherlands
Data Derby, Belgium versus the Netherlands
 

Ähnlich wie JRC-Names - EC - Diplohack Datamarket

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataDave Lewis
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashupsgiurca
 
Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Federico Gobbo
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...The European Library
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebLaura Hollink
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesBaden Hughes
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Marton Nemeth
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)Alison Hitchens
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceoanainel
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...BOBCATSSS 2017
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711STIinnsbruck
 
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technologytechiaith
 

Ähnlich wie JRC-Names - EC - Diplohack Datamarket (20)

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
LSDI.pptx
LSDI.pptxLSDI.pptx
LSDI.pptx
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
RDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization dataRDF and other linked data standards — how to make use of big localization data
RDF and other linked data standards — how to make use of big localization data
 
Semantic Pipes and Semantic Mashups
Semantic Pipes and Semantic MashupsSemantic Pipes and Semantic Mashups
Semantic Pipes and Semantic Mashups
 
Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?
 
20110728 datalift-rpi-troy
20110728 datalift-rpi-troy20110728 datalift-rpi-troy
20110728 datalift-rpi-troy
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
Bringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic WebBringing parliamentary debates to the Semantic Web
Bringing parliamentary debates to the Semantic Web
 
Towards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language CommunitiesTowards a Web Search Service for Minority Language Communities
Towards a Web Search Service for Minority Language Communities
 
Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...Widening the limits of cognitive reception with online digital library graph ...
Widening the limits of cognitive reception with online digital library graph ...
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)RDA 101: an introduction to RDA (2012)
RDA 101: an introduction to RDA (2012)
 
Harnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performanceHarnessing diversity in crowds and machines for better ner performance
Harnessing diversity in crowds and machines for better ner performance
 
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup:  AACR to R...Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup:  AACR to R...
Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...
 
Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...Nemeth Marton - Widening the limits of cognitive reception with online digita...
Nemeth Marton - Widening the limits of cognitive reception with online digita...
 
CROSER
CROSERCROSER
CROSER
 
Oc wg-nif-20130711
Oc wg-nif-20130711Oc wg-nif-20130711
Oc wg-nif-20130711
 
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
 

Mehr von Open Knowledge Belgium

A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)Open Knowledge Belgium
 
Smarter by Open Data: Process and Practice in Flevoland (NL)
Smarter by Open Data: Process and Practice in Flevoland (NL)Smarter by Open Data: Process and Practice in Flevoland (NL)
Smarter by Open Data: Process and Practice in Flevoland (NL)Open Knowledge Belgium
 
Smart Flanders: Tackling urban challenges through Open Data
Smart Flanders: Tackling urban challenges through Open DataSmart Flanders: Tackling urban challenges through Open Data
Smart Flanders: Tackling urban challenges through Open DataOpen Knowledge Belgium
 
EIF and NIFO connecting public administrations, businesses, and citizens
EIF and NIFO connecting public administrations, businesses, and citizensEIF and NIFO connecting public administrations, businesses, and citizens
EIF and NIFO connecting public administrations, businesses, and citizensOpen Knowledge Belgium
 
Connecting Open data for solving the fiscal transparency puzzle in the EU
Connecting Open data for solving the fiscal transparency puzzle in the EUConnecting Open data for solving the fiscal transparency puzzle in the EU
Connecting Open data for solving the fiscal transparency puzzle in the EUOpen Knowledge Belgium
 
Open Government and Networked European Democracy
Open Government and Networked European DemocracyOpen Government and Networked European Democracy
Open Government and Networked European DemocracyOpen Knowledge Belgium
 
Mundaneum Factories for Open Tokenomics
Mundaneum Factories for Open TokenomicsMundaneum Factories for Open Tokenomics
Mundaneum Factories for Open TokenomicsOpen Knowledge Belgium
 
MIRVA: The European Open Recognition Project
MIRVA: The European Open Recognition ProjectMIRVA: The European Open Recognition Project
MIRVA: The European Open Recognition ProjectOpen Knowledge Belgium
 
Bike for Brussels - Open Summer of Code 2017
Bike for Brussels - Open Summer of Code 2017Bike for Brussels - Open Summer of Code 2017
Bike for Brussels - Open Summer of Code 2017Open Knowledge Belgium
 
Traffic safety - answering tough questions with open data
Traffic safety - answering tough questions with open dataTraffic safety - answering tough questions with open data
Traffic safety - answering tough questions with open dataOpen Knowledge Belgium
 
Eliminating data roadbloacks to get by traffic roadblocks without pain
Eliminating data roadbloacks to get by traffic roadblocks without painEliminating data roadbloacks to get by traffic roadblocks without pain
Eliminating data roadbloacks to get by traffic roadblocks without painOpen Knowledge Belgium
 
Linked Open Data in limbo: Open cultural heritage resources
Linked Open Data in limbo: Open cultural heritage resourcesLinked Open Data in limbo: Open cultural heritage resources
Linked Open Data in limbo: Open cultural heritage resourcesOpen Knowledge Belgium
 
A journey to Linked Open Touristic Data
A journey to Linked Open Touristic DataA journey to Linked Open Touristic Data
A journey to Linked Open Touristic DataOpen Knowledge Belgium
 
How we use the massive open lidar dataset for the benfit of our clients
How we use the massive open lidar dataset for the benfit of our clientsHow we use the massive open lidar dataset for the benfit of our clients
How we use the massive open lidar dataset for the benfit of our clientsOpen Knowledge Belgium
 
mu.semte.ch: A transitional architecture for Linked Data
mu.semte.ch: A transitional architecture for Linked Datamu.semte.ch: A transitional architecture for Linked Data
mu.semte.ch: A transitional architecture for Linked DataOpen Knowledge Belgium
 
The role and value of making data inventories
The role and value of making data inventoriesThe role and value of making data inventories
The role and value of making data inventoriesOpen Knowledge Belgium
 

Mehr von Open Knowledge Belgium (20)

Open Data Stories You haven't heard!
Open Data Stories You haven't heard!Open Data Stories You haven't heard!
Open Data Stories You haven't heard!
 
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
 
Smarter by Open Data: Process and Practice in Flevoland (NL)
Smarter by Open Data: Process and Practice in Flevoland (NL)Smarter by Open Data: Process and Practice in Flevoland (NL)
Smarter by Open Data: Process and Practice in Flevoland (NL)
 
Open Knowledge for Social Innovation
Open Knowledge for Social InnovationOpen Knowledge for Social Innovation
Open Knowledge for Social Innovation
 
Smart Flanders: Tackling urban challenges through Open Data
Smart Flanders: Tackling urban challenges through Open DataSmart Flanders: Tackling urban challenges through Open Data
Smart Flanders: Tackling urban challenges through Open Data
 
EIF and NIFO connecting public administrations, businesses, and citizens
EIF and NIFO connecting public administrations, businesses, and citizensEIF and NIFO connecting public administrations, businesses, and citizens
EIF and NIFO connecting public administrations, businesses, and citizens
 
Connecting Open data for solving the fiscal transparency puzzle in the EU
Connecting Open data for solving the fiscal transparency puzzle in the EUConnecting Open data for solving the fiscal transparency puzzle in the EU
Connecting Open data for solving the fiscal transparency puzzle in the EU
 
Open Government and Networked European Democracy
Open Government and Networked European DemocracyOpen Government and Networked European Democracy
Open Government and Networked European Democracy
 
Mundaneum Factories for Open Tokenomics
Mundaneum Factories for Open TokenomicsMundaneum Factories for Open Tokenomics
Mundaneum Factories for Open Tokenomics
 
MIRVA: The European Open Recognition Project
MIRVA: The European Open Recognition ProjectMIRVA: The European Open Recognition Project
MIRVA: The European Open Recognition Project
 
Bike for Brussels - Open Summer of Code 2017
Bike for Brussels - Open Summer of Code 2017Bike for Brussels - Open Summer of Code 2017
Bike for Brussels - Open Summer of Code 2017
 
The story behind SNCB alerts
The story behind SNCB alertsThe story behind SNCB alerts
The story behind SNCB alerts
 
Traffic safety - answering tough questions with open data
Traffic safety - answering tough questions with open dataTraffic safety - answering tough questions with open data
Traffic safety - answering tough questions with open data
 
Eliminating data roadbloacks to get by traffic roadblocks without pain
Eliminating data roadbloacks to get by traffic roadblocks without painEliminating data roadbloacks to get by traffic roadblocks without pain
Eliminating data roadbloacks to get by traffic roadblocks without pain
 
Linked Open Data in limbo: Open cultural heritage resources
Linked Open Data in limbo: Open cultural heritage resourcesLinked Open Data in limbo: Open cultural heritage resources
Linked Open Data in limbo: Open cultural heritage resources
 
A journey to Linked Open Touristic Data
A journey to Linked Open Touristic DataA journey to Linked Open Touristic Data
A journey to Linked Open Touristic Data
 
How we use the massive open lidar dataset for the benfit of our clients
How we use the massive open lidar dataset for the benfit of our clientsHow we use the massive open lidar dataset for the benfit of our clients
How we use the massive open lidar dataset for the benfit of our clients
 
mu.semte.ch: A transitional architecture for Linked Data
mu.semte.ch: A transitional architecture for Linked Datamu.semte.ch: A transitional architecture for Linked Data
mu.semte.ch: A transitional architecture for Linked Data
 
Linked Open Chatbots
Linked Open ChatbotsLinked Open Chatbots
Linked Open Chatbots
 
The role and value of making data inventories
The role and value of making data inventoriesThe role and value of making data inventories
The role and value of making data inventories
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

JRC-Names - EC - Diplohack Datamarket

  • 1. 1RANLP’2011, Hissar, Bulgaria, 12.09.2011 JRC-Names: A freely available, highly multilingual named entity resource Hissar, Bulgaria, 12 September 2011 Ralf Steinberger, Bruno Pouliquen, Mijail Kabadjov, Jenya Belyaeva & Erik van der Goot Technical details and publications: http://langtech.jrc.ec.europa.eu/ Applications: http://emm.newbrief.eu/overview.html
  • 2. 2RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 3. 3RANLP’2011, Hissar, Bulgaria, 12.09.2011 What is JRC-Names? • JRC-Names consists of: • Lists of names and their many spelling variants, • ~205,000 person and organisation names plus • ~204,000 name spelling variants • In 27 scripts and many more languages • Software to recognise these names in multilingual text, with offset and unique name identifier • Download from http://langtech.jrc.ec.europa.eu/
  • 4. 4RANLP’2011, Hissar, Bulgaria, 12.09.2011 Possible uses of JRC-Names • Standardise name spellings in databases, text collections and the internet for improved retrieval (Stern & Sagot 2010) • Improve Machine Translation – names must be treated differently from other words (Babych & Hartley 2003; Steinberger & Pouliquen 2009) • Use as input to learn automatic transliteration rules (e.g. Pouliquen 2009) • Use output of JRC-Names as seeds to learn NER rules (e.g. Buchholz & van den Bosch 2000) • Social networks are less biased by national viewpoints if based on information extracted from multilingual texts • NER results are useful for other text mining tasks (opinion mining; co-reference resolution; summarisation; topic detection and tracking; cross-lingual linking of related documents across languages; …)
  • 5. 5RANLP’2011, Hissar, Bulgaria, 12.09.2011 Related work – other multilingual (ml) NE resources • Wentlant et al. (2008) – built a ml NE repository based on Wikipedia links and case information; 2.5 Mio English names, 250K German, 3K Swahili, … • Toral et al. (2008) – built Named Entity WordNet by searching NEs in WordNet and Wikipedia: 310K entities, including 278K persons • Stern & Sagot (2010) – exploit French Wikipedia and GeoNames to produce French resource: 263K person names + 883K variants. • Maurel (2009) –produced Prolexbase mostly manually: 75K entities of all types •  Most resources are based on Wikipedia • Strong at providing cross-lingual and cross-script variants • Offers only few other spelling variants • No morphological inflections • JRC-Names contains mostly spelling variants from real-life text, enriched with Wikipedia – up to 413 variants for the same NE.
  • 6. 6RANLP’2011, Hissar, Bulgaria, 12.09.2011 Name variants found and used in 6 hours (!) of EMM news analysis 26.08.2011, PM
  • 7. 7RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 8. 8RANLP’2011, Hissar, Bulgaria, 12.09.2011 NER on news gathered by the Europe Media Monitor (EMM) • ~ 3600 Sources (world-wide, with focus on Europe) • ~ 3225 news sources (web portals) • ~ 360 specialist medical sites • ~ 20 commercial newswires • Specialist pay-for sources (LexisMed) • 24/7, updated every 10 minutes • ~ 100,000 articles / day in ~ 50 languages • Named Entity Recognition (NER) performed on 20 languages. • Articles are fed into the various publicly accessible EMM applications:
  • 9. 9RANLP’2011, Hissar, Bulgaria, 12.09.2011 Multilingual NER in EMM – A brief overview • Lookup of most frequent known names and their variants in all languages • Database currently contains about 1,18 million names + 225.000 variants (status July 2011) • Including morphological (and other) variants by pre-generating inflection forms (Slovene example): Tony(a|o|u|om|em|m|ju|jem|ja)?s+Blair(a|o|u|om|em|m|ju|jem|ja) • Guessing new names using empirically-derived lexical patterns in 20 languages. • President, Minister, Head of State, Sir, American • “death of”, “[0-9]+-year-old”, … • Known first names + uppercase words • Identification of a current average of 1,000 unknown names per day. • Only names found repeatedly will become known names (error reduction).
  • 10. 10RANLP’2011, Hissar, Bulgaria, 12.09.2011 Multilingual name recognition using lexical patterns asesinato del exprimer ministro Rafic al-Hariri, que la oposición atribuyóes l'assassinat de l'ex-dirigeant Rafic Hariri et le départ du chef de la diplomfr na de moord op oud-premier Rafiq al-Hariri gingen gisteren bijna eennl libanesischen Regierungschef Rafik Hariri vor einem Monat wichtige Bde danjega libanonskega premiera Rafika Haririja. Libanonska opozicija sisl möödumisele ekspeaminister Rafik al-Hariri surma põhjustanud pommiplet death of former Prime Minister Rafik Hariri, blamed by many oppositionen ‫اغتيال‬‫السابق‬ ‫الوزراء‬ ‫رئيس‬‫الحريري‬ ‫رفيق‬‫سابقا‬ ‫حدث‬ ‫وما‬ ‫يهودية‬ ‫بأياد‬ar Бывший премьер-министр Ливана Рафик Харири, которыйru
  • 11. 11RANLP’2011, Hissar, Bulgaria, 12.09.2011 Merging name variants for the same entity • For all newly found name forms, detect whether they are a variant of an existing NE: • Transliteration; • Normalisation, using ~30 hand-written rules and removing vowels; • Calculate similarity (threshold: 94%). • Below threshold  new entity 20% + 80% Condition:
  • 12. 12RANLP’2011, Hissar, Bulgaria, 12.09.2011 Enriching the EMM data with Wikipedia name variants • For frequent or highly visible names, manually launch a Wikipedia mining process. • Check for each variant of a name whether there is a Wikipedia entry. • New name variants, in all scripts, will be recognised in new EMM articles. Хамид Карзай Hamid Karzai Hamid Karzaï Hamid Karsai ‫كرزاي‬ ‫حامد‬ हामिद करजई 哈米德·卡尔扎伊 http://en.wikipedia.org/wiki/Hamid_Karzai
  • 13. 13RANLP’2011, Hissar, Bulgaria, 12.09.2011 Manual moderation of EMM name database • Process is fully automatic, but it can be useful to make changes manually. • Manual process only for frequent or important names (e.g. Nobel Prize winners): • Name changes: (e.g. Cardinal Josef Ratzinger  Pope Benedict XVI) • Correct NER mistakes (e.g. Genius Report, Opfer von Diskriminierung); • Add new stop name parts (e.g. Monday, Report); • Merge name variants with similarity below the threshold; • Change the display name of an entity; • Correct the entity type (PER, ORG, T, U, …); • Launch Wikipedia mining process; • … • Caveat: Name database contains errors!
  • 14. 14RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 15. 15RANLP’2011, Hissar, Bulgaria, 12.09.2011 Statistics on JRC-Names (1) • JRC-Names include names from the EMM database if any of the following hold: • Found in 5 or more news clusters; • Manually verified; • Retrieved from Wikipedia; • Number of entries (status July 2011): • 205,000 distinct names; • 204,000 additional variants; • ~3.2% names of organisations / events • Number of variants: • 413 variants for Muammar Gaddafi (entity 262) • 256 variants for Mikhail Saakashvili (entity 472) • 246 variants for Mahmoud Ahmadinejad (entity 101358) • Grows by ~230 new entities and ~430 new variants per week. Variant forms No. of entities 1 63.76% 2 22.52% 3 5.31% 10 or more 3760 entities 50 or more 242 entities 100 or more 37 entities
  • 16. 16RANLP’2011, Hissar, Bulgaria, 12.09.2011 Statistics on JRC-Names (2) • Number of scripts: 27 Number of languages: ??? • News mentions names from around the world. • Frequency does not reflect origin • European Union (10101) is most frequent entity in German, and second in English. • It does not matter where a name like Silvio Berlusconi comes from.
  • 17. 17RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 18. 18RANLP’2011, Hissar, Bulgaria, 12.09.2011 Details about the JRC-Names software • Java-implemented demonstrator • Finite state automaton • Reads the NE resource file entities.gzip (frequently updated) • Searches for known names (and their variants) in UTF8-encoded text files. Returns: • Numerical name identifier • Main name for that entity • Name string found in the text • Position (Offset and string length) • For any given name string, returns all variants. • Software and NE resource file can be downloaded from • http://langtech.jrc.ec.europe.eu/ , Section on ‘Resources’ • Free usage, according to accompanying end-user licence.
  • 19. 19RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 20. 20RANLP’2011, Hissar, Bulgaria, 12.09.2011 Treatment of morphological inflections • The recognition of morphological inflections used in EMM processing chain are not currently part of JRC-Names. • We are working on a solution to include morphological processing in a future release of JRC-Names. • Further variants will also be included more consistently: • Hyphenation (e.g. Yves Saint-Laurent vs. Yves Saint Laurent) • Names with and without name ‘infixes’ (e.g. Khan al Khalil vs. Khan Khalil) • Abbreviations (e.g. Saint vs. St.) • … • Current solution: Add the approximately 45,000 full-forms of inflected names, as found in EMM processing results since January 2011, to the resource file entities.gzip • This helps to recognise the most frequent inflection forms of the frequent names.
  • 21. 21RANLP’2011, Hissar, Bulgaria, 12.09.2011 Agenda • What is JRC-Names; What can it be used for • Related work: other named entity (NE) resources • How JRC-Names was produced • Recognition of named entities in news reports in 20 languages • Introduction to EMM • Automatic mapping of name variants to the same entity • Enrichment with Wikipedia variants • Partial manual moderation • Statistics on JRC-Names • Programming details / Usage of the tool • Solutions to capture morphological variants • Further multilingual linguistic resources
  • 22. 22RANLP’2011, Hissar, Bulgaria, 12.09.2011 Further JRC/EC-provided multilingual linguistic resources • JRC-Acquis (2006): 1 billion word parallel corpus in 22 languages • DGT-TM (2007): Translation Memory in 22 languages; up to 2 million segments •  DGT-TM-2011 (forthcoming): 23 languages; 4 million segments? Yearly updates • JEX (JRC Eurovoc Indexer) (forthcoming): software to automatically label texts according to the thousands of categories of the Eurovoc thesaurus; 23 languages. • Further smaller resources: • Multilingual summary evaluation data (2010): 4 clusters for each of 7 languages • Sentiment-annotated collection of quotations (2010): English (German forthcoming) • Multilingual Named Entity-annotated parallel corpus (forthcoming) • Available at http://langtech.jrc.ec.europa.eu/, section on ‘Resources’
  • 23. 23RANLP’2011, Hissar, Bulgaria, 12.09.2011 JRC-Names: A freely available, highly multilingual named entity resource Hissar, Bulgaria, 12 September 2011 Ralf Steinberger, Bruno Pouliquen, Mijail Kabadjov, Jenya Belyaeva & Erik van der Goot Technical details and publications: http://langtech.jrc.ec.europa.eu/ Applications: http://emm.newbrief.eu/overview.html