SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Europeana and Enrichment 
Antoine Isaac 
Europeana PGA meeting 
Sept 25th, 2014
Semantic extraction? 
Recognizing and extracting named entities and keywords, 
analyzing the sentiment of a document, extracting facts and 
relation between those facts and named entities, categorizing 
documents, recognizing and extracting concepts and finally 
adding them as metadata or annotations. 
Market study on technical options for semantic feature extraction 
http://pro.europeana.eu/web/network/europeana-tech/-/ 
wiki/Main/Market+study+on+technical+options+for+semantic+fea 
ture+extraction
Semantic enrichment? 
In a linked data environment, enrichment refers to the creation 
of new links between the enriched resources and another data 
resource. […] link to controlled vocabularies or authority files 
(contextualization) 
Automatic Enrichments with Controlled Vocabularies in Europeana: 
Challenges and Consequences 
Stiller, Petras, Gäde, Isaac. Euromed 2014
Semantic enrichment? 
• Analysis: the pre-enrichment phase focuses on the analysis of the 
metadata fields in the original resource descriptions, the selection of 
potential resources to be linked to and derives rules to match and link the 
original fields to the contextual resource. 
• Linking: the process of automatically matching the values of the metadata 
fields to values of the contextual resources and adding contextual links 
(whose values are most often based on equivalent relationships) to the 
dataset. 
• Augmentation: the process of selecting the values from the contextual 
resource to be added to the original object description. This might not 
only include (multilingual) synonyms of terms to be enriched but also 
further information, for example broader or narrower concepts. 
Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and 
Consequences 
Euromed 2014
Characteristics of enrichment 
• Adding new data on top of existing 
normalization focus on syntactic aspects, no addition of new 
semantics 
• (Semi-)automatic 
– For manual enrichment, see discussion on Annotations 
• Connecting to internal or external datasets
Where does it happen? 
• Ingestion from providers 
– Harvesting metadata and content 
• Consolidating Europeana’s "master" database 
– De-referencing 
– Enrichment 
• Leveraging data for search 
– Augmenting Solr index 
– Query enrichment and translation
Not de-referencing? 
• In provider data, it is semantically equivalent to have 
a CHO with link or a CHO with link and contextual 
entity materialized next to it 
• Just called « richer » (more structured, « semantic ») 
metadata given by providers
Not index augmentation? 
• One semantic link can lead to different indexes 
• Enrichment shouldn’t be considered to feed directly 
in application/tool-specific databases 
NB: it should be exchangeable 
• Yet enrichment should be designed in coordination 
with what will happen later 
Augmentation is the post-prod of linking
Not query enrichment/translation? 
• Tools used may be the same (NLP) 
• But the evaluation criteria change 
• These enrichments are ‘lost’, not exchangeable
Ground material for enrichment 
Metadata is the primary focus of most efforts 
Content can also be used 
• Extraction of visual features 
– Text transcription 
– Map alignment 
– Image-based similarity (Ecreative) 
• Extraction of audio features (ESounds)
Linking is king 
• Object/object 
• Cross-dataset de-duplication – equivalence/similarity links 
• Other relations – derivation, part-of, FRBRization 
• Clustering into hierarchical objects or collections 
• NB: neglected, though Europeana can contribute something 
• Object/Context 
• Agents 
• Concepts 
• Places 
• Periods and Events 
• Documentation, e.g., Wikipedia articles 
• Context/Context (vocabulary alignment) 
• Matching concepts
Europeana enrichment 
• Bringing multilingual, structured data 
• Collaborative/strategy aspect 
• Likely to interest providers (Einside)
Should we be interested in other 
kinds of enrichment? 
• Non-semantic tagging with simple words 
• Translation 
• Named entity recognition 
• Language detection for metadata fields 
• Group editing, when not actioned by providers
Europeana-related projects in the picture 
• Object/object 
• De-duplication – equivalence/similarity links 
• Other relations – derivation (ESounds), part-of, FRBRization (TEL) 
• Clustering (EF-OCLC) 
• Object/Context 
• Agents 
• Concepts (PATHS, EConnect, LOCloud, MIMO) 
• Places (EConnect, LOCloud) 
• Periods and Events (PATHS, ECloud) 
• Documentation, e.g., Wikipedia articles (PATHS, LOCloud) 
• Vocabulary alignment 
EConnect (Amalgame), EFG, EUScreen, ATHENAplus?, PartagePlus 
• Non-semantic tagging with simple words 
• Translation 
• Named entity recognition 
• Language detection for metadata fields 
• Group editing, when not actioned by provider (Esounds)
Other categories?
Next steps? 
• Agree on categories 
• Agree on APIs for enrichment services 
• Addressing post-processes for applications (solr indexing) 
• Evaluation 
• Informativeness measure, completness 
• Showing it?
APIs for enrichment services 
• Input: record, field, collection? 
– Meta-enrichers 
• Problem: API result often assume application needs and data 
elements that are useful, beyond the URI of the entity: They 
are APIs for enrichment+de-referencing. 
• Keeping track of provenance (data field, version of 
enrichment tool…) 
• Example of Sounds music information retrieval 
• Exchanging enrichment data. Cf EDMpaths
Example: Europeana enrichment console 
prototype
Antoine Isaac 
antoine.isaac@europeana.eu 
@EuropeanaTech

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
Talat Fakhri
 
Accessibility Issues
Accessibility IssuesAccessibility Issues
Accessibility Issues
liddy
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
University of Bologna
 
Converging research towards AccessForAll
Converging research towards AccessForAllConverging research towards AccessForAll
Converging research towards AccessForAll
liddy
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
liddy
 
Accessibility and Metadata
Accessibility and MetadataAccessibility and Metadata
Accessibility and Metadata
liddy
 
Making Inter-operability Visible
Making Inter-operability VisibleMaking Inter-operability Visible
Making Inter-operability Visible
liddy
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
Lokesh Ramaswamy
 

Was ist angesagt? (20)

Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013Implementing Recommendations in the PATHS system, SUEDL 2013
Implementing Recommendations in the PATHS system, SUEDL 2013
 
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Accessibility Issues
Accessibility IssuesAccessibility Issues
Accessibility Issues
 
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...Aletras, Nikolaos  and  Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
Aletras, Nikolaos and Stevenson, Mark (2013) "Evaluating Topic Coherence Us...
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning Environments
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Converging research towards AccessForAll
Converging research towards AccessForAllConverging research towards AccessForAll
Converging research towards AccessForAll
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Accessibility and Metadata
Accessibility and MetadataAccessibility and Metadata
Accessibility and Metadata
 
Making Inter-operability Visible
Making Inter-operability VisibleMaking Inter-operability Visible
Making Inter-operability Visible
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introduction
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Supporting Web-based Scholarly Annotation
Supporting Web-based Scholarly AnnotationSupporting Web-based Scholarly Annotation
Supporting Web-based Scholarly Annotation
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
Mapping the European(a) metadata landscape
Mapping the European(a) metadata landscapeMapping the European(a) metadata landscape
Mapping the European(a) metadata landscape
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Annotations Supporting Scholarly Editing
Annotations Supporting Scholarly EditingAnnotations Supporting Scholarly Editing
Annotations Supporting Scholarly Editing
 

Andere mochten auch

Andere mochten auch (6)

Open Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LODOpen Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LOD
 
Multilingual challenges in Europeana
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in Europeana
 
Europeana - American Art Collaborative LOD Meeting
Europeana - American Art Collaborative LOD MeetingEuropeana - American Art Collaborative LOD Meeting
Europeana - American Art Collaborative LOD Meeting
 
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
Achieving Interoperability between the CARARE Schema for Monuments and Sites ...
 
A portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data caseA portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data case
 
AAC Education Session
AAC Education Session AAC Education Session
AAC Education Session
 

Ähnlich wie Enrichment and Europeana

2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman
aschwarzman
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Rafal Kasprowski
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldap
AAT Taiwan
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 

Ähnlich wie Enrichment and Europeana (20)

Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
EDS for JIBS
EDS for JIBSEDS for JIBS
EDS for JIBS
 
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010
 
Alexandria winer20100623
Alexandria winer20100623Alexandria winer20100623
Alexandria winer20100623
 
Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model   Fondly Collisions: Archival hierarchy and the Europeana Data Model
Fondly Collisions: Archival hierarchy and the Europeana Data Model
 
2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman2011-11-14-CrossRef-Workshops_Schwarzman
2011-11-14-CrossRef-Workshops_Schwarzman
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 
Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018
 
ProQuest Flow
ProQuest FlowProQuest Flow
ProQuest Flow
 
Introducing the Open Discovery Initiative
Introducing the Open Discovery InitiativeIntroducing the Open Discovery Initiative
Introducing the Open Discovery Initiative
 
Index nominum to ontology
Index nominum to ontologyIndex nominum to ontology
Index nominum to ontology
 
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Union catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldapUnion catalogandknowledge engineering for teldap
Union catalogandknowledge engineering for teldap
 
SharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnycSharePoint Saturday New york City - The importance of metadata #spsnyc
SharePoint Saturday New york City - The importance of metadata #spsnyc
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 

Mehr von Antoine Isaac

Mehr von Antoine Isaac (20)

Addressing multilingual challenges at Europeana: An update - DCMI 2021
Addressing multilingual challenges at Europeana: An update - DCMI 2021Addressing multilingual challenges at Europeana: An update - DCMI 2021
Addressing multilingual challenges at Europeana: An update - DCMI 2021
 
Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021Entity Management at Europeana - DCMI 2021
Entity Management at Europeana - DCMI 2021
 
Le Cadre de publication d'Europeana
Le Cadre de publication d'EuropeanaLe Cadre de publication d'Europeana
Le Cadre de publication d'Europeana
 
The Europeana Data Model Principles, community and innovation
The Europeana Data Model  Principles, community and innovationThe Europeana Data Model  Principles, community and innovation
The Europeana Data Model Principles, community and innovation
 
Europeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) caseEuropeana as a Linked Data (Quality) case
Europeana as a Linked Data (Quality) case
 
Metadata aggregation of IIIF Resources at Europeana: status and plans
Metadata aggregation of IIIF Resources at Europeana: status and plansMetadata aggregation of IIIF Resources at Europeana: status and plans
Metadata aggregation of IIIF Resources at Europeana: status and plans
 
IIIF and the Europeana mission
IIIF and the Europeana missionIIIF and the Europeana mission
IIIF and the Europeana mission
 
Multilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at EuropeanaMultilingual challenges and ongoing work to tackle them at Europeana
Multilingual challenges and ongoing work to tackle them at Europeana
 
Semantic Interoperability at Europeana - MultilingualDSIs2018
Semantic Interoperability at Europeana - MultilingualDSIs2018Semantic Interoperability at Europeana - MultilingualDSIs2018
Semantic Interoperability at Europeana - MultilingualDSIs2018
 
Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...Lightweight rights modeling and linked data publication for online cultural h...
Lightweight rights modeling and linked data publication for online cultural h...
 
The Europeana Data Model - TPDL2018
The Europeana Data Model - TPDL2018The Europeana Data Model - TPDL2018
The Europeana Data Model - TPDL2018
 
Europeana et IIIF
Europeana et IIIFEuropeana et IIIF
Europeana et IIIF
 
Data scale and diversity issues at Europeana
Data scale and diversity issues at EuropeanaData scale and diversity issues at Europeana
Data scale and diversity issues at Europeana
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
 
Europeana APIs
Europeana APIsEuropeana APIs
Europeana APIs
 
Enriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpedia
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotations
 
EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015
 
Modelling annotations for Europeana and related projects - DARIAH-EU WS
Modelling annotations for Europeana and related projects - DARIAH-EU WSModelling annotations for Europeana and related projects - DARIAH-EU WS
Modelling annotations for Europeana and related projects - DARIAH-EU WS
 
Classification schemes, thesauri and other Knowledge Organization Systems - a...
Classification schemes, thesauri and other Knowledge Organization Systems - a...Classification schemes, thesauri and other Knowledge Organization Systems - a...
Classification schemes, thesauri and other Knowledge Organization Systems - a...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Enrichment and Europeana

  • 1. Europeana and Enrichment Antoine Isaac Europeana PGA meeting Sept 25th, 2014
  • 2. Semantic extraction? Recognizing and extracting named entities and keywords, analyzing the sentiment of a document, extracting facts and relation between those facts and named entities, categorizing documents, recognizing and extracting concepts and finally adding them as metadata or annotations. Market study on technical options for semantic feature extraction http://pro.europeana.eu/web/network/europeana-tech/-/ wiki/Main/Market+study+on+technical+options+for+semantic+fea ture+extraction
  • 3. Semantic enrichment? In a linked data environment, enrichment refers to the creation of new links between the enriched resources and another data resource. […] link to controlled vocabularies or authority files (contextualization) Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences Stiller, Petras, Gäde, Isaac. Euromed 2014
  • 4. Semantic enrichment? • Analysis: the pre-enrichment phase focuses on the analysis of the metadata fields in the original resource descriptions, the selection of potential resources to be linked to and derives rules to match and link the original fields to the contextual resource. • Linking: the process of automatically matching the values of the metadata fields to values of the contextual resources and adding contextual links (whose values are most often based on equivalent relationships) to the dataset. • Augmentation: the process of selecting the values from the contextual resource to be added to the original object description. This might not only include (multilingual) synonyms of terms to be enriched but also further information, for example broader or narrower concepts. Automatic Enrichments with Controlled Vocabularies in Europeana: Challenges and Consequences Euromed 2014
  • 5. Characteristics of enrichment • Adding new data on top of existing normalization focus on syntactic aspects, no addition of new semantics • (Semi-)automatic – For manual enrichment, see discussion on Annotations • Connecting to internal or external datasets
  • 6. Where does it happen? • Ingestion from providers – Harvesting metadata and content • Consolidating Europeana’s "master" database – De-referencing – Enrichment • Leveraging data for search – Augmenting Solr index – Query enrichment and translation
  • 7. Not de-referencing? • In provider data, it is semantically equivalent to have a CHO with link or a CHO with link and contextual entity materialized next to it • Just called « richer » (more structured, « semantic ») metadata given by providers
  • 8. Not index augmentation? • One semantic link can lead to different indexes • Enrichment shouldn’t be considered to feed directly in application/tool-specific databases NB: it should be exchangeable • Yet enrichment should be designed in coordination with what will happen later Augmentation is the post-prod of linking
  • 9. Not query enrichment/translation? • Tools used may be the same (NLP) • But the evaluation criteria change • These enrichments are ‘lost’, not exchangeable
  • 10. Ground material for enrichment Metadata is the primary focus of most efforts Content can also be used • Extraction of visual features – Text transcription – Map alignment – Image-based similarity (Ecreative) • Extraction of audio features (ESounds)
  • 11. Linking is king • Object/object • Cross-dataset de-duplication – equivalence/similarity links • Other relations – derivation, part-of, FRBRization • Clustering into hierarchical objects or collections • NB: neglected, though Europeana can contribute something • Object/Context • Agents • Concepts • Places • Periods and Events • Documentation, e.g., Wikipedia articles • Context/Context (vocabulary alignment) • Matching concepts
  • 12. Europeana enrichment • Bringing multilingual, structured data • Collaborative/strategy aspect • Likely to interest providers (Einside)
  • 13. Should we be interested in other kinds of enrichment? • Non-semantic tagging with simple words • Translation • Named entity recognition • Language detection for metadata fields • Group editing, when not actioned by providers
  • 14. Europeana-related projects in the picture • Object/object • De-duplication – equivalence/similarity links • Other relations – derivation (ESounds), part-of, FRBRization (TEL) • Clustering (EF-OCLC) • Object/Context • Agents • Concepts (PATHS, EConnect, LOCloud, MIMO) • Places (EConnect, LOCloud) • Periods and Events (PATHS, ECloud) • Documentation, e.g., Wikipedia articles (PATHS, LOCloud) • Vocabulary alignment EConnect (Amalgame), EFG, EUScreen, ATHENAplus?, PartagePlus • Non-semantic tagging with simple words • Translation • Named entity recognition • Language detection for metadata fields • Group editing, when not actioned by provider (Esounds)
  • 16. Next steps? • Agree on categories • Agree on APIs for enrichment services • Addressing post-processes for applications (solr indexing) • Evaluation • Informativeness measure, completness • Showing it?
  • 17. APIs for enrichment services • Input: record, field, collection? – Meta-enrichers • Problem: API result often assume application needs and data elements that are useful, beyond the URI of the entity: They are APIs for enrichment+de-referencing. • Keeping track of provenance (data field, version of enrichment tool…) • Example of Sounds music information retrieval • Exchanging enrichment data. Cf EDMpaths
  • 18. Example: Europeana enrichment console prototype

Hinweis der Redaktion

  1. Euromed paper
  2. http://testenv-solr.eanadev.org:9191/enrichment-framework-gui-0.1-SNAPSHOT/ http://locloudgeo.eculturelab.eu/Tester_LoGeo_1_1/
  3. http://testenv-solr.eanadev.org:9191/enrichment-framework-gui-0.1-SNAPSHOT/