Many issues are faced by scholars, book researchers, museum directors who try to find the underlying connection between resources. Scholars in particular continuously emphasizes the role of digital humanities and the value of linked data in cultural heritage information systems.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
This document discusses Linked Data and outlines its key principles and benefits. It describes how Linked Data extends the traditional web by creating a single global data space using RDF to publish structured data on the web and by setting links between data items from different sources. The document outlines the growth of Linked Data on the web, with over 31 billion triples from 295 datasets as of 2011. It provides examples of large Linked Data sources like DBpedia and discusses best practices for publishing, consuming, and working with Linked Data.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Towards digitizing scholarly communicationSören Auer
Slides of the VIVO 2016 Conference keynote: Despite the availability of ubiquitous connectivity and information technology, scholarly communication has not changed much in the last hundred years: research findings are still encoded in and decoded from linear, static articles and the possibilities of digitization are rarely used. In this talk, we will discuss strategies for digitizing scholarly communication. This comprises in particular: the use of machine-readable, dynamic content; the description and interlinking of research artifacts using Linked Data; the crowd-sourcing of multilingual
educational and learning content. We discuss the relation of these developments to research information systems and how they could become part of an open ecosystem for scholarly communication.
Enterprise knowledge graphs use semantic technologies like RDF, RDF Schema, and OWL to represent knowledge as a graph consisting of concepts, classes, properties, relationships, and entity descriptions. They address the "variety" aspect of big data by facilitating integration of heterogeneous data sources using a common data model. Key benefits include providing background knowledge for various applications and enabling intra-organizational data sharing through semantic integration. Challenges include ensuring data quality, coherence, and managing updates across the knowledge graph.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
This presentation introduces the main principles of Linked Data, the underlying technologies and background standards. It provides basic knowledge for how data can be published over the Web, how it can be queried, and what are the possible use cases and benefits. As an example, we use the development of a music portal (based on the MusicBrainz dataset), which facilitates access to a wide range of information and multimedia resources relating to music.
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
This document discusses Linked Data and outlines its key principles and benefits. It describes how Linked Data extends the traditional web by creating a single global data space using RDF to publish structured data on the web and by setting links between data items from different sources. The document outlines the growth of Linked Data on the web, with over 31 billion triples from 295 datasets as of 2011. It provides examples of large Linked Data sources like DBpedia and discusses best practices for publishing, consuming, and working with Linked Data.
The document discusses the benefits and challenges of transitioning library data to linked data standards to make the data more accessible and interoperable on the web. It outlines principles of linked data and how library data could be transformed by assigning URIs to concepts, linking data sources, and storing data as RDF triples. Barriers include outdated library processes and standards like MARC that inhibit innovation, but initiatives like RDA, OpenLibrary, and data projects from the German National Library are helping advance the linked library data vision.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
The document discusses the Semantic Web and how it provides a common framework to share and reuse data across applications and organizations. It describes Resource Description Framework (RDF) and how it represents relationships in a simple data structure using graphs. It also discusses Linked Data design principles and standards like RDFa and Microformats that embed semantics into web pages. Finally, it provides examples of how search engines like Google and Yahoo utilize structured data from RDFa and Microformats to enhance search results.
How is the Semantic Web vision unfolding and what does it take for the Web to fully reach its potential and evolve from a Web of Documents to a Web of Data through universal data representation standards.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
Big Linked Data - Creating Training CurriculaEUCLID project
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
This document discusses converting library data to linked data. It describes how library data such as MARC records are currently not very readable and do not follow linked data principles. The author details converting library data to RDF and linking it to external datasets using ontologies like Dublin Core and SKOS. This creates readable, sharable, linkable and distributable library data that is more integrated and queryable. A prototype of the National Technical Library's linked data uses a lightweight API and open licenses to provide open bibliographic data in a format that can exist alongside original data distribution methods.
Development of Semantic Web based Disaster Management SystemNIT Durgapur
Semantic Web model In the field of disaster management to structurise the data such that any information needed during emergency will be easily available.
This document discusses the potential benefits of using linked data in libraries. It explains that linked data connects related data on the web using URIs and RDF triples. This allows data to be integrated, extended and reused. The document provides examples of how linked data could unlock library data, connect different library systems, and allow complex relationships to be modeled. Overall, it argues that linked data can help libraries share and integrate their data in new ways.
This document discusses the evolution of the web from a network of documents to a network of linked data. It begins by describing the original web of documents, which organized information in silos and had implicit semantics. The document then introduces the concept of the semantic web and linked data, which structures information as interconnected data using explicit semantics. It provides examples of how linked data can be represented using RDF triples and describes the principles of linked data for publishing and connecting data on the web. Finally, it discusses characteristics and examples of linked data applications.
This presentation by Shana McDanold of Georgetown University was presented during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016
This document discusses the evolution of the web from a web of documents to a web of linked data. It outlines the principles of linked data, which involve using URIs to identify things and linking those URIs to other URIs so that machines can discover more data. RDF is introduced as a standard data model for publishing linked data on the web using triples. Examples of linked data applications and datasets are provided to illustrate how linked data allows the web to function as a global database.
This document summarizes a workshop on linking library data. It introduces linked data and key technologies used for linking such as URIs, RDF, and SPARQL. It discusses challenges in linking data like finding suitable datasets to link, encouraging others to link to your data, determining link quality, and maintaining links over time. Finally, it briefly introduces the Silk framework for interlinking data and having participants discuss practical linking of library data.
The document discusses Semantic Web (Web 3.0) and defines key concepts such as RDF, SPARQL, triple stores, and OWL. It notes that vendors have created platforms and tools to implement Semantic Web technologies. However, challenges remain such as dealing with vast and vague data, duplication, inconsistencies, and logical contradictions in ontologies. While consolidation to a single approach may not be necessary, machine learning and both human-designed and AI approaches could help address these challenges.
This document discusses library linked data and the future of bibliographic control. It begins by asking what library linked data means and why it is important now. To combine the best of libraries and the web, metadata must be on the web and open for others to use. The principles of linked data are described, including using URIs, HTTP URIs, providing useful information in RDF, and including links to other URIs. The building blocks of linked data like RDF and triples are explained. Examples of existing library linked data projects are provided. The BIBFRAME initiative to develop a new framework to manage library data as linked data is outlined.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
This document provides an introduction to the Semantic Web, covering topics such as what the Semantic Web is, how semantic data is represented and stored, querying semantic data using SPARQL, and who is implementing Semantic Web technologies. The presentation includes definitions of key concepts, examples to illustrate technical aspects, and discussions of how the Semantic Web compares to other technologies. Major companies implementing aspects of the Semantic Web are highlighted.
The document discusses the FRBRoo ontology, which was created to represent bibliographic information and facilitate information sharing between libraries and museums. It harmonizes the FRBR model for libraries with the CIDOC CRM model for museums to address semantic interoperability issues between the two domains. Graphical representations of the FRBRoo version 1.0 ontology are available on the provided URLs.
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
The document discusses the benefits and challenges of transitioning library data to linked data standards to make the data more accessible and interoperable on the web. It outlines principles of linked data and how library data could be transformed by assigning URIs to concepts, linking data sources, and storing data as RDF triples. Barriers include outdated library processes and standards like MARC that inhibit innovation, but initiatives like RDA, OpenLibrary, and data projects from the German National Library are helping advance the linked library data vision.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
The document discusses the Semantic Web and how it provides a common framework to share and reuse data across applications and organizations. It describes Resource Description Framework (RDF) and how it represents relationships in a simple data structure using graphs. It also discusses Linked Data design principles and standards like RDFa and Microformats that embed semantics into web pages. Finally, it provides examples of how search engines like Google and Yahoo utilize structured data from RDFa and Microformats to enhance search results.
How is the Semantic Web vision unfolding and what does it take for the Web to fully reach its potential and evolve from a Web of Documents to a Web of Data through universal data representation standards.
A Semantic Data Model for Web ApplicationsArmin Haller
This presentation gives a short overview of the Semantic Web, RDFa and Linked Data. The second part briefly discusses ActiveRaUL, our model and system for developing form-based Web applications using Semantic Web technologies.
Big Linked Data - Creating Training CurriculaEUCLID project
This presentation includes an overview of the basic rules to follow when developing training and education curricula for Linked Data and Big Linked Data
This document discusses converting library data to linked data. It describes how library data such as MARC records are currently not very readable and do not follow linked data principles. The author details converting library data to RDF and linking it to external datasets using ontologies like Dublin Core and SKOS. This creates readable, sharable, linkable and distributable library data that is more integrated and queryable. A prototype of the National Technical Library's linked data uses a lightweight API and open licenses to provide open bibliographic data in a format that can exist alongside original data distribution methods.
Development of Semantic Web based Disaster Management SystemNIT Durgapur
Semantic Web model In the field of disaster management to structurise the data such that any information needed during emergency will be easily available.
This document discusses the potential benefits of using linked data in libraries. It explains that linked data connects related data on the web using URIs and RDF triples. This allows data to be integrated, extended and reused. The document provides examples of how linked data could unlock library data, connect different library systems, and allow complex relationships to be modeled. Overall, it argues that linked data can help libraries share and integrate their data in new ways.
This document discusses the evolution of the web from a network of documents to a network of linked data. It begins by describing the original web of documents, which organized information in silos and had implicit semantics. The document then introduces the concept of the semantic web and linked data, which structures information as interconnected data using explicit semantics. It provides examples of how linked data can be represented using RDF triples and describes the principles of linked data for publishing and connecting data on the web. Finally, it discusses characteristics and examples of linked data applications.
This presentation by Shana McDanold of Georgetown University was presented during the NISO Virtual Conference, BIBFRAME & Real World Applications of Linked Bibliographic Data, held on June 15, 2016
This document discusses the evolution of the web from a web of documents to a web of linked data. It outlines the principles of linked data, which involve using URIs to identify things and linking those URIs to other URIs so that machines can discover more data. RDF is introduced as a standard data model for publishing linked data on the web using triples. Examples of linked data applications and datasets are provided to illustrate how linked data allows the web to function as a global database.
This document summarizes a workshop on linking library data. It introduces linked data and key technologies used for linking such as URIs, RDF, and SPARQL. It discusses challenges in linking data like finding suitable datasets to link, encouraging others to link to your data, determining link quality, and maintaining links over time. Finally, it briefly introduces the Silk framework for interlinking data and having participants discuss practical linking of library data.
The document discusses Semantic Web (Web 3.0) and defines key concepts such as RDF, SPARQL, triple stores, and OWL. It notes that vendors have created platforms and tools to implement Semantic Web technologies. However, challenges remain such as dealing with vast and vague data, duplication, inconsistencies, and logical contradictions in ontologies. While consolidation to a single approach may not be necessary, machine learning and both human-designed and AI approaches could help address these challenges.
This document discusses library linked data and the future of bibliographic control. It begins by asking what library linked data means and why it is important now. To combine the best of libraries and the web, metadata must be on the web and open for others to use. The principles of linked data are described, including using URIs, HTTP URIs, providing useful information in RDF, and including links to other URIs. The building blocks of linked data like RDF and triples are explained. Examples of existing library linked data projects are provided. The BIBFRAME initiative to develop a new framework to manage library data as linked data is outlined.
Exploration, visualization and querying of linked open data sourcesLaura Po
afternoon hands-on session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
This document provides an introduction to the Semantic Web, covering topics such as what the Semantic Web is, how semantic data is represented and stored, querying semantic data using SPARQL, and who is implementing Semantic Web technologies. The presentation includes definitions of key concepts, examples to illustrate technical aspects, and discussions of how the Semantic Web compares to other technologies. Major companies implementing aspects of the Semantic Web are highlighted.
The document discusses the FRBRoo ontology, which was created to represent bibliographic information and facilitate information sharing between libraries and museums. It harmonizes the FRBR model for libraries with the CIDOC CRM model for museums to address semantic interoperability issues between the two domains. Graphical representations of the FRBRoo version 1.0 ontology are available on the provided URLs.
Best Practices for Large Scale Text Mining ProcessingOntotext
Q&A:
NOW facilitates semantic search by having annotations attached to search strings. How compolex does that get, e.g. with wildcards between annotated strings?
NOW’s searchbox is quite basic at the moment, but still supports a few scenarios.
1. Pure concept/faceted search - search for all documents containing a concept or where a set of concepts are co-occurring. Ranking is based on frequence of occurrence.
2. Concept/faceted + Full Text search - search for both concepts and particular textual term of phrase.
3. Full text search
With search, pretty much anything can be done to customise it. For the NOW showcase we’ve kept it fairly simple, as usually every client has a slightly different case and wants to tune search in a slightly different direction.
The search in NOW is faceted which means that you search with concepts (facets) and you retrieve all documents which contain mentions of the searched concept. If you search by more than one facet the engine retrieves documents which contain mentions of both concepts but there is no restriction that they occur next to each other.
Is the tagging service expandable (say with custom ontologies)? also is it a something you offer as a service? it is unclear to me from the website.
The TAG service is used for demonstration purposes only. The models behind it are trained for annotating news articles. The pipeline is customizable for every concrete scenario, different domains and entities of interest. You can access several of our pipelines as a service through the S4 platform or you can have them hosted as an on premise solution. In some cases our clients want domain adaptation or improvements in particular area, or to tag with their internal dataset - in this case we offer again an on premise deployment and also a managed service hosted on our hardware.
Hdoes your system accomodate cluster analysis using unsupervised keyword/phrase annotation for knowledge discovery?
As much as the patterns of user behaviour are also considered knowledge discovery we employ these for suggesting related reads. Apart from these we have experience tailoring custom clustering pipelines which also rely on features like keyword and named entities.
For topic extraction how many topics can we extract? from twitter corpus wgat csn we infer?
For topic extraction we have determined that we obtain best results when suggesting 3 categories. These are taken from IPTC but only the uppermost levels which are less than 20.
The twitter corpus example is from a project Ontotext participates in called Pheme. The goal of the project is to detect rumours and to check their veracity, thus help journalists in their hunt for attractive news.
Do you provide Processing Resources and JAPE rules for GATE framework and that can be used with GATE embedded?
We are contributing to the GATE framework and everything which has been wrapped up as PRs has been included the corresponding GATE distributions.
This document discusses semantic technologies for cultural heritage. It introduces Ontotext Corp, which develops semantic technology, and some of their projects involving cultural heritage data. These include the ResearchSpace project with the British Museum, projects involving Europeana like Bulgariana and Europeana Creative, and publishing Getty vocabularies as linked open data.
Mobile cultural heritage guide: location-aware semantic search (EKAW2010)chrisvanaart
In this paper we explore the use of location aware mobile devices for
searching and browsing a large number of general and cultural
heritage information repositories. Based on
GPS positioning we can determine a user's location and context, composed of physical
nearby locations, historic events that have taken place there, artworks
that were created at or inspired by those locations and artists that have
lived or worked there. Based on a geolocation, the user has three levels of refinement: pointing to a specific heading and selection and facets and subfacets of cultural heritage objects. In
our approach two types of knowledge are combined: general knowledge
about geolocations and points of interest and specialized knowledge about
a particular domain, i.e. cultural heritage. We use a number of
Linked Open Data sources and a
number of general sources from the cultural heritage domain (including Art and Architecture Thesaurus, Union List of Artist Names) as well as data from several Dutch cultural institutions. We show three concrete
scenarios where a tourist accesses localized information on his
iPhone about the current environment, events, artworks or persons, which are enriched by Linked Open Data sources.
We show that Linked Open Data sources in isolation are currently too limited to
provide interesting semantic information but combined with each other
and with a number of other sources a really informative location-based
service can be created.
Using the Micropublications ontology and the Open Annotation Data Model to re...jodischneider
This document discusses a project to construct a knowledge base linking drug interaction assertions to evidence from source documents. It will use the Micropublications Ontology to represent each assertion's support graph of claims and evidence, and the Open Annotation model to dynamically link support graph elements to quoted text excerpts from sources. The knowledge base will help answer competency questions about assertions, evidence, and their provenance. Challenges include representing both structured and unstructured text claims and efficiently querying the evidence base at scale.
In this talk we outline some of the key challenges in text analytics, describe some of Endeca's current research work in this area, examine the current state of the text analytics market and explore some of the prospects for the future.
Hadoop World 2011: Data Mining in Hadoop, Making Sense of it in Mahout! - Mic...Cloudera, Inc.
Much of Hadoop adoption thus far has been for use cases such as processing log files, text mining, and storing masses of file data -- all very necessary, but largely not exciting. In this presentation, Michael Cutler presents a selection of methodologies, primarily using Mahout, that will enable you to derive real insight into your data (mined in Hadoop) and build a recommendation engine focused on the implicit data collected from your users.
Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based...Mohamed Zaki
Complexity surrounding the holistic nature of customer experience has made measuring customer perceptions of interactive service experiences challenging. At the same time, advances in technology and changes in methods for collecting explicit customer
feedback are generating increasing volumes of unstructured textual data, making it difficult for managers to analyze and interpret this information. Consequently, text mining, a method enabling automatic extraction of information from textual data, is gaining in popularity. However, this method has performed below expectations in terms of depth of analysis of customer experience feedback and accuracy. In this study, we advance linguistics-based text mining modeling to inform the process of developing an
improved framework. The proposed framework incorporates important elements of customer experience, service methodologies, and theories such as cocreation processes, interactions, and context. This more holistic approach for analyzing feedback
facilitates a deeper analysis of customer feedback experiences, by encompassing three value creation elements: activities, resources, and context (ARC). Empirical results show that the ARC framework facilitates the development of a text mining model for analysis of customer textual feedback that enables companies to assess the impact of interactive service processes on customer experiences. The proposed text mining model shows high accuracy levels and provides flexibility through training. As such, it can evolve to account for changing contexts over time and be deployed across different (service) business domains; we term it an ‘‘open learning’’ model. The ability to timely assess customer experience feedback represents a prerequisite for successful cocreation processes in a service environment.
Ariadne Booklet 2016: Building a research infrastructure for Digital Archaeol...ariadnenetwork
Authors:
Kate Fernie (PIN and 2Culture Associates Ltd)
Franco Niccolucci (PIN)
Julian Richards (University of York)
Contributors:
Achille Felicetti, Ilenia Galluccio and Paola Ronzino (PIN),
Bruno Fanini (ITABC CNR)
Carlo Meghini, Matteo Dellepiane and Roberto Scopigno (ISTI CNR)
Dimitris Gavrilis (Athena Research Centre)
Douglas Tudhope (University of South Wales)
Elizabeth Fentress (AIAC)
Guntram Geser (Salzburg Research)
Holly Wright (University of York)
Johan Fihn (SND)
Maria Theodoridou (ICS Forth)
The document discusses text mining tools, techniques, and applications. It provides examples of using text mining for medical research to discover relationships between migraines and biochemical levels. Another example shows using call center records to analyze customer sentiment and identify problem areas. The document also discusses challenges of text mining like ambiguity and context sensitivity in language. It outlines text processing techniques including statistical analysis, language analysis, and information extraction. Finally, it discusses interfaces and visualization challenges for presenting text mining results.
This document provides a summary of study resources for data mining and machine learning, including:
- A roadmap that classifies books by difficulty level from prerequisite to advanced PhD level.
- Recommended books for each level, including original English versions and Korean translations when available. PDF versions and lecture videos are referenced.
- Online learning platforms like Coursera and edX that offer free machine learning courses, as well as YouTube channels and individual instructors.
- Interactive learning sites for practicing R and Python, including Codecademy, Datacamp and online Python tutors.
- Additional websites providing tutorials, quick references, and packages for data mining algorithms.
Text mining seeks to extract useful information from unstructured text documents. It involves preprocessing the text, identifying features, and applying techniques from data mining, machine learning and natural language processing to discover patterns. The core operations of text mining include analyzing distributions of concepts, identifying frequent concept sets and associations between concepts. Text mining systems aim to analyze document collections over time to identify trends, ephemeral relationships and anomalous patterns.
The common use by archaeologists of ubiquitous technologies such as computers and digital cameras means that archaeological research projects now produce huge amounts of diverse, digital documentation. However, while the technology is available to collect this documentation, we still largely lack community accepted dissemination channels appropriate for such torrents of data. Open Context (http://www.opencontext.org) aims to help fill this gap by providing open access data publication services for archaeology. Open Context has a flexible and generalized technical architecture that can accommodate most archaeological datasets, despite the lack of common recording systems or other documentation standards. Open Context includes a variety of tools to make data dissemination easier and more worthwhile. Authorship is clearly identified through citation tools, a web-based publication systems enables individuals upload their own data for review, and collaboration is facilitated through easy download and other features. While we have demonstrated a potentially valuable approach for data sharing, we face significant challenges in scaling Open Context up for serving large quantities of data from multiple projects.
The document discusses the history and future of online public access catalogs (OPACs) in libraries. It describes how early OPACs mimicked card catalogs but now "Next Generation" OPACs offer new interactive features like faceted searching, tags, and social networking tools. The future of OPACs involves new models like Blacklight that make searching more intuitive. Two ideas for the future are having no central catalog or a worldwide central catalog.
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
This document discusses practical applications of Linked Open Data (LOD) for libraries, archives, and museums. It describes how LOD allows these institutions to publish structured data on the web in ways that are interoperable and can be connected to other open datasets. Examples are given of how LOD is being used by various institutions to share metadata, images, and other cultural heritage assets on the web in open, machine-readable formats. The presenter argues that LOD represents a new paradigm that these cultural organizations should embrace to make their collections more accessible and useful on the web.
The document discusses personal information management (PIM) tools and strategies. It describes how PIM has been an issue since information became available and outlines some common PIM tools like email, calendars, computer desktop organization, and websites. It also discusses the implications of increased digital information storage, such as challenges around saving, organizing, and retrieving personal information across multiple tools and locations.
web 2.0, library systems and the library systemlisld
The Web 2.0 environment is characterized by concentration and diffusion. Library services are not well matched to this environment: they are fragmented and difficult to mobilize in user workflows. This presentation analyzes this situation and suggests some directions.
Digital Library Applications Of Social Networking Jeju Intl Conferenceguestbba8ac
Digital Library Applications of Social Networking discusses how social networking can be applied in libraries. It outlines how social networking sites like LibraryThing and Delicious allow users to interact and share resources. The document also discusses using linked data and semantic web standards like SKOS, RDF, and FRBR to represent controlled vocabularies and metadata in a way that is interoperable on the web. Representing this data semantically allows resources to be better discovered and connected across systems.
Semantic Libraries: the Container, the Content and the ContendersStefan Gradmann
The document discusses the transition from traditional libraries to semantic libraries, where information is organized and linked semantically rather than through physical containers and linear documents. It explores how libraries can generate knowledge through automated reasoning on semantically enriched content. Several tools and projects are presented that aim to publish content as structured, interconnected data in order to realize the vision of semantic libraries.
Building Heterogeneous Networks of Digital Libraries on the Semantic WebSebastian Ryszard Kruk
This document discusses building a heterogeneous network of digital libraries on the semantic web. It motivates the use of semantic digital libraries by describing how they integrate information from different metadata sources and provide interoperability. It then introduces JeromeDL, an open source semantic digital library, and describes its key components like the MarcOnt ontology and mediation services for legacy metadata. Finally, it discusses the Extensible Library Protocol for querying between semantic digital libraries and future work.
Du Literary and linguistic computing aux Digital Humanities : retour sur 40 a...OpenEdition
The document discusses the evolution of digital humanities from literary and linguistic computing to humanities computing to digital humanities. Key points include:
1) Digital technologies have transformed humanities scholarship by making objects of study digital and changing research methods.
2) Early work in literary and linguistic computing in the 1960s-1980s used computers to analyze texts but was only accessible to technical experts.
3) Humanities computing from the 1980s-1990s saw institutionalization and standardization through projects like the Text Encoding Initiative (TEI).
4) Digital humanities from the 1990s onward has been shaped by increased digitization, collaboration, and development of new infrastructures and approaches like linking and analyzing
Digital libraries of the future will use semantic web and social bookmarking technologies to support e-learning. Semantic digital libraries integrate information from different metadata sources to provide more robust search and browsing interfaces. They describe resources in a machine-understandable way using ontologies and expose semantics to enable interoperability between systems. This allows new search paradigms like ontology-based search and helps integrate metadata from different sources.
Are you talking to me? Researching a scenario for linking objects and publica...Ellen Van Keer
presentation of the workflow designed in the project "bridging knowledge collections", aimed at integrating an institutional repository with the online catalogs of the museum objects and library publications kept in the RMAH
The document discusses challenges and opportunities for preserving linked data on the web over time. It describes how the web has evolved from largely unstructured content on Web 1.0 to more structured and interconnected data as a valuable asset on Web 3.0. Preserving linked data presents unique issues compared to traditional digital preservation since linked data is graph-structured, distributed across sources, and dynamically changing. Effective long-term preservation requires approaches that account for the complex interdependencies and heterogeneity of linked data sources.
Discussing the Scottish Information environment and ways to open access within social networking platforms, by K. Menzies, CDLR, given at Metadata issues and Web 2.0 services CIGS seminar, Fri 30 Jan, 2009.
http://scone.strath.ac.uk/scie/index.cfm
Olaf Janssen on the principles of large-scale digital libraries and their app...Olaf Janssen
Europeana is a large-scale digital library that aggregates over 4 million items from over 1,000 cultural heritage institutions across Europe. It provides centralized access to digitized content from different domains including libraries, archives, museums and audiovisual collections. Europeana aggregates metadata describing objects rather than housing digital objects themselves. The European Union has supported the development of Europeana to provide a single access point for Europe's distributed cultural heritage and promote a common European identity.
The document discusses the Bodleian Library's efforts to address the challenges of preserving personal digital collections. It notes the rapid growth of personal digital media and the need to adapt archival practices. The Bodleian's project, called futureArch, aims to transform its capacity for hybrid archives over three years by establishing workflows, roles, infrastructure, and access methods for born-digital materials. FutureArch will help the Bodleian better preserve, process, catalogue, and provide access to creators' digital archives.
The document discusses how linking open data and semantics can benefit digital humanities research using Europeana. It proposes fully implementing the Europeana Data Model to represent cultural heritage objects as linked open data. This would connect objects across domains and with external datasets like DBpedia. Combining this enriched semantic data with tools like SwickyNotes could facilitate new forms of digital scholarship through semantic exploration, context discovery, and knowledge generation.
Victor de Boer discusses how linked data can be used for digital humanities research. He explains that linked data allows researchers to integrate heterogeneous datasets while retaining their original data models, enabling new types of analysis. Examples are given of projects that have applied linked data principles to cultural heritage data from museums, historical texts, biographical data, and maritime records. Linked data facilitates exploring connections between these datasets and reusing background knowledge from other sources.
Ähnlich wie Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage (20)
Property graph vs. RDF Triplestore comparison in 2020Ontotext
This presentation goes all the way from intro "what graph databases are" to table comparing the RDF vs. PG plus two different diagrams presenting the market circa 2020
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
This presentation will provide a brief introduction to logical reasoning and overview of the most popular semantic schema and ontology languages: RDFS and the profiles of OWL 2.
While automatic reasoning has always inspired the imagination, numerous projects have failed to deliver to the promises. The typical pitfalls related to ontologies and symbolic reasoning fall into two categories:
- Over-engineered ontologies. The selected ontology language and modeling patterns can be too expressive. This can make the results of inference hard to understand and verify, which in its turn makes KG hard to evolve and maintain. It can also impose performance penalties far greater than the benefits.
- Inappropriate reasoning support. There are many inference algorithms and implementation approaches, which work well with taxonomies and conceptual models of few thousands of concepts, but cannot cope with KG of millions of entities.
- Inappropriate data layer architecture. One such example is reasoning with virtual KG, which is often infeasible.
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
A presentation of Ontotext’s CEO Atanas Kiryakov, given during Semantics 2018 - an annual conference that brings together researchers and professionals from all over the world to share knowledge and expertise on semantic computing.
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
With the tons of bits of data around enterprises and the challenge to turn these data into knowledge, meaning is arguably in the systems of the best database holder.
Turning data pieces into actionable knowledge and data-driven decisions takes a good and reliable database. The RDF database is one such solution.
It captures and analyzes large volumes of diverse data while at the same time is able to manage and retrieve each and every connection these data ever get to enter in.
In our latest slides, you will find out why we believe RDF graph databases work wonders with serving information needs and handling the growing amounts of diverse data every organization faces today.
The Bounties of Semantic Data Integration for the Enterprise Ontotext
Semantic data integration allows enterprises to connect heterogeneous data sources through a common language. This creates a unified 360-degree view of enterprise data and facilitates knowledge management and use. Semantic integration aims to enrich existing data with external knowledge and provide a single access point for enterprise assets. It addresses challenges of accessing and storing data from various internal resources by building a well-structured integrated whole to enhance business processes.
[Webinar] GraphDB Fundamentals: Adding Meaning to Your DataOntotext
In this webinar, Desislava Hristova demonstrated how to install and set-up GraphDB™ and how one can generate RDF dataset. She also showed how one can quickly integrate complex and highly interconnected data using RDF, how to write some simple SPARQL queries and more.
In a nutshell, this webinar is suitable for those who are new to RDF databases and would like to learn how they can smartly manage their data assets with GraphDB™.
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
Ontotext introduced their cognitive analytics platform that performs cognitive graph analytics on company data and news. The platform builds large knowledge graphs by integrating data from multiple sources and uses text mining to link news articles to entities in the knowledge graph. It provides functionality for node ranking, similarity analysis and data cleaning to consolidate and reconcile company records across datasets. The platform was demonstrated through a knowledge graph containing over 2 billion facts built by integrating datasets like DBpedia, Geonames, and news article metadata.
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Ontotext
These are slides from a live webinar taken place January 2018.
GraphDB™ Fundamentals builds the basis for working with graph databases that utilize the W3C standards, and particularly GraphDB™. In this webinar, we demonstrated how to install and set-up GraphDB™ 8.4 and how you can generate your first RDF dataset. We also showed how to quickly integrate complex and highly interconnected data using RDF and SPARQL and much more.
With the help of GraphDB™, you can start smartly managing your data assets, visually represent your data model and get insights from them.
Hercule: Journalist Platform to Find Breaking News and Fight Fake OnesOntotext
Hercule: a platform to help journalists detect emerging news topics, check their veracity, track an event as it unfolds and find the various angles in a story as it develops.
How to migrate to GraphDB in 10 easy to follow steps Ontotext
GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database. GraphDB Migration Service helps you institute Ontotext GraphDB™ as your new semantic graph database.
Designed with a view to making your transitioning to GraphDB frictionless and resource-effective, GraphDB Migration Service provides the technical support and expertise you and your team of developers need to build a highly efficient architecture for semantic annotation, indexing and retrieval of digital assets.
With GraphDB Migration Services you will:
* Optimize the cost of managing the RDF database;
* Improve the performance of your system;
* Get the maximum value from your semantic solution.
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
GraphDB Cloud is an enterprise grade RDF graph database providing high-performance querying over large volumes of RDF data. On this webinar, Ontotext demonstrates how to instantly create and deploy a fully managed Graph Database, then import & query data with the (OpenRDF) GraphDB Workbench, and finally explore and visualize data with the build in visualization tools.
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...Ontotext
This webinar continues series are demonstrating how linked open data and semantic tagging of news can be used for comprehensive media monitoring, market and business intelligence. The platform for the demonstrations is FactForge: a hub for news and data about people, organizations, and locations (POL). FactForge embodies a big knowledge graph (BKG) of more than 1 billion facts that allows various analytical queries, including tracing suspicious patterns of company control; media monitoring of people, including companies owned by them, their subsidiaries, etc.
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
Personalized content recommendation systems enable users to overcome the information overload associated with rapidly changing deep and wide content streams such as news. This webinar discusses Ontotext’s latest improvements to its Dynamic Semantic Publishing (DSP) platform NOW (News on the Web). The Platform includes social data mining, web usage mining, behavioral and contextual semantic fingerprinting, content typing and rich relationship search.
What is GraphDB and how can it help you run a smart data-driven business?
Learn about GraphDB through the solutions it offers in a simple and easy to understand way. In the slides below we have unpacked GraphDB for you, using as little tech talk as possible.
Efficient Practices for Large Scale Text Mining ProcessOntotext
Text mining is a need when managing large scale textual collections. It facilitates access to, otherwise, hard to organise unstructured and heterogeneous documents, allows for extraction of hidden knowledge and opens new dimensions in data exploration.
In this webinar, Ivelina Nikolova, PhD, shares best practices and text analysis examples from successful text mining process in domains like news, financial and scientific publishing, pharma industry and cultural heritage.
The Power of Semantic Technologies to Explore Linked Open DataOntotext
Atanas Kiryakov's, Ontotext’s CEO, presentation at the first edition of Graphorum (http://graphorum2017.dataversity.net/) – a new forum that taps into the growing interest in Graph Databases and Technologies. Graphorum is co-located with the Smart Data Conference, organized by the digital publishing platform Dataversity.
The presentation demonstrates the capabilities of Ontotext’s own approach to contributing to the discipline of more intelligent information gathering and analysis by:
- graphically explorinh the connectivity patterns in big datasets;
- building new links between identical entities residing in different data silos;
- getting insights of what type of queries can be run against various linked data sets;
- reliably filtering information based on relationships, e.g., between people and organizations, in the news;
- demonstrating the conversion of tabular data into RDF.
Learn more at http://ontotext.com/.
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
Semantic Data Normalization For Efficient Clinical Trial ResearchOntotext
This document discusses semantic data normalization of clinical trial data to make it more structured and amenable to analysis. It describes converting unstructured clinical data like conditions, interventions, adverse events and eligibility criteria into RDF triples. The goal is to extract key phrases and concepts, identify qualifiers and relationships to formally represent the data. Examples show how condition texts, drug annotations and criteria can be modeled. Current work has normalized over 215,000 clinical studies from ClinicalTrials.gov into over 80 million RDF triples. The normalized data is pre-loaded in GraphDB and Ontotext S4 Cloud and can be explored and analyzed more easily.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
1. LINKED OPEN DATA FOR CULTURAL HERITAGE
VLADIMIR ALEXIEV
VLADIMIR.ALEXIEV@ONTOTEXT.COM
2016-09-29
2D presentation: , ,O for overview H for help normal continuous HTML
2. TABLE OF CONTENTS
1. Intro
1.1. GLAM vs Internet
1.2. Google NGrams: Phrases in Books
1.3. Google NGrams: Two Speci c Orgs
1.4. Google Trends: Search Popularity
1.5. How To Survive in the Internet Age?
1.6. Why Linked Open Data (LOD) is Important
2. GLAM Content Standards
2.1. Museum Content Standards
2.2. Archival Content Standards
2.3. Library Content Standards
3. GLAM Metadata Schemas
3.1. Seeing Standards (2)
3.2. XML Schemas
3.3. Museum Metadata: CDWA
3.4. Archive Metadata
3.5. Library Metadata: MARC
4. GLAM Ontologies
4.1. Europeana Data Model
4.2. CIDOC CRM
4.3. Web Annotation (Open Annotation, OA)
4.4. International Image Interop Framework (IIIF)
4.5. Library Ontologies
4.6. Archival Ontologies
5. GLAM LOD Datasets (LODLAM)
5.1. Wikidata
5.2. VIAF
5.3. Global Authority Control
3. 1 INTRO
A bit about me: co-founder of Sirma Group Holding, Bulgaria's largest software group
and parent company of Ontotext
30y in IT: 8 at university, 22 in industry
Did plenty of project management, business analysis and data modeling, some big
projects too
Last 8 years focused on data modeling and integration
Last 6 years in paricular, focused on semantic data and semantic integration
I love to poke in other people's data and get in-depth. So there's a lot about data in
these slides
See : you can sort by type and keyword, full abstracts are available.
I've provided a few references below, but if a topic interests you, please search in
the publications
The shorter version has about 110 slides, so sit back, relax, and enjoy the ride. Should
take us 1:20h
Ask questions at any time in the chat, I'll answer them all at the end
This longer version has 130 slides, including info about Library metadata and
ontologies
My publications
4. 1.1 GLAM VS INTERNET
GLAM, CH, DH?
Cultural Heritage (CH): the sum of our non-economic heritage
Obvious implications to economically signi cant sectors, eg tourism
Some say it's the source of all creativity, would you agree?
Includes old and new (eg digitally-born), material and immaterial, tangible and
intangible, permanent and temporal (eg interactive installations)
Galleries, Libraries, Archives, Museums (GLAM): sisterhood of institutions that care
for our CH, each with its own perspective and priorities
Digital Humanities (DH): the use of computers in the humanities.
Eg some UK universities with DH programs: @KingsDH @UCLDH @DH_OU
@CamDigHum
5. 1.2 GOOGLE NGRAMS: PHRASES IN BOOKS
Search for "library, museum" vs "Google, Facebook, Twitter" in books: the web sites are
negligible
6. 1.3 GOOGLE NGRAMS: TWO SPECIFIC ORGS
Compare two speci c orgs: "Facebook" is more popular in recent books, compared to
"British Museum" over time
7. 1.4 GOOGLE TRENDS: SEARCH POPULARITY
Web searches over the last 12 years: "Facebook, Google" are much more popular than
"library, museum"
8. 1.5 HOW TO SURVIVE IN THE INTERNET AGE?
Since ancient times GLAMs have been the centers of knowledge and wisdom
Aren’t Google, Wikipedia, Facebook, Twitter and smart-phone apps becoming the
new centers of research and culture (or at least popular culture)?
Will GLAMs fall victims to teenagers with smartphones browsing Facebook? If the
library's attitude is "Come search in our OPAC" then certainly yes
How to preserve the role of GLAMs into the new millennium?
To survive, GLAMs must adopt the internet as their default modus operandi
Web 1.0: presentation
Web 2.0: interaction
Web 3.0 (semantic web): data linking, enriching/disambiguating text using NLP/IE
approaches
9. 1.6 WHY LINKED OPEN DATA (LOD) IS IMPORTANT
Culture is naturally cross-institutional, cross-border, multilingual, and interlinked
LOD allows making connections between (and making sense of) the multitude of
digitized cultural artifacts available on the net
LOD enables large-scale Digital Humanities research, collaboration and aggregation;
technological renewal of CH institutions
10. 2 GLAM CONTENT STANDARDS
GLAM data is complex and varied
Exception is the rule
Many metadata format variations
Data comes from a variety of systems
Thus professional organizations have found it useful to de ne content standards
Describe what data to capture (and sometimes how to go about it)
Before formalizing how to express it in machine-readable form
Examples are extremely useful for data modelers to decide how to map the data
11. 2.1 MUSEUM CONTENT STANDARDS
: content standard for art, architecture, museumsCataloging Cultural Objects
14. 2.1.3 CCO EXAMPLE: CREATOR EXTENT
How to describe one aspect of the data
15. 2.1.4 SPECTRUM
UK Museum Collections Management Standard
De nes procedures for museums to follow, and the attendant data
Covers 21 procedures: Pre-entry, Object entry, Loans in, Acquisition, Inventory
control, Location and movement control, Transport, Cataloguing, Object condition
checking and technical assessment, Conservation and collections care, Risk
management, Insurance and indemnity management, Valuation control, Audit, Rights
management, Use of collections, Object exit, Loans out, Loss and damage,
Deaccession and disposal, Retrospective documentation
Addresses accreditation
17. 2.2 ARCHIVAL CONTENT STANDARDS
ISAD(G): archival materials
ISAAR(CPF): agents (corporations, people, families)
ISDF: functions (eg Secretary of some society)
ISDIAH: archival holding institutions
Image by D.Pitti, 2015
18. 2.3 LIBRARY CONTENT STANDARDS
AACR2 (Anglo-American Cataloging Rules 2)
International Standard Bibliographic Description (ISBD)
Resource Description and Access (RDA)
Extremely detailed and comprehensive (see RDA later). But sometimes pay more
attention where to put the commas than to:
Data sharing
Global availability of resources
Sharing the cataloging burden
19. 2.3.1 FRBR, FRSAD, FRAD
Functional Requirements for Bibliographic Records (FRBR), Subject Authority Data
(FRSAD), Authority Data (FRAD) (J.Mitchell, M.Zeng, M.Zumer, 2011)
20. 2.3.2 FRBR
Starts from user tasks ( nd, identify, select, obtain, explore). Introduces the important 4-
level WEMI model (relates to Uniform Titles):
Work: original or derived intellectual work (eg Don Quixote)
Expression: translation or edition (eg Don Quixote translation to English)
Manifestation: publisher's work (eg with illustrations, foreword by, compilation…).
ISBNs are here
Item: physical copy: libraries track loan/availability; famous copies (eg Lincoln's Bible);
manuscripts are singleton items
23. 3 GLAM METADATA SCHEMAS
How many of the standards listed in
apply to your work? (by Jenn Riley, Associate Dean for Digital Initiatives at
McGill University Library)
Seeing Standards: A Visualization of the Metadata
Universe
25. 3.2 XML SCHEMAS
Do you deal with XML? I bet you do
XML Schema (XSD): most widely used, but most unwieldy
RelaxNG (RNG): new generation schema language
RNG Compact (RNC): non-XML notation, most readable. Eg EAD3 is mastered in
RNC, then RNG and XSD produced
Schematron: express rules in XPath that can't be captured in XSD/RNG/RNC (eg
cross- eld validation)
Tools:
: patch the jing RNG validator to
emit errors like Schematron (SVRL with XPath error location)
: RNC tools and CH schemas in RNC. Emacs
with code highlighting and syntax checking ( ycheck)
https://github.com/EHRI/jing-trang/tree/EHRI-176
https://github.com/VladimirAlexiev/rnc
26. 3.3 MUSEUM METADATA: CDWA
Categories for the Description of Works of Art (CDWA): realization of CCO, 532
"categories" (data elements).
27. 3.3.1 CDWA LITE
XML schema implementing part of CDWA. Moderate complexity, about 300 elements.
Display vs Indexing (structured) elements, eg for Dimension.
28. 3.3.2 CONA SCHEMA
Cultural Objects Name Authority (CONA): Getty museum data aggregation. Moderate
complexity, about 280 elements:
29. 3.3.3 SPECTRUM XML
has 10 entities and 592 elds, of which 490 are Object
(artwork) elds. I am not aware of any systems producing this.
SPECTRUM Schema 4.0b
30. 3.3.4 LIDO
Lightweight Information Describing Objects (LIDO). Evolved from CDWA, museumdat,
with inspiration from CIDOC CRM. (Images by R.Stein and A.Vitzthum, ATHENA
workshop, 2010)
31. 3.3.5 LIDO SCHEMA
Complex schema, eg when referring to a related object, you can provide almost as
much detail as for the main object. Could leverage opportunities for linking more.
Display vs Indexing (structured) elements: inherited from CDWA
33. 3.4.1 ARCHIVE METADATA PROBLEMS
Pay a lot of attention to presentation, not enough to linking (dif cult to "semanticize").
Emphasis on documents, not historic agents and events
EAG: So-called "controlled access points" are text, and typically not controlled at all
EAC: Many institutions don't consider EAC very valuable, and instead put person info
in EAD's element (example below from EADiva)
EAC: Related persons are names ("strings"), not links ("things")
EAC: Events include lots of info but only Date is separate eld (person names could be
tagged but often are not)
EAC: Family tree modeled as Outline, that's also used for other purposes (just
presentation)
bioghist
<bioghist>
<head>Chronological Events</head>
<chronlist>
<chronitem>
<date normal="19781028">October 28, 1978</date>
<event>
<persname normal="Wossname, Samuel">Sam Wossname</persname> succeeds
<persname normal="Othername, John">John Othername</persname> as department head.
</event>
</chronitem>
<chronitem>
<date normal="19790315">March 15, 1979</date>
<event>Departmental reorganization.</event>
</chronitem>
</chronlist>
</bioghist>
34. 3.5 LIBRARY METADATA: MARC
MARC is 50 years old, unreadable, and doesn't accommodate new FRBR principles.
MARC-XML is not much better
35. 3.5.1 MARC MUST DIE
A whole emotional subculture, based on a slogan by Roy Fielding, 2002.
: "MARC is dead" (is it really?)
: in-depth discussion wiki
marc-must-die.info
FutureLib
Facebook group
by Sally Chambers, ELAG 2011Presentation
36. 4 GLAM ONTOLOGIES
Why do they call conversion to RDF "lifting" and back to some other format "lowering"?
RDF is a simple abstracted data model
Doesn't have nesting biases like XML: whether a sub-element is nested or referenced
by ID. Has less syntactic idiosyncrasies
(RDF/XML is awful, but there is Turtle for readability, or JSONLD for programmer
convenience)
The model is self-describing in a distributed way: if a class/property is looked up,
should return description and info
37. 4.1 EUROPEANA DATA MODEL
Model used by the Europeana aggregator (53M objects), and adopted by Digital Public
Library of America (DPLA) Based on:
OAI ORE (Open Archives Initiative Object Reuse & Exchange): organizing object
metadata and digital representations (WebResources)
Dublin Core: descriptive metadata
SKOS (Simple Knowledge Organization System): conceptual objects (concepts,
agents, etc)
CIDOC-CRM inspired: events, some relations between objects
39. 4.1.2 EDM ISSUES/CONSIDERATIONS
Criticized that it's not expressive enough. Eg can't capture the speci c contribution of
an artist to artwork
Complication: splits info about an object:
EDM External (form provider): edm:ProvidedCHO and ore:Aggregation
EDM Internal (at Europeana): edm:ProvidedCHO and 2 <ore:Aggregation,
ore:Proxy> pairs
Many providers use the minimal features and make mistakes; Europeana didn't do a
lot of validation
Old objects retro-converted from ESE are poor (only text), though some
enrichments added by Europeana
formed, to push this strategic point (2015-
2020)
Europeana Data Quality Committee
Evolving speci cation (since 2009)
Currently considering actual implementation of Events
Extensions for manuscripts, music, fashion, etc
40. 4.2 CIDOC CRM
: comprehensive reference model used for history, historic events,
archaeology, museum data, etc by CIDOC (ICOM documentation committee).
Standardized as ISO 21127:2014, still evolving. About 85 classes, fundamental
branches: Persistent (endurant) vs Temporal (perdurant), Physical vs Conceptual
CIDOC CRM
41. 4.2.1 CIDOC CRM PROPERTIES
Classes represent abstract things (eg crm:E24_Physical_Man-Made_Thing), speci c
things (eg Paintings, Coins) are accommodated with crm:P2_has_type. 135 props (plus
their inverses); prop hierarchy (see "- - -" at bottom):
42. 4.2.2 CIDOC GRAPHICAL EXAMPLES
(or including Kindle)
(or including Kindle): essential to
understand how to apply CRM in various situations
Typical modeling construct short-cut (crm:P43_has_dimension) vs long-path (eg
crm:P39i_was_measured_by/crm:P40_observed_dimension), which allows more
details
Video Tutorial HTML version
Graphical Representation continuous HTML version
43. 4.3 WEB ANNOTATION (OPEN ANNOTATION, OA)
: mark, annotate, relate any web resources, eg: Webpage and bookmark, Image
and region over it, Document and translation, Paragraph and commentary. Diagram of
from spec (using my rdfpuml)
W3C TR
Complete Example
44. 4.4 INTERNATIONAL IMAGE INTEROP FRAMEWORK (IIIF)
Standard API for DeepZoom (hi-res) images. Supported by many servers and viewers.
http://iiif.io
45. 4.4.1 IIIF PRESENTATION API
Based on OA and SharedCanvas. Strong attention to JSONLD representation
(convenient for developers). Allows to assemble manuscripts from pieces, present folios,
etc etc. See , eg :Rob Sanderson presentations IIIF and JSONLD
46. 4.5 LIBRARY ONTOLOGIES
War of the Bibliographic Ontologies?
BIBO: used for a long time, pragmaic
FRBRer: pragmatic realization of FRBR, but little uptake (not rich enough?)
FRBRoo: based on CIDOC CRM, perhaps too complex
Fabio, Cito, Doco and friends: modern, includes new features (eg citation intent)
BibFrame: sponsored by LoC, but for modeling mistakes
RDAregistry.info: basic FRBR classes, numerous properties for all kinds of things.
Used for 100M records at TEL
SchemaBibEx ( ): steps on a clean model sponsored by the big 4
search engines (Google, MS Bing, Yahoo, Yandex.ru). Developed by OCLC. May end up
being used for 300M records at WorldCat.
soundly criticized
http://bib.schema.org
48. 4.5.2 RDAREGISTRY PROPERTIES
Many props (306 for Work alone), for speci c purposes (eg "apellee" for court decisions,
"granting institution" for academic theses). Numeric prop names, but lexical (natural
language) also supported. Serves many semantic formats.
49. 4.5.3 A TASTE OF FRBROO
Task Force: asked what to add to EDM to better t
FRBRoo.
EDM–FRBRoo Application Pro le
TF members developed a number of examples, eg on publications of "Don Quixote"
(T.Aalberg, V.Alexiev, J.Walkowska).
EDM variant:
52. 4.5.4 FRBR-INSPIRED
"FRBR, Before and After" by K.Coyle (ALA 2016) is an in-depth look at FRBR-inspired
models/realizations.
Chapter 10 describes the following ontologies: FRBRer, FRBRcore, FaBiO, <indecs>,
BIBFRAME, RDA in RDF, webFRBRer, FRBRoo
"Mistakes have been made", K.Coyle, SWIB 2015
53. 4.5.5 BRITISH LIBRARY DATA MODEL
Pragmatic data model that reuses several ontologies, and adds own props
54. 4.5.6 FIRST LIBRARY THAT RUNS ON RDF
Oslo Public Library ( , since 2014) uses Koha open source
software, RDF in the core, and /rdf2marc conversions. Pragmatic data model
that reuses several ontologies, and adds own props. Enables a number of agile apps, eg
search related books on Kiosk
http://data.deichman.no
marc2rdf
56. 4.6 ARCHIVAL ONTOLOGIES
3 attempts to represent EAD as RDF, but IMHO neither is very good.
Eg "The Semantic Mapping of Archival Metadata to the CIDOC CRM Ontology"
(Journal of Archival Organization, 9:174–207, 2011) proposes to represent the EAD
levels hierarchy (from Fonds down to Items) as ve parallel CRM hierarchies
Records in Context (RiC): new upcoming semantic standard by ICA
Addresses the scope of EAD, EAC, EAG in one framework. Inspired by national
standards, FRBR (FRBR-LRM), CIDOC CRM
(2015),
1.0 (Sep 2016): Document key components of archival description,
properties of each, relations between them
Ontology: after nalizing the Conceptual Model, Expressed in OWL, will include
semantic mapping to similar concepts developed by related communities
Progress report Mlist for comments
Conceptual Model
58. 5 GLAM LOD DATASETS (LODLAM)
Some established thesauri and gazetteers as LOD, some are interconnected:
DBPedia; Wikidata, VIAF, FAST, ULAN; GeoNames, Pleiades, TGN; LCSH, AAT,
IconClass, Joconde, SVCN, Wordnet, etc.
Not shown: large collection LODs like: Europeana (EDM), British Museum (CIDOC
CRM), YCBA (CIDOC CRM), Rijksmuseum (EDM)
(Diagram based on work by M.Hildebrand)
59. 5.1 WIKIDATA
Tons of info on everything, including GLAMs, artists, artworks, etc. Eg Frans Hals on
Reasonator
61. 5.1.2 SUM OF ALL PAINTINGS
. Data used for:Wikidata Project Sum of All Paintings
Works by painter across collections (catalogue raisonné). Eg Frans Hals
62. 5.1.3 CROTOS
Excellent image search. Shows links to WD, Wikimedia Commons, original website. Eg
Frans Hals on Crotos
63. 5.1.4 YOU CAN HELP TOO!
(9.9k of 140k). Important because <collection,
inventory number> is used to identify the painting. Eg (1k), (2)
Hunting for missing inventory numbers
US Getty Museum
64. 5.1.5 LET'S FIX THE SECOND ONE
, like this:Find it on Getty's site add the info
66. 5.2 VIAF
Virtual International Authority File: 20 national libraries, 10 other contributors
including Getty ULAN and Wikidata. Eg coreferencing cluster of Spinoza:
68. 5.3 GLOBAL AUTHORITY CONTROL
201307 ,
Wikimania 2013
201501 (initiated by Ontotext)
201503 study for Europeana of
datasets including Person/Organization names. Conclusions:
The best datasets to use for name enrichment are VIAF and Wikidata
There are few name forms in common between the "library-tradition" datasets
(dominated by VIAF) and the "LOD-tradition datasets" (dominated by Wikidata)
VIAF has more name variations and permutations, Wikidata has more multilingual
names (translations)
VIAF is much bigger: 35M persons/orgs. Wikidata has 2.7M persons and maybe
1M orgs
Only 0.5M of Wikidata persons/orgs are coreferenced to VIAF, with maybe
another 0.5M coreferenced to other datasets, either VIAF-constituent (eg GND)
or non-constituent (eg RKDartists)
A lot can be gained by leveraging coreferencing across VIAF and Wikidata
Wikidata has great tools for crowd-sourced coreferencing
Authority Addicts: The New Frontier of Authority Control on Wikidata
Wikidata Project Authority Control
Name Data Sources for Semantic Enrichment
69. 5.3.1 NAMES OF LUCAS CRANACH
in 7 LOD datasets (Wikidata: Freebase, DBpedia,
Yago; VIAF: ISNI, ULAN).
Analyzed records of Lucas Cranach
71. 5.3.3 MIX-N-MATCH
A global Authority on everything: librarian's dream come true! is a
collaborative tool to create coreferences. 234 authorities, including Getty AAT, TGN,
ULAN; RKD artists, works; LoC Authorities; VIAF (not in M-n-M but on WD); BM
persons; BBC YourPaintings; Artsy, etc etc
Mix-n-Match
72. 5.3.3.1 YOU CAN HELP WITH AUTHORITIES TOO!
Eg checking matches to Getty AAT. Single sign-on, a click per item. Easy!
73. 6 LODLAM PROJECTS
GLAM and DH projects present a bewildering variety, eg
Publishing Vocabularies/Thesauri as LOD
Publishing Museum collections and National Bibliographies as LOD
Enrichment of GLAM metadata with relevant thesauri, semantic and faceted search
Study of artistic in uence over time and space
Literary traditions, parallel editions
Poetic repertories
Studying manuscripts, stematology (manuscript derivation)
Historiography
Studying charters, prosopography ("micro biographies"). "Prosopography is Greek for
Facebook", , 2015SNAP:DRGN project
Research functions and sometimes integrated into Virtual Research Environments
74. 6.1 MELLON "SPACE" PROJECTS
The Andrew Mellon Foundation funds many projects in CH and DH, and a few software
projects, including:
CollectionSpace: museum collection management
ArchiveSpace: archive management
ResearchSpace: semantic integration based on CIDOC CRM, search, data & image
annotation, data basket, etc
ConservationSpace: line of business application for conservation specialists
86. 6.3 BRITISH MUSEUM (BM) AND YCBA LOD
GraphDB runs the BM SPARQL endpoint. One of the biggest CH RDF collections
(917M triples)
As part of RS, developed mapping of BM data (2M objects) with BM, using CIDOC
CRM
This mapping was followed by the Yale Center for British Art (YCBA)
: very comprehensive but is monolithic and has imprecisions.
Includes the (in)famous diagram
Mapping Documentation
87. 6.4 CONSERVATIONSPACE
Executed by a consortium led by US National Gallery of Art. Developed by Sirma ITT
(Ontotext sibling). Based on Ontotext GraphDB (semantic metadata), Alfresco
(document management), Smart Documents (Sirma product).
88. 6.5 EUROPEANA LOD AND OAI PMH
Ontotext crated and hosted the Europeana SPARQL and OAI PMH services
89. 6.5.1 EUROPEANA STATISTICS
Eg chart of newspapers (several millions) by year: can't do this using the Europeana API,
but is easy with SPARQL
90. 6.6 EUROPEANA FOOD AND DRINK
Food & Drink content, semantically enriched (place and FD topic). :
open data, SPARQL endpoint, open source (Github). Uses GraphDB and ElasticSearch
enterprise connector
EFD Semantic App
92. 6.6.2 WIDE GEOGRAPHIC COVERAGE
Objects from the Roman Empire to Antarctica (Scott's expedition to the South Pole), and
everything in-between
93. 6.6.3 EFD ENRICHMENT: FD GAZETTEER
Use Wikipedia Categories to extract a FD Gazetteer.
"Domain-speci c modeling: Towards a Food and Drink Gazetteer", Tagarev, A.; Tolosi,
L.; and Alexiev, V, LNCS 9398, p182-196, January 2016 ( )preprint
94. 6.6.4 EFD ENRICHMENT: PRUNING FD CATEGORY TREE
. Alexiev, V. DBpedia meeting, February
2016.
Using DBPedia in Europeana Food and Drink
95. 6.6.5 EFD ENRICHMENT: FRENCH
Selected French as second enrichment language after English, considering category
overlap (work by L.Tolosi, x-axis is cat level), available content, NLP capabilities
96. 6.6.6 EFD PLACE ENRICHMENT
We used standard Ontotext Concept Enrichment Service, which is a mix of
DBpedia+Wikidata. But also had to add Geonames, to leverage the place hierarchy
97. 6.6.7 EFD PLACE ENRICHMENT
Hierarchical semantic facet based on Geonames
98. 6.6.8 EFD GEOGRAPHIC MAPPING: CLUSTERING
Once we have places, it's relatively easy to map them. We used the Cluster Mapper
library
99. 6.6.9 EFD GEOGRAPHIC MAPPING: JITTERING
There are 9k objects marked "Bulgaria". We don't want all ags in the center of Bulgaria,
so we jitter them up
100. 6.6.10 GLAMS WORKING WITH WIKIDATA
Why should GLAMs bother about Wikidata? Because it gives an excellent way to
connect and expose your collection data to a multilingual audience
:
Recommendation 1: For every Europeana project, considering the possible
bene ts of a Wikimedia component should be default behavior
Recommendation 7: Make Wikidata a central element of Europeana's "portal to
platform" strategy
Recommendation 8: Europeana should continue to invest in technology that
improves the interoperability between GLAMs and Wikimedia platforms
: easily add content about a colorful tradition
"blessing of the baskets" ("swiecenie koszyczek" or just "Święconka" in Polish). With
proper cats: when we merge them across languages (pl, en, de), we discover the
content is about Food and Drink, Easter, and a Polish tradition
Europeana Wikimedia Taskforce report
GLAMs Working with Wikidata
101. 6.7 GETTY VOCABULARY PROGRAM LOD
GVP well-known and respected in GLAM. Dependencies: AAT-TGN-ULAN-CONA.
Center of LODLAM cloud? (Diagram by J.Cobb, 2014)GVP Training Materials
102. 6.7.1 GVP LOD RELEASES
, , . Publicized in blog posts by J.Cuno, head of
the Getty Trust
AAT 2014-02 TGN 2014-08 ULAN 2015-03
103. 6.7.2 ONTOTEXT SCOPE OF WORK
Semantic/ontology development:
Contributed to (latest standard on thesauri). Provided
implementation experience, suggestions and xes
Complete mapping speci cation
Help implement R2RML scripts working off Getty's Oracle database, contribution to
Perl implementation (RDB2RDF), R2RML extension (rrx:languageColumn)
Work with a wide External Reviewers group (people from OCLC, Europeana, ISO
25964 working group, etc)
GraphDB semantic repo, clustered for high-availability
Semantic application development (customized Forest user interface) and tech
consulting
SPARQL 1.1 compliant endpoint:
Comprehensive documentation (100 pages):
Sample queries (100), including charts, geographic queries, etc
Per-entity export les, explicit/total data dumps. Many formats: RDF, Turtle, NTriples,
JSON, JSON-LD
Help desk / support on twitter and google group (see home page)
Presentations, papers.
. Alexiev, V.; Lindenthal, J.; and Isaac, A. International Journal on Digital
Libraries, August 2015, Springer.
http://vocab.getty.edu/ontology
ISO 25964 ontology
http://vocab.getty.edu/sparql
http://vocab.getty.edu/doc
On the composition of ISO 25964 hierarchical relations (BTG,
BTP, BTI)
104. 6.7.3 COMPLETE REPRESENTATION OF ALL GVP INFO
See , V.Alexiev, CIDOC 2014.
External Ontologies:
GVP LOD: Ontologies and Semantic Representation
Pre x Ontology Used for
bibo: Bibliography Ontology Sources
dc: Dublin Core Elements common
dct: Dublin Core Terms common
foaf: Friend of a Friend ontology Contributors
iso: ISO 25946 (latest on thesauri) iso:ThesaurusArray, BTG/BTP/BTI
owl: Web Ontology Language Basic RDF representation
prov: Provenance Ontology Revision history
rdf: Resource Description Framework Basic RDF representation
rdfs: RDF Schema Basic RDF representation
schema: Schema.org common, geo (TGN), bio (ULAN)
skos: Simple Knowledge Organization System Basis vocabulary representation
skosxl: SKOS Extension for Labels Rich labels
wgs: W3C World Geodetic Survey geo Geo (TGN)
xsd: XML Schema Datatypes Basic RDF representation
107. 6.7.6 KEY VALUES (FLAGS) ARE IMPORTANT
Excel-driven Ontology Generation™. Key val can be mapped to Custom sub-class,
Custom (sub-)prop, Ontology Value (eg <term/kind/Abbreviation>)
108. 6.7.7 ASSOCIATIVE RELATIONS ARE VALUABLE
More Excel-driven Ontology Generation™
Relations come in owl:inverseOf pairs (or owl:SymmetricProperty self-inverse)
110. 6.7.9 COMPREHENSIVE DOCUMENTATION
. Alexiev, V.; Cobb, J.;
Garcia, G.; Harpring, P. Getty Research Institute, 3.2 edition, March 2015.
Getty Vocabularies Linked Open Data: Semantic Representation
111. 6.7.10 SAMPLE QUERIES (100), INTEGRATED UI
Some charts, eg "Year Joined UN" (TGN), "Pope Reign Durations" (ULAN)
112. 6.7.11 GVP VOCABS USAGE
Collected about 100 usages of the vocabs, many in Collection Management and Search.
Many described in , J.Cobb, 2014. EgGetty Vocabs: Why LOD? Why Now?
AAT used in : nds bibliographic and authority data: language
codes, geographic area codes, publication country codes, AACR2 abbreviations, LC
main entry, Cutter numbers, AAT concepts, etc
Cataloging Calculator
113. 6.7.12 AAT IN EUROPEANA
type/subject/material elds
PartagePlus matched Art Nuveau candidate concepts to AAT; enriched labels
Europeana uses AAT to enrich
114. 6.8 J.P.GETTY MUSEUM
Working with JPGM on publishing LOD. Considering CIDOC CRM, maybe also simpler
ontologies. Hoping to generate R2RML from instance examples like:
115. 6.8.1 J.P.GETTY MUSEUM AND WIKIDATA
Discussing making data for Wikidata. WD has 480 Getty paintings, but the Museum has
180k artworks. WD query shown as image grid
116. 6.9 AMERICAN ART COLLABORATIVE
: 14 US art museums committed to establishing a critical
mass of LOD on the semantic web. Consulting on CRM mapping.
American Art Collaborative
Work ongoing at , eg see
Eg possible mapping of "(sculpture) Cast after"
https://github.com/american-art NPG mapping issues
117. 6.10 EUROPEAN HOLOCAUST RESEARCH INFRASTRUCTURE
EHRI is a large-scale EU project that involves 23 Holocaust archives (Europe, Israel and
the US), DH and IT organizations.
In its rst phase (2011-2015) it aggregated archival descriptions and materials on a
large scale and built a Virtual Research Environment (portal) for Holocaust
researchers based on a graph database.
In its second phase (2015-2019), EHRI2 seeks to enhance the gathered materials
using semantic approaches: enrichment, coreferencing, interlinking. Semantic
integration involves Four of the 14 EHRI2 work packages and helps integrate
databases, free text, and metadata to interconnect historical entities (people,
organizations, places, historic events) and create networks.
"Semantic Archive Integration for Holocaust Research: the EHRI Research
Infrastructure", V.Alexiev, L.Brazzo, CIDOC Congress 2016.
118. 6.10.1 EHRI: PERSON NETWORKS
Research question: how person networks in uenced chance of survival. Idea:
Rec 123456: rstName “John”, lastName “Smith”, gender Male, dateMarriage 1921-
01-05, additional names nameSpouseMaiden “Matienzo”, nameSpouse “Maria Smith”,
nameChild “Mike Smith”, nameSibling “Jack Jones”
We can create Person records for the people mentioned, make some likely inferences,
then try to match to other Person records in the database
119. 6.10.2 EHRI: LARGE-SCALE PLACE MATCHING
Match USHMM places to Geonames, also achieving deduplication. A Geonames
matching pipeline in free text was also developed
120. 6.10.3 EHRI: ORAL HISTORY INTERVIEWS
Analyze 2.5k OH Interviews:
ONTO: Place enrichment, Person name recognition
INRIA: word2vec experiments
guard Cosdist punishment Cosdist
guard Cosdist punishment Cosdist
guarding 0.593507 punishments 0.668144
sentry 0.512083 punish 0.601212
hlinka 0.496201 punishing 0.543213
gate 0.490032 beatings 0.527033
watching 0.484647 penalty 0.497262
ri e 0.484379 deserved 0.490157
lookout 0.482025 beaten 0.473870
patrol 0.477233 straf 0.473338
soldier 0.475982 offense 0.461230
guarded 0.474689 executing 0.459965
police 0.474291 merciless 0.455123
semantic "differencing" (interesting)
KGB ‐ Stalin + Hitler = SS
121. 6.10.4 EHRI: DISCOVERING CAMPS, GHETTOS, STALAGS
And referencing to Geonames so we can get coordinates
122. 6.11 OTHERS PROJECTS: WIKIARTHISTORY
Vienna University of Technology ( , )site paper
Art History networks from Wikipedia, through VIAF id
Time and nationality from ULAN
123. 6.12 CHARTEX
NLP analysis of medieval Charters and Deeds. Funded by Digging Into Data cross-
country SSH funding initiative. Visualized with BRAT
124. 6.13 NUMISMATICS
My good friend at the American Numismatic Society has developed a host
of amazing software that uses and produces LOD.
Ethan Gruber
Numishare: Data platform for coins/medals, 100k coin types
Nomisma: Shared authorities for numismatics
Kerameikos: Pottery LOD
EADitor: EAD Editor: based on XML & XForms, uses/produces LOD
xEAC: EAC/CPF Editor: based on XML & XForms, uses/produces LOD
125. 6.13.1 COINS IN TIME AND SPACE
Spatiotemporal distribution of hoards containing a particular Roman Republican coin
type. Below: examples of this type in partner collections
129. 6.13.5 COINHOARDS
Greek coin data provided by
Geo mapping data provided by
Below: reference to the coin in an archival notebook (linked via OA)
CoinHoards.org
nomisma.org