From Ambition to Go Live

Richard Wallis
Richard WallisSchema.org & Structured Web Data Consultant - Developer Advocate at Google via Adecco UK um Adecco UK (for Google)
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin
Independent Consultant, Evangelist & Founder
W3C Community Groups:
• Bibframe2Schema (Chair) – Standardised conversion path(s)
• Schema Bib Extend (Chair) - Bibliographic data
• Schema Architypes (Chair) - Archives
• Financial Industry Business Ontology – Financial schema.org
• Tourism Structured Web Data (Co-Chair)
• Schema Course Extension
• Schema IoT Community
• Educational & Occupational Credentials in Schema.org
richard.wallis@dataliberate.com — @dataliberate
40+ Years – Computing
30+ Years – Cultural Heritage technology
20+ Years – Semantic Web & Linked Data
Worked With:
• Google – Schema.org vocabulary, site, extensions. documentation and community
• OCLC – Global library cooperative
• FIBO – Financial Industry Business Ontology Group
• Various Clients – Implementing/understanding Linked Data, Schema.org:
National Library Board Singapore
British Library — Stanford University — Europeana
2
3
National Library Board Singapore
Public Libraries
Network of 28 Public Libraries,
including 2 partner libraries*
Reading Programmes and Initiatives
Programmes and Exhibitions
targeted at Singapore communities
*Partner libraries are libraries which are partner owned
and
funded but managed by NLB/NLB’s subsidiary Libraries
and Archives Solutions Pte Ltd. Library@Chinatown and
the Lifelong Learning Institute Library are Partner libraries.
National Archives
Transferred from NHB to NLB in Nov
2012
Custodian of Singapore’s
Collective Memory: Responsible
for Collection, Preservation and
Management of
Singapore’s Public and Private
Archival Records
Promotes Public Interest in our
Nation’s
History and Heritage
National Library
Preserving Singapore’s Print
and Literary Heritage, and
Intellectual memory
Reference Collections
Legal Deposit (including
electronic)
4
Over
560,000
Singapore &
SEA items
Over 147,000
Chinese, Malay &
Tamil Languages
items
Reference Collection
Over 62,000
Social Sciences
& Humanities
items
Over 39,000
Science &
Technology
items
Over
53,000
Arts items
Over 19,000
Rare Materials
items
Archival Materials
Over 290,000
Government files &
Parliament papers
Over 190,000
Audiovisual & sound
recordings
Over 70,000
Maps & building
plans
Over
1.14m
Photographs
Over 35,000
Oral history
interviews
Over 55,000
Speeches & press
releases
Over
7,000
Posters
National Library Board Singapore
Over 5m
print collection
Over 2.4m
music tracks
78
databases
Over 7,400
e-newspapers and
e-magazines titles
Over
8,000
e-learning
courses
Over 1.7m
e-books and
audio books
Lending Collection
5
National Library Board Online Services
7
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
8
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
9
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
10
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
11
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
12
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
13
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
14
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
15
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
16
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
17
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
18
Contract Awarded
metaphactory platform
Low-code knowledge graph platform
Semantic knowledge modeling
Semantic search & discovery
AWS Partner
Public sector partner
Singapore based
Linked Data, Structured data, Semantic
Web, bibliographic meta data, Schema.org
and management systems consultant
19
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
20
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
21
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
22
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
23
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
24
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
25
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
26
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
27
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
28
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
29
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
30
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
31
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
32
Data Data Data!
Data Source Source Records Entity Count Update Frequency
ILS 1.4m 56.8m Daily
CMS 82k 228k Weekly
NAS 1.6m 6.7m Monthly
TTE 3k 317k Monthly
3.1m 70.4m
33
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
34
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
35
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
36
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
37
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
38
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
39
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
40
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
41
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
42
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
43
Technical Architecture (simplified)
Hosted on Amazon Web Services
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
44
Technical Architecture (simplified)
Hosted on Amazon Web Services
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
45
Technical Architecture (simplified)
Hosted on Amazon Web Services
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
46
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
47
CURATOR ACCESS
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
DMI
48
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
49
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
50
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
51
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
52
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
53
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
54
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
55
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
56
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
57
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
58
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
59
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
60
61
62
63
64
65
66
67
68
69
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
70
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
71
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
72
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
73
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
74
The entity iceberg
75
The entity iceberg
Primary
76
The entity iceberg
Primary
Discovery
77
The entity iceberg
Primary
Aggregation
Discovery
78
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
79
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
Management
80
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
81
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
82
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
83
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
84
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
85
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Discovery Interface (DI)
104
105
106
107
108
109
110
111
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
112
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
113
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
114
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
115
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
116
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
117
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
118
Daily Delta Ingestion & Reconciliation
119
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
120
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
121
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
122
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
137
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
138
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
139
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
140
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
141
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin
1 von 127

Recomendados

Candidal infections of the oral cavity von
Candidal infections of the oral cavityCandidal infections of the oral cavity
Candidal infections of the oral cavityArsalan Wahid Malik
3.8K views29 Folien
Advances in obturation system in endodontics /certified fixed orthodontic co... von
Advances in obturation system in endodontics  /certified fixed orthodontic co...Advances in obturation system in endodontics  /certified fixed orthodontic co...
Advances in obturation system in endodontics /certified fixed orthodontic co...Indian dental academy
16.8K views56 Folien
Occupational Hazards in Dentistry.pptx von
Occupational Hazards in Dentistry.pptxOccupational Hazards in Dentistry.pptx
Occupational Hazards in Dentistry.pptxssuser2034f6
345 views11 Folien
structural anomalies of teeth von
structural anomalies of teethstructural anomalies of teeth
structural anomalies of teethAnkitaBansal58
154 views47 Folien
HIV and oral manifestations von
HIV and oral manifestationsHIV and oral manifestations
HIV and oral manifestationsHaritha RK
1.9K views24 Folien
Structure Of Dental Practices In India von
Structure Of Dental Practices In IndiaStructure Of Dental Practices In India
Structure Of Dental Practices In IndiaNavreet Bajwa
3.1K views44 Folien

Más contenido relacionado

Was ist angesagt?

Implementation of water fluoridation in malaysia (malay) von
Implementation of water fluoridation in malaysia (malay)Implementation of water fluoridation in malaysia (malay)
Implementation of water fluoridation in malaysia (malay)Hidir Apollo
1.8K views31 Folien
Glass Ionomer Cements- Fundamentals von
Glass Ionomer Cements- FundamentalsGlass Ionomer Cements- Fundamentals
Glass Ionomer Cements- FundamentalsMettinaAngela
286 views41 Folien
Dental auxiliary von
Dental auxiliaryDental auxiliary
Dental auxiliarydr esha bali
15.1K views109 Folien
Minimal intervention dentistry von
Minimal intervention dentistryMinimal intervention dentistry
Minimal intervention dentistryMettinaAngela
697 views67 Folien
KMTC QA Policy Document 2016 von
KMTC QA Policy Document 2016KMTC QA Policy Document 2016
KMTC QA Policy Document 2016Isaac Munene Ntwiga,MPH
2.5K views18 Folien
classification of systemic and topical fluorides von
classification of systemic and topical fluoridesclassification of systemic and topical fluorides
classification of systemic and topical fluoridesTabish Zia
10.1K views28 Folien

Was ist angesagt?(20)

Implementation of water fluoridation in malaysia (malay) von Hidir Apollo
Implementation of water fluoridation in malaysia (malay)Implementation of water fluoridation in malaysia (malay)
Implementation of water fluoridation in malaysia (malay)
Hidir Apollo1.8K views
Glass Ionomer Cements- Fundamentals von MettinaAngela
Glass Ionomer Cements- FundamentalsGlass Ionomer Cements- Fundamentals
Glass Ionomer Cements- Fundamentals
MettinaAngela286 views
Dental auxiliary von dr esha bali
Dental auxiliaryDental auxiliary
Dental auxiliary
dr esha bali15.1K views
Minimal intervention dentistry von MettinaAngela
Minimal intervention dentistryMinimal intervention dentistry
Minimal intervention dentistry
MettinaAngela697 views
classification of systemic and topical fluorides von Tabish Zia
classification of systemic and topical fluoridesclassification of systemic and topical fluorides
classification of systemic and topical fluorides
Tabish Zia10.1K views
Dentinogenesis imperfecta tamale von Edward Kaliisa
Dentinogenesis imperfecta tamaleDentinogenesis imperfecta tamale
Dentinogenesis imperfecta tamale
Edward Kaliisa893 views
Etikht12elev von pityubrix
Etikht12elevEtikht12elev
Etikht12elev
pityubrix7.7K views
Social Myths & Taboos in Dentistry- Journal of Dental Health of India von hindol1996
Social Myths & Taboos in Dentistry- Journal of Dental Health of IndiaSocial Myths & Taboos in Dentistry- Journal of Dental Health of India
Social Myths & Taboos in Dentistry- Journal of Dental Health of India
hindol1996203 views
Dental Fluorosis von IAU Dent
Dental FluorosisDental Fluorosis
Dental Fluorosis
IAU Dent9.2K views
Global trends in oral diseases with emphasis on 1 von Dr.Priyanka Sharma
Global trends in oral diseases with emphasis on 1Global trends in oral diseases with emphasis on 1
Global trends in oral diseases with emphasis on 1
Dr.Priyanka Sharma6.1K views
Diagnosis and treatment planing of conservative von Ajeet Kumar
Diagnosis and treatment planing of conservativeDiagnosis and treatment planing of conservative
Diagnosis and treatment planing of conservative
Ajeet Kumar277 views
silver diamine fluoride -SDF -Pediatric dentistry von NaifAsiri2
silver diamine fluoride -SDF -Pediatric dentistrysilver diamine fluoride -SDF -Pediatric dentistry
silver diamine fluoride -SDF -Pediatric dentistry
NaifAsiri2820 views
Intro one introduction to dentistry von Somaia Dashti
Intro one introduction to dentistryIntro one introduction to dentistry
Intro one introduction to dentistry
Somaia Dashti1.9K views
Restorative and esthetic dentistry von Amin Abusallamah
Restorative and esthetic dentistryRestorative and esthetic dentistry
Restorative and esthetic dentistry
Amin Abusallamah12.9K views

Similar a From Ambition to Go Live

Contextual Computing: Laying a Global Data Foundation von
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
868 views129 Folien
Contextual Computing - Knowledge Graphs & Web of Entities von
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
2.8K views135 Folien
Schema.org: Where did that come from! von
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!Richard Wallis
1.7K views90 Folien
CILIP Conference - x metadata evolution the final mile - Richard Wallis von
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP
361 views30 Folien
Marc and beyond: 3 Linked Data Choices von
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
1.4K views20 Folien
LD4L OCLC Data Strategy von
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data StrategyRichard Wallis
2.2K views80 Folien

Similar a From Ambition to Go Live(20)

Contextual Computing: Laying a Global Data Foundation von Richard Wallis
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
Richard Wallis868 views
Contextual Computing - Knowledge Graphs & Web of Entities von Richard Wallis
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
Richard Wallis2.8K views
Schema.org: Where did that come from! von Richard Wallis
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!
Richard Wallis1.7K views
CILIP Conference - x metadata evolution the final mile - Richard Wallis von CILIP
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP361 views
Marc and beyond: 3 Linked Data Choices von Richard Wallis
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
Richard Wallis1.4K views
Linked Data: from Library Entities to the Web of Data von Richard Wallis
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
Richard Wallis2.4K views
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio... von Marcus Smith
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Marcus Smith715 views
Telling the World and Our Users What We Have von Richard Wallis
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
Richard Wallis1.7K views
Three Linked Data choices for Libraries von Richard Wallis
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for Libraries
Richard Wallis636 views
WorldCat, Works, and Schema.org von Richard Wallis
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.org
Richard Wallis2.3K views
Schema.org where did that come from? von Richard Wallis
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
Richard Wallis735 views
Entification: The Route to 'Useful' Library Data von Richard Wallis
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
Richard Wallis1.7K views
The Web of Data is Our Opportunity von Richard Wallis
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
Richard Wallis1.6K views
Introduction to APIs and Linked Data von Adrian Stevenson
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson1.1K views
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD... von ariadnenetwork
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
ariadnenetwork314 views
Structured Data: It's All About the Graph! von Richard Wallis
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
Richard Wallis692 views

Más de Richard Wallis

Schema.org Structured data the What, Why, & How von
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
2.8K views88 Folien
Structured data: Where did that come from & why are Google asking for it von
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for itRichard Wallis
2K views115 Folien
FIBO & Schema.org von
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.orgRichard Wallis
2K views50 Folien
Schema.org - An Extending Influence von
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
1.8K views125 Folien
Schema.org - Extending Benefits von
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending BenefitsRichard Wallis
5.4K views121 Folien
Identifying The Benefit of Linked Data von
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked DataRichard Wallis
3.5K views110 Folien

Más de Richard Wallis(17)

Schema.org Structured data the What, Why, & How von Richard Wallis
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
Richard Wallis2.8K views
Structured data: Where did that come from & why are Google asking for it von Richard Wallis
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for it
Richard Wallis2K views
Schema.org - An Extending Influence von Richard Wallis
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
Richard Wallis1.8K views
Schema.org - Extending Benefits von Richard Wallis
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis5.4K views
Identifying The Benefit of Linked Data von Richard Wallis
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
Richard Wallis3.5K views
Web Driven Revolution For Library Data von Richard Wallis
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
Richard Wallis1.4K views
Schema.org: What It Means For You and Your Library von Richard Wallis
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your Library
Richard Wallis2.3K views
Designing Linked Data Software & Services for Libraries von Richard Wallis
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for Libraries
Richard Wallis2.1K views
The Power of Sharing Linked Data: Bibliothekartag 2014 von Richard Wallis
The Power of Sharing Linked Data: Bibliothekartag 2014The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014
Richard Wallis991 views
The Power of Sharing Linked Data - ELAG 2014 Workshop von Richard Wallis
The Power of Sharing Linked Data - ELAG 2014 WorkshopThe Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 Workshop
Richard Wallis1.8K views
The Simple Power of the Link - ELAG 2014 Workshop von Richard Wallis
The Simple Power of the Link - ELAG 2014 WorkshopThe Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 Workshop
Richard Wallis1.2K views

Último

Employees attrition von
Employees attritionEmployees attrition
Employees attritionMaryAlejandraDiaz
5 views5 Folien
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx von
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptxDataScienceConferenc1
9 views16 Folien
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf von
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf10urkyr34
6 views259 Folien
Ukraine Infographic_22NOV2023_v2.pdf von
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdfAnastosiyaGurin
1.4K views3 Folien
Amy slides.pdf von
Amy slides.pdfAmy slides.pdf
Amy slides.pdfStatsCommunications
5 views13 Folien
Custom Tag Manager Templates von
Custom Tag Manager TemplatesCustom Tag Manager Templates
Custom Tag Manager TemplatesMarkus Baersch
28 views17 Folien

Último(20)

[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx von DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf von 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr346 views
Ukraine Infographic_22NOV2023_v2.pdf von AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
PRIVACY AWRE PERSONAL DATA STORAGE von antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf von DataScienceConferenc1
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
[DSC Europe 23] Ales Gros - Quantum and Today s security with Quantum.pdf
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx von DataScienceConferenc1
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... von DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
CRM stick or twist workshop von info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821712 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ... von DataScienceConferenc1
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... von DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
Lack of communication among family.pptx von ahmed164023
Lack of communication among family.pptxLack of communication among family.pptx
Lack of communication among family.pptx
ahmed1640237 views
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... von DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
Data about the sector workshop von info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821716 views
DGST Methodology Presentation.pdf von maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum5 views

From Ambition to Go Live

  • 1. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin
  • 2. Independent Consultant, Evangelist & Founder W3C Community Groups: • Bibframe2Schema (Chair) – Standardised conversion path(s) • Schema Bib Extend (Chair) - Bibliographic data • Schema Architypes (Chair) - Archives • Financial Industry Business Ontology – Financial schema.org • Tourism Structured Web Data (Co-Chair) • Schema Course Extension • Schema IoT Community • Educational & Occupational Credentials in Schema.org richard.wallis@dataliberate.com — @dataliberate 40+ Years – Computing 30+ Years – Cultural Heritage technology 20+ Years – Semantic Web & Linked Data Worked With: • Google – Schema.org vocabulary, site, extensions. documentation and community • OCLC – Global library cooperative • FIBO – Financial Industry Business Ontology Group • Various Clients – Implementing/understanding Linked Data, Schema.org: National Library Board Singapore British Library — Stanford University — Europeana 2
  • 3. 3 National Library Board Singapore Public Libraries Network of 28 Public Libraries, including 2 partner libraries* Reading Programmes and Initiatives Programmes and Exhibitions targeted at Singapore communities *Partner libraries are libraries which are partner owned and funded but managed by NLB/NLB’s subsidiary Libraries and Archives Solutions Pte Ltd. Library@Chinatown and the Lifelong Learning Institute Library are Partner libraries. National Archives Transferred from NHB to NLB in Nov 2012 Custodian of Singapore’s Collective Memory: Responsible for Collection, Preservation and Management of Singapore’s Public and Private Archival Records Promotes Public Interest in our Nation’s History and Heritage National Library Preserving Singapore’s Print and Literary Heritage, and Intellectual memory Reference Collections Legal Deposit (including electronic)
  • 4. 4 Over 560,000 Singapore & SEA items Over 147,000 Chinese, Malay & Tamil Languages items Reference Collection Over 62,000 Social Sciences & Humanities items Over 39,000 Science & Technology items Over 53,000 Arts items Over 19,000 Rare Materials items Archival Materials Over 290,000 Government files & Parliament papers Over 190,000 Audiovisual & sound recordings Over 70,000 Maps & building plans Over 1.14m Photographs Over 35,000 Oral history interviews Over 55,000 Speeches & press releases Over 7,000 Posters National Library Board Singapore Over 5m print collection Over 2.4m music tracks 78 databases Over 7,400 e-newspapers and e-magazines titles Over 8,000 e-learning courses Over 1.7m e-books and audio books Lending Collection
  • 5. 5 National Library Board Online Services
  • 6. 7 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 7. 8 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 8. 9 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 9. 10 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 10. 11 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 11. 12 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 12. 13 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 13. 14 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 14. 15 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 15. 16 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 16. 17 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 17. 18 Contract Awarded metaphactory platform Low-code knowledge graph platform Semantic knowledge modeling Semantic search & discovery AWS Partner Public sector partner Singapore based Linked Data, Structured data, Semantic Web, bibliographic meta data, Schema.org and management systems consultant
  • 18. 19 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 19. 20 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 20. 21 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 21. 22 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 22. 23 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 23. 24 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 24. 25 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 25. 26 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 26. 27 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 27. 28 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 28. 29 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 29. 30 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 30. 31 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 31. 32 Data Data Data! Data Source Source Records Entity Count Update Frequency ILS 1.4m 56.8m Daily CMS 82k 228k Weekly NAS 1.6m 6.7m Monthly TTE 3k 317k Monthly 3.1m 70.4m
  • 32. 33 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 33. 34 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 34. 35 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 35. 36 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 36. 37 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 37. 38 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 38. 39 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 39. 40 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 40. 41 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 41. 42 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 42. 43 Technical Architecture (simplified) Hosted on Amazon Web Services Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 43. 44 Technical Architecture (simplified) Hosted on Amazon Web Services Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 44. 45 Technical Architecture (simplified) Hosted on Amazon Web Services GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 45. 46 PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 46. 47 CURATOR ACCESS PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT DMI
  • 47. 48 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 48. 49 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 49. 50 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 50. 51 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 51. 52 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 52. 53 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 53. 54 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 54. 55 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 55. 56 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 56. 57 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 57. 58 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 58. 59 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 59. 60
  • 60. 61
  • 61. 62
  • 62. 63
  • 63. 64
  • 64. 65
  • 65. 66
  • 66. 67
  • 67. 68
  • 68. 69 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 69. 70 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 70. 71 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 71. 72 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 72. 73 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 79. 80 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 80. 81 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 81. 82 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 82. 83 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 83. 84 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 84. 85 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 85. 86
  • 86. 87
  • 87. 88
  • 88. 89
  • 89. 90
  • 90. 91
  • 91. 92
  • 92. 93
  • 93. 94
  • 94. 95
  • 95. 96
  • 96. 97
  • 97. 98
  • 98. 99
  • 99. 100
  • 100. 101
  • 101. 102
  • 103. 104
  • 104. 105
  • 105. 106
  • 106. 107
  • 107. 108
  • 108. 109
  • 109. 110
  • 110. 111 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 111. 112 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 112. 113 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 113. 114 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 114. 115 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 115. 116 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 116. 117 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 117. 118 Daily Delta Ingestion & Reconciliation
  • 118. 119 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 119. 120 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 120. 121 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 121. 122 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 122. 137 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 123. 138 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 124. 139 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 125. 140 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 126. 141 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 127. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin