SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Downloaden Sie, um offline zu lesen
So…how does anyone do this stuff, for real?
MW 2023, Washington DC
Linked Data on a Budget
David Newbury
Assistant Director, Software and User Experience,
Getty Digital
Hi! I’m David.
I lead the software and user experience teams at Getty.
Getty is a big museum/research hub in Los Angeles. We do lots of things with
data.
All of the actual work here was done by my fabulously talented team. I just talk.
2
Introduction
Part 1:
Linked Data is amazing!
3
● Linked Data is another name for the
Semantic Web, a good idea by Tim
Berners-Lee, whose previous good idea
turned out to be very good.
4
What is Linked Data: The Standard Story
There are three main concepts in Linked Data:
1. Data is represented as a graph.
2. Meaning is determined by ontologies.
3. IDs are dereferencable URLs.
5
What is Linked Data: The Standard Story
A Graph is a way to represent data.
Think of a fact.
6
What is Linked Data: Data as a Graph
Favorite Drink Coffee
David
A Graph is a way to represent data.
Think of a fact.
Think of another fact.
7
What is Linked Data: Data as a Graph
Favorite Drink Coffee
Favorite Drink Beer
David
John
A Graph is a way to represent data.
Think of a fact.
Think of another fact.
And another.
8
What is Linked Data: Data as a Graph
Favorite Drink Coffee
Favorite Drink Beer
Favorite Drink Chai
David
John
Betsy
You could imagine these as a table of data:
9
What is Linked Data: Data as a Graph
David
John
Betsy
Fav Drink
Coffee
Beer
Chai
You could imagine these as a table of data:
…and add other information about
the people involved.
10
What is Linked Data: Data as a Graph
David
John
Betsy
Fav Drink
Coffee
Beer
Chai
Hometown
Pittsburgh
Boston
Pittsburgh
This does get duplicative, though,
if you want to add additional
information about a different
column.
11
What is Linked Data: Data as a Graph
David
Fav Drink
Coffee
John Beer
Betsy Chai
Hometown
Pittsburgh
Boston
Pittsburgh
State
PA
MA
PA
You can solve this with a relational
database…
12
What is Linked Data: Data as a Graph
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Hometown
Pittsburgh
Boston
State
PA
MA
Place ID
1
2
You can solve this with a relational
database…
…or with with a graph.
13
What is Linked Data: Data as a Graph
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
John
Fav Drink Beer
State
Hometown Boston MA
Tables are great for lots of data
about “a thing”, with a limited
number of kinds of things with
consistent links between things.
14
What is Linked Data: Data as a Graph
Graphs are great when the number
of kinds of things and number of
links between them is high and
inconsistent.
15
What is Linked Data: Data as a Graph
Another problem is meaning:
Words are great, but they require a
shared understanding of what’s
being described.
16
What is Linked Data: Data as a Ontology
David State PA
David State Solid
David State Confused
Linked Data uses ontologies to
include, as data, context and
definition around the terms used to
define how things are connected.
17
What is Linked Data: Data as a Ontology
David State PA
David State Solid
David State Confused
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
It also assumes that each of
these concepts is represented
by a unique identifier, which lets
people—and computers—be
unambiguous.
18
What is Linked Data: Data as a Ontology
geo_state
State
matter_state
State
mental_state
State
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
label
label
label
By making these identifiers into
URLs, they can be made globally
unique—and can also carry with
them the identity of the
concept’s creator.
19
What is Linked Data: Data as a Ontology
getty.edu/geo_state
getty.edu/matter_state
getty.edu/mental_state
defined as
Geographical region
within a country
defined as
Distinct form of
matter
defined as
Emotional or mental
condition
And it also means that if you
dereference that URL, you can
provide access to the data!
20
What is Linked Data: Dereferencable Data
getty.edu/matter_state
defined as
Distinct form of
matter
It also means that the
information can come from
outside of our own ecosystem.
21
What is Linked Data: Dereferencable Data
getty.edu/matter_state
defined as
Distinct form of
matter
spanish
label
materia
same as
wikidata.org/Q35758
Linked Data is Amazing!
22
Part 1: Summary
Linked Data is Amazing!
But…
23
Part 1: Summary
Part 2:
Linked Data is annoying.
24
Relational databases are optimized
for performance and data locality.
If you keep all the information
about a person in one place—it’s
very fast to pull it back.
25
Annoyances: Performance
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Hometown
Pittsburgh
Boston
State
PA
MA
Place ID
1
2
It’s also easy to understand
“What is a person” from the
perspective of the application:
It’s the information in the
“Person” table.
26
Annoyances: Concept Boundaries
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
It also makes it easy to include metadata
about the record.
27
Annoyances: Metadata
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Updated
2022-01-05
1970-01-01
2023-04-01
This idea of a “record” is a construct—
remember, these are just facts,
organized into a table.
But we’re trained to think about data as
collections of grouped facts, relevant
within a specific context.
28
Annoyances: Record Boundaries
David
Fav Drink
Coffee
John Beer
Betsy Chai
Place ID
1
2
1
Updated
2022-01-05
1970-01-01
2023-04-01
Graphs don’t provide clear
boundaries the same way—they
don’t have the concept of a record.
Each triple is a stand-alone
record—and often collecting all
the information you want requires
many hops across the graph.
29
Annoyances: Graph Structure
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
Graphs are optimized for querying:
Defining a query-specific context
that includes a set of facts based on
novel criteria of interest, and
returning that subset of
information.
30
Annoyances: Queries
David
Fav Drink Coffee
State
Hometown
Pittsburgh PA
Betsy
Fav Drink Chai
Hometown
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
is just as easy to ask as
“What is the tombstone data about Irises?”
31
Annoyances: Queries
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?”
is just as easy absurdly difficult to ask as
“What is the tombstone data about Irises?”
32
Annoyances: Queries
Doing so moves the burden of defining the
relevant context to the user of the data, not
the creator of the data.
This is great for research, but not so great
for ease of use.
33
Annoyances: Queries
We have never asked:
“What objects does Getty have that have images larger than
1200px on the longest side that have been exhibited in both New
York and Paris and were created by artists who lived before 1850?
…but we ask
What is the tombstone data about Irises?
Several thousand times a day.
34
Annoyances: Queries
Dereferencability could solve
this…but it requires network
requests.
35
Annoyances: Queries
David
Fav Drink Coffee
Hometown
Pittsburgh
Dereferencability could solve
this…but it requires network
requests.
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
Dereferencability could solve
this…but it requires network
requests.
So many requests.
37
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q35758
Dereferencability could solve
this…but it requires network
requests.
So many requests.
…when do you stop?
38
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q106458883
spanish
label
división administrativa de
primer nivel en varios países
Dereferencability could solve
this…but it requires network
requests.
So many requests.
…when do you stop?
…and can you rely on other
systems?
39
Annoyances: Queries
David
Fav Drink Coffee
geo_state
Hometown
Pittsburgh PA
State
defined as
Geographical region
within a country
label
same as
wikidata.org/Q106458883
spanish
label
división administrativa de
primer nivel en varios países
Linked Data is annoying.
None of these are theoretical concerns about Linked Data.
They’re just practical concerns when you try and build something on top of it.
40
Part 2: Summary
Part 3:
Getty builds stuff on linked data.
41
Getty has been doing Linked Data since 2014,
starting with the Getty Vocabularies.
It’s a collection of concepts, people, and places
deeply relevant to the study of art and
architecture.
42
Getty’s Linked Data: Getty Vocabularies
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
43
Getty’s Linked Data: Archival Records
Since then, we’ve moved most of our major
systems to use Linked Data—including our
archives…
… and our museum collection.
44
Getty’s Linked Data: Archival Records
We’ve also built a complex, powerful
infrastructure to support doing this across
our application landscape.
It’s been fun. We’ve learned a lot.
45
Getty’s Linked Data: APIs
A Hard-won lesson:
No application that we’ve built required Linked Data.
46
Getty’s Linked Data: What we learned
A Hard-won lesson:
No application that we’ve built required Linked Data.
Which, if you think about it, makes sense. Each application has
a specific, known context with clear record boundaries.
47
Getty’s Linked Data: What we learned
Why keep doing it?
The value is in the ecosystem—when we present information in multiple contexts.
It’s also in the community—allowing our data to be used beyond our
organization’s boundaries.
48
Getty’s Linked Data: What we learned
Why should YOU do it?
Because what makes cultural data interesting is not contained within the walls of
any one institution.
It’s shared across our entire, world-wide community. We should work together.
That’s the reason—not any particular data structure or ontology.
49
Getty’s Linked Data: What we learned
Part 4:
So…what can YOU do?
50
You don’t need to do what we’ve done.
Enabling connections across silos and organizations doesn’t mean that you need a
triplestore with Linked.Art data provided via JSON-LD documents reconciled to
ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with
cross-references via Web Annotations, associated with IIIF Manifests.
51
Linking Data: The Six Levels
You don’t need to do what we’ve done.
Enabling connections across silos and organizations doesn’t mean that you need a
triplestore with Linked.Art data provided via JSON-LD documents reconciled to
ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with
cross-references via Web Annotations, associated with IIIF Manifests.
That would just be showing off.
52
Linking Data: The Six Levels
You just need to make it easy for people to
understand what you have done.
There are, in our experience, six levels of Linked Data that build on one another,
but all provide value—both within an organization and across the community.
53
Linking Data: The Six Levels
#1: Authority
Provide a consistent way to identify both entities and the institution providing
information in your data.
54
Level 1: Authority
Give everything an identifier.
Other people can’t talk about your data without a way to unambiguously refer to
the record that you’re talking about.
URLs as IDs are great for this—they’re both unique—and they let others know
who produced the data.
55
Level 1: Authority
Give everything an human-friendly identifier.
https://data.getty.edu/research/collections/object/97e8fd22-92a4-4831-aa63-33255c1aaefe
This is not friendly.
56
Level 1: Authority
Give everything an human-friendly identifier.
This is friendly:
https://data.getty.edu/archives/AK3098
57
Level 1: Authority
Identifiers are for other PEOPLE to use.
Identifiers are most commonly used by machines—but most of the effort around
identifiers is done by humans typing them.
Optimize for people, not for machines.
58
Level 1: Authority
Identifiers Identify Documents.
You have the best sense of what “relevant context” might be. It’s wonderful to
provide query capabilities—but you should determine what information is usually
relevant for a given identifier.
Make easy things easy, and hard things possible.
59
Level 1: Authority
#2: Reconciliation
Use authorities and thesauri to disambiguate between similar real world entities.
60
Level 2: Reconciliation
Reference, even if you can’t link.
Give people a sense of how your data might be connected to others by adding in
pointer to a shared, common point of reference.
61
Level 2: Reconciliation
Reference, even if you can’t link.
The Getty Vocabularies are great for this. So is Wikidata. So is VIAF. Doesn’t
matter—just give us a way to confirm that what we’re thinking is what you’re
thinking.
62
Level 2: Reconciliation
Publish that reference.
It only works, though, if you let people KNOW.
63
Level 2: Reconciliation
If this is all you can do, you’ve done enough.
Almost all the value of linked data is present at this point. If you publish data,
provide identifiers, and you include links to others—you’ve done linked data.
Everything after this is extra credit.
64
Level 2: Reconciliation
#3: Bidirectional Linking
Establish and publish connections between systems or institutions.
65
Level 3: Bidirectional Linking
Links go both ways.
It’s valuable to know that a given artwork is mentioned in a book—but it’s just as
valuable to know that a book mentions an artwork!
66
Level 3: Bidirectional Linking
Sync is hard.
We’ve learned that trying to keep this in sync within systems is hard. Most of our
applications are not designed to deal with information outside of their own sphere
of control.
Instead, we maintain these references outside systems of record, and look them
up when needed for presentation.
67
Level 3: Bidirectional Linking
Links are often surprising!
Publishing bidirectional crosswalks between linked things creates
networks of information—and helps people discover unexpected relationships.
68
Level 3: Bidirectional Linking
#4: Aggregation
Enhance discovery by providing search and access to information across
collections.
69
Level 4: Aggregation
This is where you start doing things for other people.
The previous levels are about what you do in your data, often for yourself—
but now, you’re doing things explicitly to help other people do things with your
data.
70
Level 4: Aggregation
The best place for data is where people are looking for it.
Often, that’s not with you.
Share your data with other people, and let them point back to you.
71
Level 4: Aggregation
But: Change Discovery.
If other people are using your data, they’re going to cache it.
They don’t trust you.
72
Level 4: Aggregation
Change Discovery.
If other people are using your data, they’re going to cache it.
I don’t trust you.
73
Level 4: Aggregation
Change Discovery.
If other people are using your data, they’re going to cache it.
I don’t trust you.
Please don’t trust my systems.
74
Level 4: Aggregation
Change Discovery.
Cache our data: We’ll let you know if the data changes.
75
Level 4: Aggregation
Change Discovery.
It doesn’t matter what the change is—just letting someone know to look for
changes provides most of the value.
Recaching everything is hard, but pulling just the changed records is easy.
76
Level 4: Aggregation
#5: Interoperability
Develop interfaces that present information from many sources in a single way.
77
Level 5: Interoperability
Data Standards matter now.
Up to this point in the process, I haven’t mentioned anything about linked.art, or
CIDOC-CRM, or Schema.org, or SKOS-XL.
They don’t matter until you want to create an automatically interoperable
application.
78
Level 5: Interoperability
Data Standards have other value, of course.
Standards are great for consistency and ensuring quality—
and for letting other people write the documentation.
79
Level 5: Interoperability
Externally, they’re for robots.
The external value of standards means that I can write code that consumes your
data without needing to talk to you—or even know you exist.
This is why Schema.org is so widely used—Google doesn’t know I exist, but they
can still extract my event data and share it.
80
Level 5: Interoperability
IIIF is our community’s shining example of this.
A standard widely-enough used that there are multiple applications that can be
used across the field to show other people’s data in yet other people’s applications.
81
Level 5: Interoperability
Linked.art is just beginning to demonstrate this.
We’re on the precipice of having enough data at this level for it to be worth
building applications for artwork. Stay tuned!
82
Level 5: Interoperability
This is Level 5.
A reminder here. This is my penultimate level of linked data.
You don’t need to start here, and you don’t need to get to here to provide value.
83
Level 5: Interoperability
#6: Reuse
Allowing one institution to import information from another while maintaining the
provenance of the data.
84
Level 6: Reuse
We haven’t gotten here.
The final goal here would be if I could use your data in my application—and have it
still be your data.
This is the dream.
I’m still dreaming about this.
85
Level 6: Reuse
We will get here.
An ecosystem of shared, reusable, linked data will open potential beyond what we
can do at any organization—even Getty.
But it can’t be done without others. Without you.
86
Level 6: Reuse
Start Small.
Each of these levels provides value.
Decide what you can do—and do that—it’s enough, and it helps us build the
community.
87
Level 6: Reuse
Work Together—and complain!
The only way we’ll know what works—and, more importantly, what doesn’t, is if we
hear from others that things don’t work!
Linked Data is not valuable outside of a community—and if it’s not working for the
community, it’s not working.
We’re making mistakes—let us know when, so we can learn—and we can share.
88
Level 6: Reuse
Thank you! Complaints go here:
dnewbury@getty.edu
89

Weitere ähnliche Inhalte

Ähnlich wie Linked Data on a Budget

Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?John Girard
 
non-slides-Thatcamp
non-slides-Thatcampnon-slides-Thatcamp
non-slides-ThatcampTrevor Owens
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web SearcheXascale Infolab
 
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...OCLC
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Brief Introduction to Linked Data
Brief Introduction to Linked DataBrief Introduction to Linked Data
Brief Introduction to Linked DataRobert Sanderson
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overviewChris Taggart
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data FundamentalsSmarak Das
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextMurad Daryousse
 
TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docxssuserf9c51d
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)James Hendler
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...Rida Qayyum
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutionsQUONTRASOLUTIONS
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...FIA2010
 

Ähnlich wie Linked Data on a Budget (20)

Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?Big Data: Friend, Phantom or Foe?
Big Data: Friend, Phantom or Foe?
 
non-slides-Thatcamp
non-slides-Thatcampnon-slides-Thatcamp
non-slides-Thatcamp
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
Usabilidad y diseno
Usabilidad y disenoUsabilidad y diseno
Usabilidad y diseno
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web Search
 
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...
Breaking Out of the Walled Garden: Lessons Learned in Moving Library Linked D...
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Brief Introduction to Linked Data
Brief Introduction to Linked DataBrief Introduction to Linked Data
Brief Introduction to Linked Data
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Semantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data ContextSemantic Web Investigation within Big Data Context
Semantic Web Investigation within Big Data Context
 
TED Wiley Visualizing .docx
TED  Wiley Visualizing .docxTED  Wiley Visualizing .docx
TED Wiley Visualizing .docx
 
Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)Data Big and Broad (Oxford, 2012)
Data Big and Broad (Oxford, 2012)
 
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Big Data
Big DataBig Data
Big Data
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutions
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
Linked Data and the Future Internet Architecture: A motivation: Stefan Decker...
 

Mehr von David Newbury

The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataDavid Newbury
 
USE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the GettyUSE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the GettyDavid Newbury
 
IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021David Newbury
 
IIIF Canvases as First Class Citizens
IIIF Canvases as First Class CitizensIIIF Canvases as First Class Citizens
IIIF Canvases as First Class CitizensDavid Newbury
 
IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020David Newbury
 
How to Fail Interdisciplinarily
How to Fail InterdisciplinarilyHow to Fail Interdisciplinarily
How to Fail InterdisciplinarilyDavid Newbury
 
Sharing Data Across Memory Institutions
Sharing Data Across Memory InstitutionsSharing Data Across Memory Institutions
Sharing Data Across Memory InstitutionsDavid Newbury
 
NDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked DataNDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked DataDavid Newbury
 
Fuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital HumanitiesFuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital HumanitiesDavid Newbury
 
Telling Stories with Data: Class Notes 2
Telling Stories with Data:  Class Notes 2Telling Stories with Data:  Class Notes 2
Telling Stories with Data: Class Notes 2David Newbury
 
Telling Stories With Data: Class 1
Telling Stories With Data: Class 1Telling Stories With Data: Class 1
Telling Stories With Data: Class 1David Newbury
 
21st Century Provenance: Lessons Learned Building Art Tracks
21st Century Provenance:  Lessons Learned Building Art Tracks21st Century Provenance:  Lessons Learned Building Art Tracks
21st Century Provenance: Lessons Learned Building Art TracksDavid Newbury
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataDavid Newbury
 
Linked Data: Worse is Better
Linked Data:  Worse is BetterLinked Data:  Worse is Better
Linked Data: Worse is BetterDavid Newbury
 
Art Tracks: A technical deep dive.
Art Tracks:  A technical deep dive.Art Tracks:  A technical deep dive.
Art Tracks: A technical deep dive.David Newbury
 
Using Linked Data: American Art Collaborative, Oct. 3, 2016
Using Linked Data:  American Art Collaborative, Oct. 3, 2016Using Linked Data:  American Art Collaborative, Oct. 3, 2016
Using Linked Data: American Art Collaborative, Oct. 3, 2016David Newbury
 
Data 101: Making Charts from Spreadsheets
Data 101: Making Charts from SpreadsheetsData 101: Making Charts from Spreadsheets
Data 101: Making Charts from SpreadsheetsDavid Newbury
 
IIIF For Small Projects
IIIF  For Small ProjectsIIIF  For Small Projects
IIIF For Small ProjectsDavid Newbury
 

Mehr von David Newbury (20)

The LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked DataThe LOD Gateway: Open Source Infrastructure for Linked Data
The LOD Gateway: Open Source Infrastructure for Linked Data
 
USE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the GettyUSE ME: progressive integration of IIIF with new software services at the Getty
USE ME: progressive integration of IIIF with new software services at the Getty
 
IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021IIIF Across Platforms | IIIF Community Call, January 2021
IIIF Across Platforms | IIIF Community Call, January 2021
 
IIIF Canvases as First Class Citizens
IIIF Canvases as First Class CitizensIIIF Canvases as First Class Citizens
IIIF Canvases as First Class Citizens
 
IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020IIIF and Linked Open Data: LODLAM 2020
IIIF and Linked Open Data: LODLAM 2020
 
How to Fail Interdisciplinarily
How to Fail InterdisciplinarilyHow to Fail Interdisciplinarily
How to Fail Interdisciplinarily
 
Sharing Data Across Memory Institutions
Sharing Data Across Memory InstitutionsSharing Data Across Memory Institutions
Sharing Data Across Memory Institutions
 
Extending IIIF 3.0
Extending IIIF 3.0Extending IIIF 3.0
Extending IIIF 3.0
 
NDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked DataNDSR Learning Enrichment: Data Models and Linked Data
NDSR Learning Enrichment: Data Models and Linked Data
 
Fuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital HumanitiesFuzzy Dates & the Digital Humanities
Fuzzy Dates & the Digital Humanities
 
Telling Stories with Data: Class Notes 2
Telling Stories with Data:  Class Notes 2Telling Stories with Data:  Class Notes 2
Telling Stories with Data: Class Notes 2
 
Telling Stories With Data: Class 1
Telling Stories With Data: Class 1Telling Stories With Data: Class 1
Telling Stories With Data: Class 1
 
21st Century Provenance: Lessons Learned Building Art Tracks
21st Century Provenance:  Lessons Learned Building Art Tracks21st Century Provenance:  Lessons Learned Building Art Tracks
21st Century Provenance: Lessons Learned Building Art Tracks
 
Art Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured DataArt Tracks: From Provenance to Structured Data
Art Tracks: From Provenance to Structured Data
 
Linked Data: Worse is Better
Linked Data:  Worse is BetterLinked Data:  Worse is Better
Linked Data: Worse is Better
 
Understanding D3
Understanding D3Understanding D3
Understanding D3
 
Art Tracks: A technical deep dive.
Art Tracks:  A technical deep dive.Art Tracks:  A technical deep dive.
Art Tracks: A technical deep dive.
 
Using Linked Data: American Art Collaborative, Oct. 3, 2016
Using Linked Data:  American Art Collaborative, Oct. 3, 2016Using Linked Data:  American Art Collaborative, Oct. 3, 2016
Using Linked Data: American Art Collaborative, Oct. 3, 2016
 
Data 101: Making Charts from Spreadsheets
Data 101: Making Charts from SpreadsheetsData 101: Making Charts from Spreadsheets
Data 101: Making Charts from Spreadsheets
 
IIIF For Small Projects
IIIF  For Small ProjectsIIIF  For Small Projects
IIIF For Small Projects
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Linked Data on a Budget

  • 1. So…how does anyone do this stuff, for real? MW 2023, Washington DC Linked Data on a Budget David Newbury Assistant Director, Software and User Experience, Getty Digital
  • 2. Hi! I’m David. I lead the software and user experience teams at Getty. Getty is a big museum/research hub in Los Angeles. We do lots of things with data. All of the actual work here was done by my fabulously talented team. I just talk. 2 Introduction
  • 3. Part 1: Linked Data is amazing! 3
  • 4. ● Linked Data is another name for the Semantic Web, a good idea by Tim Berners-Lee, whose previous good idea turned out to be very good. 4 What is Linked Data: The Standard Story
  • 5. There are three main concepts in Linked Data: 1. Data is represented as a graph. 2. Meaning is determined by ontologies. 3. IDs are dereferencable URLs. 5 What is Linked Data: The Standard Story
  • 6. A Graph is a way to represent data. Think of a fact. 6 What is Linked Data: Data as a Graph Favorite Drink Coffee David
  • 7. A Graph is a way to represent data. Think of a fact. Think of another fact. 7 What is Linked Data: Data as a Graph Favorite Drink Coffee Favorite Drink Beer David John
  • 8. A Graph is a way to represent data. Think of a fact. Think of another fact. And another. 8 What is Linked Data: Data as a Graph Favorite Drink Coffee Favorite Drink Beer Favorite Drink Chai David John Betsy
  • 9. You could imagine these as a table of data: 9 What is Linked Data: Data as a Graph David John Betsy Fav Drink Coffee Beer Chai
  • 10. You could imagine these as a table of data: …and add other information about the people involved. 10 What is Linked Data: Data as a Graph David John Betsy Fav Drink Coffee Beer Chai Hometown Pittsburgh Boston Pittsburgh
  • 11. This does get duplicative, though, if you want to add additional information about a different column. 11 What is Linked Data: Data as a Graph David Fav Drink Coffee John Beer Betsy Chai Hometown Pittsburgh Boston Pittsburgh State PA MA PA
  • 12. You can solve this with a relational database… 12 What is Linked Data: Data as a Graph David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Hometown Pittsburgh Boston State PA MA Place ID 1 2
  • 13. You can solve this with a relational database… …or with with a graph. 13 What is Linked Data: Data as a Graph David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown John Fav Drink Beer State Hometown Boston MA
  • 14. Tables are great for lots of data about “a thing”, with a limited number of kinds of things with consistent links between things. 14 What is Linked Data: Data as a Graph
  • 15. Graphs are great when the number of kinds of things and number of links between them is high and inconsistent. 15 What is Linked Data: Data as a Graph
  • 16. Another problem is meaning: Words are great, but they require a shared understanding of what’s being described. 16 What is Linked Data: Data as a Ontology David State PA David State Solid David State Confused
  • 17. Linked Data uses ontologies to include, as data, context and definition around the terms used to define how things are connected. 17 What is Linked Data: Data as a Ontology David State PA David State Solid David State Confused defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition
  • 18. It also assumes that each of these concepts is represented by a unique identifier, which lets people—and computers—be unambiguous. 18 What is Linked Data: Data as a Ontology geo_state State matter_state State mental_state State defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition label label label
  • 19. By making these identifiers into URLs, they can be made globally unique—and can also carry with them the identity of the concept’s creator. 19 What is Linked Data: Data as a Ontology getty.edu/geo_state getty.edu/matter_state getty.edu/mental_state defined as Geographical region within a country defined as Distinct form of matter defined as Emotional or mental condition
  • 20. And it also means that if you dereference that URL, you can provide access to the data! 20 What is Linked Data: Dereferencable Data getty.edu/matter_state defined as Distinct form of matter
  • 21. It also means that the information can come from outside of our own ecosystem. 21 What is Linked Data: Dereferencable Data getty.edu/matter_state defined as Distinct form of matter spanish label materia same as wikidata.org/Q35758
  • 22. Linked Data is Amazing! 22 Part 1: Summary
  • 23. Linked Data is Amazing! But… 23 Part 1: Summary
  • 24. Part 2: Linked Data is annoying. 24
  • 25. Relational databases are optimized for performance and data locality. If you keep all the information about a person in one place—it’s very fast to pull it back. 25 Annoyances: Performance David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Hometown Pittsburgh Boston State PA MA Place ID 1 2
  • 26. It’s also easy to understand “What is a person” from the perspective of the application: It’s the information in the “Person” table. 26 Annoyances: Concept Boundaries David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1
  • 27. It also makes it easy to include metadata about the record. 27 Annoyances: Metadata David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Updated 2022-01-05 1970-01-01 2023-04-01
  • 28. This idea of a “record” is a construct— remember, these are just facts, organized into a table. But we’re trained to think about data as collections of grouped facts, relevant within a specific context. 28 Annoyances: Record Boundaries David Fav Drink Coffee John Beer Betsy Chai Place ID 1 2 1 Updated 2022-01-05 1970-01-01 2023-04-01
  • 29. Graphs don’t provide clear boundaries the same way—they don’t have the concept of a record. Each triple is a stand-alone record—and often collecting all the information you want requires many hops across the graph. 29 Annoyances: Graph Structure David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown
  • 30. Graphs are optimized for querying: Defining a query-specific context that includes a set of facts based on novel criteria of interest, and returning that subset of information. 30 Annoyances: Queries David Fav Drink Coffee State Hometown Pittsburgh PA Betsy Fav Drink Chai Hometown
  • 31. “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850?” is just as easy to ask as “What is the tombstone data about Irises?” 31 Annoyances: Queries
  • 32. “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850?” is just as easy absurdly difficult to ask as “What is the tombstone data about Irises?” 32 Annoyances: Queries
  • 33. Doing so moves the burden of defining the relevant context to the user of the data, not the creator of the data. This is great for research, but not so great for ease of use. 33 Annoyances: Queries
  • 34. We have never asked: “What objects does Getty have that have images larger than 1200px on the longest side that have been exhibited in both New York and Paris and were created by artists who lived before 1850? …but we ask What is the tombstone data about Irises? Several thousand times a day. 34 Annoyances: Queries
  • 35. Dereferencability could solve this…but it requires network requests. 35 Annoyances: Queries David Fav Drink Coffee Hometown Pittsburgh
  • 36. Dereferencability could solve this…but it requires network requests. Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA
  • 37. Dereferencability could solve this…but it requires network requests. So many requests. 37 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q35758
  • 38. Dereferencability could solve this…but it requires network requests. So many requests. …when do you stop? 38 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q106458883 spanish label división administrativa de primer nivel en varios países
  • 39. Dereferencability could solve this…but it requires network requests. So many requests. …when do you stop? …and can you rely on other systems? 39 Annoyances: Queries David Fav Drink Coffee geo_state Hometown Pittsburgh PA State defined as Geographical region within a country label same as wikidata.org/Q106458883 spanish label división administrativa de primer nivel en varios países
  • 40. Linked Data is annoying. None of these are theoretical concerns about Linked Data. They’re just practical concerns when you try and build something on top of it. 40 Part 2: Summary
  • 41. Part 3: Getty builds stuff on linked data. 41
  • 42. Getty has been doing Linked Data since 2014, starting with the Getty Vocabularies. It’s a collection of concepts, people, and places deeply relevant to the study of art and architecture. 42 Getty’s Linked Data: Getty Vocabularies
  • 43. Since then, we’ve moved most of our major systems to use Linked Data—including our archives… 43 Getty’s Linked Data: Archival Records
  • 44. Since then, we’ve moved most of our major systems to use Linked Data—including our archives… … and our museum collection. 44 Getty’s Linked Data: Archival Records
  • 45. We’ve also built a complex, powerful infrastructure to support doing this across our application landscape. It’s been fun. We’ve learned a lot. 45 Getty’s Linked Data: APIs
  • 46. A Hard-won lesson: No application that we’ve built required Linked Data. 46 Getty’s Linked Data: What we learned
  • 47. A Hard-won lesson: No application that we’ve built required Linked Data. Which, if you think about it, makes sense. Each application has a specific, known context with clear record boundaries. 47 Getty’s Linked Data: What we learned
  • 48. Why keep doing it? The value is in the ecosystem—when we present information in multiple contexts. It’s also in the community—allowing our data to be used beyond our organization’s boundaries. 48 Getty’s Linked Data: What we learned
  • 49. Why should YOU do it? Because what makes cultural data interesting is not contained within the walls of any one institution. It’s shared across our entire, world-wide community. We should work together. That’s the reason—not any particular data structure or ontology. 49 Getty’s Linked Data: What we learned
  • 50. Part 4: So…what can YOU do? 50
  • 51. You don’t need to do what we’ve done. Enabling connections across silos and organizations doesn’t mean that you need a triplestore with Linked.Art data provided via JSON-LD documents reconciled to ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with cross-references via Web Annotations, associated with IIIF Manifests. 51 Linking Data: The Six Levels
  • 52. You don’t need to do what we’ve done. Enabling connections across silos and organizations doesn’t mean that you need a triplestore with Linked.Art data provided via JSON-LD documents reconciled to ULAN and Wikidata, queryable via SPARQL and ElasticSearch, with cross-references via Web Annotations, associated with IIIF Manifests. That would just be showing off. 52 Linking Data: The Six Levels
  • 53. You just need to make it easy for people to understand what you have done. There are, in our experience, six levels of Linked Data that build on one another, but all provide value—both within an organization and across the community. 53 Linking Data: The Six Levels
  • 54. #1: Authority Provide a consistent way to identify both entities and the institution providing information in your data. 54 Level 1: Authority
  • 55. Give everything an identifier. Other people can’t talk about your data without a way to unambiguously refer to the record that you’re talking about. URLs as IDs are great for this—they’re both unique—and they let others know who produced the data. 55 Level 1: Authority
  • 56. Give everything an human-friendly identifier. https://data.getty.edu/research/collections/object/97e8fd22-92a4-4831-aa63-33255c1aaefe This is not friendly. 56 Level 1: Authority
  • 57. Give everything an human-friendly identifier. This is friendly: https://data.getty.edu/archives/AK3098 57 Level 1: Authority
  • 58. Identifiers are for other PEOPLE to use. Identifiers are most commonly used by machines—but most of the effort around identifiers is done by humans typing them. Optimize for people, not for machines. 58 Level 1: Authority
  • 59. Identifiers Identify Documents. You have the best sense of what “relevant context” might be. It’s wonderful to provide query capabilities—but you should determine what information is usually relevant for a given identifier. Make easy things easy, and hard things possible. 59 Level 1: Authority
  • 60. #2: Reconciliation Use authorities and thesauri to disambiguate between similar real world entities. 60 Level 2: Reconciliation
  • 61. Reference, even if you can’t link. Give people a sense of how your data might be connected to others by adding in pointer to a shared, common point of reference. 61 Level 2: Reconciliation
  • 62. Reference, even if you can’t link. The Getty Vocabularies are great for this. So is Wikidata. So is VIAF. Doesn’t matter—just give us a way to confirm that what we’re thinking is what you’re thinking. 62 Level 2: Reconciliation
  • 63. Publish that reference. It only works, though, if you let people KNOW. 63 Level 2: Reconciliation
  • 64. If this is all you can do, you’ve done enough. Almost all the value of linked data is present at this point. If you publish data, provide identifiers, and you include links to others—you’ve done linked data. Everything after this is extra credit. 64 Level 2: Reconciliation
  • 65. #3: Bidirectional Linking Establish and publish connections between systems or institutions. 65 Level 3: Bidirectional Linking
  • 66. Links go both ways. It’s valuable to know that a given artwork is mentioned in a book—but it’s just as valuable to know that a book mentions an artwork! 66 Level 3: Bidirectional Linking
  • 67. Sync is hard. We’ve learned that trying to keep this in sync within systems is hard. Most of our applications are not designed to deal with information outside of their own sphere of control. Instead, we maintain these references outside systems of record, and look them up when needed for presentation. 67 Level 3: Bidirectional Linking
  • 68. Links are often surprising! Publishing bidirectional crosswalks between linked things creates networks of information—and helps people discover unexpected relationships. 68 Level 3: Bidirectional Linking
  • 69. #4: Aggregation Enhance discovery by providing search and access to information across collections. 69 Level 4: Aggregation
  • 70. This is where you start doing things for other people. The previous levels are about what you do in your data, often for yourself— but now, you’re doing things explicitly to help other people do things with your data. 70 Level 4: Aggregation
  • 71. The best place for data is where people are looking for it. Often, that’s not with you. Share your data with other people, and let them point back to you. 71 Level 4: Aggregation
  • 72. But: Change Discovery. If other people are using your data, they’re going to cache it. They don’t trust you. 72 Level 4: Aggregation
  • 73. Change Discovery. If other people are using your data, they’re going to cache it. I don’t trust you. 73 Level 4: Aggregation
  • 74. Change Discovery. If other people are using your data, they’re going to cache it. I don’t trust you. Please don’t trust my systems. 74 Level 4: Aggregation
  • 75. Change Discovery. Cache our data: We’ll let you know if the data changes. 75 Level 4: Aggregation
  • 76. Change Discovery. It doesn’t matter what the change is—just letting someone know to look for changes provides most of the value. Recaching everything is hard, but pulling just the changed records is easy. 76 Level 4: Aggregation
  • 77. #5: Interoperability Develop interfaces that present information from many sources in a single way. 77 Level 5: Interoperability
  • 78. Data Standards matter now. Up to this point in the process, I haven’t mentioned anything about linked.art, or CIDOC-CRM, or Schema.org, or SKOS-XL. They don’t matter until you want to create an automatically interoperable application. 78 Level 5: Interoperability
  • 79. Data Standards have other value, of course. Standards are great for consistency and ensuring quality— and for letting other people write the documentation. 79 Level 5: Interoperability
  • 80. Externally, they’re for robots. The external value of standards means that I can write code that consumes your data without needing to talk to you—or even know you exist. This is why Schema.org is so widely used—Google doesn’t know I exist, but they can still extract my event data and share it. 80 Level 5: Interoperability
  • 81. IIIF is our community’s shining example of this. A standard widely-enough used that there are multiple applications that can be used across the field to show other people’s data in yet other people’s applications. 81 Level 5: Interoperability
  • 82. Linked.art is just beginning to demonstrate this. We’re on the precipice of having enough data at this level for it to be worth building applications for artwork. Stay tuned! 82 Level 5: Interoperability
  • 83. This is Level 5. A reminder here. This is my penultimate level of linked data. You don’t need to start here, and you don’t need to get to here to provide value. 83 Level 5: Interoperability
  • 84. #6: Reuse Allowing one institution to import information from another while maintaining the provenance of the data. 84 Level 6: Reuse
  • 85. We haven’t gotten here. The final goal here would be if I could use your data in my application—and have it still be your data. This is the dream. I’m still dreaming about this. 85 Level 6: Reuse
  • 86. We will get here. An ecosystem of shared, reusable, linked data will open potential beyond what we can do at any organization—even Getty. But it can’t be done without others. Without you. 86 Level 6: Reuse
  • 87. Start Small. Each of these levels provides value. Decide what you can do—and do that—it’s enough, and it helps us build the community. 87 Level 6: Reuse
  • 88. Work Together—and complain! The only way we’ll know what works—and, more importantly, what doesn’t, is if we hear from others that things don’t work! Linked Data is not valuable outside of a community—and if it’s not working for the community, it’s not working. We’re making mistakes—let us know when, so we can learn—and we can share. 88 Level 6: Reuse
  • 89. Thank you! Complaints go here: dnewbury@getty.edu 89