SlideShare a Scribd company logo
1 of 19
Linked Open Data and
Systemic Taxonomy
Joel Richard
Smithsonian Libraries
richardjm@si.edu
A tale of two publications
In three acts
Who are the Smithsonian Libraries?
• 20 Libraries in the U.S. and Panama
• Supports research of staff and the public
• Strong effort to digitize pre-1923 texts
• Index Animalium and Taxonomic
Literature II are two examples
Joel Richard,
Disclaimer
We are still learning.
We are still building.
Joel Richard,
Joel Richard,
Act I: The Players
(or, identifying the data with which
we are working and their meaning
and usefulness to the scientific
community.)
Taxonomic Literature II
Essential Reference
Tool for Botanists
Botanists/Authors
and Publications
from 1753–1940
Multiple indexes, “unique identifiers”
It is a “database in book form”
Joel Richard,
Joel Richard,
Joel Richard,
Joel Richard,
Index Animalium
Genus name, author
& citation for
430,000 animals
Covers Publications
from 1758–1850
Also a database, but
many challenges
still exist in the data.
Joel Richard,
Joel Richard,
Act II: The Linking
(or, identifying those data elements to
be linked, inherent challenges of
parsing OCR text, and identifying
linkable remote data sources)
Joel Richard,
Linkable Data Elements
Joel Richard,
foaf:lastName, foaf:familyName
foaf:firstName, foaf:givenName
foaf:name, skos:prefLabel
bio:birth
bio:death
skos:definition
tl2:personAbbreviation
tl2:titleNumber
dc:title
event:place
dc:publisher
dc:created
tl2:titleAbbreviation
http://library.si.edu/tl2/author/darwin
RDF Type = foaf:Person
http://library.si.edu/tl2/title/origin…
RDF Type = bibo:Book
Joel Richard,
Challenges with Our Data
• Errors in the Corrected OCR
• Challenges in Parsing Citations
• The 80/20 rule: manually making
connections unable to be made by
automated means
• Finding suitable sources of data to
link to. (DBPedia? VIAF? EOL? Others?)
Joel Richard,
Linked Data Sources
Low-Hanging Fruit:
• DBPedia
• OCLC WorldCat
• Biodiversity Heritage Library
• Virtual International Authority File
• Encyclopedia of Life
• Library of Congress Subject Headings
• GeoNames
• Open Library
Joel Richard,
Act III: The Sum of the Parts
(or, our goals and desires for this
data, what it means to the linked
data world and the scientific
community in general)
Joel Richard,
What’s the point?
• This data may already exist online.
• It may also not always be as accurate
as needed for science.
• We are in a position to be the
authoritative source for this
information.
• Linked Data allows it to be easily
reused and shared.
Joel Richard,
Danaus plexippus
Index Animalium Systema Naturae, etc
Aimeé Antoinette
Camus
(botanist)
Your Local Library
( )
Joel Richard,
One Example of Reuse
Ryan Schenk
http://synynyms.com/
Thank you!
Joel Richard
RichardJM@si.edu
http://library.si.edu/staff/joel-richard
http://slideshare.net/joelrichard

More Related Content

What's hot

De walt ecn_2012
De walt ecn_2012De walt ecn_2012
De walt ecn_2012ECNOfficer
 
LIS 653, Session 3: Principles and Standards
LIS 653, Session 3: Principles and Standards LIS 653, Session 3: Principles and Standards
LIS 653, Session 3: Principles and Standards Dr. Starr Hoffman
 
The Case for Stable VIVO URIs
The Case for Stable VIVO URIsThe Case for Stable VIVO URIs
The Case for Stable VIVO URIsVioleta Ilik
 
Lis415 ranganathan
Lis415 ranganathanLis415 ranganathan
Lis415 ranganathanMridul Maity
 
LIS 653, Session 6: FRBR & Relationships
LIS 653, Session 6: FRBR & Relationships LIS 653, Session 6: FRBR & Relationships
LIS 653, Session 6: FRBR & Relationships Dr. Starr Hoffman
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesK.G. Schneider
 
Data Management Open House
Data Management Open HouseData Management Open House
Data Management Open HouseJackie Wirz, PhD
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...marcosmartinezromero
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Dario Taraborelli
 

What's hot (20)

De walt ecn_2012
De walt ecn_2012De walt ecn_2012
De walt ecn_2012
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
LIS 653, Session 3: Principles and Standards
LIS 653, Session 3: Principles and Standards LIS 653, Session 3: Principles and Standards
LIS 653, Session 3: Principles and Standards
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
The Case for Stable VIVO URIs
The Case for Stable VIVO URIsThe Case for Stable VIVO URIs
The Case for Stable VIVO URIs
 
Lis415 ranganathan
Lis415 ranganathanLis415 ranganathan
Lis415 ranganathan
 
LIS 653, Session 6: FRBR & Relationships
LIS 653, Session 6: FRBR & Relationships LIS 653, Session 6: FRBR & Relationships
LIS 653, Session 6: FRBR & Relationships
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Tassonomia E Folksonomia
Tassonomia E FolksonomiaTassonomia E Folksonomia
Tassonomia E Folksonomia
 
Data Management Open House
Data Management Open HouseData Management Open House
Data Management Open House
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Educ Sept2010
Educ Sept2010Educ Sept2010
Educ Sept2010
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Adol 668
Adol 668Adol 668
Adol 668
 
Social Work Subject Guide
Social Work Subject GuideSocial Work Subject Guide
Social Work Subject Guide
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
 

Viewers also liked

Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access agosti
 
Open Research Data: Taxonomy
Open Research Data: TaxonomyOpen Research Data: Taxonomy
Open Research Data: Taxonomyagosti
 
The role of product category for brand relationships
The role of product category for brand relationships The role of product category for brand relationships
The role of product category for brand relationships CBR Conference
 
Category Management Project
Category Management ProjectCategory Management Project
Category Management ProjectElias Polymeros
 
Brand As A Category Not A Product
Brand As A Category Not A ProductBrand As A Category Not A Product
Brand As A Category Not A ProductJohn Oyakhilome
 
Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerceHeather Hedden
 

Viewers also liked (6)

Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access Nothing in taxonomy makes sense except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access
 
Open Research Data: Taxonomy
Open Research Data: TaxonomyOpen Research Data: Taxonomy
Open Research Data: Taxonomy
 
The role of product category for brand relationships
The role of product category for brand relationships The role of product category for brand relationships
The role of product category for brand relationships
 
Category Management Project
Category Management ProjectCategory Management Project
Category Management Project
 
Brand As A Category Not A Product
Brand As A Category Not A ProductBrand As A Category Not A Product
Brand As A Category Not A Product
 
Taxonomies for E-commerce
Taxonomies for E-commerceTaxonomies for E-commerce
Taxonomies for E-commerce
 

Similar to Linked Open Data and Systematic Taxonomy

Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataJoel Richard
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...Alison Hitchens
 
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...Martin Kalfatovic
 
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeThe Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Martin Kalfatovic
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage LibraryMartin Kalfatovic
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage LibraryMartin Kalfatovic
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...Martin Kalfatovic
 
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...Becky Morin
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSCPilsk
 
Pratt Sils Knowledge Organization Fall 2008
Pratt Sils Knowledge Organization Fall 2008Pratt Sils Knowledge Organization Fall 2008
Pratt Sils Knowledge Organization Fall 2008PrattSILS
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...Martin Kalfatovic
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...ICZN
 
Unlocking indexanimaliumstatic
Unlocking indexanimaliumstaticUnlocking indexanimaliumstatic
Unlocking indexanimaliumstaticSCPilsk
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 
An Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryAn Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryMartin Kalfatovic
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
 

Similar to Linked Open Data and Systematic Taxonomy (20)

Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open Data
 
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
What is #LODLAM?! Understanding linked open data in libraries, archives [and ...
 
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...
Digitalización de literatura de Biodiversidad: an Overview of the Biodiversit...
 
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeThe Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...
The Biodiversity Heritage Library: 30 Million Pages of Taxonomic Literature &...
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in Research
 
The Open Access Community, and OAIster
The Open Access Community, and OAIsterThe Open Access Community, and OAIster
The Open Access Community, and OAIster
 
Pratt Sils Knowledge Organization Fall 2008
Pratt Sils Knowledge Organization Fall 2008Pratt Sils Knowledge Organization Fall 2008
Pratt Sils Knowledge Organization Fall 2008
 
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...An International Cooperative Digital Library for Taxonomic Literature: The Bi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
Sherborn: Pilsk, Joel Richard & Kalfatovic - Unlocking the Index Animalium: F...
 
Unlocking indexanimaliumstatic
Unlocking indexanimaliumstaticUnlocking indexanimaliumstatic
Unlocking indexanimaliumstatic
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage Library
 
An Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage LibraryAn Introduction to the Biodiversity Heritage Library
An Introduction to the Biodiversity Heritage Library
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Linked Open Data and Systematic Taxonomy

  • 1. Linked Open Data and Systemic Taxonomy Joel Richard Smithsonian Libraries richardjm@si.edu A tale of two publications In three acts
  • 2. Who are the Smithsonian Libraries? • 20 Libraries in the U.S. and Panama • Supports research of staff and the public • Strong effort to digitize pre-1923 texts • Index Animalium and Taxonomic Literature II are two examples Joel Richard,
  • 3. Disclaimer We are still learning. We are still building. Joel Richard,
  • 4. Joel Richard, Act I: The Players (or, identifying the data with which we are working and their meaning and usefulness to the scientific community.)
  • 5. Taxonomic Literature II Essential Reference Tool for Botanists Botanists/Authors and Publications from 1753–1940 Multiple indexes, “unique identifiers” It is a “database in book form” Joel Richard,
  • 8. Joel Richard, Index Animalium Genus name, author & citation for 430,000 animals Covers Publications from 1758–1850 Also a database, but many challenges still exist in the data.
  • 10. Joel Richard, Act II: The Linking (or, identifying those data elements to be linked, inherent challenges of parsing OCR text, and identifying linkable remote data sources)
  • 12. Joel Richard, foaf:lastName, foaf:familyName foaf:firstName, foaf:givenName foaf:name, skos:prefLabel bio:birth bio:death skos:definition tl2:personAbbreviation tl2:titleNumber dc:title event:place dc:publisher dc:created tl2:titleAbbreviation http://library.si.edu/tl2/author/darwin RDF Type = foaf:Person http://library.si.edu/tl2/title/origin… RDF Type = bibo:Book
  • 13. Joel Richard, Challenges with Our Data • Errors in the Corrected OCR • Challenges in Parsing Citations • The 80/20 rule: manually making connections unable to be made by automated means • Finding suitable sources of data to link to. (DBPedia? VIAF? EOL? Others?)
  • 14. Joel Richard, Linked Data Sources Low-Hanging Fruit: • DBPedia • OCLC WorldCat • Biodiversity Heritage Library • Virtual International Authority File • Encyclopedia of Life • Library of Congress Subject Headings • GeoNames • Open Library
  • 15. Joel Richard, Act III: The Sum of the Parts (or, our goals and desires for this data, what it means to the linked data world and the scientific community in general)
  • 16. Joel Richard, What’s the point? • This data may already exist online. • It may also not always be as accurate as needed for science. • We are in a position to be the authoritative source for this information. • Linked Data allows it to be easily reused and shared.
  • 17. Joel Richard, Danaus plexippus Index Animalium Systema Naturae, etc Aimeé Antoinette Camus (botanist) Your Local Library ( )
  • 18. Joel Richard, One Example of Reuse Ryan Schenk http://synynyms.com/

Editor's Notes

  1. Originally this presentation was going to center around a discussion of our conversion of TL2 to linked data and what we learned, but I felt that it would be better to use it as an example of things to keep in mind when creating your own data sets.
  2. Situated at the center of the world's largest museum complex, the Smithsonian Libraries forms a vital part of the research, exhibition, and educational enterprise of the Institution. The Libraries unites 20 libraries into one system supported by central collections support services. We maintain publication exchanges with more than 4,000 institutions worldwide that supply Smithsonian scientists and curators with current periodicals, exhibition catalogs, and professional society publications. Through preservation treatments, experts work to save the Smithsonian's 1.5 million printed books and manuscripts for future generations. Our Digital Library creates electronic versions of rare books and other distinctive collections, as well as exhibitions and specialized finding aids. We can be found on the web at http://library.si.edu
  3. I dislike disclaimers, but we’re still new to linked open data and are learning as we go. The idea of LOD has been around for several years now, so we are also playing a bit of catch-up.Our first goals are to get some data online and then start linking our dataout to other sources, and encourage others to link to us. We don’t yet know how our data relates to others. It’s not scientific datacreated as part of a research project per se, but initially we see it as valuable, useful information at least for some segements of the research world.
  4. So as an example of how to create a data set, I’ll use Taxonomic Literature II. It is a fifteen volumes guide to the literature of systemic botany published between 1753 and 1940. It contains almost 10,000 authors and about 37,000 publications.The reason to focus on TL2 is that we aim to be the authority on the web for this information. We have received permission from the IAPT (Intl Assoc for Plant Taxonomy) to digitze and release this information on the web under an open license. TL-2 is used by most? botanists and their work is made easier by this data being online. Prior to 2012 this information was either located in a library or locked behind a paywall of sorts.
  5. This is a page of TL-2 showing Charles Darwin and On the Origin of Species with those items that are immediately visible that can be parsed and turned into Linked Data.There is other data in the page that could be turned into linked data, but at this time, we have only parsed the data that is highlighted on this page.Clearly, moving from something such as a printed book to a Linked Open Data set is an arduous task. If you are working on creating your own data sets, your experiences will differ depending on the source(s) of your data.One important things to note here are the “Darwin” in parentheses, which is a unique abbreviation for an author. Each author has one. Another important item is the “1313” identifying the title, On the Origin of Species. Each publication in TL-2 has its own number. There are about 9,900 authors and 37,000 titles in all.
  6. This is the current website that we have that shows a sample of the search results for Charles Darwin. This is not Linked Data.You can find this page at: http://www.sil.si.edu/digitalcollections/tl-2/
  7. Index Animalium, published in the late 1800s and early 1900s, contains 430,000 species names for 7000 scientific volumes published between 1758 and 1840. Charles Davies Sherborn dedicated much of his life to this work. The volumes consist of the index to species with one species + citation per line and a bibliography listing the titles that Sherborn read. Challenges in the data include inconsistent citation formats, two kinds of abbreviations, both in the index and in the bibliography, as well as errors introduced during the printing process.
  8. This is one example of a page from Index Animalium for Papilio (Danaus) plexippus, AKA the Monarch Butterfly. The abbreviations:Linnaeus: Carl LinnaeusSyst. Nat.: SystemaNaturaeEd 10: 10th edition1758: Publication Year471: Page 471Also 12th Edition, published in 1767, page 767.
  9. Identified here are the “easy” to identify data elements that can be brought to linked data. We still need to contend with the challenges associated with the parsing of these into actual citations. The TL-2 data at the top has already been parsed and loaded into a database. Index Animalium is posing a greater challenge and will take longer to complete.
  10. A further breakdown of our data for TL-2 into linked data showing the predicates we might use for each. Again, the items in orange are specific to TL2 and may not exist in other LOD data sets. For example, the FOAF vocabulary has date of birth, but can we use only a year in that field? Will that foul up other computers? FOAF also doesn’t include date of death, which we definitely have. What predicate do we use? Do we create our own ontology and publish it? (probably)Finally, we haven’t yet begun a formal analysis of which existing ontologies might fit our needs.
  11. 80/20 Rule: You spend 20% of your time on 80% of the work and 80% of your time on the 20% of the work. We are at that point with Index Animalium. We would like to do further parsing of data with TL-2 but it will pose similar challenges to that of Index Animalium.
  12. Some potential sources of data that we can link to. We’d like to one day have some of these link back to us, thereby competing the circuit for a linked data web of knowledge.
  13. This is what we would like to do:A researcher enters a botanist name or a species name and is taken directly to the page in the book referenced by that entry. If the book is not known to be digitized and online, then we can redirect them to OCLC worldcat to find a copy of that book in their local library.This is a great improvement for those who wouldn’t normally have access to these books in their local library.