Linked Open Data and Systematic Taxonomy

•Download as PPTX, PDF•

1 like•1,422 views

A short talk in which I briefly discuss the Smithsonian Libraries' plans for Linked Open Data related to our Taxonomic Literature II and Index Animalium digitization projects.

Technology Education

Linked Open Data and
Systemic Taxonomy
Joel Richard
Smithsonian Libraries
richardjm@si.edu
A tale of two publications
In three acts

Who are the Smithsonian Libraries?
• 20 Libraries in the U.S. and Panama
• Supports research of staff and the public
• Strong effort to digitize pre-1923 texts
• Index Animalium and Taxonomic
Literature II are two examples
Joel Richard,

Disclaimer
We are still learning.
We are still building.
Joel Richard,

Joel Richard,
Act I: The Players
(or, identifying the data with which
we are working and their meaning
and usefulness to the scientific
community.)

Taxonomic Literature II
Essential Reference
Tool for Botanists
Botanists/Authors
and Publications
from 1753–1940
Multiple indexes, “unique identifiers”
It is a “database in book form”
Joel Richard,

Joel Richard,
Index Animalium
Genus name, author
& citation for
430,000 animals
Covers Publications
from 1758–1850
Also a database, but
many challenges
still exist in the data.

Joel Richard,
Act II: The Linking
(or, identifying those data elements to
be linked, inherent challenges of
parsing OCR text, and identifying
linkable remote data sources)

Joel Richard,
foaf:lastName, foaf:familyName
foaf:firstName, foaf:givenName
foaf:name, skos:prefLabel
bio:birth
bio:death
skos:definition
tl2:personAbbreviation
tl2:titleNumber
dc:title
event:place
dc:publisher
dc:created
tl2:titleAbbreviation
http://library.si.edu/tl2/author/darwin
RDF Type = foaf:Person
http://library.si.edu/tl2/title/origin…
RDF Type = bibo:Book

Joel Richard,
Challenges with Our Data
• Errors in the Corrected OCR
• Challenges in Parsing Citations
• The 80/20 rule: manually making
connections unable to be made by
automated means
• Finding suitable sources of data to
link to. (DBPedia? VIAF? EOL? Others?)

Joel Richard,
Linked Data Sources
Low-Hanging Fruit:
• DBPedia
• OCLC WorldCat
• Biodiversity Heritage Library
• Virtual International Authority File
• Encyclopedia of Life
• Library of Congress Subject Headings
• GeoNames
• Open Library

Joel Richard,
Act III: The Sum of the Parts
(or, our goals and desires for this
data, what it means to the linked
data world and the scientific
community in general)

Joel Richard,
What’s the point?
• This data may already exist online.
• It may also not always be as accurate
as needed for science.
• We are in a position to be the
authoritative source for this
information.
• Linked Data allows it to be easily
reused and shared.

Joel Richard,
Danaus plexippus
Index Animalium Systema Naturae, etc
Aimeé Antoinette
Camus
(botanist)
Your Local Library
( )

Joel Richard,
One Example of Reuse
Ryan Schenk
http://synynyms.com/

Thank you!
Joel Richard
RichardJM@si.edu
http://library.si.edu/staff/joel-richard
http://slideshare.net/joelrichard

What's hot

De walt ecn_2012ECNOfficer

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...CEDAR: Center for Expanded Data Annotation and Retrieval

LIS 653, Session 3: Principles and Standards Dr. Starr Hoffman

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...CEDAR: Center for Expanded Data Annotation and Retrieval

An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsCEDAR: Center for Expanded Data Annotation and Retrieval

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...CEDAR: Center for Expanded Data Annotation and Retrieval

The Case for Stable VIVO URIsVioleta Ilik

Lis415 ranganathanMridul Maity

LIS 653, Session 6: FRBR & Relationships Dr. Starr Hoffman

Taxonomies and FolksonomiesK.G. Schneider

Tassonomia E Folksonomiafunzionepubblica

Data Management Open HouseJackie Wirz, PhD

ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...marcosmartinezromero

Educ Sept2010brocklibrarians`

Linked data 101: Getting Caught in the Semantic Web Morgan Briles

Adol 668rosebudy23

Social Work Subject GuideMorgan State University

Ontologies neo4j-graph-workshop-berlinSimon Jupp

Importing life science at a into Neo4jSimon Jupp

Citations needed for the sum of all human knowledge: Wikidata as the missing ...Dario Taraborelli

What's hot (20)

De walt ecn_2012

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...

LIS 653, Session 3: Principles and Standards

Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...

The Case for Stable VIVO URIs

Lis415 ranganathan

LIS 653, Session 6: FRBR & Relationships

Taxonomies and Folksonomies

Tassonomia E Folksonomia

Data Management Open House

ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...

Educ Sept2010

Linked data 101: Getting Caught in the Semantic Web

Adol 668

Social Work Subject Guide

Ontologies neo4j-graph-workshop-berlin

Importing life science at a into Neo4j

Citations needed for the sum of all human knowledge: Wikidata as the missing ...

Viewers also liked

Nothing in taxonomy makes sense except in the light of Open Access agosti

Open Research Data: Taxonomyagosti

The role of product category for brand relationships CBR Conference

Category Management ProjectElias Polymeros

Brand As A Category Not A ProductJohn Oyakhilome

Taxonomies for E-commerceHeather Hedden

Viewers also liked (6)

Nothing in taxonomy makes sense except in the light of Open Access

Open Research Data: Taxonomy

The role of product category for brand relationships

Category Management Project

Brand As A Category Not A Product

Taxonomies for E-commerce

Recently uploaded (20)

Understanding the FAA Part 107 License ..

CNIC Information System with Pakdata Cf In Pakistan

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

How to Troubleshoot Apps for the Modern Connected Worker

presentation ICT roal in 21st century education

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Strategies for Landing an Oracle DBA Job as a Fresher

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Vector Search -An Introduction in Oracle Database 23ai.pptx

Platformless Horizons for Digital Adaptability

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Why Teams call analytics are critical to your entire business

Linked Open Data and Systematic Taxonomy

1. Linked Open Data and Systemic Taxonomy Joel Richard Smithsonian Libraries richardjm@si.edu A tale of two publications In three acts

2. Who are the Smithsonian Libraries? • 20 Libraries in the U.S. and Panama • Supports research of staff and the public • Strong effort to digitize pre-1923 texts • Index Animalium and Taxonomic Literature II are two examples Joel Richard,

3. Disclaimer We are still learning. We are still building. Joel Richard,

4. Joel Richard, Act I: The Players (or, identifying the data with which we are working and their meaning and usefulness to the scientific community.)

5. Taxonomic Literature II Essential Reference Tool for Botanists Botanists/Authors and Publications from 1753–1940 Multiple indexes, “unique identifiers” It is a “database in book form” Joel Richard,

6. Joel Richard,

7. Joel Richard,

8. Joel Richard, Index Animalium Genus name, author & citation for 430,000 animals Covers Publications from 1758–1850 Also a database, but many challenges still exist in the data.

9. Joel Richard,

10. Joel Richard, Act II: The Linking (or, identifying those data elements to be linked, inherent challenges of parsing OCR text, and identifying linkable remote data sources)

11. Joel Richard, Linkable Data Elements

12. Joel Richard, foaf:lastName, foaf:familyName foaf:firstName, foaf:givenName foaf:name, skos:prefLabel bio:birth bio:death skos:definition tl2:personAbbreviation tl2:titleNumber dc:title event:place dc:publisher dc:created tl2:titleAbbreviation http://library.si.edu/tl2/author/darwin RDF Type = foaf:Person http://library.si.edu/tl2/title/origin… RDF Type = bibo:Book

13. Joel Richard, Challenges with Our Data • Errors in the Corrected OCR • Challenges in Parsing Citations • The 80/20 rule: manually making connections unable to be made by automated means • Finding suitable sources of data to link to. (DBPedia? VIAF? EOL? Others?)

14. Joel Richard, Linked Data Sources Low-Hanging Fruit: • DBPedia • OCLC WorldCat • Biodiversity Heritage Library • Virtual International Authority File • Encyclopedia of Life • Library of Congress Subject Headings • GeoNames • Open Library

15. Joel Richard, Act III: The Sum of the Parts (or, our goals and desires for this data, what it means to the linked data world and the scientific community in general)

16. Joel Richard, What’s the point? • This data may already exist online. • It may also not always be as accurate as needed for science. • We are in a position to be the authoritative source for this information. • Linked Data allows it to be easily reused and shared.

17. Joel Richard, Danaus plexippus Index Animalium Systema Naturae, etc Aimeé Antoinette Camus (botanist) Your Local Library ( )

18. Joel Richard, One Example of Reuse Ryan Schenk http://synynyms.com/

19. Thank you! Joel Richard RichardJM@si.edu http://library.si.edu/staff/joel-richard http://slideshare.net/joelrichard

Editor's Notes

Originally this presentation was going to center around a discussion of our conversion of TL2 to linked data and what we learned, but I felt that it would be better to use it as an example of things to keep in mind when creating your own data sets.
Situated at the center of the world's largest museum complex, the Smithsonian Libraries forms a vital part of the research, exhibition, and educational enterprise of the Institution. The Libraries unites 20 libraries into one system supported by central collections support services. We maintain publication exchanges with more than 4,000 institutions worldwide that supply Smithsonian scientists and curators with current periodicals, exhibition catalogs, and professional society publications. Through preservation treatments, experts work to save the Smithsonian's 1.5 million printed books and manuscripts for future generations. Our Digital Library creates electronic versions of rare books and other distinctive collections, as well as exhibitions and specialized finding aids. We can be found on the web at http://library.si.edu
I dislike disclaimers, but we’re still new to linked open data and are learning as we go. The idea of LOD has been around for several years now, so we are also playing a bit of catch-up.Our first goals are to get some data online and then start linking our dataout to other sources, and encourage others to link to us. We don’t yet know how our data relates to others. It’s not scientific datacreated as part of a research project per se, but initially we see it as valuable, useful information at least for some segements of the research world.
So as an example of how to create a data set, I’ll use Taxonomic Literature II. It is a fifteen volumes guide to the literature of systemic botany published between 1753 and 1940. It contains almost 10,000 authors and about 37,000 publications.The reason to focus on TL2 is that we aim to be the authority on the web for this information. We have received permission from the IAPT (Intl Assoc for Plant Taxonomy) to digitze and release this information on the web under an open license. TL-2 is used by most? botanists and their work is made easier by this data being online. Prior to 2012 this information was either located in a library or locked behind a paywall of sorts.
This is a page of TL-2 showing Charles Darwin and On the Origin of Species with those items that are immediately visible that can be parsed and turned into Linked Data.There is other data in the page that could be turned into linked data, but at this time, we have only parsed the data that is highlighted on this page.Clearly, moving from something such as a printed book to a Linked Open Data set is an arduous task. If you are working on creating your own data sets, your experiences will differ depending on the source(s) of your data.One important things to note here are the “Darwin” in parentheses, which is a unique abbreviation for an author. Each author has one. Another important item is the “1313” identifying the title, On the Origin of Species. Each publication in TL-2 has its own number. There are about 9,900 authors and 37,000 titles in all.
This is the current website that we have that shows a sample of the search results for Charles Darwin. This is not Linked Data.You can find this page at: http://www.sil.si.edu/digitalcollections/tl-2/
Index Animalium, published in the late 1800s and early 1900s, contains 430,000 species names for 7000 scientific volumes published between 1758 and 1840. Charles Davies Sherborn dedicated much of his life to this work. The volumes consist of the index to species with one species + citation per line and a bibliography listing the titles that Sherborn read. Challenges in the data include inconsistent citation formats, two kinds of abbreviations, both in the index and in the bibliography, as well as errors introduced during the printing process.
This is one example of a page from Index Animalium for Papilio (Danaus) plexippus, AKA the Monarch Butterfly. The abbreviations:Linnaeus: Carl LinnaeusSyst. Nat.: SystemaNaturaeEd 10: 10th edition1758: Publication Year471: Page 471Also 12th Edition, published in 1767, page 767.
Identified here are the “easy” to identify data elements that can be brought to linked data. We still need to contend with the challenges associated with the parsing of these into actual citations. The TL-2 data at the top has already been parsed and loaded into a database. Index Animalium is posing a greater challenge and will take longer to complete.
A further breakdown of our data for TL-2 into linked data showing the predicates we might use for each. Again, the items in orange are specific to TL2 and may not exist in other LOD data sets. For example, the FOAF vocabulary has date of birth, but can we use only a year in that field? Will that foul up other computers? FOAF also doesn’t include date of death, which we definitely have. What predicate do we use? Do we create our own ontology and publish it? (probably)Finally, we haven’t yet begun a formal analysis of which existing ontologies might fit our needs.
80/20 Rule: You spend 20% of your time on 80% of the work and 80% of your time on the 20% of the work. We are at that point with Index Animalium. We would like to do further parsing of data with TL-2 but it will pose similar challenges to that of Index Animalium.
Some potential sources of data that we can link to. We’d like to one day have some of these link back to us, thereby competing the circuit for a linked data web of knowledge.
This is what we would like to do:A researcher enters a botanist name or a species name and is taken directly to the page in the book referenced by that entry. If the book is not known to be digitized and online, then we can redirect them to OCLC worldcat to find a copy of that book in their local library.This is a great improvement for those who wouldn’t normally have access to these books in their local library.

Linked Open Data and Systematic Taxonomy

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Linked Open Data and Systematic Taxonomy

Similar to Linked Open Data and Systematic Taxonomy (20)

Recently uploaded

Recently uploaded (20)

Linked Open Data and Systematic Taxonomy

Editor's Notes