SlideShare a Scribd company logo
1 of 18
Data Curation and
Biodiversity Research --
Lessons from BiSciCol and
a look at the “Triplifier
Simplifier”


John Deck, University of California, Berkeley
Brian Stucky, University of Colorado, Boulder
Lukasz Ziemba, University of Florida, Gaineseville
Nico Cellinese, University of Florida, Gainesville
Rob Guralnick, University of Colorado, Boulder
BiSciCol Team
Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John
   Deck, Rob
Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate
   Rachwal, Brian
Stucky, Rob Whitton, Lukasz Ziemba
•   BiSciCol is National Science Foundation funded 2010 – 2014
•   Infrastructure to tag & track specimens & derivates in cyberspace
•   Relies on globally unique identifiers (GUIDs) to track objects
•   Implements a Linked Data approach
•   Provides support for the Global Names Architecture
A Biological Relationship Graph …




                          Taxonomic Type Filter




                          Class Filter
                           X  Specimens
                               Tissues
                           X   Sequences
Why Linked Data? Why BiSciCol?
Here is Gustav’s Problem


                   Generates Lots of Data…




 (Prefers to collect stuff)
Biodiversity Data Challenges



   Data is Distributed


   Rapidly Changing
   Technologies

   Covers Multiple
   Domains
Solving Biodiversity Data Challenges with
BiSciCol and Linked Data

                                             Is a dwc:Event
  Group data into classes.
                                             Is a dwc:Event
   Assign identifiers.

   Link identifiers.

                         [ ] Ocean Sampling Day
   Publish.              [X] Moorea Biocode
                         [X] SI MSNGR System
                         [+] Add My Data
The Triplifier
                  (Advanced Interface)

Loading Data

Naming and Identifying Objects

Linking Objects

Publishing


                                 Powered by:
Advanced Interface: Loading Data
                             Darwin Core
                              Archive      Darwin
                                           Core
                                           Archive




           Spreadsheets



             Mysql




                          KEMU      MySQL
Advanced Interface: Entities


                                                                                                                                         78
                                                                                  Tissue




Result is identifiers assigned to Entities:
78 a door .
427 a cat .
<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> a <dwc:Occurrence> .
<http://biocode.berkeley.edu/collectorevents/MIB_25> a <dwc:Event> .




                                                                                   From Gary Larsen and adapted by Barry Smith in Referent Tracking
                                  Ceusters W, Smith B. Strategies                 for Referent Tracking in Electronic Health R Biomed
                                                                                      presentation at the Semantics of Biodiversity Workshop, 2012.
                                                                                                           Inform. 2006 Jun;39(3):362-78.
Advanced Interface: Entity
                                                  Relations




Relations as Triples:
<http://biocode.berkeley.edu/collectorevents/MIB_25> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> .
<http://biocode.berkeley.edu/collectorevents/MIB_37> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> .
<http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M037F10> .
<http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M028G5> .
Triplify!: View graph based data
            Response




    Query
The Triplifier (Simple Interface)




          Publish
What challenges are we facing now?
(for BiSciCol, Linked Data, and data integration
                   In general)
Identifier Issues
                                 Persistence
                                 Solutions:
                                 • DOIs (http://doi.org/)
                                 • EZIDs (http://ezid.net/)




                                 Assignment at the source is difficult
                                 Solutions:
                                 • Calculated namespaces (e.g. geo:lat,lng) via PDAs
                                 • UUIDs (randomly unique)

           The digestible RFID tag


                                     Semantic web requires URIs but many standards (including
scheme : string                      Darwin Core) do not require URIs for identifiers
                                     Solution:
                                     • Promote use of URIs for identifiers in all Standards.
     URI
“Occurrence”      Classification Issues
                                              Inadequate representational units
                                              Confusion between representational
                                              units



“Sample, Specimen, Individual, Aggregation”

                                               Solutions:
                                               • Continue working on clarity in term
                                                  definitions
                                               • Work from upper level ontologies (e.g.
                                                  Basic Formal Ontology) to derive
                                                  definitions.
Relation Issues
        Non-sensical conclusions are possible!




        Solution:
        • apply directional links only where
           appropriate.
Adoption Issues
    Critical mass required for effective utilization
    Solutions:
    • Work with aggregators (GBIF, VertNet, NCBI).
    • View Triples as a publishable unit




    Reality is complicated

    Solutions:
    • Work collaboratively (e.g.
       BioPortal, hackathons, interdisciplinary
       workshops)
The BiSciCol Mission

• BiSciCol tackles biodiversity data challenges:
    •    Tracking and integration of objects across disciplines
    •    Linking derivatives back to their source
• BiSciCol is about community, collaborative practice
    •    Commitment to standards, ontologies
    •    Agreement on permanent, resolvable identifiers
    •    Triplification of data sources to enhance linked data



        http://biscicol.blogspot.com/       http://biscicol.org

More Related Content

Similar to Triplifier talk

Biological Science Collections Tagging and Tracking presented at SPNHC
Biological Science Collections Tagging and Tracking presented at SPNHCBiological Science Collections Tagging and Tracking presented at SPNHC
Biological Science Collections Tagging and Tracking presented at SPNHCRob Guralnick
 
BiSciCol: Linking Information for Biodiversity Scientists
BiSciCol: Linking Information for Biodiversity ScientistsBiSciCol: Linking Information for Biodiversity Scientists
BiSciCol: Linking Information for Biodiversity ScientistsJohn Deck
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?GigaScience, BGI Hong Kong
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesStefan Dietze
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Dimitrios Koureas
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
BiSciCol ievobio
BiSciCol ievobioBiSciCol ievobio
BiSciCol ievobioJohn Deck
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 
Scratchpads training course introduction
Scratchpads training course introductionScratchpads training course introduction
Scratchpads training course introductionDimitrios Koureas
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsugChris Dwan
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 

Similar to Triplifier talk (20)

Biological Science Collections Tagging and Tracking presented at SPNHC
Biological Science Collections Tagging and Tracking presented at SPNHCBiological Science Collections Tagging and Tracking presented at SPNHC
Biological Science Collections Tagging and Tracking presented at SPNHC
 
BiSciCol: Linking Information for Biodiversity Scientists
BiSciCol: Linking Information for Biodiversity ScientistsBiSciCol: Linking Information for Biodiversity Scientists
BiSciCol: Linking Information for Biodiversity Scientists
 
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
 
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
Marco Roos: Newton's ideas and methods are preserved forever: how about yours?
 
Gbrds Tech Issues Op
Gbrds Tech Issues OpGbrds Tech Issues Op
Gbrds Tech Issues Op
 
Linked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; RepositoriesLinked Data for Federation of OER Data &amp; Repositories
Linked Data for Federation of OER Data &amp; Repositories
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
BiSciCol ievobio
BiSciCol ievobioBiSciCol ievobio
BiSciCol ievobio
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
Scratchpads training course introduction
Scratchpads training course introductionScratchpads training course introduction
Scratchpads training course introduction
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...
 
2015 09 emc lsug
2015 09 emc lsug2015 09 emc lsug
2015 09 emc lsug
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 

Recently uploaded

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 

Recently uploaded (20)

Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 

Triplifier talk

  • 1. Data Curation and Biodiversity Research -- Lessons from BiSciCol and a look at the “Triplifier Simplifier” John Deck, University of California, Berkeley Brian Stucky, University of Colorado, Boulder Lukasz Ziemba, University of Florida, Gaineseville Nico Cellinese, University of Florida, Gainesville Rob Guralnick, University of Colorado, Boulder BiSciCol Team Reed Beaman, Nico Cellinese, Jonathan Coddington, Neil Davies, John Deck, Rob Guralnick, Bryan P. Heidorn, Chris Meyer, Tom Orrell, Rich Pyle, Kate Rachwal, Brian Stucky, Rob Whitton, Lukasz Ziemba
  • 2. BiSciCol is National Science Foundation funded 2010 – 2014 • Infrastructure to tag & track specimens & derivates in cyberspace • Relies on globally unique identifiers (GUIDs) to track objects • Implements a Linked Data approach • Provides support for the Global Names Architecture
  • 3. A Biological Relationship Graph … Taxonomic Type Filter Class Filter X Specimens Tissues X Sequences
  • 4. Why Linked Data? Why BiSciCol? Here is Gustav’s Problem Generates Lots of Data… (Prefers to collect stuff)
  • 5. Biodiversity Data Challenges Data is Distributed Rapidly Changing Technologies Covers Multiple Domains
  • 6. Solving Biodiversity Data Challenges with BiSciCol and Linked Data Is a dwc:Event Group data into classes. Is a dwc:Event Assign identifiers. Link identifiers. [ ] Ocean Sampling Day Publish. [X] Moorea Biocode [X] SI MSNGR System [+] Add My Data
  • 7. The Triplifier (Advanced Interface) Loading Data Naming and Identifying Objects Linking Objects Publishing Powered by:
  • 8. Advanced Interface: Loading Data Darwin Core Archive Darwin Core Archive Spreadsheets Mysql KEMU MySQL
  • 9. Advanced Interface: Entities 78 Tissue Result is identifiers assigned to Entities: 78 a door . 427 a cat . <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> a <dwc:Occurrence> . <http://biocode.berkeley.edu/collectorevents/MIB_25> a <dwc:Event> . From Gary Larsen and adapted by Barry Smith in Referent Tracking Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health R Biomed presentation at the Semantics of Biodiversity Workshop, 2012. Inform. 2006 Jun;39(3):362-78.
  • 10. Advanced Interface: Entity Relations Relations as Triples: <http://biocode.berkeley.edu/collectorevents/MIB_25> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> . <http://biocode.berkeley.edu/collectorevents/MIB_37> <ma:isSourceOf> <http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> . <http://biocode.berkeley.edu/collectorspecimens/BMOO_2665> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M037F10> . <http://biocode.berkeley.edu/collectorspecimens/BMOO_2667> <ma:isSourceOf> <http://biocode.berkeley.edu/plate_well/Plate_M028G5> .
  • 11. Triplify!: View graph based data Response Query
  • 12. The Triplifier (Simple Interface) Publish
  • 13. What challenges are we facing now? (for BiSciCol, Linked Data, and data integration In general)
  • 14. Identifier Issues Persistence Solutions: • DOIs (http://doi.org/) • EZIDs (http://ezid.net/) Assignment at the source is difficult Solutions: • Calculated namespaces (e.g. geo:lat,lng) via PDAs • UUIDs (randomly unique) The digestible RFID tag Semantic web requires URIs but many standards (including scheme : string Darwin Core) do not require URIs for identifiers Solution: • Promote use of URIs for identifiers in all Standards. URI
  • 15. “Occurrence” Classification Issues Inadequate representational units Confusion between representational units “Sample, Specimen, Individual, Aggregation” Solutions: • Continue working on clarity in term definitions • Work from upper level ontologies (e.g. Basic Formal Ontology) to derive definitions.
  • 16. Relation Issues Non-sensical conclusions are possible! Solution: • apply directional links only where appropriate.
  • 17. Adoption Issues Critical mass required for effective utilization Solutions: • Work with aggregators (GBIF, VertNet, NCBI). • View Triples as a publishable unit Reality is complicated Solutions: • Work collaboratively (e.g. BioPortal, hackathons, interdisciplinary workshops)
  • 18. The BiSciCol Mission • BiSciCol tackles biodiversity data challenges: • Tracking and integration of objects across disciplines • Linking derivatives back to their source • BiSciCol is about community, collaborative practice • Commitment to standards, ontologies • Agreement on permanent, resolvable identifiers • Triplification of data sources to enhance linked data http://biscicol.blogspot.com/ http://biscicol.org