SlideShare a Scribd company logo
1 of 11
Download to read offline
Collaborative development of cross-
    database Bio2RDF queries


                Peter Ansell
Microsoft Queensland University of Technology
              eResearch Centre

             p.ansell@qut.edu.au
Introduction
●   Large number of cross-disciplinary datasources in
    different locations
●   Scientists require simplified access to many of the
    datasources for complex research
●   Recently a large number of datasources have been
    published using the RDF syntax
●   Now, we need to learn how to query across them and
    be able to share that knowledge with others

     Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                2
Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                           3
Linked Data
1) Use URIs as names for things
2) Use HTTP URIs so that people can look up those
names.
3) When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)
4) Include links to other URIs. so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html

    Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                               4
Linked Data querying
●   Strategy 1 (Naive) :
           –   Retrieve resources
           –   Retrieve linked resources
           –   Cache the resources and perform queries locally
●   Strategy 2 (Search engine):
           –   Retrieve resources
           –   Retrieve directly linked resources
           –   Query a semantic search engine for related resources
           –   Cache the resources and perform queries locally
     Sydney, Australia        3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                     5
Linked Data querying
●   Strategy 3 (Distributed query) :
           –   Mix SPARQL endpoint queries with URI based
                resolution to avoid having a large local cache
           –   Normalise results from each site to form final query
                result
           –   Users can process the results, and perform one or more
                queries based on their interpretation of the results




     Sydney, Australia        3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                     6
Bio2RDF distributed queries
●   Assign Namespaces to providers
●   Query across relevant providers given a users query
●   Aggregate all results into a single RDF document
    and return to the user
●   It works: 700000 queries during the last month
●   Largest dataset has 10 billion triples, the Protein
    Databank, with others making up about 5 billion
    triples

     Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                7
Workflow
                    Resolved URI: http://bio2rdf.org/label/go:0000345



        Host name: http://bio2rdf.org/              Query: label/go:0000345



                        Regular expression: label/([w-]+):(.+)



        http://bio2rdf.org/query:labelsearch

                                    http://bio2rdf.org/query:labelsearchforgo




Sydney, Australia              3rd eResearch Australasia Conference       9-13 Nov 2009

                                                                                          8
Collaboration
●   Query and provider definitions have an RDF
    representation
●   Any other person is able to take a definition and
    change it to suit their needs and redistribute their
    definition
●   If definitions are Linked Data themselves, as the
    Bio2RDF configuration item are, the HTTP URI can
    be used by others to pull the definition into their
    own software
     Sydney, Australia    3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                 9
Provenance for queries and data
●   Provenance can be attached to each item, including
    details such as OpenID URI's and dates
●   The sources and queries that were used for a
    particular URI can be found by utilising the query
    plan option
           –   http://bio2rdf.org/queryplan/label/go:0000345
           –   Enables automatic query collaboration, as you could
                take this query plan and add or modify queries
                inside the query plan

     Sydney, Australia      3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                   10
Conclusion
●   Many large distributed datasources
●   Single interface, RDF
●   Distribute queries efficiently across the endpoints
●   Enabling people to create the definitions of what
    they did so other people can collaborate, via a single
    server or copy and paste



     Sydney, Australia   3rd eResearch Australasia Conference   9-13 Nov 2009

                                                                                11

More Related Content

What's hot

Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
Karlsruhe Institute of Technology (KIT)
 

What's hot (20)

The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
Using Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-SwitchboardUsing Neo4j for exploring the research graph connections made by RD-Switchboard
Using Neo4j for exploring the research graph connections made by RD-Switchboard
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
Data Citation in The Dataverse Network
Data Citation in The Dataverse NetworkData Citation in The Dataverse Network
Data Citation in The Dataverse Network
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
Standardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DASStandardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DAS
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
20151102koyama
20151102koyama20151102koyama
20151102koyama
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
re3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositoriesre3data.org – a Registry of Research Data Repositories
re3data.org – a Registry of Research Data Repositories
 
Opendata repository-v2
Opendata repository-v2Opendata repository-v2
Opendata repository-v2
 
Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013Global Biodiversity Information Facility - 2013
Global Biodiversity Information Facility - 2013
 
Elab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-finalElab 16 5-13-re3data-scholze-final
Elab 16 5-13-re3data-scholze-final
 
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
Web service technologies, at CGIAR ICT-KM workshop in Rome (2005)
 

Viewers also liked (8)

HIKM2010 - Query Resolution for Biology and Medicine
HIKM2010 - Query Resolution for Biology and MedicineHIKM2010 - Query Resolution for Biology and Medicine
HIKM2010 - Query Resolution for Biology and Medicine
 
12 Jyotirlingas
12 Jyotirlingas12 Jyotirlingas
12 Jyotirlingas
 
Take The Time
Take The TimeTake The Time
Take The Time
 
Bio2RDF Distributed Querying model
Bio2RDF Distributed Querying modelBio2RDF Distributed Querying model
Bio2RDF Distributed Querying model
 
Lynes Presentation 1
Lynes Presentation 1Lynes Presentation 1
Lynes Presentation 1
 
Mathematics Of Life
Mathematics Of LifeMathematics Of Life
Mathematics Of Life
 
students using statistics
students using statisticsstudents using statistics
students using statistics
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similar to Customisable cross-database Bio2RDF queries

The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
EDINA, University of Edinburgh
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
MediaMixerCommunity
 

Similar to Customisable cross-database Bio2RDF queries (20)

W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
RDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service InteroperationRDM Programme @ Edinburgh - Service Interoperation
RDM Programme @ Edinburgh - Service Interoperation
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...The Heterogenous Zone: Six use cases for six research data collections in Edi...
The Heterogenous Zone: Six use cases for six research data collections in Edi...
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
RDA Update
RDA UpdateRDA Update
RDA Update
 
Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017Data Access & Storage @ UWA - UWA Research Week September 2017
Data Access & Storage @ UWA - UWA Research Week September 2017
 
DSpace for Data Revisited
DSpace for Data RevisitedDSpace for Data Revisited
DSpace for Data Revisited
 
Building a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ ArchivesBuilding a linked data based content discovery service for the RTÉ Archives
Building a linked data based content discovery service for the RTÉ Archives
 
Edinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for DataEdinburgh DataShare - DSpace for Data
Edinburgh DataShare - DSpace for Data
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Sandra Collins - Building a linked data based content discovery service for t...
Sandra Collins - Building a linked data based content discovery service for t...Sandra Collins - Building a linked data based content discovery service for t...
Sandra Collins - Building a linked data based content discovery service for t...
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Metadata for Research Objects
Metadata for Research ObjectsMetadata for Research Objects
Metadata for Research Objects
 
ElN - repository integration at the University of Goettingen
ElN - repository integration at the University of GoettingenElN - repository integration at the University of Goettingen
ElN - repository integration at the University of Goettingen
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...Geospatial Data Insfrastructures, Cybercartography and Open Data:  The Need f...
Geospatial Data Insfrastructures, Cybercartography and Open Data: The Need f...
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Customisable cross-database Bio2RDF queries

  • 1. Collaborative development of cross- database Bio2RDF queries Peter Ansell Microsoft Queensland University of Technology eResearch Centre p.ansell@qut.edu.au
  • 2. Introduction ● Large number of cross-disciplinary datasources in different locations ● Scientists require simplified access to many of the datasources for complex research ● Recently a large number of datasources have been published using the RDF syntax ● Now, we need to learn how to query across them and be able to share that knowledge with others Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 2
  • 3. Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 3
  • 4. Linked Data 1) Use URIs as names for things 2) Use HTTP URIs so that people can look up those names. 3) When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4) Include links to other URIs. so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 4
  • 5. Linked Data querying ● Strategy 1 (Naive) : – Retrieve resources – Retrieve linked resources – Cache the resources and perform queries locally ● Strategy 2 (Search engine): – Retrieve resources – Retrieve directly linked resources – Query a semantic search engine for related resources – Cache the resources and perform queries locally Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 5
  • 6. Linked Data querying ● Strategy 3 (Distributed query) : – Mix SPARQL endpoint queries with URI based resolution to avoid having a large local cache – Normalise results from each site to form final query result – Users can process the results, and perform one or more queries based on their interpretation of the results Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 6
  • 7. Bio2RDF distributed queries ● Assign Namespaces to providers ● Query across relevant providers given a users query ● Aggregate all results into a single RDF document and return to the user ● It works: 700000 queries during the last month ● Largest dataset has 10 billion triples, the Protein Databank, with others making up about 5 billion triples Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 7
  • 8. Workflow Resolved URI: http://bio2rdf.org/label/go:0000345 Host name: http://bio2rdf.org/ Query: label/go:0000345 Regular expression: label/([w-]+):(.+) http://bio2rdf.org/query:labelsearch http://bio2rdf.org/query:labelsearchforgo Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 8
  • 9. Collaboration ● Query and provider definitions have an RDF representation ● Any other person is able to take a definition and change it to suit their needs and redistribute their definition ● If definitions are Linked Data themselves, as the Bio2RDF configuration item are, the HTTP URI can be used by others to pull the definition into their own software Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 9
  • 10. Provenance for queries and data ● Provenance can be attached to each item, including details such as OpenID URI's and dates ● The sources and queries that were used for a particular URI can be found by utilising the query plan option – http://bio2rdf.org/queryplan/label/go:0000345 – Enables automatic query collaboration, as you could take this query plan and add or modify queries inside the query plan Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 10
  • 11. Conclusion ● Many large distributed datasources ● Single interface, RDF ● Distribute queries efficiently across the endpoints ● Enabling people to create the definitions of what they did so other people can collaborate, via a single server or copy and paste Sydney, Australia 3rd eResearch Australasia Conference 9-13 Nov 2009 11