SlideShare ist ein Scribd-Unternehmen logo
1 von 31
NHM Data Portal:
first steps toward the Graph-of-Life
Vince Smith, Ben Scott & Ed Baker
Informatics & Digital Collections Group, NHM London
SPNHC, Berlin, 23 June 2016
NHM Collection
Collection area No of objects No of type
specimens
Physical
register
Digital
data
Palaeontology 6,919,207 43,146 2,364,232 340,636
Mineralogy 423,563 615 425,000 402,727
Botany 5,863,000 172,750 127,200 645,222
Entomology 33,753,257 612,796 57,197 255,000
Zoology 27,501,350 325,000 1,986,000 1,160,216
Library & archives 5,460,000 - - -
TOTAL 79,920,377 1,154,307 4,959,629 2,803,801
<3% of NHM specimens are digitised, &
even fewer are ‘computable’
Citizen science
Big, open, linked dataHigh-throughput digitisation
Data portal and tools Text mining
Robotics
Digital Science at the NHM
Citizen science
Big, open, linked dataHigh-throughput digitisation
Data portal and tools Text mining
Robotics
Digital Science at the NHM
NHM Digital Collections Access, pre-2015
• Developed with the best of intentions, but…
• 23 separate interfaces
• Hard to find, cite, access and integrate
• No maps, few images, slow, no statistics, no export,
few updates, no authors, no citation mechanisms,
no GBIF connection
NHM Data Portal
• Discovery of NHM collections & research data
• Easy access & reuse to promote collaboration
(website, API, R-package, RDF & direct download)
• 3.7m records, >1m images (+sound, video & 3D)
• Integrates with our collection management
system (weekly) & DAM system (for images)
• Traffic light data quality indicators
• Stable, citable (DataCite) identifiers on datasets &
GUIDs on records to measure impact
• Technically sustainable & scalable
• Default open licensing (CC-Zero, CC-BY, CC-BY-NC)
http://data.nhm.ac.uk
CKAN – the technical foundation for the portal
• Enterprise, open source data portal platform
• Developed by Open Knowledge Foundation
• Used by 31 national governments, 74
regional authorities, academia & large
commercial organisations
• Key features
o Publish & find datasets
o Store & manage large data
o Robust API
o Customise & extend
o Sustainable
http://ckan.org/e.g. http://data.gov.uk/
Primary views of each NHM dataset
Point map Grid map Heat map
Statistical overviewFilterable table
Dataset & data record citation
• DataCite DOIs on every dataset
• Stable URI (UUID) on every record
• Prior identifiers aliased &
disambiguated
• Citation encouraged with clear
statements at dataset & record level
• Allows us to track cited usage
• Dynamic DOI’s on subsets coming soon
Dataset DOI Specimen URI
Traffic-light data quality indicators (via GBIF)
Via GBIF API
Major errors
Minor errors
No errors
nb. similar services offered by CRIA for Brazilian data
Potential errors highlighted & “corrected”
Assembly Video
doi: 10.3897/zookeys.481.8788
Step-by-step instructions
Supports deposition of other research datasets
Easy addition of new datasets (rapid & semi-automated)
1. Name the
dataset
2. Upload / link the
data file
3. Describe the
data file
4. Theme & tag
5. Add additional
resources
6. Temporal
coverage
7. Geographic
coverage
8. Save & finish
Data access & feedback
Extensive API
R integration
Link to data curator team
DwCA Downloads RDF (Linked Open Data)
Serving external data aggregators
GBIF iDigBio EOL
Vertnet CRIA
Data visualisations driven by API
DEMO DEMO DEMO
500,000,000
(since Feb. 2015, excluding major aggregators)
Records downloaded
Data access & feedback
Extensive API
R integration
Link to data curator team
DwCA Downloads RDF (& Linked Open Data)
Tim Berners-Lee, the inventor of the Web and
Linked Data initiator, suggested a 5-star
deployment scheme for Open Data…
What does a 5-star Data Portal mean?
LOD gives us the means to connect our data
(i.e. graph queries across distributed datasets)
Top 200 collections holding institutions
contributing specimen record to GBIF
Example 1: “what data are we publishing”
• What proportion of our collections
are accessible / digitised?
• What biases exiting in our digitised
collections?
• How much taxonomic redundancy
exists in our collections?
Useful for policy setting:
- Planning digitisation strategies
(why should we all be digitising the same taxa first)
- Identifying institutional collections strengths
(outside our community these are often not known)
- What is ‘unique’ in our collections
(taxonomically, geospatially, temporally)
- Disaster planning
(how many institutions hold the same material)
What collections are held globally?
Where are these specimens from?
There are huge gaps and biases in what & where about our collections &
where these collections are from
Top 200 collections
(scaled by size)
Specimen country origin
(darker is more )
Our results are very incomplete,
constrained by what we’ve digitised
Size of
collection
Proportion
digitised
RBGE
RBGK
NHM
MNHN
RMCA
RBINS
Very small proportions of our collections are digitally accessible
We don’t publish the overall size of our collections in a machine readable way
Example 2: exploring ecological interactions
• Specimen data is one dimension of our
collections
• We need to know how organisms interact
E.g. Predator-prey, pollinator-pollenated, host-parasite
• Museums have lots of this data
NHM Interactions data:
• Louse-host (12,000+)
• Helminth host-parasite (250,000+)
• Also large datasets: Coleoptera feeding on
dipterocarp seeds, butterfly host-plants,
British mammal-flea associations, bee flower
pollinators, several parasitic wasp datasets,
….
Increasingly published as RDF via NHM Data Portal
Global Biotic Interactions (GloBI) Database
• By Jorrit Poelen & colleagues
• Collates interaction datasets
• Currently >1.9M interactions
• EOL pulls these into Species Pages
• NHM Portal creates a combined
dataset to feed GloBI
• Produces Linked Open Data
– Create beautiful visualisations
http://www.globalbioticinteractions.org/
• Predatory interactions for
Eurythenes gryllus
• Visualisations highlight
number, frequency & type
of interaction
GloBI’s Interaction
Browser
https://blog.globalbioticinteractio
ns.org/2014/03/21/exploring-
antarctic-interactions-using-
globis-interaction-browser/
Create beautiful
visualisations with custom R
scripts and existing libraries
(e.g., igraph, Reol, rgdal)
https://blog.globalbioticinteractions.org/201
4/06/06/a-food-web-map-of-the-world/
Conclusions
• Data portals like the NHM Portal allow us to contribute and reflect
our data through the lens of specialist aggregators
• GBIF & GloBI are specialist aggregators serving LOD
• LOD allows us to combine big datasets to address new questions
– Tracking interactions & distribution of disease vectors
– Predicting crop pests, via the distribution and interactions of pests of crop
wild relatives
Next Steps
• Continue Portal development & encourage institutional adoption
• Consolidate NHM ecological interaction datasets
• Publish combined dataset on the NHM Data Portal
• GloBI to harvest the dataset and publish linked open data
• Develop visualisations for key NHM datasets
Acknowledgements
Ben Scott – Portal Engineer & Architect
Ed Baker – Data Researcher
Laurence Livermore - Project Management
Matt Woodburn – Data Architect
Vince Smith – SRO / Coordinator

Weitere ähnliche Inhalte

Was ist angesagt?

GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)Dag Endresen
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldJuan Antonio Vizcaino
 
GBIF-Norway at NMBU, January 2015
GBIF-Norway at NMBU, January 2015GBIF-Norway at NMBU, January 2015
GBIF-Norway at NMBU, January 2015Dag Endresen
 
GBIF BIFA mentoring, Day 4b Event core, July 2016
GBIF BIFA mentoring, Day 4b Event core, July 2016GBIF BIFA mentoring, Day 4b Event core, July 2016
GBIF BIFA mentoring, Day 4b Event core, July 2016Dag Endresen
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Juan Antonio Vizcaino
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31Dag Endresen
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsJuan Antonio Vizcaino
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Dag Endresen
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishingDag Endresen
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Dag Endresen
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Stella Wisdom
 
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBILiterature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBIOpenAIRE
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...Lora Aroyo
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Dag Endresen
 
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...Dag Endresen
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014Dag Endresen
 

Was ist angesagt? (20)

GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)GBIF towards 2030 (November 2018)
GBIF towards 2030 (November 2018)
 
Experiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics fieldExperiences to learn from the MS proteomics field
Experiences to learn from the MS proteomics field
 
GBIF-Norway at NMBU, January 2015
GBIF-Norway at NMBU, January 2015GBIF-Norway at NMBU, January 2015
GBIF-Norway at NMBU, January 2015
 
GBIF BIFA mentoring, Day 4b Event core, July 2016
GBIF BIFA mentoring, Day 4b Event core, July 2016GBIF BIFA mentoring, Day 4b Event core, July 2016
GBIF BIFA mentoring, Day 4b Event core, July 2016
 
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
 
TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31TDWG VoMaG Vocabulary management workflow, 2013-10-31
TDWG VoMaG Vocabulary management workflow, 2013-10-31
 
Proteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomicsProteomics public data resources: enabling "big data" analysis in proteomics
Proteomics public data resources: enabling "big data" analysis in proteomics
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 
Statistical data in RDF
Statistical data in RDFStatistical data in RDF
Statistical data in RDF
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBILiterature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
Literature-data integration in the life sciences – Jo McEntyre, EMBL-EBI
 
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel...
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)Data exchange alternatives, GIGA TAG (2009)
Data exchange alternatives, GIGA TAG (2009)
 
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
 
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
 
A case for Open Data (GoOpen 2009)
A case for Open Data (GoOpen 2009)A case for Open Data (GoOpen 2009)
A case for Open Data (GoOpen 2009)
 

Andere mochten auch

Cockroaches: from the beginning
Cockroaches: from the beginningCockroaches: from the beginning
Cockroaches: from the beginningEdward Baker
 
Towards automated monitoring of Orthoptera (and some other noisy stuff)
Towards automated monitoring of Orthoptera (and some other noisy stuff)Towards automated monitoring of Orthoptera (and some other noisy stuff)
Towards automated monitoring of Orthoptera (and some other noisy stuff)Edward Baker
 
Nature Live! Cockroaches from the beginning (31/07/2011)
Nature Live! Cockroaches from the beginning (31/07/2011)Nature Live! Cockroaches from the beginning (31/07/2011)
Nature Live! Cockroaches from the beginning (31/07/2011)Edward Baker
 
Scratchpads & Citizen Science
Scratchpads & Citizen ScienceScratchpads & Citizen Science
Scratchpads & Citizen ScienceEdward Baker
 
Biodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History MuseumBiodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History MuseumEdward Baker
 
The Great Pretenders 4
The Great Pretenders 4The Great Pretenders 4
The Great Pretenders 4Edward Baker
 
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...Edward Baker
 
Java 7 at SoftShake 2011
Java 7 at SoftShake 2011Java 7 at SoftShake 2011
Java 7 at SoftShake 2011julien.ponge
 
Java 7 JUG Summer Camp
Java 7 JUG Summer CampJava 7 JUG Summer Camp
Java 7 JUG Summer Campjulien.ponge
 

Andere mochten auch (9)

Cockroaches: from the beginning
Cockroaches: from the beginningCockroaches: from the beginning
Cockroaches: from the beginning
 
Towards automated monitoring of Orthoptera (and some other noisy stuff)
Towards automated monitoring of Orthoptera (and some other noisy stuff)Towards automated monitoring of Orthoptera (and some other noisy stuff)
Towards automated monitoring of Orthoptera (and some other noisy stuff)
 
Nature Live! Cockroaches from the beginning (31/07/2011)
Nature Live! Cockroaches from the beginning (31/07/2011)Nature Live! Cockroaches from the beginning (31/07/2011)
Nature Live! Cockroaches from the beginning (31/07/2011)
 
Scratchpads & Citizen Science
Scratchpads & Citizen ScienceScratchpads & Citizen Science
Scratchpads & Citizen Science
 
Biodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History MuseumBiodiversity Informatics at the Natural History Museum
Biodiversity Informatics at the Natural History Museum
 
The Great Pretenders 4
The Great Pretenders 4The Great Pretenders 4
The Great Pretenders 4
 
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...
Biodiversity Informatics of the Cyperaceae: Where we stand and where we’re he...
 
Java 7 at SoftShake 2011
Java 7 at SoftShake 2011Java 7 at SoftShake 2011
Java 7 at SoftShake 2011
 
Java 7 JUG Summer Camp
Java 7 JUG Summer CampJava 7 JUG Summer Camp
Java 7 JUG Summer Camp
 

Ähnlich wie NHM Data Portal: first steps toward the Graph-of-Life

Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45minsDimitrios Koureas
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeVince Smith
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince Smith
 
Building data infrastructures for science
Building data infrastructures for scienceBuilding data infrastructures for science
Building data infrastructures for scienceVince Smith
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptxvijayapraba1
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageVince Smith
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Stephen Katz
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...taxonbytes
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meetingNiklas Blomberg
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyArchiver
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...Peter Löwe
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 

Ähnlich wie NHM Data Portal: first steps toward the Graph-of-Life (20)

Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
The Biodiversity Informatics Landscape
The Biodiversity Informatics LandscapeThe Biodiversity Informatics Landscape
The Biodiversity Informatics Landscape
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
Building data infrastructures for science
Building data infrastructures for scienceBuilding data infrastructures for science
Building data infrastructures for science
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Delivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information ageDelivering biodiversity knowledge in the information age
Delivering biodiversity knowledge in the information age
 
Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012 Open@Fao presentation at the EADI Open For Development Project, 2012
Open@Fao presentation at the EADI Open For Development Project, 2012
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...De-centralized but global: Redesigning biodiversity data aggregation for impr...
De-centralized but global: Redesigning biodiversity data aggregation for impr...
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
Elixir at de.nbi meeting
Elixir at de.nbi meetingElixir at de.nbi meeting
Elixir at de.nbi meeting
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...TIB's action for research data managament as a national library's strategy in...
TIB's action for research data managament as a national library's strategy in...
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 

Mehr von Edward Baker

Data Sharing in Ecoacoustics
Data Sharing in EcoacousticsData Sharing in Ecoacoustics
Data Sharing in EcoacousticsEdward Baker
 
Ecoacoustic Challenges: UKAN Soundscapes Workshop
Ecoacoustic Challenges: UKAN Soundscapes WorkshopEcoacoustic Challenges: UKAN Soundscapes Workshop
Ecoacoustic Challenges: UKAN Soundscapes WorkshopEdward Baker
 
BioAcoustica: an online repository and analysis platform for wildlife sound
BioAcoustica: an online repository and analysis platform for wildlife soundBioAcoustica: an online repository and analysis platform for wildlife sound
BioAcoustica: an online repository and analysis platform for wildlife soundEdward Baker
 
Phasmids as Pests of Agriculture and Forestry
Phasmids as Pests of Agriculture and ForestryPhasmids as Pests of Agriculture and Forestry
Phasmids as Pests of Agriculture and ForestryEdward Baker
 
Phasmid Study Group: Name changes talk (Summer Meeting 2014)
Phasmid Study Group: Name changes talk (Summer Meeting 2014)Phasmid Study Group: Name changes talk (Summer Meeting 2014)
Phasmid Study Group: Name changes talk (Summer Meeting 2014)Edward Baker
 
NHM MSc: Automated Acoustic Identification
NHM MSc: Automated Acoustic IdentificationNHM MSc: Automated Acoustic Identification
NHM MSc: Automated Acoustic IdentificationEdward Baker
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
New tools for monitoring biodiversity and environments
New tools for monitoring biodiversity and environmentsNew tools for monitoring biodiversity and environments
New tools for monitoring biodiversity and environmentsEdward Baker
 
Building highways in the informatics landscape
Building highways in the informatics landscapeBuilding highways in the informatics landscape
Building highways in the informatics landscapeEdward Baker
 
What will a digitial Natural History Museum look like in 10 years time?
What will a digitial Natural History Museum look like in 10 years time?What will a digitial Natural History Museum look like in 10 years time?
What will a digitial Natural History Museum look like in 10 years time?Edward Baker
 
The story of a Wikipedia page
The story of a Wikipedia pageThe story of a Wikipedia page
The story of a Wikipedia pageEdward Baker
 
ViBRANT Citizen Science: Intro
ViBRANT Citizen Science: IntroViBRANT Citizen Science: Intro
ViBRANT Citizen Science: IntroEdward Baker
 
European initiatives
European initiativesEuropean initiatives
European initiativesEdward Baker
 
Scratchpads Training Course
Scratchpads Training CourseScratchpads Training Course
Scratchpads Training CourseEdward Baker
 
Nature Live!: Cockroaches from the beginning (May 2012)
Nature Live!: Cockroaches from the beginning (May 2012)Nature Live!: Cockroaches from the beginning (May 2012)
Nature Live!: Cockroaches from the beginning (May 2012)Edward Baker
 
Scratchpads Intro: Swiss Orchid Foundation
Scratchpads Intro: Swiss Orchid FoundationScratchpads Intro: Swiss Orchid Foundation
Scratchpads Intro: Swiss Orchid FoundationEdward Baker
 
Swiss Orchid Foundation Scratchpads and ViBRANT overview
Swiss Orchid Foundation Scratchpads and ViBRANT overviewSwiss Orchid Foundation Scratchpads and ViBRANT overview
Swiss Orchid Foundation Scratchpads and ViBRANT overviewEdward Baker
 
Connecting the dots: Natural Science Collections and the Web
Connecting the dots: Natural Science Collections and the WebConnecting the dots: Natural Science Collections and the Web
Connecting the dots: Natural Science Collections and the WebEdward Baker
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,futureEdward Baker
 

Mehr von Edward Baker (20)

Data Sharing in Ecoacoustics
Data Sharing in EcoacousticsData Sharing in Ecoacoustics
Data Sharing in Ecoacoustics
 
Ecoacoustic Challenges: UKAN Soundscapes Workshop
Ecoacoustic Challenges: UKAN Soundscapes WorkshopEcoacoustic Challenges: UKAN Soundscapes Workshop
Ecoacoustic Challenges: UKAN Soundscapes Workshop
 
BioAcoustica: an online repository and analysis platform for wildlife sound
BioAcoustica: an online repository and analysis platform for wildlife soundBioAcoustica: an online repository and analysis platform for wildlife sound
BioAcoustica: an online repository and analysis platform for wildlife sound
 
Phasmids as Pests of Agriculture and Forestry
Phasmids as Pests of Agriculture and ForestryPhasmids as Pests of Agriculture and Forestry
Phasmids as Pests of Agriculture and Forestry
 
Phasmid Study Group: Name changes talk (Summer Meeting 2014)
Phasmid Study Group: Name changes talk (Summer Meeting 2014)Phasmid Study Group: Name changes talk (Summer Meeting 2014)
Phasmid Study Group: Name changes talk (Summer Meeting 2014)
 
NHM MSc: Automated Acoustic Identification
NHM MSc: Automated Acoustic IdentificationNHM MSc: Automated Acoustic Identification
NHM MSc: Automated Acoustic Identification
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
New tools for monitoring biodiversity and environments
New tools for monitoring biodiversity and environmentsNew tools for monitoring biodiversity and environments
New tools for monitoring biodiversity and environments
 
Building highways in the informatics landscape
Building highways in the informatics landscapeBuilding highways in the informatics landscape
Building highways in the informatics landscape
 
What will a digitial Natural History Museum look like in 10 years time?
What will a digitial Natural History Museum look like in 10 years time?What will a digitial Natural History Museum look like in 10 years time?
What will a digitial Natural History Museum look like in 10 years time?
 
The story of a Wikipedia page
The story of a Wikipedia pageThe story of a Wikipedia page
The story of a Wikipedia page
 
ViBRANT Citizen Science: Intro
ViBRANT Citizen Science: IntroViBRANT Citizen Science: Intro
ViBRANT Citizen Science: Intro
 
European initiatives
European initiativesEuropean initiatives
European initiatives
 
Scratchpads Training Course
Scratchpads Training CourseScratchpads Training Course
Scratchpads Training Course
 
Nature Live!: Cockroaches from the beginning (May 2012)
Nature Live!: Cockroaches from the beginning (May 2012)Nature Live!: Cockroaches from the beginning (May 2012)
Nature Live!: Cockroaches from the beginning (May 2012)
 
Scratchpads Intro: Swiss Orchid Foundation
Scratchpads Intro: Swiss Orchid FoundationScratchpads Intro: Swiss Orchid Foundation
Scratchpads Intro: Swiss Orchid Foundation
 
Swiss Orchid Foundation Scratchpads and ViBRANT overview
Swiss Orchid Foundation Scratchpads and ViBRANT overviewSwiss Orchid Foundation Scratchpads and ViBRANT overview
Swiss Orchid Foundation Scratchpads and ViBRANT overview
 
Connecting the dots: Natural Science Collections and the Web
Connecting the dots: Natural Science Collections and the WebConnecting the dots: Natural Science Collections and the Web
Connecting the dots: Natural Science Collections and the Web
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
 
ViBRANT Overview
ViBRANT OverviewViBRANT Overview
ViBRANT Overview
 

Kürzlich hochgeladen

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 

Kürzlich hochgeladen (20)

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 

NHM Data Portal: first steps toward the Graph-of-Life

  • 1. NHM Data Portal: first steps toward the Graph-of-Life Vince Smith, Ben Scott & Ed Baker Informatics & Digital Collections Group, NHM London SPNHC, Berlin, 23 June 2016
  • 2. NHM Collection Collection area No of objects No of type specimens Physical register Digital data Palaeontology 6,919,207 43,146 2,364,232 340,636 Mineralogy 423,563 615 425,000 402,727 Botany 5,863,000 172,750 127,200 645,222 Entomology 33,753,257 612,796 57,197 255,000 Zoology 27,501,350 325,000 1,986,000 1,160,216 Library & archives 5,460,000 - - - TOTAL 79,920,377 1,154,307 4,959,629 2,803,801 <3% of NHM specimens are digitised, & even fewer are ‘computable’
  • 3. Citizen science Big, open, linked dataHigh-throughput digitisation Data portal and tools Text mining Robotics Digital Science at the NHM
  • 4. Citizen science Big, open, linked dataHigh-throughput digitisation Data portal and tools Text mining Robotics Digital Science at the NHM
  • 5. NHM Digital Collections Access, pre-2015 • Developed with the best of intentions, but… • 23 separate interfaces • Hard to find, cite, access and integrate • No maps, few images, slow, no statistics, no export, few updates, no authors, no citation mechanisms, no GBIF connection
  • 6.
  • 7. NHM Data Portal • Discovery of NHM collections & research data • Easy access & reuse to promote collaboration (website, API, R-package, RDF & direct download) • 3.7m records, >1m images (+sound, video & 3D) • Integrates with our collection management system (weekly) & DAM system (for images) • Traffic light data quality indicators • Stable, citable (DataCite) identifiers on datasets & GUIDs on records to measure impact • Technically sustainable & scalable • Default open licensing (CC-Zero, CC-BY, CC-BY-NC) http://data.nhm.ac.uk
  • 8. CKAN – the technical foundation for the portal • Enterprise, open source data portal platform • Developed by Open Knowledge Foundation • Used by 31 national governments, 74 regional authorities, academia & large commercial organisations • Key features o Publish & find datasets o Store & manage large data o Robust API o Customise & extend o Sustainable http://ckan.org/e.g. http://data.gov.uk/
  • 9. Primary views of each NHM dataset Point map Grid map Heat map Statistical overviewFilterable table
  • 10. Dataset & data record citation • DataCite DOIs on every dataset • Stable URI (UUID) on every record • Prior identifiers aliased & disambiguated • Citation encouraged with clear statements at dataset & record level • Allows us to track cited usage • Dynamic DOI’s on subsets coming soon Dataset DOI Specimen URI
  • 11. Traffic-light data quality indicators (via GBIF) Via GBIF API Major errors Minor errors No errors nb. similar services offered by CRIA for Brazilian data
  • 12. Potential errors highlighted & “corrected”
  • 13. Assembly Video doi: 10.3897/zookeys.481.8788 Step-by-step instructions Supports deposition of other research datasets
  • 14. Easy addition of new datasets (rapid & semi-automated) 1. Name the dataset 2. Upload / link the data file 3. Describe the data file 4. Theme & tag 5. Add additional resources 6. Temporal coverage 7. Geographic coverage 8. Save & finish
  • 15. Data access & feedback Extensive API R integration Link to data curator team DwCA Downloads RDF (Linked Open Data)
  • 16. Serving external data aggregators GBIF iDigBio EOL Vertnet CRIA
  • 17. Data visualisations driven by API DEMO DEMO DEMO
  • 18. 500,000,000 (since Feb. 2015, excluding major aggregators) Records downloaded
  • 19. Data access & feedback Extensive API R integration Link to data curator team DwCA Downloads RDF (& Linked Open Data)
  • 20. Tim Berners-Lee, the inventor of the Web and Linked Data initiator, suggested a 5-star deployment scheme for Open Data… What does a 5-star Data Portal mean?
  • 21. LOD gives us the means to connect our data (i.e. graph queries across distributed datasets)
  • 22. Top 200 collections holding institutions contributing specimen record to GBIF Example 1: “what data are we publishing” • What proportion of our collections are accessible / digitised? • What biases exiting in our digitised collections? • How much taxonomic redundancy exists in our collections? Useful for policy setting: - Planning digitisation strategies (why should we all be digitising the same taxa first) - Identifying institutional collections strengths (outside our community these are often not known) - What is ‘unique’ in our collections (taxonomically, geospatially, temporally) - Disaster planning (how many institutions hold the same material)
  • 23.
  • 24. What collections are held globally? Where are these specimens from? There are huge gaps and biases in what & where about our collections & where these collections are from Top 200 collections (scaled by size) Specimen country origin (darker is more )
  • 25. Our results are very incomplete, constrained by what we’ve digitised Size of collection Proportion digitised RBGE RBGK NHM MNHN RMCA RBINS Very small proportions of our collections are digitally accessible We don’t publish the overall size of our collections in a machine readable way
  • 26. Example 2: exploring ecological interactions • Specimen data is one dimension of our collections • We need to know how organisms interact E.g. Predator-prey, pollinator-pollenated, host-parasite • Museums have lots of this data NHM Interactions data: • Louse-host (12,000+) • Helminth host-parasite (250,000+) • Also large datasets: Coleoptera feeding on dipterocarp seeds, butterfly host-plants, British mammal-flea associations, bee flower pollinators, several parasitic wasp datasets, …. Increasingly published as RDF via NHM Data Portal
  • 27. Global Biotic Interactions (GloBI) Database • By Jorrit Poelen & colleagues • Collates interaction datasets • Currently >1.9M interactions • EOL pulls these into Species Pages • NHM Portal creates a combined dataset to feed GloBI • Produces Linked Open Data – Create beautiful visualisations http://www.globalbioticinteractions.org/
  • 28. • Predatory interactions for Eurythenes gryllus • Visualisations highlight number, frequency & type of interaction GloBI’s Interaction Browser https://blog.globalbioticinteractio ns.org/2014/03/21/exploring- antarctic-interactions-using- globis-interaction-browser/
  • 29. Create beautiful visualisations with custom R scripts and existing libraries (e.g., igraph, Reol, rgdal) https://blog.globalbioticinteractions.org/201 4/06/06/a-food-web-map-of-the-world/
  • 30. Conclusions • Data portals like the NHM Portal allow us to contribute and reflect our data through the lens of specialist aggregators • GBIF & GloBI are specialist aggregators serving LOD • LOD allows us to combine big datasets to address new questions – Tracking interactions & distribution of disease vectors – Predicting crop pests, via the distribution and interactions of pests of crop wild relatives Next Steps • Continue Portal development & encourage institutional adoption • Consolidate NHM ecological interaction datasets • Publish combined dataset on the NHM Data Portal • GloBI to harvest the dataset and publish linked open data • Develop visualisations for key NHM datasets
  • 31. Acknowledgements Ben Scott – Portal Engineer & Architect Ed Baker – Data Researcher Laurence Livermore - Project Management Matt Woodburn – Data Architect Vince Smith – SRO / Coordinator

Hinweis der Redaktion

  1. Age of enlightenment -Linking historical specimens & early scientific literature (cultural) Crop wild relatives - Ranges and ecology of crop pest relatives (metadata) Informatics - Digitisation workflows, data access & tools (digital) Environmental change - Phenology of butterflies (pinned insects) Macroscience from micro-collections - Vectors, ontology & minerals (slides) Open herbarium - Global plant diversity (herbarium sheets)
  2. Age of enlightenment -Linking historical specimens & early scientific literature (cultural) Crop wild relatives - Ranges and ecology of crop pest relatives (metadata) Informatics - Digitisation workflows, data access & tools (digital) Environmental change - Phenology of butterflies (pinned insects) Macroscience from micro-collections - Vectors, ontology & minerals (slides) Open herbarium - Global plant diversity (herbarium sheets)
  3. Example slide – create a new picture slide to add your own images. Then copy and paste the frame from this slide.
  4. Hard to track use. A few are beginning to cite in papers, but rates are low,
  5. So what does linked mean for us and what are the benefits: As a consumer, you can do all what you can do with ★★★★ Web data and additionally: ✔ You can discover more (related) data while consuming the data. ✔ You can directly learn about the data schema. ⚠ You now have to deal with broken data links, just like 404 errors in web pages. ⚠ Presenting data from an arbitrary link as fact is as risky as letting people include content from any website in your pages. Caution, trust and common sense are all still necessary. As a publisher … ✔ You make your data discoverable. ✔ You increase the value of your data. ✔ Your own organisation will gain the same benefits from the links as the consumers. ⚠ You’ll need to invest resources to link your data to other data on the Web. ⚠ You may need to repair broken or incorrect links.
  6. A GIANT GRAPH
  7. Back in April of this year, the National Museum of Natural History in New Delhi was destroyed. Large collections of mammals and birds were lost in that fire, but it truth as a community it is hard to assess the real impact of the loss because we don’t have a global perspective on what is in our collections. This information is only held locally.
  8. NARRATIVE: Bias in data publishers and collections. For specimen data only for top 200 publishing institutions (out of XXX specimen data publishers): Represents a total of XXX Dots: Institutions publishing specimen data to GBIF – scaled by size Background: countries specimens come from – darker is more
  9. NARRATIVE: Even the data available is very incomplete. E.g. NHM London (outer London dot) and Kew (inner London dot) combined. (Other dot is RBGE). In general not much! Circle = scaled by stated collection size. Black: proportion exposed via GBIF.
  10. Developed by Jorrit Poelen (freelance software engineer)
  11. Allows us to generate visualisations that show major interaction patterns across all interactions Here is an example: Green: plants; pink: parasitic fungi Potential Uses: Guide conservation: should ecologically unique interactions be identified and prioritized for conservation?
  12. Allows us to generate visualisations that show major interaction patterns across all interactions Here is an example: Green: plants; pink: parasitic fungi Potential Uses: Guide conservation: should ecologically unique interactions be identified and prioritized for conservation?
  13. Developed by Jorrit Poelen (freelance software engineer)