8. Anatolia Zooarchaeology Case Study
led by Alexandria Archive Institute
Research goals and outcomes:
– Improve archaeological data
collection / documentation practices
– Better understanding of gaps (spatial
and temporal)
– Integrated biometrics show complex
patterns (introduction of domestic
and continued use of wild animals
by region)
– Aligning data to EOL taxon identifiers
helps draw out patterns in relative
proportion of taxa over time and
space across many assemblages
9. EOL Computable Data Challenge
1. 14 different sites
2. 34+ zooarchaeologists
3. Decoding, cleanup, metadata documentation
4. 220,000+ specimens
5. 450 entities linked to 143 EOL taxon concepts
6. Anatomical entities linked to Uberon.org
7. Biometrics linked to measurement ontology
8. Collaborative analysis
http://opencontext.org/
10. 0 100000 200000 300000 400000 500000 600000 700000 800000
Distribution
MolecularBiology
Multiple topics
TypeInformation
Habitat
ConservationStatus
Threats
Morphology
Conservation
Management
Trends
Size
Associations
Uses
TrophicStrategy
Cyclicity & Life Cycle
PopulationBiology
Reproduction
Migration
Taxonomy
LifeExpectancy
Identification
Behaviour
Ecology
Diseases
Number of text objects
Subject
of
text
object
11. Promote NLP text mining,
crowdsourcing, standardizing
• Species Interaction Datasets—Integration,
Visualization, and Analysis (Poelen and Mungall)
• Discovering EnvO habitat terms in EOL contents
(Pafilis)
• Altitude Specificity of Flower Coloration (Wright)
• Crowd-sourced data to examine morphological
impacts of extinction risk in ray-finned fishes
(Chang)
• Macroecological patterns in butterfly-hostplant
associations (Ferrer-Parris)
12. EOL GloBI
Global Biotic Interactions
Challenge: Species interaction datasets are mostly
buried in flat files & custom formats.
Plan: Build infrastructure for normalizing and aggregating
species interaction datasets and make them accessible
through flat files (Darwin Core Archive), web services,
and semantic web endpoints (SPARQL).
Eventually: Publish biotic interaction ontology re-using
existing ontologies, re-integrate with EOL
Enable semantic interoperability to allow for cross-functional
analysis (e.g. How does a parasite regulate gene
expression of host?
Poelen, Mungall, Simons, Reiz
14. Easy access to analyzable trait data
“Are blue organisms more common in high altitudes?”
“How can I predict vulnerability to climate change based
on life history characteristics?”
“What organisms should I collect to fill in gaps in genome
quality tissue collections?”
• Look for data type, download for all taxa
• Create a collection of taxa, download all data
• Use Reol: an R interface to EOL (Banbury, Omeara)
http://barbbanbury.info/barbbanbury/Reol.html
• Find more specialized data repositories
15. Adding traits to EOL
Funded: Marine focus
<scientific name> <hasAvgBodyMass in g> <value>
<scientific name> <preysOn> <scientific name>
Harvest and display on data tab
Add high-level semantics from coarse SPM ontology
Downloads, fancy searching
Machine access
19. Thanks
Funding & other contributions
Sloan Foundation
Smithsonian Institution
David Rubenstein
Marine Biological Laboratory
Harvard University
Our content partners
Thousands of individual
contributors, and hundreds of
volunteer curators
Image credits
Jenny from Taipei
University of Birmingham
Cynthia Parr
Chief Scientist @eol
@cydparr parrc@si.edu
GLoBI: Jorrit Poelen (lead/software), Chris Mungall
(ontologies), James Simons (biologist) and Robert
Reiz (software). Datasets shared by: Peter D.
Roopnarine, Rachel Hertog, Carlos García-Robledo,
James Simons, Jenny L. Wrast, C. Barnes,
International Council for the Exploration of the Sea
(ICES), Jose R. Ferrer Paris, Senol Akin, Malcolm
Storey (BioInfo.org.uk), Ivy E. Baremore, Joel Sachs
(SPIRE), Colt W. Cook, David A. Blewett
Alexandria Archive: Sarah Kansa, Eric
Kansa, 34 other zooarchaeologists
BioNames: Rod Page, Ryan Schenk
MOOC: Katy Börner, Twy Bethard,
Andrew Miles , Mattia Della Libera
Hinweis der Redaktion
This is a very different kind of talk because I am not focusing on metagenomics or microbes per se. I would to introduce what we are doing at the Encyclopedia of Life in hopes that we can soon bridge the gap between those studies and studies on macroorganism diversity.
We have a working infrastructure as well as more than 200 partners, We harvest and sort text and multimedia by topic and by species and put it on our pages. Curation + user-added content from the crowds is added to the mix.This is fed back to providers, giving them traffic, quality control on their own content, and new content for them to use And, we are already seeing spinoff products. We make it easy for developers, and everything is either public domain or CC-licensed so it can be re-used.
As this is a meeting about standards, I thought I would mention some of the standards we are using.
We now have over a million pages with content, some of it is even in other languages like Arabic, Spanish, and Chinese. And we are getting traffic mostly from the general public, from all over the world.
There are strong links between taxa represented in NCBI’s databases and others. Each dot here represents a project with a database holding some sort of biological data. Chiefly, the links between these databases are based on taxonomic names and so EOL has mapped every name and their identifiers in each of these hubs to bring the data together.
One of the benefits is that we can support third-party projects where linking and visualizing via names is critical. This is Bionames, a project by Rod Page & Ryan Shenk. They have visualized the taxonomic concepts in this family of bats – where there are no images there are obvious gaps in the Encyclopedia of life. There is a timeline showing when species were described, a sample classification and distribution map, and links to some of the foundational literature.Essentially, they are re-organizing EOL data to suit their own use cases for taxonomists, and bringing in additional data not yet available on EOL.
Here is what archaeologists are doing with EOL
The removal of objects is now forbidden in most countries and many sites in the US. As a result data collection methods have changed from description of a physical object accessible in the US to a full surrogate for an object that might be re-buried in the ground. Data collection has increased as the collection of objects has decreased.Still individual systems of data collection (see examples on the right) have emerged which have.Developed over timeAre Handed down from mentors Contain some technological adoption, particularly the adoption of Excel spreadsheets over relational databasesIn all of our interviews there was no reference to existing guides, such as the UK: Archaeological Data Service or Netherlands: DANS on archaeological documentation.
Most of our 5.4 million content objects are text blobs and here are the subjects of that text. Most often, our text objects are about distribution. But there are many other subjects involved including essays that include multiple subjects.
Information Visualization MOOC (Massive Open Online Course) led by Dr. Katy Börner of Indiana University, students TwyBethard (United States), Andrew Miles (United Kingdom), Edward Kok (Netherlands) and Mattia Della Libera (Italy) used GloBI data to create an insightful visualization of spatial marine food webs in the Gulf of Mexico.
In the next year and a half we are tackling these challenges with funding from the Sloan Foundation.We are starting with marine dataIn the most simplistic view, we’ll be storing triples, each part of which can be linked to a definitionso that the meaning is clearly defined. There might be five different ways to define an attribute like “body length” and we should be able to handle them all without losing the distinction. Of course we’ll also make sure each triple links back to a dataset and all the appropriate credits.This data will be organized on a data tab, perhaps sorted out into the 35 or so “topics” that we currently have text chapters for, like size or reproduction, and we will also allow powerful downloading and searching capabilityFinally we’ll be setting up ways for other applications to grab the data and do interesting things with it.This semantic web technology isn’t new, but the way we’ll be using it with EOL is new.
Serving building blocks, but actually not quite like lego because we are not one source that mass produces everything
More like amazon marketplace, because we are an infrastructure that providers (i.e. merchants) can plug into to share their data with others.
We are in the midst of a genomics revolution.The cost to generate a full genome sequence is dropping more or less daily.What is all this genetic information DOING?How does it relate to what we can see and measure about organisms, their phenotypes, or their traits?How do these genes interact with the environment to result in both normal and abnormal development of traitsnot just for lab-dwelling species like rats, but across the tree of life?How do evolutionary changes in DNA make a difference in the lives of organisms?TraitBank, which is not yet funded, would enable us to scale up and manage all kinds of trait data about all organisms.