1. Simon Jupp
Samples, Phenotypes and Ontologies
EMBL-EBI
Semantic services for data
interoperability
Elixir all hands meeting
Interoperability workshop
March 2017
2. Ontology services as building blocks for
FAIR
• You need standards (ontologies and controlled
vocabularies) to make data interpretable
• Interpretable data is more readily interoperable
• We can use interoperable data to build integrated
systems that make the data more findable by user
• The data become reusable when we use common
standards
• But,
• There are a lot of standards
• Doing this at scale for different domains is hard
3. Improving Findability by greater
InteroperabilitySmarter searching Data analysis
Data integration
Data visualisation
4. BioSamples case study
• description of material of biological
interest
• may be linked to assay data
• sequencing, microarray,
• proteomics
• also imaging, etc
• We’ve been making this data
FAIR for many years
5. The challenge - thousands of data
attributes…
• BioSamples is an example of real world experimental metadata
• We see all the variability – warts and all
• Good play ground for building tooling to cleanup and add values to this
data
• If we can build tooling that works for BioSamples – they’ll work anywhere!
6. What are the disease attributes?
diseaseState
hostDisease
clinicallyAffectedStatus
diagnosis
Infection
diseaseStatus
healthState
disease
clinicalInformation
hostHealthState
affectedBy
causeOfDeath
NOT:
diseaseStage: info about the stage of a disease e.g. "48 hai”, “stage”, “terminal”
diseasestage
tumorStatus:"non-tumor",120, "Tumor",100,"CSL +/+ Xenograft Tumor 1st",
healthStatus: "normal","Allergic","stressed”,"NA(Not immunized)"
9. Ontology challenges
• How do I access ontologies?
• How do I map data to ontologies?
• Which ontologies should I use?
• What about data that doesn’t map?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” search applications?
• How do I publish this data?
10. SPOT team - Adding value with ontologies
Data
Exploration
and
Cleanup
Data
structuring
Ontology
Annotatio
n
Data cleaning
and mapping
Ontology
building
FAIRified data
11. Data Enrichment Services
• Building an interoperability
toolkit for Europe (Elixir)
• Integrated (linked) APIs
• Plumbing for data curation
systems and workflows
• Lowering the barrier of entry to
ontologies for data stewards
New ontology lookup service!
13. Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• Large community of users, 10s of millions of
hits per month
• Open source and dockerised
14. Zooma
• Optimal mappings based on data we have seen previously
• Favours precision over recall
• Captures annotations + context – context is v. important
• Currently contains over 92,000 annotations from 7 resources
• ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets
• Used to improve and share their mappings across resources
Repository of curated ontology mappings
http://www.ebi.ac.uk/spot/zooma
“Heart”
UBERON:0000948
A Zooma Mapping
+ Context
(where, when, why?)
15. New for 2017 – Ontology Cross Mapping
• Cross-references are a powerful tool for integrating data
• A lot curator effort in building ontology cross-references
• Currently hard to find/explore Ontology Mapping space
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
16. Ontology Mapping Service (OxO)
• UI and API to expose known mappings from OBO, UMLS and
manually curated mappings sets (e.g. GWAS, OpenTargets)
• Normlaised CURIE prefixes using identifiers.org
• SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_
• Provides a “silver standard” to support predictive mapping algorithms
* Going live March 2017
http://www.ebi.ac.uk/spot/oxo *
17. Common questions
• How do I access ontologies?
• How do I map data to ontologies?
• Which ontologies should I use?
• What about data that doesn’t map?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” search applications?
• How do I publish this data?
18. Data
Get the application ontology from OLS
Building a search index with BioSolr
Publishing structured data as RDF
Yes
No
Yes
No
Yes
No
Webulous OBO foundry
Create a new term
Add mappings
back to Zooma
No
Is the data annotated
to ontologies?
Is there
unmapped data?
Can you find
terms in OLS?
Is it the ontology
want?
Yes
Data annotation workflow
Search Zooma
Search OLS
Search OxO
19. Summary
• Part of FAIR process will be alignment with standards
• Already many standards and ontologies in use
• We build tools and services that help get you there
• You will have to do some curation
• But our tooling can capture that so we can share the burden
• How FAIR is FAIR enough?
• We’ll never FAIRify all of BioSamples
• Decide what your application is and optimise for that
20. Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
Funding
• EMBL
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.