SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Simon Jupp
Samples, Phenotypes and Ontologies
EMBL-EBI
Semantic services for data
interoperability
Elixir all hands meeting
Interoperability workshop
March 2017
Ontology services as building blocks for
FAIR
• You need standards (ontologies and controlled
vocabularies) to make data interpretable
• Interpretable data is more readily interoperable
• We can use interoperable data to build integrated
systems that make the data more findable by user
• The data become reusable when we use common
standards
• But,
• There are a lot of standards
• Doing this at scale for different domains is hard
Improving Findability by greater
InteroperabilitySmarter searching Data analysis
Data integration
Data visualisation
BioSamples case study
• description of material of biological
interest
• may be linked to assay data
• sequencing, microarray,
• proteomics
• also imaging, etc
• We’ve been making this data
FAIR for many years
The challenge - thousands of data
attributes…
• BioSamples is an example of real world experimental metadata
• We see all the variability – warts and all
• Good play ground for building tooling to cleanup and add values to this
data
• If we can build tooling that works for BioSamples – they’ll work anywhere!
What are the disease attributes?
diseaseState
hostDisease
clinicallyAffectedStatus
diagnosis
Infection
diseaseStatus
healthState
disease
clinicalInformation
hostHealthState
affectedBy
causeOfDeath
NOT:
diseaseStage: info about the stage of a disease e.g. "48 hai”, “stage”, “terminal”
diseasestage
tumorStatus:"non-tumor",120, "Tumor",100,"CSL +/+ Xenograft Tumor 1st",
healthStatus: "normal","Allergic","stressed”,"NA(Not immunized)"
Makes finding the right data hard
Normalising sample descriptions through
annotation with ontologies
CL:CL_0000071
(blood vessel
endothelial cell)
obo:CHEBI_39867
(valproic acid)
NCBITaxon:NCBITa
xon_9606
(Homo Sapiens)
Curation
Ontology challenges
• How do I access ontologies?
• How do I map data to ontologies?
• Which ontologies should I use?
• What about data that doesn’t map?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” search applications?
• How do I publish this data?
SPOT team - Adding value with ontologies
Data
Exploration
and
Cleanup
Data
structuring
Ontology
Annotatio
n
Data cleaning
and mapping
Ontology
building
FAIRified data
Data Enrichment Services
• Building an interoperability
toolkit for Europe (Elixir)
• Integrated (linked) APIs
• Plumbing for data curation
systems and workflows
• Lowering the barrier of entry to
ontologies for data stewards
New ontology lookup service!
The Ontology Toolkit
Search/Visualise ontologies
Annotate data
Ontology cross mapping
Create new ontology content
Webulous
Ontology Lookup Service
• Ontology search engine
• Ontology term history tracking
• Ontology visualisation
• Powerful RESTful API
Repository of over 160 pre-selected biomedical ontologies (4.5 million terms)
http://www.ebi.ac.uk/ols
• Provides unified mechanism to access
multiple ontologies
• Large community of users, 10s of millions of
hits per month
• Open source and dockerised
Zooma
• Optimal mappings based on data we have seen previously
• Favours precision over recall
• Captures annotations + context – context is v. important
• Currently contains over 92,000 annotations from 7 resources
• ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets
• Used to improve and share their mappings across resources
Repository of curated ontology mappings
http://www.ebi.ac.uk/spot/zooma
“Heart”
UBERON:0000948
A Zooma Mapping
+ Context
(where, when, why?)
New for 2017 – Ontology Cross Mapping
• Cross-references are a powerful tool for integrating data
• A lot curator effort in building ontology cross-references
• Currently hard to find/explore Ontology Mapping space
Datasource 1 Datasource 2
Human
Phenotype
Ontology
SNOMED-CTMappings
Ontology Mapping Service (OxO)
• UI and API to expose known mappings from OBO, UMLS and
manually curated mappings sets (e.g. GWAS, OpenTargets)
• Normlaised CURIE prefixes using identifiers.org
• SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_
• Provides a “silver standard” to support predictive mapping algorithms
* Going live March 2017
http://www.ebi.ac.uk/spot/oxo *
Common questions
• How do I access ontologies?
• How do I map data to ontologies?
• Which ontologies should I use?
• What about data that doesn’t map?
• How can I translate from one ontology to another?
• How can I extend an ontology?
• How do I build “ontology aware” search applications?
• How do I publish this data?
Data
Get the application ontology from OLS
Building a search index with BioSolr
Publishing structured data as RDF
Yes
No
Yes
No
Yes
No
Webulous OBO foundry
Create a new term
Add mappings
back to Zooma
No
Is the data annotated
to ontologies?
Is there
unmapped data?
Can you find
terms in OLS?
Is it the ontology
want?
Yes
Data annotation workflow
Search Zooma
Search OLS
Search OxO
Summary
• Part of FAIR process will be alignment with standards
• Already many standards and ontologies in use
• We build tools and services that help get you there
• You will have to do some curation
• But our tooling can capture that so we can share the burden
• How FAIR is FAIR enough?
• We’ll never FAIRify all of BioSamples
• Decide what your application is and optimise for that
Ontology team
Helen ParkinsonTony Burdett
Sira SarntivijaiOlga Vrousgou Thomas Liener
Funding
• EMBL
• CORBEL This project receives funding from the
European Union’s Horizon 2020 research and
innovation programme under grant agreement No
654248.
• EXCELERATE ELIXIR-EXCELERATE is funded by
the European Commission within the Research
Infrastructures programme of Horizon 2020, grant
agreement number 676559.

Weitere ähnliche Inhalte

Was ist angesagt?

20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
Seonho Kim
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Pablo Pareja Tobes
 

Was ist angesagt? (20)

BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
Neo4j and bioinformatics
Neo4j and bioinformaticsNeo4j and bioinformatics
Neo4j and bioinformatics
 
20130622 okfn hackathon t2
20130622 okfn hackathon t220130622 okfn hackathon t2
20130622 okfn hackathon t2
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
OEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology EngineeringOEG-Tools for supporting Ontology Engineering
OEG-Tools for supporting Ontology Engineering
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
Bio4j
Bio4jBio4j
Bio4j
 
Using Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4jUsing Public RDF Resources in Neo4j
Using Public RDF Resources in Neo4j
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platform
 
Genome science intermine
Genome science intermineGenome science intermine
Genome science intermine
 
Opportunities in chemical structure standardization
Opportunities in chemical structure standardizationOpportunities in chemical structure standardization
Opportunities in chemical structure standardization
 
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka IntegrationACS 248th Paper 136 JSmol/JSpecView Eureka Integration
ACS 248th Paper 136 JSmol/JSpecView Eureka Integration
 
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Graph DB + Bioinformatics:  Bio4j, recent applications and future directions Graph DB + Bioinformatics:  Bio4j, recent applications and future directions
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Ngsp
NgspNgsp
Ngsp
 
Use of Research (Meta-)Data - Finding researchers in/across organizations -
Use of Research (Meta-)Data  - Finding researchers in/across organizations -Use of Research (Meta-)Data  - Finding researchers in/across organizations -
Use of Research (Meta-)Data - Finding researchers in/across organizations -
 
A chemistry data repository to serve them all
A chemistry data repository to serve them allA chemistry data repository to serve them all
A chemistry data repository to serve them all
 

Ähnlich wie Semantics as a service at EMBL-EBI

Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
Dr. Haxel Consult
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
Susanna-Assunta Sansone
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 

Ähnlich wie Semantics as a service at EMBL-EBI (20)

Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...II-SDV 2012 Text Mining, Term Mining and Visualization  - Improving the Impac...
II-SDV 2012 Text Mining, Term Mining and Visualization - Improving the Impac...
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Building OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web toolsBuilding OBO Foundry ontology using semantic web tools
Building OBO Foundry ontology using semantic web tools
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
OpenAthens Conference 2018 - Tim Lull and Chad Smith - Cultivating your onlin...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
Elsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing IndustryElsevier - Smart Data and Algorithms for the Publishing Industry
Elsevier - Smart Data and Algorithms for the Publishing Industry
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 

Kürzlich hochgeladen

dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 

Semantics as a service at EMBL-EBI

  • 1. Simon Jupp Samples, Phenotypes and Ontologies EMBL-EBI Semantic services for data interoperability Elixir all hands meeting Interoperability workshop March 2017
  • 2. Ontology services as building blocks for FAIR • You need standards (ontologies and controlled vocabularies) to make data interpretable • Interpretable data is more readily interoperable • We can use interoperable data to build integrated systems that make the data more findable by user • The data become reusable when we use common standards • But, • There are a lot of standards • Doing this at scale for different domains is hard
  • 3. Improving Findability by greater InteroperabilitySmarter searching Data analysis Data integration Data visualisation
  • 4. BioSamples case study • description of material of biological interest • may be linked to assay data • sequencing, microarray, • proteomics • also imaging, etc • We’ve been making this data FAIR for many years
  • 5. The challenge - thousands of data attributes… • BioSamples is an example of real world experimental metadata • We see all the variability – warts and all • Good play ground for building tooling to cleanup and add values to this data • If we can build tooling that works for BioSamples – they’ll work anywhere!
  • 6. What are the disease attributes? diseaseState hostDisease clinicallyAffectedStatus diagnosis Infection diseaseStatus healthState disease clinicalInformation hostHealthState affectedBy causeOfDeath NOT: diseaseStage: info about the stage of a disease e.g. "48 hai”, “stage”, “terminal” diseasestage tumorStatus:"non-tumor",120, "Tumor",100,"CSL +/+ Xenograft Tumor 1st", healthStatus: "normal","Allergic","stressed”,"NA(Not immunized)"
  • 7. Makes finding the right data hard
  • 8. Normalising sample descriptions through annotation with ontologies CL:CL_0000071 (blood vessel endothelial cell) obo:CHEBI_39867 (valproic acid) NCBITaxon:NCBITa xon_9606 (Homo Sapiens) Curation
  • 9. Ontology challenges • How do I access ontologies? • How do I map data to ontologies? • Which ontologies should I use? • What about data that doesn’t map? • How can I translate from one ontology to another? • How can I extend an ontology? • How do I build “ontology aware” search applications? • How do I publish this data?
  • 10. SPOT team - Adding value with ontologies Data Exploration and Cleanup Data structuring Ontology Annotatio n Data cleaning and mapping Ontology building FAIRified data
  • 11. Data Enrichment Services • Building an interoperability toolkit for Europe (Elixir) • Integrated (linked) APIs • Plumbing for data curation systems and workflows • Lowering the barrier of entry to ontologies for data stewards New ontology lookup service!
  • 12. The Ontology Toolkit Search/Visualise ontologies Annotate data Ontology cross mapping Create new ontology content Webulous
  • 13. Ontology Lookup Service • Ontology search engine • Ontology term history tracking • Ontology visualisation • Powerful RESTful API Repository of over 160 pre-selected biomedical ontologies (4.5 million terms) http://www.ebi.ac.uk/ols • Provides unified mechanism to access multiple ontologies • Large community of users, 10s of millions of hits per month • Open source and dockerised
  • 14. Zooma • Optimal mappings based on data we have seen previously • Favours precision over recall • Captures annotations + context – context is v. important • Currently contains over 92,000 annotations from 7 resources • ClinVar, Cellular Phenotype Database, ExpressionAtlas, UniProt, GWAS, EBiSC, OpenTargets • Used to improve and share their mappings across resources Repository of curated ontology mappings http://www.ebi.ac.uk/spot/zooma “Heart” UBERON:0000948 A Zooma Mapping + Context (where, when, why?)
  • 15. New for 2017 – Ontology Cross Mapping • Cross-references are a powerful tool for integrating data • A lot curator effort in building ontology cross-references • Currently hard to find/explore Ontology Mapping space Datasource 1 Datasource 2 Human Phenotype Ontology SNOMED-CTMappings
  • 16. Ontology Mapping Service (OxO) • UI and API to expose known mappings from OBO, UMLS and manually curated mappings sets (e.g. GWAS, OpenTargets) • Normlaised CURIE prefixes using identifiers.org • SNOMED-CT: / SNOMEDCT: / SNOMED: / SNOMEDCT_ • Provides a “silver standard” to support predictive mapping algorithms * Going live March 2017 http://www.ebi.ac.uk/spot/oxo *
  • 17. Common questions • How do I access ontologies? • How do I map data to ontologies? • Which ontologies should I use? • What about data that doesn’t map? • How can I translate from one ontology to another? • How can I extend an ontology? • How do I build “ontology aware” search applications? • How do I publish this data?
  • 18. Data Get the application ontology from OLS Building a search index with BioSolr Publishing structured data as RDF Yes No Yes No Yes No Webulous OBO foundry Create a new term Add mappings back to Zooma No Is the data annotated to ontologies? Is there unmapped data? Can you find terms in OLS? Is it the ontology want? Yes Data annotation workflow Search Zooma Search OLS Search OxO
  • 19. Summary • Part of FAIR process will be alignment with standards • Already many standards and ontologies in use • We build tools and services that help get you there • You will have to do some curation • But our tooling can capture that so we can share the burden • How FAIR is FAIR enough? • We’ll never FAIRify all of BioSamples • Decide what your application is and optimise for that
  • 20. Ontology team Helen ParkinsonTony Burdett Sira SarntivijaiOlga Vrousgou Thomas Liener Funding • EMBL • CORBEL This project receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 654248. • EXCELERATE ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559.