Challenges, contributions and applications to
biomedicine & agronomy
Presentation at "Journée thématique sur les autorités de données
8 avril 2019, Toulouse"
BioPortal: ontologies and integrated data resourcesat the click of a mouse
Ontology Repository and Ontology-based Services
1. Ontology Repository and
Ontology-based Services
Challenges, contributions and applications to
biomedicine & agronomy
Clement Jonquet
jonquet@lirmm.fr @jonquet_lirmm
Journée thématique sur les autorités de données
8 avril 2019, Toulouse
2. Les portails web de ressources
sémantiques, outils nécessaires
des “autorités de données.”
Présentation de solutions mises en œuvre dans
AgroPortal et le SIFR BioPortal
Clement Jonquet
jonquet@lirmm.fr @jonquet_lirmm
Journée thématique sur les autorités de données
8 avril 2019, Toulouse
3. Credits (people & support)
• LIRMM
• Vincent Emonet
• Anne Toulet
• Andon Tchechmedjiev
• Amine Abdaoui
• Juan-Antonio Lossio
• Elcio Abrahao
• Zohra Bellahsene
• Amina Annane (ESI Algeria)
• Mathieu Roche (CIRAD)
• Sandra Bringay
• Few MSc students / year
• Collaborators
• Pierre Larmande (IRD)
• Mark Musen (Stanford)
• John Graybeal (Stanford)
• Stefan Darmoni (CISMEF)
• Maguelonne Teisseire (IRSTEA)
• Sebastien Harispe (LGI2P)
• Adrien Coulet (LORIA)
• Elizabeth Arnaud (CGIAR)
• S. Aubin, O. Hologne, E. Dzalé,
P. Neveu, C. Pommier, C.
Nédellec … (INRA)
• ….
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
4. What are we going to talk about?
1. A few elements on semantics and ontologies
2. Ontology repositories & ontology-based services
3. Two collaborative projects on ontology-based
services in biomedicine and agronomy
4. Challenges in this area of research
5. Conclusions and perspective
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
5. 1. A few elements on
semantics and
ontologies
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
9. Big data is also
happening in
agriculture
Journéethématiquesurlesautoritésdedonnées,8avril2019,Toulouse
10. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Wilkinson M.D., Dumontier M, et al. (2016) The FAIR guiding principles for scientific
datamanagement and stewardship. Scientific Data 3:160018
1100 citations
in 2 years
12. Mise en place de l’Institut de Science des
Données de Montpellier
• Structurer l’écosystème montpelliérain en science des données
• En faire connaître le potentiel toutes communautés confondues
• Favoriser et diffuser l’usage de la science des données
– de la collecte / gestion des données à leur exploitation / analyse
– respectant les spécificités de chaque contexte et champ disciplinaire
– intégrant les dimensions d’ouverture, partage, protection et valorisation
de la donnée et du logiciel
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
13. The Semantic Web offers the
technologies
Journéethématiquesurlesautoritésdedonnées,8avril2019,Toulouse
14. Journée thématique sur les autorités de données, 8 avril 2019, ToulouseCredit: F. Gandon (Inria)
19. Why ontologies are important in science?
• To provide canonical representation of scientific knowledge
• To annotate experimental data to enable interpretation,
comparison, and discovery across databases
• To facilitate knowledge-based applications for
• Decision support
• Natural language-processing
• Data integration
• But ontologies are: spread out, in different formats, of different size, with different
structures
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
20. Number of ontologies in the NCBO BioPortal
MeSH
GO
SNOMED CT
HP
ORDO
Overlapping ontologies
Other issues
with ontologies
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Variety of representation languages
21. 2. Ontology repositories
& ontology-based
services
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
22. Ontology libraries, registries, repositories
• Ontology libraries defined as
• “a library system that offers various functions for managing, adapting and
standardizing groups of ontologies. It should fulfill the needs for re-use of
ontologies. In this sense, an ontology library system should be easily
accessible and offer efficient support for re-using existing relevant
ontologies and standardizing them based on upper-level ontologies and
ontology representation languages.” [Ding & Fensel, 2001]
• Ontology repositories defined as
• “a structured collection of ontologies (…) by using an Ontology Metadata
Vocabulary. References and relations between ontologies and their
modules build the semantic model of an ontology repository. Access to
resources is realized through semantically-enabled interfaces applicable
for humans and machines. Therefore a repository provides a formal query
language” [Hartmann, Palma, Gomez-Perez, 2009]
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
23. Still a subject of interest?
• Open Ontology Repository initiative (late 2000s)
• 2010 ORES workshop
• Ontology Repositories and Editors for the Semantic Web
• Review of ontology repositories
• [Where to publish and find ontologies? D’Aquin & Noy, 2012]
• A bunch of papers on ontology recommendation & selection
• News
• New platform in 2015 Aber-OWL
• OLS 3.0, AgroPortal releases
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
24. Why ontology repositories are important?
• You’ve built an ontology, how do you let the world know?
• You need an ontology, where do you go to get it?
• How do you know whether an ontology is any good?
• How do you find data resources that are relevant to the domain of the ontology (or to
specific terms)?
• How could you leverage your ontology to enable new science?
• How could you use ontologies without managing them ?
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
25. As any data, vocabularies and ontologies
need to be FAIR
• The FAIR principles have established the importance of using standards vocabularies
or ontologies to describe FAIR data and to facilitate interoperability and reuse…
and…
• I2. (meta)data use vocabularies that follow FAIR principles
• Explosion of the number of ontologies/vocabularies
• Cumbersome to identify
the ontologies we need
and manage their overlap.
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
26. Ontology repositories help to make ontologies FAIR
InteroperableFindable Accessible Re-usable
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
27. What are the ontology libraries out there?
• Ontology repositories / portal
• NCBO BioPortal
• Ontobee
• AberOWL
• EBI Ontology Lookup Service
• OKFN Linked Open Vocabularies
• ONKI Ontology Library Service
• MMI Ontology Registry and Repository
• ESIPportal
• AgroPortal
• SIFR BioPortal
• CISMEF HeTOP
• OntoHub
• Ontoserver
• Web indexes
• Watson, Swoogle,
Sindice, Falcons
• Ontology libraries / listings (more or less
updated)
• OBO Foundry
• WebProtégé
• Romulus
• DAML ontology library
• Colore
• FAO VEST Registry
• FAIRsharing
• DERI Vocabularies , OntologyDesignPatterns,
Semanticweb.org, W3C Good ontologies
• BARTOC
• Platform technology, Terminology Services
• Mondeca ITM, LexEVS, ANDS, SKOSMOS, NERC-VS
• Abandoned projects
• Cubboard, Knoodl, Schemapedia, SchemaWeb,
OntoSelect, OntoSearch, TONES
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
28. Focus on NCBO BioPortal : a “one stop shop” for
biomedical ontologies
• Web repository for biomedical
ontologies
• Make ontologies accessible and usable –
abstraction on format, locations, structure, etc.
• Users can publish, download, browse, search,
comment, align ontologies and use them for
annotations both online and via a web services
API.
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
29. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
• Online support for ontology
• Peer review & notes
• Versioning
• Mapping
• Search
• Resources
• Annotation
• Open source technology
• Packaged in a “virtual
appliance”
• Set up your own
“bioportal” in a few
hours
30. http://bioportal.bioontology.org
Ontology
Services
• Search
• Traverse
• Comment
• Download
Widgets
• Tree-view
• Auto-complete
• Graph-view
Annotation
Data Access
Mapping
Services
• Create
• Upload
• Download
Term recognition
Search data
annotated with a
given term
http://data.bioontology.org
Journéethématiquesurlesautoritésdedonnées,8avril2019,Toulouse
31. Ontology alignment
• Ontologies, vocabularies, and terminologies inevitably overlap in coverage
• Mappings do not always belong to an ontology
• The community needs a place to store
and retrieve them
• That’s the role of the ontology repository
• Dealing with mappings is a technical, data and
scientific challenge
• Capture the whole mapping lifecycle
• Semantically described with plenty
of provenance information
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
32. Who has been reusing NCBO
technology so far?
• NCI term browser (https://nciterms.nci.nih.gov)
• BioPortal first, then LexEVS
• Open Ontology Repository (OOR) Initiative (http://www.oor.net)
• Marine Metadata Interoperability Ontology Registry and Repository (http://mmisw.org)
• ESIPPortal (Earth Science Information Partners - http://semanticportal.esipfed.org )
• AgroPortal (http://agroportal.lirmm.fr)
• SIFR/French BioPortal (http://bioportal.lirmm.fr)
• And a few hospitals, research labs, with private data and specific needs (often in-house annotation)
• EcoPortal new initiative in ecology/biodiversity by LifeWatch ERIC
• Stanford libraries (https://biblio.ontoportal.org)
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
33. NCBO BioPortal
data as of 2013
http://lod-cloud.netJournée thématique sur les autorités de données, 8 avril 2019, Toulouse
34. 3. Two collaborative projects
on ontology-based services in
biomedicine and agronomy
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
35. SIFR: Semantic Indexing of French Biomedical Data
Resources
http://www.lirmm.fr/sifr
• Ontology-based services to index, mine
and retrieve French biomedical data
• In France, there is already a reference
repository for medical terminologies but
nothing public for annotation
• Crucial need for tools & services for French
biomedical data
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
36. C. Jonquet, A. Annane, K. Bouarech, V. Emonet & S. Melzi. SIFR
BioPortal: French biomedical ontologies and terminologies
available for semantic annotation, In 16th Journées Francophones
d'Informatique Médicale, JFIM'16. Genève, Suisse, July 2016.
A dedicated version
of BioPortal for
French ontologies
http://bioportal.lirmm.fr
28 monolingual ontologies/terminologies
• From the UMLS or HeTOP or uploaded by
users
• Cleaned and checked for annoation
Journéethématiquesurlesautoritésdedonnées,8avril2019,Toulouse
39. SIFR
Annotator
• Detect biomedical
entities in the text
• Use semantics inside
ontologies
• Easy to use web service
• Free and open access
• Easy to plug-in external workflows
• Annotations in several formats with concept URIs
• Multiple parameters
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
41. AgroPortal: a vocabulary and ontology
repository for agronomy
http://agroportal.lirmm.fr
• Develop and support a reference ontology repository
• Primary focus on the agronomy & close related domains (plant sciences, food and
biodiversity)
• Reusing the NCBO BioPortal technology
• Avoid to re-implement what has been done, facilitate interoperability
• Reusing the scientific outcomes, experience & methods of the biomedical domain
• Enable straightforward use of agronomic related ontologies
• Respect the requirements & specificities of the agronomic community
• Fully semantic web compliant infrastructure
• Enable new science Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
42. AgroPortal an ontology repository
for agronomy, food, plant sciences &
biodiversity
107 ontologies, 80 candidates
5 driving use cases
~90 registered users
http://agroportal.
lirmm.fr
Publish,
search,
download
Browse,
visualize
Peer review
Versioning
Annotation
Recommen
dation
Mapping
Notes
Projects
C. Jonquet, A. Toulet, (…) P. Larmande. AgroPortal: an ontology repository for agronomy,
Computers and Electronics in Agriculture. Jan 2018. 144, pp.126-143. Elsevier. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
44. 5 Driving Agronomic Use Cases
Ø IBC Rice Genomics & AgroLD project
Ø Data integration and knowledge management related to rice (P. Larmande)
Ø RDA Wheat Data Interoperability working group
Ø Common framework for publishing wheat data (E. Dzalé-Yeumo)
Ø LovInra : INRA Linked Open Vocabularies
Ø Vocabularies produced by INRA scientists (S. Aubin)
Ø Crop Ontology project
Ø Ontologies for describing crop germplasm & traits (E. Arnaud)
Ø GODAN global map of agri-food data standards
Ø VEST/AgroPortal MAP of standards (V. Pesce)
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
46. Ontology groups and
categories
Journée thématique sur les autorités de données, 8 avril 2019,
Category Number
Plant Phenotypes and Traits 31
Plant Anatomy and Development 4
Natural Resources, Earth and Environment 12
Animal Science and Animal Products 6
Agricultural Research, Technology and Engineering 15
Breeding and Genetic Improvement 1
Plant Science and Plant Products 7
Plant Genetic Resources 2
Food and Human Nutrition 7
Food Security 2
Taxonomic Classifications of Organisms 6
Farms and Farming Systems 5
Fisheries and Aquaculture 2
Forest Science and Forest Products 2
Biodiversity and Ecology 14
Specific slices display to use only the
ontologies of a group
http://lovinra.agroportal.lirmm.fr/
http://semandiv.agroportal.lirmm.fr/
51. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
REST Service API:
http://data.agroportal.lirmm.fr/documentation
SPARQL endpoint:
http://sparql.agroportal.lirmm.fr
53. Ontology identification & selection
(via metadata)
Multilingualism
Ontology alignment
(complete life cycle)
Generic ontology-based services
(especially for free text data)
Annotations and linked data
(keep quality while enabling horizontal studies)
Scalability and interoperability
(to multiple domain and number/variety of ontologies)
Challenges for
ontology
repositories
Journée thématique sur les autorités de données,
8 avril 2019, Toulouse
54. Challenges for
ontology
repositories
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Ontology identification & selection
(via metadata)
Multilingualism
Ontology alignment
(complete life cycle)
Generic ontology-based services
(especially for free text data)
Annotations and linked data
(keep quality while enabling horizontal studies)
Scalability and interoperability
(to multiple domain and number/variety of ontologies)
55. Generic ontology-based services (especially for
free text data)
• The role of the portal is to offer services for ontologies
• Focus here on the use of ontologies is for annotation purposes
• How can a repository facilitate the use of ontologies for annotation?
• Text mining challenge (disambiguation, context, negation, modality, time)
• Electronic Health Records
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
56. Ontology – data cycle
• Ontologies and data change everyday
• Need to be able to handle the “deltas” only
• Work on terminology and knowledge
extraction from text
• BioTex (http://tubo.lirmm.fr/biotex)
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
57. Part-of-Speech Tagging
Candidate terms extraction
Ranking of candidate terms
Computing of new combination measures
Re-ranking using web-based measure.
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
J.A. Lossio-Ventura, C. Jonquet, M. Roche & M.
Teisseire. Biomedical term extraction: overview
and a new methodology, Information Retrieval,
Special issue on Medical Information Retrieval.
August 2015. Vol. 19 (1), pp. 59-99. Springer.
58. Improve the workflow to handle clinical text
narrative
• Project SIFR & PractiKPharma
• Detecting Negation, Temporality and Experiencer
• Implementation using NegEx/ConText
• Inclusion in the French/SIFR Annotator
• Proxy architecture to plug this the NCBO Annotator
• Very good performance results
• e.g., negation F1 between 0.8 and 0.9
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
A. Abdaoui, A. Tchechmedjiev, W. Digan, S. Bringay, C. Jonquet., French
ConText: a Publicly Accessible System for Detecting Negation,
Temporality and Experiencer in French Clinical Notes Biomedical
Informatics. Under review – 3rd round.
59. Features for
annotating clinical
text
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Le patient ne montre aucun signe de fièvre. Son père a déjà eu de
l’arthrose. Il a des antécédents de dépression.
60. Scoring of annotations
• Improve the NCBO Annotator results by ranking the annotations according to their
relevance
• While not changing the service implementation
• Take into account their frequencies (as originally proposed in 2009 and removed)
• Add a term extraction measure, called C-Value, used to positively discriminate annotations
generated from matches with multi-word terms.
• Mostly improves annotations done with multiword terms
• 2 new scoring methods allowing to score and rank annotations by their importance
in the given input data
• Interesting results validated against PubMed manual annotations
S. Melzi & C. Jonquet. Scoring semantic annotations returned by the NCBO Annotator, In 7th International Semantic Web
Applications and Tools for Life Sciences, SWAT4LS'14. Berlin, Germany, Dec. 2014.
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
61. Ontology identification & selection
(via metadata)
Multilingualism
Ontology alignment
(complete life cycle)
Generic ontology-based services
(especially for free text data)
Annotations and linked data
(keep quality while enabling horizontal studies)
Scalability and interoperability
(to multiple domain and number/variety of ontologies)
Challenges for
ontology
repositories
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
62. Catching up with relevant data:
annotations and linked data
• Data deluge
• Not necessarily connected to
relevant ontologies
• Annotate data with ontology concepts
• Horizontal approach
ONTOLOGIES
RESOURCES
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
C. Jonquet, P. LePendu, S. Falconer, A. Coulet, N. F. Noy, M. A. Musen & N. H.
Shah. NCBO Resource Index: Ontology-Based Search and Mining of Biomedical
Resources, Web Semantics. September 2011. Vol. 9 (3), pp. 316-324. Elsevier.
63. The role of the ontology repository is not
clear here
• We built the NCBO Resource Index as a searchable
database of around 50 biomedical resources semantically
indexed, with annotations
• Since then, linked open data has become the approach in
the semantic web
• In agronomy: build a database of resources described in
RDF, and annotated with ontologies: the AgroLD project
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
64. Agronomy Linked Data (AgroLD) project
• Semantic data integration from
agronomic databases
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
A. Venkatesan, G. Tagny, N. El Hassouni, I. Chentli, V. Guignon, C. Jonquet, M. Ruiz, P.
Larmande, Agronomic Linked Data: a knowledge system to enable integrative biology in
Agronomy, PLoS One, 13 (11), pp.e0198270, 2018.
RDF
Transformation
Workflows
Annotation
Worflows
65. AgroLD
semantic web oriented
data integration
platform for plant
biology
www.agrold.org
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
66. Ontologies
Knowledge in AgroLD
multiple data sources annotated with
reference ontologies
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Multiple APIs
(REST, SPARQL)
Multiple querying
interfaces (e.g.,
relations
network)
Galaxy wrapper
available
67. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Ontologies used
in AgroLD
8 databases
37M triples
9 ontologies
69. Summary
• We discussed the importance of ontology repositories and the span of
ontology-based services they can (should) offer to work with data
• Reviewed some of the challenges in that domain of research, illustrating
with our 2 projects (SIFR & AgroPortal)
• Context of several axes of research from metadata standard, ontology alignment, data
integration, semantic annotation and more…
• Reviewed some of the results obtained & propositions made
• Some are work in progress
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
70. Project D2KAB (2019-2023)
• Data to Knowledge in Agronomy and Biodiversity
• 2 work-packages on ontology services and alignment
• Development of AgroPortal and extended services
• 1 work-package on building and harnessing knowledge
graphs
• 2 work-packages of driving ag & biodiv projects (food
packaging, agro-agri linked data, wheat phenotype,
ecosystems & plant biogeography)
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
71. D2KAB’s
Objective
Create a framework to turn agronomy and biodiversity data into –
semantically described, interoperable, actionable, open– knowledge,
along with investigating scientific methods and tools to exploit this
knowledge for applications in science & agriculture
■ How: Ontologies & Linked Open Data
– data integration, text mining,
semantic annotation, ontology
alignment & linked data
exploitation
■ Use case driven informatics research =>
5 driving scenarios
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
72. D2KAB: a national component of an
international
ecosystem
Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
PIA projets IBC,
ISTEX, IFB, AnaEE-
France, #DigitAg,
Idex Paris-Saclay,
UCA
AnaEE Research
Infrastructure
Labex NUMEV,
AGRO CEMEB, I-
Site MUSE and
CAP2025
GACS WG
H2020 eROSA
project
Agrisemantics
(GO-FAIR IN &
RDA WG)
Involved in
several H2020
2018 & 2019
propositions
73. Journée thématique sur les autorités de données, 8 avril 2019, Toulouse
Take home message