Integrating eagle-i and VIVO ontologies for CTSA researcher profiling
1. Finding common ground:
integrating the eagle-i and VIVO
ontologies
Carlo Torniai, Shahim Essaid, Brian Lowe, Jon
Corson-Rikert, and Melissa Haendel
@ontowonka
haendel@ohsu.edu
ICBO 2013
2. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
People and Resources
techniques
training
protocols
affiliation
roles
grants
credentials
genes
anatomy
manufacturer
publications
disease
3. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
CTSAConnect Project
Connecting people and resources
Needs:
Identify potential collaborators, relevant resources, and
expertise across scientific disciplines
Assemble translational teams of scientists to address specific
research questions
Goal is to create a semantic representation of clinician and
basic science researcher expertise to enable:
More effective linking of information about clinicians and basic
science researchers
Computation and publication of clinical expertise data as
Linked Data (LD) for use in other applications
4. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Integrated Semantic Framework
Ontology (ISF) suite
Merge the eagle-i and VIVO ontologies into one single ontology
suite (the ISF)
Extend their coverage to include representation of clinical
encounter
Modularize the ISF such that it can be made available in a set of
files that can be reused independently
eagle-i
Resources
VIVO
People
Coordination
eagle-i
VIVO
Semantic
Clinical
activities
5. ISF Content and modularization
eagle-I
Research resources
VIVO
Person profiling
CTSA ShareCenter
Discussions, requests,
share documents
ISF
Contact Organizations
Affiliations
Services Events
Clinical
Expertise
Reagents
Organisms
Credentials
CTSAconnect
Reveal Connections. Realize Potential.
6. 7/8/2013 6www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Original Ontologies
eagle-i resource ontology VIVO ontology
BFO as upper Ontology No upper Ontology
Has OBO Foundry principles as
guiding design principles
Adopts ontologies already in wide
use across the Linked Data
community such as FOAF and BIBO
Aimed at driving an application as
well as develop an interoperable
core domain ontology
Aimed mostly at supporting data
validation and data entry through
the VIVO application and to
produce Linked data
Active application and ontology
development and live data
Active application and ontology
development and live data
Somewhat unconventional scenario: Usually creating ontologies from
scratch or reusing existing ontologies without above constraints
7. 7/8/2013 7www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
A first approach
Goals:
Identify overlapping and duplicated entities in the eagle-i and VIVO ontologies
Avoid severe disruptions in application compatibility
Minor incremental additions to the ISF and push significant changes back to the
source ontologies
Good for:
Referencing existing entities while developing new ISF-specific modules
Performing initial alignments on classes in some portion of the overlapping
hierarchies
Limits:
Lengthy process of identifying necessary alignments and implementing changes
in the source ontologies
With no disruption to the applications, development was slow and low impact
8. 7/8/2013 8www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
“Current” approach
Implement the refactoring and merging disconnected from application and
data constraints
Impact on the application and data migration assessed after refactoring
Better balance of impact on apps and data migration versus total redesign of approach
Refactoring of source files based on content coverage
9. 7/8/2013 9www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Three examples of merging and
refactoring
Merging two different design approaches (Person and
Contacts) using existing standard (Vcard)
Tackle an open design/representation issue proposing
a new design pattern (position of a person over time)
Reference/incorporation of external vocabularies or
taxonomies
17. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Annotation view with approved or pending approval.
Module view shows pending axiom changes per module and has ability to save the
changes with a log comment, and generate the spreadsheet summary
Protégé refactoring plugin
18. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
So what?
Now that eagle-i and VIVO are “on the same page,” future
development can leverage better consensus and ontologically
rigorous solutions
CTSAs have a new research profiling data standard for exchange
Applications such as Vivo, eagle-i, LOKI, Profiles, SciVal, and
ScienCV are working on generating ISF compliant data
We can profile people based on a much larger diversity of their
activities and products of research
There is still a lot of work to do – this was a short term project and
ISF could be better generalized for other use cases
19. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Team
CTSA 10-001: 100928SB23
PROJECT #: 00921-0001
OHSU:
Melissa Haendel, Carlo
Torniai, Nicole
Vasilevsky, Shahim
Essaid, Eric Orwoll
Cornell University:
Jon Corson-Rikert, Dean
Krafft, Brian Lowe
University of Florida:
Mike Conlon, Chris
Barnes, Nicholas Rejack
Stony Brook University:
Moises Eisenberg, Erich
Bremer, Janos Hajagos
Harvard University:
Daniela Bourges-
Waldegg
Sophia Cheng
Share Center:
Chris Kelleher, Will
Corbett, Ranjit
Das, Ben Sharma
University at Buffalo:
Barry Smith, Dagobert
Soergel
CTSAconnect project
ctsaconnect.org
CTSAconnect ontology
sourcehttp://code.google.com/p/connect
Resources
Hinweis der Redaktion
The process of integrating the eagle-i and VIVO ontologies, refactoring them, and modularizing the ISF posed a set of interesting challenges and constraints
Trade off between content coverage aggregation and pattern-driven (for example, certain types of axioms in one place, imports, etc.)For instance, profile module that needs to be generic.Vocabulary as information model: Person and social security. The axioms that a person has social security is not an axiom that exists in “dictionary”. Informational vs definitional axioms. Informational axioms are about a subset of the entity – E.g. People are not defined by their social security number.
Using Vcard to bring together two different representations. Both eagle-I and VIVO had representations of contact information but most of it was done with data properties and string values, some of which were not structures. The move to the new vcard/foaf representation imposes more structure and requires a lot more use of classes and object properties. The data migration is not yet done for the applications.ISF now includes the general idea of contact, which can take multiple forms. An agent can have a contact that can be a FOAF profile (more web based) while the VCArd is more standard. The Vcard standard is a well established IETF networking standard for exchanging contact related information and the FOAF vocabulary is a commonly used RDF vocabulary to represent contact like information that is more focused on web presence rather than physical addresses and communication as in the Vcard case. The ISF adopts both Vcard and FOAF. Vcard had an existing RDF mapping at the beginning of the project but recently the W3C published a new RDF mapping for version 4 of vcard. The RDF mapping is still in draft status but we are moving to the new RDF mapping for the final release of the ISF.
How we translated the general problem to classes.Relationship as SDCThis slide shows the classes that are instantiated to represent the situation. The top “reified” relationship class is the “Relationship” class and it is instantiated to capture the “mentoring” relationship. The “Relationship” class is a continuant and is currently asserted to be a SDC.The problem query: all the students at OHSU during 2011 -> Mentor assigned to a person. It is not a process. It’s a reified relationship.We had many use-cases where there was a need for representing static relationships between continuants (without implying ongoing processes) over time, at a specific place, asserted by a specific agent, etc. and for this to be possible, we had to reify the relationships as classes and instances (as opposed to the binary RDF properties). This allows us to represent a situation shown in the slide. We can capture the relationship independently of any processes based on the relationship.
Here we show how relationship is used to represent position. The position is created to the same managing agent. The student position relates the student to a time representation (VIVO time: instance of interval and instances of the values.Dotted relationships are kind for shortcut relationshipCan be used for credentials, positions, grants, etc.This slide shows an extension of “Relationship” with a “Position” subclass that is more specific. This was needed to be more specific about organizational or social positions.The diagram also shows that we can:Attach time ranges and values to the position instanceWe can be more specific by relating the roles (as opposed to anything in general) to a position to be more explicit about which roles are participating in the relationship. This is similar to the modeling of processes where things participate through their roles, and it is similar to general modeling principles (such as in UML based models) where an “association” stands for a relationship between objects in specific roles. We initially modeled this specialization of “Relationship” as a subtype “Association” but decided later to leave it undefined until clear use-cases come up.There is also a special “assigns” object property that allows us to capture if a relationship was created/asserted by some agent to distinguish this related thing as a “controlling” or “acting” agent that brings the relationship instance into existence. Other examples include credentials (by a credentialing agent), citations (by a law enforcing agent), etc. Any agent/intention/fact based existence of some relationship can be modeled under this class.
This shows the use-cases for URIs that don’t fall under the typical OWL class/individual modeling of data. There is a need for an agreed on set of codes, concepts, types, etc. of things in addition to classes and individuals. It is also just another perspective on the domain where there is frequently a need to talk about a whole set (an OWL class) as if it is a single primitive thing (an instance) and SKOS is a formalization of this idea.
Here we have added the punning (if needed)This diagram shows:That we make a distinction between the “ontology” on the left side and the “vocabulary” on the right.This distinction doesn’t mean that the set of URIs on both sides are disjoint. Certain URIs might exist as classes in the ontology and as individuals in the vocabulary.This is the punning, the same URI has two different type assertions (class type vs. individual type)The “PhD degree” is an individual that can be referenced in a “position” instance to indicate that the position is related to PhD degrees in some way but it doesn’t imply that there is a specific instance of a PhD degree that belong to some agent related by the position. If an agent later obtains an actual instance of a PhD degree, a new URI will be created and asserted to be an instance of the “PhD degree” class from the ontology (the punning of the “PhD degree” URI).
Here ICD example:Concept scheme class means the vocabulary (Mesh or ICD9) and the SKOS concept The concept ICD (327.3 exists in ICD9 scheme).Now the notation (which is an actual datatype such umls-aui) and the value of that datatype. The concept ICD0 is coded with the code SKOS give you some object property to related concept. The closeMatch,exactMatchWhen same AUI or CUI exist we have exact matchLui Sui CUI AUI*UIThe idea is using SKOS:exactMatch or closeMatch, we can walk between ontologies and still relate back to ISF
Increasing the complexity of the ontology merging process created more impetus to keep track of changes and document and validate them. To this end, we developed a Protégé plugin that better supports this new process.When we were in the stage of being very detailed we wanted to mark axioms for each classes if they were migrated or not.Yellow was reviewed, green was complete with axiom migration