The document discusses representing processes and their temporal sequence in ontologies. It describes how processes consist of sub-processes and provides examples like eating and transcription. It also discusses using OWL to define the composition and ordering of process parts, as well as querying ontologies about process sequences and properties.
Powerful Google developer tools for immediate impact! (2023-24 C)
Representing sequences of parts in processes using OWL
1. Representing the sequence of parts of processes using OWL Janna Hastings , Samy Deghou, Christoph Steinbeck EBI Cheminformatics and Metabolism Stefan Schulz Medical University of Graz, Austria Deep Knowledge Representation Challenge Workshop, Banff, Alberta, Canada 26 June 2011
2. Processes happen in time 30.06.11 Processes and their parts – DKRC, Banff, Alberta GO ‘biological process’ ontology > 17 000 terms Mary birth Mary’s life childhood adulthood
3. Processes consist of sub-processes 30.06.11 Processes and their parts – DKRC, Banff, Alberta EATING biting chewing swallowing digesting temporal sequence biochemical pathways
8. 30.06.11 Processes and their parts – DKRC, Banff, Alberta process transcription transcription initiation transcription elongation transcription termination
10. Composition and ordering of process parts 30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription transcription initiation transcription elongation transcription termination precededBy precededBy eukaryotic transcription rateOfTranscription = 60 processualPartOf
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23. Sequence of components of an emotion? 30.06.11 Processes and their parts – DKRC, Banff, Alberta The Emotion Ontology, Hastings et al., ICBO 2011
Processes are those sorts of things that necessarily involve temporal extent. They *happen*, rather than *existing*. Examples are the development of organisms over time; an organism’s life; specific phases in life such as pregnancy; and biological processes such as transcription. Processes have a special dependency relationship to their participants, but are not the same thing. The Gene Ontology ‘biological process’ ontology consists of > 17 000 matches (by far the largest portion of the GO overall)
Many or most of the interesting processes described in biology consist of sub-processes which form a part of the overall process. The sub-processes are usually ordered with respect to time. They may be repeated in the same sequence. Processes are often illustrated diagrammatically such as the familiar biochemical pathway diagrams. For our purposes in this paper we will however ignore the complications posed to representation by the presence of cycles in such pathway illustrations, since we are primarily interested here in the classification based on straightforwardly linear sequences of parts of processes. Representation of temporal sequences of process parts is not very highly represented in bio-ontologies.
Note that developmental anatomy ontologies have a rather more complicated and necessary relationship to temporal sequences than do straightforward process hierarchies, since in developmental anatomy ontologies, the physical entity being described doesn’t exist in certain parts of the temporal hierarchy.
Initiation begins with an RNA polymerase enzyme binding to a region on a DNA double strand, which depends on the existence of the right pre-conditions. First, the promoter sequence of the region to be transcribed needs to be accessible. Then, relevant proteins called transcription factors need to recognise the specific promoter. When the specific transcription factors are bound to the promoter, the RNA polymerase can moor. This forms the transcription initiation complex. Elongation can be summarized in the following series of sub-processes: a. RNA nucleotide monomers are paired with complementary DNA bases and added to the 3' end of the new RNA macromolecule being synthesized. A sugar-phosphate backbone forms with assistance from RNA polymerase. While unwinding the double strand, 10 to 20 nucleotides are available to the enzyme in order to proceed to the base pairing and to the proper elongation of the RNA. b. The rate of polymerisation is about 60 nucleotides per second in eukaryotes. Furthermore, multiple molecules of RNA polymerase can simultaneously transcribe the same DNA strand, following each other like a truck convoy. c. If the cell has a nucleus, the RNA is further processed (addition of a 3' poly-A tail and a 5' cap) and exits through to the cytoplasm through the nuclear pore complex. In eukaryotic cells, when the polymerase encounters the termination signal (a specific sequence on the DNA), it continues transcribing for hundreds of nucleotides past the termination signal, but at a point about 10 to 35 nucleotides past the signal (AAUAAA sequence in the pre-RNA), the mRNA is cut free from the enzyme. Subsequently, if the cell has a nucleus, the mRNA is further processed by the addition of a 3' poly-A tail and a 5' cap, and exits through to the cytoplasm through the nuclear pore complex. By contrast, in prokaryotes, transcription stops right at the end of the termination signal and the RNA and DNA are released.
We give a brief sketch of the more straightforward aspects of our model, before going into detail on the more problematic areas in the next section. Firstly, we model the biological entities which are described above as material entities: DNA, mRNA, cell and cell nucleus, etc, which are inherited from BioTop. The various macromolecular complexes involved in transcription are included as well, such as the TranscriptionInitiationComplex.
The transcription process is modelled together with its parts (sub-processes), i.e. initiation, elongation and termination, using the transitive precededBy relation to indicate the temporal sequence of process parts, as follows:
Transcription subClassOf Process and (hasProcessualPart some TranscriptionInitiation) and (hasProcessualPart some TranscriptionElongation) and (hasProcessualPart some TranscriptionTermination) TranscriptionInitiation subClassOf (Process and processualPartOf some Transcription) TranscriptionElongation subClassOf (Process and processualPartOf some Transcription) TranscriptionTermination subClassOf (Process and processualPartOf some Transcription) TranscriptionElongation subclassOf precededBy some TranscriptionInitiation TranscriptionTermination subclassOf precededBy some TranscriptionElongation EukaryoticTranscription subClassOf transcription EukaryoticTranscription subClassOf rateOfTranscription value "60"^^int
The first question could be addressed using SPARQL-DL querying [6], in which ontology information is collapsed into a graph and can be queried in a similar fashion to RDF data with SPARQL . However, the query which retrieved the sequence of sub-processes would have to make assumptions about the maximum number of possible sub-processes, which is not very intuitive.
The first question could be addressed using SPARQL-DL querying [6], in which ontology information is collapsed into a graph and can be queried in a similar fashion to RDF data with SPARQL . However, the query which retrieved the sequence of sub-processes would have to make assumptions about the maximum number of possible sub-processes, which is not very intuitive.
The second question can potentially be answered if it is reformulated as How many RNA polymerases can bind a promoter? – and the relevant cardinality restriction is captured somewhere in the ontology. Additional logic is needed to translate the query Can multiple to decide yes or no based on whether the answer to the How many question is greater than one. The third question is beyond the scope of what OWL knowledge bases can cope with – complex processing is required to formulate the relevant mathematical expression and test the solution based on the rate of transcription which is modelled in the ontology.
We would also like the ontology to be able to perform correct instance classification. In particular, we can try to classify completed transcription processes, in which the various sub-processes have executed in the sequence specified. This is more complex than the above query answering. To see this, we create the following instances in the ontology, each of which aside from the first represents different negative examples (instances we would not expect to see classified): This is a form of error detection as well as
transcription1: contains subprocesses initiation1, elongation1, and termination1. elongation1 precededBy initiation1, and termination1 precededBy elongation1. transcription2: contains subprocesses initiation2 and elongation2, but no other subprocesses.
transcription3: contains subprocesses initiation3, elongation3, and termination3, but they are in the incorrect sequence (initiation precededBy termination). transcription4: contains subprocesses initiation4, elongation4 and termination4, but relate them to the subprocesses of the previous instance: elongation4 precededBy initiation3, and termination4 precededBy termination3.
transcription5: contains two different copies of each of the subprocesses initiation, elongation and termination.
With this definition, executing HermiT for classification, we obtain the following instance members: transcription1, -3, -4 and -5. (i.e. only transcription2 failed to be classified as an instance). Clearly we must do better. our next attempt uses an exact cardinality constraint to strengthen the requirement:
However, reasoning with this definition finds no instances as members , indicating that the cardinality constraint is not met. This may be due to the open world assumption underlying OWL reasoning: although we have stated that our instances have only one of the relevant sub-processes as parts, nevertheless in all possible models nothing prevents additional sub-process parts being included.
With another attempt to address some of the issues in correct classification, we can try to enforce that the sub-processes forming the sequence are all part of the same overall transcription process. We can attempt this using the special Self keyword in OWL. The definition would then look like: Sadly, we found that actually reasoning with this construction – although it was syntactically accepted – did not yield the desired result (due to the reasoner implementation), and no instances were classified.
Is the precededBy relation transitive? If ( elongation precededBy initiation ) and ( termination precededBy elongation ) => termination precededBy initiation But, in one organism, it may be the case that two transcription processes are temporally ordered such that transcription2 precededBy transcription1 and then we have initiation2 precededBy termination1