SlideShare a Scribd company logo
1 of 26
Download to read offline
Handling markup
                     overlaps using OWL
                                         Angelo Di Iorio (diiorio@cs.unibo.it)
                                  Silvio Peroni (speroni@cs.unibo.it)
                                              Fabio Vitali (fabio@cs.unibo.it)




http://creativecommons.org/licenses/by-sa/3.0
Summary




• Overlapping markup in everyday life
• EARMARK: an OWL-based meta-markup language
• Conclusions and future works
Overlapping markup... wait, what?

•   A definition: overlapping markup “describes cases where some markup
    structures do not nest neatly into others”
    DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal,
    Canada.
    <body>
      <p>Some <em>very</p>
      <p>interesting</em> text</p>
    </body>


•   Different techniques to embed overlap in XML hierarchies, for instance:
Overlapping markup... wait, what?

•   A definition: overlapping markup “describes cases where some markup
    structures do not nest neatly into others”
    DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal,
    Canada.
    <body>
      <p>Some <em>very</p>
      <p>interesting</em> text</p>
    </body>


•   Different techniques to embed overlap in XML hierarchies, for instance:
    ✦   milestones – expressed through empty elements to mark the boundaries of the content
        <body>
          <p>Some <em start=”id1”/>very</p>
          <p>interesting<em end=”id1”/> text</p>
        </body>
Overlapping markup... wait, what?

•   A definition: overlapping markup “describes cases where some markup
    structures do not nest neatly into others”
    DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal,
    Canada.
    <body>
      <p>Some <em>very</p>
      <p>interesting</em> text</p>
    </body>


•   Different techniques to embed overlap in XML hierarchies, for instance:
    ✦   milestones – expressed through empty elements to mark the boundaries of the content
        <body>
          <p>Some <em start=”id1”/>very</p>
          <p>interesting<em end=”id1”/> text</p>
        </body>
    ✦   fragmentation – expressed by two non-overlapping elements linked through id-idref pairs
        <body>
          <p>Some <em id=”em1” next=”em2”>very</em></p>
          <p><em id=”em2”>interesting</em> text</p>
        </body>
Overlapping everywhere

•   Where we can find it: word processor formats + change tracking (e.g., ODT)
<office:text>
   <text:changed-region text:id="S1">
       <text:insertion>
          <office:change-info>
              <dc:creator>John Smith</dc:creator>
              <dc:date>2009-10-27T18:45:00</dc:date>
          </office:change-info>
       </text:insertion>
   </text:changed-region>
   <text:p>
       The beginning and
       <text:change-start text:change-id="S1"/>
   </text:p>
   <text:p>
       also
       <text:change-end text:change-id="S1"/>
       the end.
   </text:p>
</office:text>

              What the document is
Overlapping everywhere

•   Where we can find it: word processor formats + change tracking (e.g., ODT)
<office:text>
   <text:changed-region text:id="S1">
       <text:insertion>
          <office:change-info>
              <dc:creator>John Smith</dc:creator>
              <dc:date>2009-10-27T18:45:00</dc:date>            What the document
          </office:change-info>                                    represents
       </text:insertion>
   </text:changed-region>
   <text:p>                                            office:text
       The beginning and
       <text:change-start text:change-id="S1"/>           text:p
   </text:p>
   <text:p>                                                                      before
       also                                     The beginning and the end.
       <text:change-end text:change-id="S1"/>                              2009-10-27T18:45:00
       the end.
   </text:p>
</office:text>

              What the document is
Overlapping everywhere

•   Where we can find it: word processor formats + change tracking (e.g., ODT)
<office:text>
   <text:changed-region text:id="S1">
       <text:insertion>
          <office:change-info>
              <dc:creator>John Smith</dc:creator>
              <dc:date>2009-10-27T18:45:00</dc:date>            What the document
          </office:change-info>                                    represents
       </text:insertion>
   </text:changed-region>
   <text:p>                                             office:text
       The beginning and
       <text:change-start text:change-id="S1"/>            text:p
   </text:p>
   <text:p>                                                                      before
       also                                     The beginning and the end.
       <text:change-end text:change-id="S1"/>                              2009-10-27T18:45:00
       the end.                                             also
   </text:p>                                                                      after
</office:text>
                                                    text:p        text:p
              What the document is
                                                            office:text
Overlapping everywhere

•   Where we can find it: word processor formats + change tracking (e.g., ODT)
<office:text>
   <text:changed-region text:id="S1">
       <text:insertion>
          <office:change-info>
              <dc:creator>John Smith</dc:creator>
              <dc:date>2009-10-27T18:45:00</dc:date>            What the document
          </office:change-info>                                    represents
       </text:insertion>
   </text:changed-region>
   <text:p>                                             office:text
       The beginning and
       <text:change-start text:change-id="S1"/>            text:p
   </text:p>
   <text:p>                                                                      before
       also                                     The beginning and the end.
       <text:change-end text:change-id="S1"/>                              2009-10-27T18:45:00
       the end.                                             also
   </text:p>                                                                      after
</office:text>
                                                    text:p        text:p
              What the document is                                         inserted by John Smith
                                                            office:text
•   EARMARK is a vocabulary that defines a meta-markup language by means of OWL
    ontologies – http://www.essepuntato.it/2008/12/earmark

•   It is more expressive than XML
                                                  XML              EARMARK
           Data structure                   Tree                       DAG
             Overlapping             Only by using tricks Of course, it is a feature here
              Semantics                    What?                Yes, it is OWL!
•    Three disjoint base classes:
    ✦ Docuverse – it represents the textual content of a document

        Subclasses: StringDocuverse, URIDocuverse
    ✦   Range – it describes any text lying between two locations
        Subclasses: PointerRange, XPathRange, XPathPointerRange
    ✦   MarkupItem – a collection of individuals belonging to the classes MarkupItem and
        Range
        Subclasses: Element, Attribute, Comment
An example




The beginning and the end.
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .


                        :aDoc a earmark:StringDocuverse
                          ; earmark:hasContent “The beginning and the end.”^^xsd:string .




The beginning and the end.
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .


                        :aDoc a earmark:StringDocuverse
                          ; earmark:hasContent “The beginning and the end.”^^xsd:string .




The beginning and the end.
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .


                        :aDoc a earmark:StringDocuverse
                          ; earmark:hasContent “The beginning and the end.”^^xsd:string .

                                   :r2   a earmark:PointerRange
                                     ;   earmark:refersTo :aDoc
The beginning and the end.           ;   earmark:begins “14”^^xsd:nonNegativeInteger
                                     ;   earmark:ends “26”^^xsd:nonNegativeInteger .
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .



       office:text            :aDoc a earmark:StringDocuverse
                                ; earmark:hasContent “The beginning and the end.”^^xsd:string .

            text:p                       :r2   a earmark:PointerRange
                                           ;   earmark:refersTo :aDoc
The beginning and the end.                 ;   earmark:begins “14”^^xsd:nonNegativeInteger
                                           ;   earmark:ends “26”^^xsd:nonNegativeInteger .
            also


   text:p            text:p


       office:text
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .
@prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .


       office:text            :aDoc a earmark:StringDocuverse
                                ; earmark:hasContent “The beginning and the end.”^^xsd:string .

            text:p                       :r2   a earmark:PointerRange
                                           ;   earmark:refersTo :aDoc
The beginning and the end.                 ;   earmark:begins “14”^^xsd:nonNegativeInteger
                                           ;   earmark:ends “26”^^xsd:nonNegativeInteger .
            also
                                          :aMarkupItem a earmark:Element
                                            ; earmark:hasGeneralIdentifier “p”
   text:p            text:p                 ; earmark:hasNamespace
                                              “urn:oasis:names:tc:opendocument:xmlns:text:1.0”
                                            ; c:firstItem :item1
       office:text
                                            ; c:lastItem :item2 .

                                          :item1 c:itemContent :r1
                                            ; c:nextItem :item2 .

                                          :item2 c:itemContent :r2 .
An example
@prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix : <http://www.example.com/> .
@prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .


       office:text       :aDoc a earmark:StringDocuverse
                           ; earmark:hasContent “The beginning and the end.”^^xsd:string .

          text:p                      :r2   a earmark:PointerRange
                                        ;   earmark:refersTo :aDoc
The beginning and the end.              ;   earmark:begins “14”^^xsd:nonNegativeInteger
                                        ;   earmark:ends “26”^^xsd:nonNegativeInteger .
          also
                                            :aMarkupItem a earmark:Element
                                              ; earmark:hasGeneralIdentifier “p”
   text:p       text:p                        ; earmark:hasNamespace
                                                “urn:oasis:names:tc:opendocument:xmlns:text:1.0”
                      inserted by John Smith ; c:firstItem :item1
       office:text
                                              ; c:lastItem :item2 .

                                       :item1 c:itemContent :r1
                                         ; c:nextItem :item2 .

                                       :item2 c:itemContent :r2 .
An example
@prefix    earmark: <http://www.essepuntato.it/2008/12/earmark#> .
@prefix    : <http://www.example.com/> .
@prefix    c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> .
@prefix    dc: <http://purl.org/dc/elements/1.1/> .
          office:text      :aDoc a earmark:StringDocuverse
                             ; earmark:hasContent “The beginning and the end.”^^xsd:string .

            text:p                       :r2   a earmark:PointerRange
                                           ;   earmark:refersTo :aDoc
The beginning and the end.                 ;   earmark:begins “14”^^xsd:nonNegativeInteger
                                           ;   earmark:ends “26”^^xsd:nonNegativeInteger .
             also
                                            :aMarkupItem a earmark:Element
                                              ; earmark:hasGeneralIdentifier “p”
   text:p       text:p                        ; earmark:hasNamespace
                                                “urn:oasis:names:tc:opendocument:xmlns:text:1.0”
                      inserted by John Smith ; c:firstItem :item1
       office:text
                                              ; c:lastItem :item2 .

                                         :item1 c:itemContent :r1
                                           ; c:nextItem :item2 .

                                         :item2 c:itemContent :r2 .

                    :p2 a Insertion ; dc:creator “John Smith”
                      ; dc:date “2009-10-27T18:45:00”^^xsd:dateTime .
EARMARK Data Structure

• It is an API and a Java library that allows to easily create and
   modify EARMARK document within Java applications

• Open Source project: http://earmark.sourceforge.net
EARMARKDocument ed = new EARMARKDocument(new URI("http://www.example.com"));

Docuverse aDoc =
  ed.createStringDocuverse("The beginning and the end.");

[...]

Range aRange = ed.createPointerRange(aDoc, 14, 26);

[...]

Element aMarkupItem =
  ed.createElement("p", "urn:oasis:names:tc:opendocument:xmlns:text:1.0",
    Collection.Type.List);
ed.appendChild(anotherMarkupItem);

[...]
Semantic Web technologies as added value

•   Because every EARMARK document is expressed as proper ABox of an ontology,
    we can use Semantic Web technologies:
    ✦   to manipulate documents
    ✦   to query them
    ✦   to infer new assertions
    ✦   to check some integrity constraints on document structure and on content semantics

•   In EARMARK, those technologies can be very helpful in solving issues that are
    difficult to solve or are not solvable at all by using XML tools

•   An example: “get all the text fragments inserted by John Smith”
Semantic Web technologies as added value

•   Because every EARMARK document is expressed as proper ABox of an ontology,
    we can use Semantic Web technologies:
    ✦   to manipulate documents
    ✦   to query them
    ✦   to infer new assertions
    ✦   to check some integrity constraints on document structure and on content semantics

•   In EARMARK, those technologies can be very helpful in solving issues that are
    difficult to solve or are not solvable at all by using XML tools

•   An example: “get all the text fragments inserted by John Smith”
    ✦   XPath
        for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] |
        @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding-
        sibling::text:change-start[1][@text:change-id = $id] and following-
        sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed-
        region/@text:id = $id]
Semantic Web technologies as added value

•   Because every EARMARK document is expressed as proper ABox of an ontology,
    we can use Semantic Web technologies:
    ✦   to manipulate documents
    ✦   to query them
    ✦   to infer new assertions
    ✦   to check some integrity constraints on document structure and on content semantics

•   In EARMARK, those technologies can be very helpful in solving issues that are
    difficult to solve or are not solvable at all by using XML tools

•   An example: “get all the text fragments inserted by John Smith”
    ✦   XPath
        for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] |
        @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding-
        sibling::text:change-start[1][@text:change-id = $id] and following-
        sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed-
        region/@text:id = $id]
    ✦   SPARQL
        SELECT ?r WHERE { ?r a earmark:Range , Insertion ; dc:creator "John Smith" . }
Conclusions and
                         future works

• We presented a new meta-markup language called EARMARK,
  defined by means of OWL ontologies, that allows to make very
  complex markup documents

• We applied it in a real-case scenario (ODT format with change
  tracking) showing how it allows to handle, manipulate and query
  complex documents in a better way (than XML does)

• Future works about this topic include:
  ✦   Rocco and Fretta are two on-going projects that allow transformations from
      XML documents (with overlapping markup specified by using tricks) to
      EARMARK documents, and vice versa
  ✦   a formalism to specify explicitly semantics of markup and of textual content
  ✦   a word processor that allows to define EARMARK documents in a very
      simple way, with the possibility to add any kind of semantic assertions to any
      entity of the document (both markup items and textual content)
Thanks for your attention
I think it’s time for questions :-)
Late time example:
        A more complex ODT document...
<office:text>
   <text:changed-region text:id="S2">
   ! <text:deletion><office:change-info>
   ! ! ! <dc:creator>Silvio Peroni</dc:creator>
   ! ! ! <dc:date>2009-10-27T18:45:00</dc:date>
! ! </office:change-info><text:p>.</text:p></text:deletion>
   ! <text:insertion>
   ! ! <office:change-info office:chg-author="Angelo Di Iorio"
   ! ! ! office:chg-date-time="2009-10-27T18:42:00"/>
   ! </text:insertion>
   </text:changed-region>
   <text:changed-region text:id="A2">
   ! <text:insertion><office:change-info>
   ! ! ! <dc:creator>Angelo Di Iorio</dc:creator>
   ! ! ! <dc:date>2009-10-27T18:42:00</dc:date>
! ! </office:change-info></text:insertion>
   </text:changed-region>
   [...]
   <text:p>This is one paragraph<text:change-start text:change-id="S1"/>;
   ! actually, it was!<text:change-end text:change-id="S1"/>
   ! <text:change text:change-id="S2"/>
       <text:change-start text:change-id="A2"/></text:p>
   <text:p><text:change-end text:change-id="A2"/>
   ! <text:change text:change-id="A3"/><text:change-start text:change-id="A4"/>S
   ! <text:change-end text:change-id="A4"/>plit in two.</text:p>
</office:text>
... and its representation in EARMARK
TIME                              docuverses         ranges markup items assertions

                                                     r6      p      text
                                                                           a text:insertion ;
                                                                           dc:creator “Silvio Peroni”
                               ; actually, it was!                         dc:date “2009-10-27T18:45:00”
                                                                           a text:deletion ;
                                                                           dc:creator “Silvio Peroni”
                                                                           dc:date “2009-10-27T18:45:00”
                                                     r4      p
                                                                    text
                                                     r5
                                                                           a text:insertion ;
                                               .S            p             dc:creator “Angelo Di Iorio”
                                                                           dc:date “2009-10-27T18:42:00”
                                                                           a text:deletion ;
                                                     r1                    dc:creator “Angelo Di Iorio”
                                                                           dc:date “2009-10-27T18:42:00”
                                                     r2      p      text

                                                     r3                             Legend      string in the range
                                                                                   docuverse begin               end
This is one paragraph that will be split in two.                                   content location              location

More Related Content

Similar to Handling Markup Overlaps Using OWL

Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Jack Bowers
 
Beginner workshop to angularjs presentation at Google
Beginner workshop to angularjs presentation at GoogleBeginner workshop to angularjs presentation at Google
Beginner workshop to angularjs presentation at GoogleAri Lerner
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Modelchomas kandar
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Modelchomas kandar
 
Decoding and developing the online finding aid
Decoding and developing the online finding aidDecoding and developing the online finding aid
Decoding and developing the online finding aidkgerber
 
REST and AJAX Reconciled
REST and AJAX ReconciledREST and AJAX Reconciled
REST and AJAX ReconciledLars Trieloff
 
Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)Lucas Jellema
 
Introduction to terrastore
Introduction to terrastoreIntroduction to terrastore
Introduction to terrastoresvjson
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)Pat Patterson
 
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)Understanding and Developing Web Services - For DBAs and Developers (whitepaper)
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)Revelation Technologies
 

Similar to Handling Markup Overlaps Using OWL (20)

Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
 
Xml
XmlXml
Xml
 
Beginner workshop to angularjs presentation at Google
Beginner workshop to angularjs presentation at GoogleBeginner workshop to angularjs presentation at Google
Beginner workshop to angularjs presentation at Google
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
HTML-INTRO.pptx
HTML-INTRO.pptxHTML-INTRO.pptx
HTML-INTRO.pptx
 
Web page concept Basic
Web page concept  BasicWeb page concept  Basic
Web page concept Basic
 
Web page concept final ppt
Web page concept  final pptWeb page concept  final ppt
Web page concept final ppt
 
Decoding and developing the online finding aid
Decoding and developing the online finding aidDecoding and developing the online finding aid
Decoding and developing the online finding aid
 
Xml Lecture Notes
Xml Lecture NotesXml Lecture Notes
Xml Lecture Notes
 
REST and AJAX Reconciled
REST and AJAX ReconciledREST and AJAX Reconciled
REST and AJAX Reconciled
 
Understanding XML DOM
Understanding XML DOMUnderstanding XML DOM
Understanding XML DOM
 
Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)Reaching Out From PL/SQL (OPP 2010)
Reaching Out From PL/SQL (OPP 2010)
 
Introduction to terrastore
Introduction to terrastoreIntroduction to terrastore
Introduction to terrastore
 
Unit 5 xml (1)
Unit 5   xml (1)Unit 5   xml (1)
Unit 5 xml (1)
 
Microservices in Clojure
Microservices in ClojureMicroservices in Clojure
Microservices in Clojure
 
Ravi software faculty
Ravi software facultyRavi software faculty
Ravi software faculty
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)Understanding and Developing Web Services - For DBAs and Developers (whitepaper)
Understanding and Developing Web Services - For DBAs and Developers (whitepaper)
 

More from University of Bologna

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusUniversity of Bologna
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...University of Bologna
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 

More from University of Bologna (14)

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Dealing with Markup Semantics
Dealing with Markup SemanticsDealing with Markup Semantics
Dealing with Markup Semantics
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Handling Markup Overlaps Using OWL

  • 1. Handling markup overlaps using OWL Angelo Di Iorio (diiorio@cs.unibo.it) Silvio Peroni (speroni@cs.unibo.it) Fabio Vitali (fabio@cs.unibo.it) http://creativecommons.org/licenses/by-sa/3.0
  • 2. Summary • Overlapping markup in everyday life • EARMARK: an OWL-based meta-markup language • Conclusions and future works
  • 3. Overlapping markup... wait, what? • A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others” DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada. <body> <p>Some <em>very</p> <p>interesting</em> text</p> </body> • Different techniques to embed overlap in XML hierarchies, for instance:
  • 4. Overlapping markup... wait, what? • A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others” DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada. <body> <p>Some <em>very</p> <p>interesting</em> text</p> </body> • Different techniques to embed overlap in XML hierarchies, for instance: ✦ milestones – expressed through empty elements to mark the boundaries of the content <body> <p>Some <em start=”id1”/>very</p> <p>interesting<em end=”id1”/> text</p> </body>
  • 5. Overlapping markup... wait, what? • A definition: overlapping markup “describes cases where some markup structures do not nest neatly into others” DeRose, S. (2004). Markup Overlap: A Review and a Horse. In Proceedings of Extreme Markup Languages 2004. Montreal, Canada. <body> <p>Some <em>very</p> <p>interesting</em> text</p> </body> • Different techniques to embed overlap in XML hierarchies, for instance: ✦ milestones – expressed through empty elements to mark the boundaries of the content <body> <p>Some <em start=”id1”/>very</p> <p>interesting<em end=”id1”/> text</p> </body> ✦ fragmentation – expressed by two non-overlapping elements linked through id-idref pairs <body> <p>Some <em id=”em1” next=”em2”>very</em></p> <p><em id=”em2”>interesting</em> text</p> </body>
  • 6. Overlapping everywhere • Where we can find it: word processor formats + change tracking (e.g., ODT) <office:text> <text:changed-region text:id="S1"> <text:insertion> <office:change-info> <dc:creator>John Smith</dc:creator> <dc:date>2009-10-27T18:45:00</dc:date> </office:change-info> </text:insertion> </text:changed-region> <text:p> The beginning and <text:change-start text:change-id="S1"/> </text:p> <text:p> also <text:change-end text:change-id="S1"/> the end. </text:p> </office:text> What the document is
  • 7. Overlapping everywhere • Where we can find it: word processor formats + change tracking (e.g., ODT) <office:text> <text:changed-region text:id="S1"> <text:insertion> <office:change-info> <dc:creator>John Smith</dc:creator> <dc:date>2009-10-27T18:45:00</dc:date> What the document </office:change-info> represents </text:insertion> </text:changed-region> <text:p> office:text The beginning and <text:change-start text:change-id="S1"/> text:p </text:p> <text:p> before also The beginning and the end. <text:change-end text:change-id="S1"/> 2009-10-27T18:45:00 the end. </text:p> </office:text> What the document is
  • 8. Overlapping everywhere • Where we can find it: word processor formats + change tracking (e.g., ODT) <office:text> <text:changed-region text:id="S1"> <text:insertion> <office:change-info> <dc:creator>John Smith</dc:creator> <dc:date>2009-10-27T18:45:00</dc:date> What the document </office:change-info> represents </text:insertion> </text:changed-region> <text:p> office:text The beginning and <text:change-start text:change-id="S1"/> text:p </text:p> <text:p> before also The beginning and the end. <text:change-end text:change-id="S1"/> 2009-10-27T18:45:00 the end. also </text:p> after </office:text> text:p text:p What the document is office:text
  • 9. Overlapping everywhere • Where we can find it: word processor formats + change tracking (e.g., ODT) <office:text> <text:changed-region text:id="S1"> <text:insertion> <office:change-info> <dc:creator>John Smith</dc:creator> <dc:date>2009-10-27T18:45:00</dc:date> What the document </office:change-info> represents </text:insertion> </text:changed-region> <text:p> office:text The beginning and <text:change-start text:change-id="S1"/> text:p </text:p> <text:p> before also The beginning and the end. <text:change-end text:change-id="S1"/> 2009-10-27T18:45:00 the end. also </text:p> after </office:text> text:p text:p What the document is inserted by John Smith office:text
  • 10. EARMARK is a vocabulary that defines a meta-markup language by means of OWL ontologies – http://www.essepuntato.it/2008/12/earmark • It is more expressive than XML XML EARMARK Data structure Tree DAG Overlapping Only by using tricks Of course, it is a feature here Semantics What? Yes, it is OWL! • Three disjoint base classes: ✦ Docuverse – it represents the textual content of a document Subclasses: StringDocuverse, URIDocuverse ✦ Range – it describes any text lying between two locations Subclasses: PointerRange, XPathRange, XPathPointerRange ✦ MarkupItem – a collection of individuals belonging to the classes MarkupItem and Range Subclasses: Element, Attribute, Comment
  • 11. An example The beginning and the end.
  • 12. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . The beginning and the end.
  • 13. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . The beginning and the end.
  • 14. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . :r2 a earmark:PointerRange ; earmark:refersTo :aDoc The beginning and the end. ; earmark:begins “14”^^xsd:nonNegativeInteger ; earmark:ends “26”^^xsd:nonNegativeInteger .
  • 15. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . office:text :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . text:p :r2 a earmark:PointerRange ; earmark:refersTo :aDoc The beginning and the end. ; earmark:begins “14”^^xsd:nonNegativeInteger ; earmark:ends “26”^^xsd:nonNegativeInteger . also text:p text:p office:text
  • 16. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . @prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> . office:text :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . text:p :r2 a earmark:PointerRange ; earmark:refersTo :aDoc The beginning and the end. ; earmark:begins “14”^^xsd:nonNegativeInteger ; earmark:ends “26”^^xsd:nonNegativeInteger . also :aMarkupItem a earmark:Element ; earmark:hasGeneralIdentifier “p” text:p text:p ; earmark:hasNamespace “urn:oasis:names:tc:opendocument:xmlns:text:1.0” ; c:firstItem :item1 office:text ; c:lastItem :item2 . :item1 c:itemContent :r1 ; c:nextItem :item2 . :item2 c:itemContent :r2 .
  • 17. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . @prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> . office:text :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . text:p :r2 a earmark:PointerRange ; earmark:refersTo :aDoc The beginning and the end. ; earmark:begins “14”^^xsd:nonNegativeInteger ; earmark:ends “26”^^xsd:nonNegativeInteger . also :aMarkupItem a earmark:Element ; earmark:hasGeneralIdentifier “p” text:p text:p ; earmark:hasNamespace “urn:oasis:names:tc:opendocument:xmlns:text:1.0” inserted by John Smith ; c:firstItem :item1 office:text ; c:lastItem :item2 . :item1 c:itemContent :r1 ; c:nextItem :item2 . :item2 c:itemContent :r2 .
  • 18. An example @prefix earmark: <http://www.essepuntato.it/2008/12/earmark#> . @prefix : <http://www.example.com/> . @prefix c: <http://swan.mindinformatics.org/ontologies/1.2/collections/> . @prefix dc: <http://purl.org/dc/elements/1.1/> . office:text :aDoc a earmark:StringDocuverse ; earmark:hasContent “The beginning and the end.”^^xsd:string . text:p :r2 a earmark:PointerRange ; earmark:refersTo :aDoc The beginning and the end. ; earmark:begins “14”^^xsd:nonNegativeInteger ; earmark:ends “26”^^xsd:nonNegativeInteger . also :aMarkupItem a earmark:Element ; earmark:hasGeneralIdentifier “p” text:p text:p ; earmark:hasNamespace “urn:oasis:names:tc:opendocument:xmlns:text:1.0” inserted by John Smith ; c:firstItem :item1 office:text ; c:lastItem :item2 . :item1 c:itemContent :r1 ; c:nextItem :item2 . :item2 c:itemContent :r2 . :p2 a Insertion ; dc:creator “John Smith” ; dc:date “2009-10-27T18:45:00”^^xsd:dateTime .
  • 19. EARMARK Data Structure • It is an API and a Java library that allows to easily create and modify EARMARK document within Java applications • Open Source project: http://earmark.sourceforge.net EARMARKDocument ed = new EARMARKDocument(new URI("http://www.example.com")); Docuverse aDoc = ed.createStringDocuverse("The beginning and the end."); [...] Range aRange = ed.createPointerRange(aDoc, 14, 26); [...] Element aMarkupItem = ed.createElement("p", "urn:oasis:names:tc:opendocument:xmlns:text:1.0", Collection.Type.List); ed.appendChild(anotherMarkupItem); [...]
  • 20. Semantic Web technologies as added value • Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies: ✦ to manipulate documents ✦ to query them ✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics • In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools • An example: “get all the text fragments inserted by John Smith”
  • 21. Semantic Web technologies as added value • Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies: ✦ to manipulate documents ✦ to query them ✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics • In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools • An example: “get all the text fragments inserted by John Smith” ✦ XPath for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] | @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding- sibling::text:change-start[1][@text:change-id = $id] and following- sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed- region/@text:id = $id]
  • 22. Semantic Web technologies as added value • Because every EARMARK document is expressed as proper ABox of an ontology, we can use Semantic Web technologies: ✦ to manipulate documents ✦ to query them ✦ to infer new assertions ✦ to check some integrity constraints on document structure and on content semantics • In EARMARK, those technologies can be very helpful in solving issues that are difficult to solve or are not solvable at all by using XML tools • An example: “get all the text fragments inserted by John Smith” ✦ XPath for $id in //@text:id[../text:insertion//(dc:creator[. = ‘John Smith’] | @office:chg-author[. = ’ John Smith’])] return //text:p//text()[(preceding- sibling::text:change-start[1][@text:change-id = $id] and following- sibling::text:change-end[1][@text:change-id = $id]) or ancestor::text:changed- region/@text:id = $id] ✦ SPARQL SELECT ?r WHERE { ?r a earmark:Range , Insertion ; dc:creator "John Smith" . }
  • 23. Conclusions and future works • We presented a new meta-markup language called EARMARK, defined by means of OWL ontologies, that allows to make very complex markup documents • We applied it in a real-case scenario (ODT format with change tracking) showing how it allows to handle, manipulate and query complex documents in a better way (than XML does) • Future works about this topic include: ✦ Rocco and Fretta are two on-going projects that allow transformations from XML documents (with overlapping markup specified by using tricks) to EARMARK documents, and vice versa ✦ a formalism to specify explicitly semantics of markup and of textual content ✦ a word processor that allows to define EARMARK documents in a very simple way, with the possibility to add any kind of semantic assertions to any entity of the document (both markup items and textual content)
  • 24. Thanks for your attention I think it’s time for questions :-)
  • 25. Late time example: A more complex ODT document... <office:text> <text:changed-region text:id="S2"> ! <text:deletion><office:change-info> ! ! ! <dc:creator>Silvio Peroni</dc:creator> ! ! ! <dc:date>2009-10-27T18:45:00</dc:date> ! ! </office:change-info><text:p>.</text:p></text:deletion> ! <text:insertion> ! ! <office:change-info office:chg-author="Angelo Di Iorio" ! ! ! office:chg-date-time="2009-10-27T18:42:00"/> ! </text:insertion> </text:changed-region> <text:changed-region text:id="A2"> ! <text:insertion><office:change-info> ! ! ! <dc:creator>Angelo Di Iorio</dc:creator> ! ! ! <dc:date>2009-10-27T18:42:00</dc:date> ! ! </office:change-info></text:insertion> </text:changed-region> [...] <text:p>This is one paragraph<text:change-start text:change-id="S1"/>; ! actually, it was!<text:change-end text:change-id="S1"/> ! <text:change text:change-id="S2"/> <text:change-start text:change-id="A2"/></text:p> <text:p><text:change-end text:change-id="A2"/> ! <text:change text:change-id="A3"/><text:change-start text:change-id="A4"/>S ! <text:change-end text:change-id="A4"/>plit in two.</text:p> </office:text>
  • 26. ... and its representation in EARMARK TIME docuverses ranges markup items assertions r6 p text a text:insertion ; dc:creator “Silvio Peroni” ; actually, it was! dc:date “2009-10-27T18:45:00” a text:deletion ; dc:creator “Silvio Peroni” dc:date “2009-10-27T18:45:00” r4 p text r5 a text:insertion ; .S p dc:creator “Angelo Di Iorio” dc:date “2009-10-27T18:42:00” a text:deletion ; r1 dc:creator “Angelo Di Iorio” dc:date “2009-10-27T18:42:00” r2 p text r3 Legend string in the range docuverse begin end This is one paragraph that will be split in two. content location location