SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Dealing with
                     Markup Semantics
                                    Silvio Peroni – speroni@cs.unibo.it
                                          Aldo Gangemi – aldo.gangemi@cnr.it
                                                Fabio Vitali – fabio@cs.unibo.it




http://creativecommons.org/licenses/by-sa/3.0
Summary




•   Semantic markup vs. markup semantics

•   Why markup semantics

•   Why XML is not enough

•   Markup semantics with EARMARK and Linguistic Act

•   Real-world scenarios

•   Conclusions
Shift of meaning


                Markup                       Tag              Semantics and Markup
             document markup            markup element              markup semantics
 1990
 Web of    it tells us something        a syntactic item
documents    about the text or            representing          “what is the meaning of a
          content of a document       the building block of       markup element title
                                     a document structure      contained in a document d?”

                                                     First Era of the Web (WWW)
                                             Second Era of the Web (Semantic Web)

              resource markup                keyword                semantic markup
           it is used to identify a non-hierarchical keyword
today
 Web of    any data added to a       or term assigned to a   “the resource r has the string
  data      resource with the     piece of information (such      Dealing with Markup
        intention to semantically as an Internet bookmark,         Semantics as title”
                 describe it       digital image or computer
                                               file)
Markup semantics today


•       The document markup is still here:
    ✦    lot of research issues are still open-problems now
    ✦    some on those partially-solved issues can be addressed in a better way through
         nowadays tools and technologies

•       So, our question is:

        Why the Semantic Web has not yet addressed properly markup semantics?

        Possible answers:
    ✦    Because the document markup is dead, really
    ✦    Because markup semantics is not an interesting research topic
    ✦    Because markup semantics is not an useful tool for solving valuable problems
    ✦    Actually, the Semantic Web addressed markup semantics
The document markup is dead... wait, really?


•   The document markup does not play any important role in
    nowadays research fields and company interests
                                              Are we definitely sure?




         Maybe not!
Research groups’ interest in markup semantics


 •       Does it mean that there is no research communities interested in this issue? Well,
         actually, it is an old and still-live issue:
     ✦     Renear, A., Dubin, D., Sperberg-McQueen, C. M. (2002). Towards a Semantics for XML Markup.
     ✦     Dubin, D. (2003). Object mapping for markup semantics.
     ✦     Renear, A., Dubin, D., Sperberg-McQueen, C. M., Huitfeldt, C. (2003). XML Semantics and Digital Libraries.
     ✦     Simons, G. F., Lewis, W. D., Farrar, S. O., Langendoen, D. T., Fitzsimons, B., Gonzalez, H. (2004). The semantics of
           markup: mapping legacy markup schemas to a common semantics.
     ✦     Garcia, R., Celma, O. (2005) Semantic Integration and Retrieval of Multimedia Metadata.
     ✦     Marcoux,Y. (2006). A natural-language approach to modeling: Why is some XML so difficult to write?
     ✦     Van Deursen, D., Poppe, C., Martens, G., Mannens, E.,Van de Walle, R. (2008). XML to RDF Conversion: a
           Generic Approach.
     ✦     Marcoux,Y., Rizkallah, E. (2009). Intertextual semantics: A semantics for information design.
     ✦     Sperberg-McQueen, C. M., Marcoux,Y., Huitfeldt, C. (2009). Two representations of the semantics of TEI Lite
     ✦     Nuzzolese, A., Gangemi, A., Presutti,V. (2010). Gathering Lexical Linked Data and Knowledge Patterns from
           FrameNet.

 •       “The problem addressed seems old and seems to have been solved before, but actually
         has not [sufficiently]”
         – by an anonymous reviewer
Markup semantics and real-world problems


•   Some advantages when having a formal and machine-readable
    semantics of markup:
    ✦   perform both syntactic and semantic validation
    ✦   infer facts from documents automatically
    ✦   simplify the federation, conversion and translation of documents among digital
        repositories
    ✦   query upon the structure of the document by considering its semantics
    ✦   create visualisations of documents considering the semantics of their
        structures rather than their markup vocabularies
    ✦   increase the accessibility of documents’ content (see the “tag abuse” issue)
    ✦   guarantee a better maintainability when a markup schema evolves

•   Fields of interest: digital libraries and digital (and semantic)
    publishing
Semantic Web approaching markup semantics

 •       RDFa may be a valid choice for associating formal semantics with arbitrary
         text fragments
     ✦    Pros: easy to use and parse, compliant with XML-like formats
     ✦    Cons: we need to modify the structure of the document (more attributes, more elements)

<?xml version="1.0" encoding="UTF-8"?>
<p>Fabio says that overlhappens</p>                1 markup element only

                        <?xml version="1.0" encoding="UTF-8"?>
  RDFa enhancing        <p prefix=”: http://www.example.com/
                          foaf: http://xmlns.com/foaf/0.1/”>
                          <span about=”:fv” property=”foaf:firstName”>Fabio</span>
2 markup elements         says that overlhappens
3 attributes            </p>


 •       There are domains (e.g., those having to deal with administrative and juridical
         documents) in which we cannot modify the structure of documents

 •       How can we say that the element p in the document means “paragraph”?
Our problems in addressing markup semantics

•   ✦
        Let’s use XML for defining document markup structures
         Pros: it is the today common format, used in lot of tools and applications
    ✦    Cons: it does not define a formal way for specifying markup semantics

•       Let’s use OWL for defining formal semantics and then associating it to
        XML markup
    ✦    Pros: OWL was created for define semantics
    ✦    Cons: we have to use XML-based approaches (RDFa, GRDDL) to link semantics to
         XML markup and this is not always possible

•       A compromise between XML and OWL is not fully satisfying

•       A solution: to elevate either the document markup formalism or the
        formal semantics model to the level of the other, that means:
    ✦    to use XML for document markup and another formalism, fully compliant with XML in
         all the possible scenarios, for defining its markup semantics (does it exist?), or
    ✦    to develop an OWL ontology for defining document markup and another OWL
         ontology for specifying its semantics
                                                                   try to guess what we did
•       The Extremely Annotational RDF
        Markup (EARMARK) is at the
        same time a markup meta-language and
        an ontology of (document) markup
    ✦    More expressive than XML – it allows to
         organise markup structures as graphs
    ✦    It makes easy to associate OWL semantics
         to document items – an EARMARK
         document is a set of OWL assertions, all the
         markup items and text nodes are individuals
         of particular classes
    ✦    Lot of tools available: a Java API, frameworks
         to convert XML documents into EARMARK
         ones and to convert complex EARMARK
         documents (i.e., having a graph structure)
         into XML ones applying overlapping tricks
         to store as much information as possible
         into the simple XML tree hierarchy
              more information at http://palindrom.es/phd/research/earmark
An example: XML tricks

                         p



    agent                    noun   verb      This is not directly representable
                                                in XML (unless using tricks):
                                                 “noun” and “verb” overlap
Fabio says that overlhappens


 To be representable     p                           XML serialisation
in XML it should be...
                                                   with TEI fragmentation
                                     verb   <p>
                                              <agent>Fabio</agent> says that
                                              <noun xml:id=”e1” next=”e2”>
                                                 overl
     agent                   noun   noun      </noun>
                                              <verb>
                                                 h<noun xml:id=”e2”>ap</noun>pens
                                              </verb>
Fabio says that overlhappens                </p>
An example: EARMARK document

                  p        ex:doc a :StringDocuverse;
                             :hasContent "Fabio says that overlhappens".
                                             ex:r0-5 a :PointerRange;
                                               :refersTo ex:doc;
   agent              noun      verb           :begins "0"; :ends "5”.

                                             ex:r5-16 a :PointerRange;
                                               :refersTo ex:doc;
Fabio says that overlhappens                   :begins "5"; :ends "16".
ex:agent a :Element;                         ex:r16-21 a :PointerRange;
  :hasGeneralIdentifier "agent";               :refersTo ex:doc;
  c:firstItem [c:itemContent ex:r0-5].         :begins "16"; :ends "21".
ex:noun a :Element;
                                             ex:r22-24 a :PointerRange;
  :hasGeneralIdentifier "noun";
                                               :refersTo ex:doc;
  c:firstItem [c:itemContent ex:r16-21;
                                               :begins "22"; :ends "24".
  c:nextItem [c:itemContent ex:r22-24]] .
ex:verb a :Element;                          ex:r21-28 a :PointerRange;
  :hasGeneralIdentifier "verb";                :refersTo ex:dox;
  c:firstItem [c:itemContent ex:r21-28].       :begins "21"; :ends "28".
ex:p a :Element ; :hasGeneralIdentifier "p";
  c:firstItem [c:itemContent ex:agent; c:nextItem [c:itemContent ex:r5-16;
  c:nextItem [c:itemContent ex:noun; c:nextItem [c:itemContent ex:verb]]]].
Towards markup semantics


•   EARMARK is suitable for expressing markup semantics
    straightforwardly using OWL

•   What model can we use? It must:
    ✦   follow precise and theoretically-founded principles
    ✦   be interoperable across different markup vocabularies

•   A large amount of vocabularies addresses the representation of
    terms vs. meanings vs. things – e.g., SKOS, FRBR, CIDOC, OWL-
    WordNet

    Problems:
    ✦   too specific for particular contexts
    ✦   they are not interoperable
Linguistic Act ontology design pattern


•   References: any individual from the
    world we are describing – e.g., Fabio

•   Meanings: any (meta-level) object
    that explains something – e.g., person

•   Information entities: any symbol
    that has a meaning or denotes one or
    more references – e.g., the string
    “Fabio”

•   Linguistic acts: any communicative
    situation including information entities,
    agents, meanings, references, and a
    possible spatio-temporal context – e.g.,
    to add markup to a document
             http://ontologydesignpatterns.org/cp/owl/semantics.owl
Example: “Results” section of a paper
                                                                  <section>
 <div class=”section”>              2 XML excerpts of               <info>
   <h1>Results</h1>                                                   <title>Results</title>
   <p>...</p>
                                     “Result” sections              </info>
 </div>                                                             <para>...</para>
                                                                  </section>
                            Related EARMARK conversions
ex1:div a :Element;                                          ex2:section a :Element;
  :hasGeneralIdentifier “div”;                                   :hasGeneralIdentifier “section”;
  c:firstItem [c:itemContent                                     c:firstItem [c:itemContent
      ex1:class];                                                   ex2:info;
  c:nextItem [c:itemContent ex1:h1;                              c:nextItem [c:itemContent
  c:nextItem [c:itemContent ex1:p]]];                               ex2:para]];
  la:expresses                                                   la:expresses
      doco:Section, deo:Results.                                    doco:Section, deo:Results.
...                                                          ...
ex1:p a :Element;                                            ex2:para a :Element;
  :hasGeneralIdentifier “p”;                                     :hasGeneralIdentifier “para”;
  c:firstItem [c:itemContent                                     c:firstItem [c:itemContent
      ex1:someText];                                                ex2:someText];
  la:express doco:Paragraph.                                     la:express doco:Paragraph.
...                                                          ...
              We are using the Document Components Ontology (http://purl.org/spar/doco) and
    the Discourse Elements Ontology (http://purl.org/spar/deo) to specify the semantics of markup elements
Searches on heterogeneous repositories


 •   Problem: how to search something across a large number of
     digital libraries that use storing documents as XML documents of
     different and non-interoperable formats?

 •   Query: give me all the markup elements that represents
     paragraphs of any “Result” section of any available document that
     were written by any person called Fabio
SELECT ?x WHERE {
  ?x a :Element ; la:expresses doco:Paragraph ;
    dc:creator [a foaf:Person ; foaf:name “Fabio”];
    (^c:itemContent/^c:item)+
      [a :Element; la:expresses doco:Section , deo:Results]
}

                     ex1:p and ex2:para are returned
Semantic format conversion


 •   Problem: how to convert a document from a (unknown) format
     into a target one, without knowing the markup vocabulary of the
     former and having the possibility of querying its semantics

 •   Convert: substitute any markup element representing a section
     with a new one named “sec” that contains the same elements and
     text content of the removed one
        DELETE {?s :hasGeneralIdentifier ?gi}
        INSERT {?s :hasGeneralIdentifier “sec”}
        WHERE {
          ?s a :Element; :hasGeneralIdentifier ?gi;
          la:expresses doco:Section
        }
                            <sec class=”section”>   <sec>
                                                      <info>
previous excerpts change:     <h1>Results</h1>
                                                        <title>Results</title>
                              ...
                                                        ...
Markup sensibility

•   Problem: how to estimate whether a markup element, that is valid at the syntactical
    and structural level, is also valid at the semantic level

•   Semantic constraints can be defined as ontological axioms of the underlying
    ontology, in order to understand whether a document is adhering to or in contrast
    with them
             <smith> a :Element; :hasGeneralIdentifier “TLCPerson”;
               la:denotes </ontology/ul/person/JohnSmith> ...
             </ontology/ul/person/JohnSmith> a akomantoso:Person.

     <akomaNtoso> ...
       <TLCPerson id=”smith” href=”/ontology/uk/person/JohnSmith” /> ...
       <speech id=”sp_1” by=”#smith” as=”#mineconomy”>
         <p>Honorable Members of the Parliament...</p>
       </speech> ...
     </akomaNtoso>

     <sp_1> a :Element; :hasGeneralIdentifier “speech”;
       la:expresses akomantoso:Speech; la:denotes _:aSpeechEvent; ...
     _:aSpeechEvent a akomantoso:SpeechEvent;
       akomantoso:hasSpeaker </ontology/ul/person/JohnSmith>.
     [] a la:LinguisticAct; sit:isSettingFor <sp_1>, akomantoso:Speech,
       </ontology/ul/person/JohnSmith>, _:aSpeechEvent.
Verifying semantic constraints

•   Verify: check whether the markup element “speech” denotes a particular
    speech event that involves only and at least 1 person as speaker, that is
    introduced in the document through a markup element
(Element that hasGeneralIdentifier value “speech”)
SubClassOf
(sit:hasSetting only
  (la:LinguisticAct that
    sit:isSettingFor exactly 1 (Element and la:InformationEntity)
    and
    sit:isSettingFor exactly 1 (
      (akomantoso:SpeechEvent and la:Reference)
      that
      akomantoso:hasSpeaker some (
        akomantoso:Person that la:isDenotedBy some Element
      )
    )
    and
    sit:isSettingFor value akomantoso:Speech
  )
)
Conclusions

•       The issue of markup semantics is still a interesting research field, with a lot of
        possible applications in real-world scenarios

•       We proposed our approach for addressing markup semantics through Semantic
        Web technologies and we introduced EARMARK, as a new document markup
        meta-language, and the Linguistic Act ontology design pattern for expressing
        semantics of EARMARK document markup

•       We shown how to use these models for addressing real scenarios in which the
        use of markup semantics can help when doing particular tasks, such as querying
        on heterogeneous document repositories, converting document markup across
        different vocabularies, and verifying the validity of markup elements at a semantic
        level

•       Future development:
    ✦     a software assistant that helps users in the definition of markup semantics of a given XML schema
    ✦     two applications for the semantic validation of markup documents and for the visualisation of
          document parts according to their semantics
Thanks for your attention

Weitere ähnliche Inhalte

Was ist angesagt?

SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataMarek Maśko
 
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...DaeHyun Sung
 
Xml Java
Xml JavaXml Java
Xml Javacbee48
 
Processing OpenDocument Format
Processing OpenDocument FormatProcessing OpenDocument Format
Processing OpenDocument FormatAlexandro Colorado
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsSimon Dew
 
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...DaeHyun Sung
 
Diane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebDiane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebALATechSource
 
User Defined Characters and SVG Fonts
User Defined Characters and SVG FontsUser Defined Characters and SVG Fonts
User Defined Characters and SVG FontsJun Fujisawa
 
ILUG 2007 - Notes and Office Integration
ILUG 2007 - Notes and Office IntegrationILUG 2007 - Notes and Office Integration
ILUG 2007 - Notes and Office IntegrationJohn Head
 

Was ist angesagt? (19)

Getting Real With RDA
Getting Real With RDAGetting Real With RDA
Getting Real With RDA
 
XML
XMLXML
XML
 
CIS-189 Final Review
CIS-189 Final ReviewCIS-189 Final Review
CIS-189 Final Review
 
SQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML DataSQL Server - Querying and Managing XML Data
SQL Server - Querying and Managing XML Data
 
XML
XMLXML
XML
 
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...
[LibreOffice Asia Conference 2019] CJK Issues on LibreOffice(based on Korean ...
 
Xmlphp
XmlphpXmlphp
Xmlphp
 
Xml Java
Xml JavaXml Java
Xml Java
 
Processing OpenDocument Format
Processing OpenDocument FormatProcessing OpenDocument Format
Processing OpenDocument Format
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documents
 
Xml
XmlXml
Xml
 
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
[LibreOffice conference 2021] The challenge of Using LibreOffice & Building L...
 
Diane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic WebDiane Hillmann: RDA Vocabularies in the Semantic Web
Diane Hillmann: RDA Vocabularies in the Semantic Web
 
User Defined Characters and SVG Fonts
User Defined Characters and SVG FontsUser Defined Characters and SVG Fonts
User Defined Characters and SVG Fonts
 
XML and DTD
XML and DTDXML and DTD
XML and DTD
 
O9xml
O9xmlO9xml
O9xml
 
ILUG 2007 - Notes and Office Integration
ILUG 2007 - Notes and Office IntegrationILUG 2007 - Notes and Office Integration
ILUG 2007 - Notes and Office Integration
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
 
XML
XMLXML
XML
 

Andere mochten auch (7)

百度官方Seo优化指南V1.0
百度官方Seo优化指南V1.0百度官方Seo优化指南V1.0
百度官方Seo优化指南V1.0
 
The Woodlands TX - Real Estate Market Reports - May/June 2010
The Woodlands TX - Real Estate Market Reports - May/June 2010The Woodlands TX - Real Estate Market Reports - May/June 2010
The Woodlands TX - Real Estate Market Reports - May/June 2010
 
Amistad
AmistadAmistad
Amistad
 
Cancer really sucks_-_v6
Cancer really sucks_-_v6Cancer really sucks_-_v6
Cancer really sucks_-_v6
 
poomjit-212cafe
poomjit-212cafepoomjit-212cafe
poomjit-212cafe
 
A descobertado fluxo mihaly
A descobertado fluxo   mihalyA descobertado fluxo   mihaly
A descobertado fluxo mihaly
 
Csaguide
CsaguideCsaguide
Csaguide
 

Ähnlich wie Understanding Markup Semantics

RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013CA API Management
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web workPaul Houle
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Webliddy
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadatarobin fay
 
Semantic web
Semantic webSemantic web
Semantic webtariq1352
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep WebSamiul Hoque
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingJeffrey Williams
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
The RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkThe RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkDiane Hillmann
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)James Hendler
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CIvan Herman
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Sebastian Ryszard Kruk
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulationstbruce
 
Lee Iverson - How does the web connect content?
Lee Iverson - How does the web connect content?Lee Iverson - How does the web connect content?
Lee Iverson - How does the web connect content?Museums Computer Group
 

Ähnlich wie Understanding Markup Semantics (20)

RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
RESTing in the ALPS Mike Amundsen's Presentation from QCon London 2013
 
Semantics
SemanticsSemantics
Semantics
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Semantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & MetadataSemantic Web, Cataloging, & Metadata
Semantic Web, Cataloging, & Metadata
 
Semantic web
Semantic webSemantic web
Semantic web
 
Toward The Semantic Deep Web
Toward The Semantic Deep WebToward The Semantic Deep Web
Toward The Semantic Deep Web
 
Industrial strength - Natural Language Processing
Industrial strength - Natural Language ProcessingIndustrial strength - Natural Language Processing
Industrial strength - Natural Language Processing
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They WorkThe RDA Vocabularies: What They Are, How They Work
The RDA Vocabularies: What They Are, How They Work
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Extended WordNet
Extended WordNetExtended WordNet
Extended WordNet
 
Linking library data
Linking library dataLinking library data
Linking library data
 
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
Hacia la Internet del Futuro: Web Semántica y Open Linked Data, Parte 2
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Lee Iverson - How does the web connect content?
Lee Iverson - How does the web connect content?Lee Iverson - How does the web connect content?
Lee Iverson - How does the web connect content?
 

Mehr von University of Bologna

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusUniversity of Bologna
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...University of Bologna
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentUniversity of Bologna
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseUniversity of Bologna
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsUniversity of Bologna
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherUniversity of Bologna
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...University of Bologna
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentUniversity of Bologna
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersUniversity of Bologna
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...University of Bologna
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsUniversity of Bologna
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...University of Bologna
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...University of Bologna
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachUniversity of Bologna
 

Mehr von University of Bologna (17)

The Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations CorpusThe Initiative for Open Citations and the OpenCitations Corpus
The Initiative for Open Citations and the OpenCitations Corpus
 
OpenCitations
OpenCitationsOpenCitations
OpenCitations
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
A Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology DevelopmentA Simplified Agile Methodology for Ontology Development
A Simplified Agile Methodology for Ontology Development
 
FOOD: FOod in Open Data
FOOD: FOod in Open DataFOOD: FOod in Open Data
FOOD: FOod in Open Data
 
Freedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations ariseFreedom for bibliographic references: OpenCitations arise
Freedom for bibliographic references: OpenCitations arise
 
A pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflowsA pattern-based ontology for describing publishing workflows
A pattern-based ontology for describing publishing workflows
 
Semantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing togetherSemantic lenses to bring digital and semantic publishing together
Semantic lenses to bring digital and semantic publishing together
 
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
Zeri e LODE
: Extracting the Zeri photo archive to Linked Open Data: formaliz...
 
Characterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experimentCharacterising citations in scholarly articles: an experiment
Characterising citations in scholarly articles: an experiment
 
Bringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointersBringing semantic publishing into TEI: ideas and pointers
Bringing semantic publishing into TEI: ideas and pointers
 
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
Tracking Changes through EARMARK: a Theoretical Perspective and an Implementa...
 
Towards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citationsTowards the automatic identification of the nature of citations
Towards the automatic identification of the nature of citations
 
The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...The Live OWL Documentation Environment: a tool for the automatic generation o...
The Live OWL Documentation Environment: a tool for the automatic generation o...
 
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
Scholarly publishing and Linked Data: describing roles, statuses, temporal an...
 
Embedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approachEmbedding semantic annotations within texts: the FRETTA approach
Embedding semantic annotations within texts: the FRETTA approach
 
Handling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWLHandling Markup Overlaps Using OWL
Handling Markup Overlaps Using OWL
 

Kürzlich hochgeladen

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Kürzlich hochgeladen (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Understanding Markup Semantics

  • 1. Dealing with Markup Semantics Silvio Peroni – speroni@cs.unibo.it Aldo Gangemi – aldo.gangemi@cnr.it Fabio Vitali – fabio@cs.unibo.it http://creativecommons.org/licenses/by-sa/3.0
  • 2. Summary • Semantic markup vs. markup semantics • Why markup semantics • Why XML is not enough • Markup semantics with EARMARK and Linguistic Act • Real-world scenarios • Conclusions
  • 3. Shift of meaning Markup Tag Semantics and Markup document markup markup element markup semantics 1990 Web of it tells us something a syntactic item documents about the text or representing “what is the meaning of a content of a document the building block of markup element title a document structure contained in a document d?” First Era of the Web (WWW) Second Era of the Web (Semantic Web) resource markup keyword semantic markup it is used to identify a non-hierarchical keyword today Web of any data added to a or term assigned to a “the resource r has the string data resource with the piece of information (such Dealing with Markup intention to semantically as an Internet bookmark, Semantics as title” describe it digital image or computer file)
  • 4. Markup semantics today • The document markup is still here: ✦ lot of research issues are still open-problems now ✦ some on those partially-solved issues can be addressed in a better way through nowadays tools and technologies • So, our question is: Why the Semantic Web has not yet addressed properly markup semantics? Possible answers: ✦ Because the document markup is dead, really ✦ Because markup semantics is not an interesting research topic ✦ Because markup semantics is not an useful tool for solving valuable problems ✦ Actually, the Semantic Web addressed markup semantics
  • 5. The document markup is dead... wait, really? • The document markup does not play any important role in nowadays research fields and company interests Are we definitely sure? Maybe not!
  • 6. Research groups’ interest in markup semantics • Does it mean that there is no research communities interested in this issue? Well, actually, it is an old and still-live issue: ✦ Renear, A., Dubin, D., Sperberg-McQueen, C. M. (2002). Towards a Semantics for XML Markup. ✦ Dubin, D. (2003). Object mapping for markup semantics. ✦ Renear, A., Dubin, D., Sperberg-McQueen, C. M., Huitfeldt, C. (2003). XML Semantics and Digital Libraries. ✦ Simons, G. F., Lewis, W. D., Farrar, S. O., Langendoen, D. T., Fitzsimons, B., Gonzalez, H. (2004). The semantics of markup: mapping legacy markup schemas to a common semantics. ✦ Garcia, R., Celma, O. (2005) Semantic Integration and Retrieval of Multimedia Metadata. ✦ Marcoux,Y. (2006). A natural-language approach to modeling: Why is some XML so difficult to write? ✦ Van Deursen, D., Poppe, C., Martens, G., Mannens, E.,Van de Walle, R. (2008). XML to RDF Conversion: a Generic Approach. ✦ Marcoux,Y., Rizkallah, E. (2009). Intertextual semantics: A semantics for information design. ✦ Sperberg-McQueen, C. M., Marcoux,Y., Huitfeldt, C. (2009). Two representations of the semantics of TEI Lite ✦ Nuzzolese, A., Gangemi, A., Presutti,V. (2010). Gathering Lexical Linked Data and Knowledge Patterns from FrameNet. • “The problem addressed seems old and seems to have been solved before, but actually has not [sufficiently]” – by an anonymous reviewer
  • 7. Markup semantics and real-world problems • Some advantages when having a formal and machine-readable semantics of markup: ✦ perform both syntactic and semantic validation ✦ infer facts from documents automatically ✦ simplify the federation, conversion and translation of documents among digital repositories ✦ query upon the structure of the document by considering its semantics ✦ create visualisations of documents considering the semantics of their structures rather than their markup vocabularies ✦ increase the accessibility of documents’ content (see the “tag abuse” issue) ✦ guarantee a better maintainability when a markup schema evolves • Fields of interest: digital libraries and digital (and semantic) publishing
  • 8. Semantic Web approaching markup semantics • RDFa may be a valid choice for associating formal semantics with arbitrary text fragments ✦ Pros: easy to use and parse, compliant with XML-like formats ✦ Cons: we need to modify the structure of the document (more attributes, more elements) <?xml version="1.0" encoding="UTF-8"?> <p>Fabio says that overlhappens</p> 1 markup element only <?xml version="1.0" encoding="UTF-8"?> RDFa enhancing <p prefix=”: http://www.example.com/ foaf: http://xmlns.com/foaf/0.1/”> <span about=”:fv” property=”foaf:firstName”>Fabio</span> 2 markup elements says that overlhappens 3 attributes </p> • There are domains (e.g., those having to deal with administrative and juridical documents) in which we cannot modify the structure of documents • How can we say that the element p in the document means “paragraph”?
  • 9. Our problems in addressing markup semantics • ✦ Let’s use XML for defining document markup structures Pros: it is the today common format, used in lot of tools and applications ✦ Cons: it does not define a formal way for specifying markup semantics • Let’s use OWL for defining formal semantics and then associating it to XML markup ✦ Pros: OWL was created for define semantics ✦ Cons: we have to use XML-based approaches (RDFa, GRDDL) to link semantics to XML markup and this is not always possible • A compromise between XML and OWL is not fully satisfying • A solution: to elevate either the document markup formalism or the formal semantics model to the level of the other, that means: ✦ to use XML for document markup and another formalism, fully compliant with XML in all the possible scenarios, for defining its markup semantics (does it exist?), or ✦ to develop an OWL ontology for defining document markup and another OWL ontology for specifying its semantics try to guess what we did
  • 10. The Extremely Annotational RDF Markup (EARMARK) is at the same time a markup meta-language and an ontology of (document) markup ✦ More expressive than XML – it allows to organise markup structures as graphs ✦ It makes easy to associate OWL semantics to document items – an EARMARK document is a set of OWL assertions, all the markup items and text nodes are individuals of particular classes ✦ Lot of tools available: a Java API, frameworks to convert XML documents into EARMARK ones and to convert complex EARMARK documents (i.e., having a graph structure) into XML ones applying overlapping tricks to store as much information as possible into the simple XML tree hierarchy more information at http://palindrom.es/phd/research/earmark
  • 11. An example: XML tricks p agent noun verb This is not directly representable in XML (unless using tricks): “noun” and “verb” overlap Fabio says that overlhappens To be representable p XML serialisation in XML it should be... with TEI fragmentation verb <p> <agent>Fabio</agent> says that <noun xml:id=”e1” next=”e2”> overl agent noun noun </noun> <verb> h<noun xml:id=”e2”>ap</noun>pens </verb> Fabio says that overlhappens </p>
  • 12. An example: EARMARK document p ex:doc a :StringDocuverse; :hasContent "Fabio says that overlhappens". ex:r0-5 a :PointerRange; :refersTo ex:doc; agent noun verb :begins "0"; :ends "5”. ex:r5-16 a :PointerRange; :refersTo ex:doc; Fabio says that overlhappens :begins "5"; :ends "16". ex:agent a :Element; ex:r16-21 a :PointerRange; :hasGeneralIdentifier "agent"; :refersTo ex:doc; c:firstItem [c:itemContent ex:r0-5]. :begins "16"; :ends "21". ex:noun a :Element; ex:r22-24 a :PointerRange; :hasGeneralIdentifier "noun"; :refersTo ex:doc; c:firstItem [c:itemContent ex:r16-21; :begins "22"; :ends "24". c:nextItem [c:itemContent ex:r22-24]] . ex:verb a :Element; ex:r21-28 a :PointerRange; :hasGeneralIdentifier "verb"; :refersTo ex:dox; c:firstItem [c:itemContent ex:r21-28]. :begins "21"; :ends "28". ex:p a :Element ; :hasGeneralIdentifier "p"; c:firstItem [c:itemContent ex:agent; c:nextItem [c:itemContent ex:r5-16; c:nextItem [c:itemContent ex:noun; c:nextItem [c:itemContent ex:verb]]]].
  • 13. Towards markup semantics • EARMARK is suitable for expressing markup semantics straightforwardly using OWL • What model can we use? It must: ✦ follow precise and theoretically-founded principles ✦ be interoperable across different markup vocabularies • A large amount of vocabularies addresses the representation of terms vs. meanings vs. things – e.g., SKOS, FRBR, CIDOC, OWL- WordNet Problems: ✦ too specific for particular contexts ✦ they are not interoperable
  • 14. Linguistic Act ontology design pattern • References: any individual from the world we are describing – e.g., Fabio • Meanings: any (meta-level) object that explains something – e.g., person • Information entities: any symbol that has a meaning or denotes one or more references – e.g., the string “Fabio” • Linguistic acts: any communicative situation including information entities, agents, meanings, references, and a possible spatio-temporal context – e.g., to add markup to a document http://ontologydesignpatterns.org/cp/owl/semantics.owl
  • 15. Example: “Results” section of a paper <section> <div class=”section”> 2 XML excerpts of <info> <h1>Results</h1> <title>Results</title> <p>...</p> “Result” sections </info> </div> <para>...</para> </section> Related EARMARK conversions ex1:div a :Element; ex2:section a :Element; :hasGeneralIdentifier “div”; :hasGeneralIdentifier “section”; c:firstItem [c:itemContent c:firstItem [c:itemContent ex1:class]; ex2:info; c:nextItem [c:itemContent ex1:h1; c:nextItem [c:itemContent c:nextItem [c:itemContent ex1:p]]]; ex2:para]]; la:expresses la:expresses doco:Section, deo:Results. doco:Section, deo:Results. ... ... ex1:p a :Element; ex2:para a :Element; :hasGeneralIdentifier “p”; :hasGeneralIdentifier “para”; c:firstItem [c:itemContent c:firstItem [c:itemContent ex1:someText]; ex2:someText]; la:express doco:Paragraph. la:express doco:Paragraph. ... ... We are using the Document Components Ontology (http://purl.org/spar/doco) and the Discourse Elements Ontology (http://purl.org/spar/deo) to specify the semantics of markup elements
  • 16. Searches on heterogeneous repositories • Problem: how to search something across a large number of digital libraries that use storing documents as XML documents of different and non-interoperable formats? • Query: give me all the markup elements that represents paragraphs of any “Result” section of any available document that were written by any person called Fabio SELECT ?x WHERE { ?x a :Element ; la:expresses doco:Paragraph ; dc:creator [a foaf:Person ; foaf:name “Fabio”]; (^c:itemContent/^c:item)+ [a :Element; la:expresses doco:Section , deo:Results] } ex1:p and ex2:para are returned
  • 17. Semantic format conversion • Problem: how to convert a document from a (unknown) format into a target one, without knowing the markup vocabulary of the former and having the possibility of querying its semantics • Convert: substitute any markup element representing a section with a new one named “sec” that contains the same elements and text content of the removed one DELETE {?s :hasGeneralIdentifier ?gi} INSERT {?s :hasGeneralIdentifier “sec”} WHERE { ?s a :Element; :hasGeneralIdentifier ?gi; la:expresses doco:Section } <sec class=”section”> <sec> <info> previous excerpts change: <h1>Results</h1> <title>Results</title> ... ...
  • 18. Markup sensibility • Problem: how to estimate whether a markup element, that is valid at the syntactical and structural level, is also valid at the semantic level • Semantic constraints can be defined as ontological axioms of the underlying ontology, in order to understand whether a document is adhering to or in contrast with them <smith> a :Element; :hasGeneralIdentifier “TLCPerson”; la:denotes </ontology/ul/person/JohnSmith> ... </ontology/ul/person/JohnSmith> a akomantoso:Person. <akomaNtoso> ... <TLCPerson id=”smith” href=”/ontology/uk/person/JohnSmith” /> ... <speech id=”sp_1” by=”#smith” as=”#mineconomy”> <p>Honorable Members of the Parliament...</p> </speech> ... </akomaNtoso> <sp_1> a :Element; :hasGeneralIdentifier “speech”; la:expresses akomantoso:Speech; la:denotes _:aSpeechEvent; ... _:aSpeechEvent a akomantoso:SpeechEvent; akomantoso:hasSpeaker </ontology/ul/person/JohnSmith>. [] a la:LinguisticAct; sit:isSettingFor <sp_1>, akomantoso:Speech, </ontology/ul/person/JohnSmith>, _:aSpeechEvent.
  • 19. Verifying semantic constraints • Verify: check whether the markup element “speech” denotes a particular speech event that involves only and at least 1 person as speaker, that is introduced in the document through a markup element (Element that hasGeneralIdentifier value “speech”) SubClassOf (sit:hasSetting only (la:LinguisticAct that sit:isSettingFor exactly 1 (Element and la:InformationEntity) and sit:isSettingFor exactly 1 ( (akomantoso:SpeechEvent and la:Reference) that akomantoso:hasSpeaker some ( akomantoso:Person that la:isDenotedBy some Element ) ) and sit:isSettingFor value akomantoso:Speech ) )
  • 20. Conclusions • The issue of markup semantics is still a interesting research field, with a lot of possible applications in real-world scenarios • We proposed our approach for addressing markup semantics through Semantic Web technologies and we introduced EARMARK, as a new document markup meta-language, and the Linguistic Act ontology design pattern for expressing semantics of EARMARK document markup • We shown how to use these models for addressing real scenarios in which the use of markup semantics can help when doing particular tasks, such as querying on heterogeneous document repositories, converting document markup across different vocabularies, and verifying the validity of markup elements at a semantic level • Future development: ✦ a software assistant that helps users in the definition of markup semantics of a given XML schema ✦ two applications for the semantic validation of markup documents and for the visualisation of document parts according to their semantics
  • 21. Thanks for your attention