Overview
What is Semantic Web
Semantic Web Vision
Semantic Web Layers
RDF, RDFS, OWL
Tools
GATE
Applications
What is Semantic Web?
Semantic means that the meaning of data
can be discovered by computers
"The Semantic Web is an extension of the
current web in which information is given
well-defined meaning, better enabling
computers and people to work in
cooperation." - Tim Berners-Lee
Definition
The Semantic Web is a project to create a
universal medium for information exchange by
putting documents with computer-processable
meaning (semantic) on the World Wide Web
The Semantic Web extends the Web through the
use of standards, markup languages and related
processing tools
The aims of Semantic Web
Indexing and retrieving information
Annotation
The Web as a interoperable database
Machine retrieval of data
Web based services
Discovery of services
Intelligent software agents
Semantic Web Vision
Oriented toward machine-readable
resources rather than human-readable
Requires resources to be described so
that machines know what they mean
Description in terms of metadata
Use of logic interpretation for inference
Semantic Web Layers
XML (Extensible Markup Language)- The
language framework that is used to define
nearly all new languages that are used to
interchange data over the Web
XML Schema -A language used to define
the structure of specific XML language
Semantic Web Layers
RDF (Resource Description Framework)-
a language used to describe all sort of
information and meta data
RDF Schema-A framework that provides a
means to specify basic vocabularies for
specific RDF application language to use
Semantic Web Layers
Ontology- defines vocabularies and
establish the usage of words and terms in
context of specific vocabulary
Logic and Proof –is used to establish the
consistency and correctness of data sets
and to infer conclusion that aren’t explicitly
stated
Semantic Web agents
Metadata will be used to identify and
extract information from Web sources.
Ontologies will be used to assist in Web
searches, to interpret retrieved
information, and to communicate with
other agents.
Logic will be used for processing retrieved
information and for drawing conclusions.
RDF
• “Resource Description Framework”
• RDF is a data model
• Originally for describing metadata for web pages
• Structured information
• Universal, machine-readable data exchange model
• Syntax uses XML for serialization
• Statements can be modeled with
• Resources: an element, a URI, a literal
• Properties: directed relation between two resources
• Statements: triples of two resources linked by property
RDF
• Generally triple can be viewed as a graph
• both “ object: and “ subject” are the graph nodes
• “properties are the edges
• XML syntax is only the tools for practical usage instead of graph
• Components
• URIs – for referencing resources
• Literals – data values
• Empty nodes (blank nodes) – talking about something which doesn’t
have a name
RDF Example
• Subject: URIs and empty nodes
• Predicate: URIs ( also called properties)
• Object: URIs and empty nodes and literals
A simple example
“The book has the title War and Peace”
Graphical RDF Statement
has the title War and
The book
The book Peace
RDF in a XML document
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/element/1.1">
<rdf:Description rdf:about="http://amazon.com/books">
<dc:title> War and Peace</dc:title>
</rdf:Description>
</rdf:RDF>
Ontology
We can express ontology as:
Ontology =<taxonomy, inference rules>
And we can express a taxonomy as:
Taxonomy <{classes}, {relations}>
Ontology Languages (RDFS, OWL) has formal
foundations that allow us to infer additional (implicit)
statements
RDF & RDFS
RDF is graphical formalism ( + XML syntax + semantics)
for representing metadata
for describing the semantics of information in a
machine- accessible way
RDFS extends RDF with “schema vocabulary”, e.g.:
Class, Property
type, subClassOf, subPropertyOf
range, domain
Limitations of RDF/RDFS
No standard for expressing primitive data types such as
integer, etc. All data types in RDF/RDFS are treated as
strings.
No standard for expressing relations of properties
(unique, transitive, inverse etc.)
No standard for expressing whether enumerations are
closed.
No standard to express equivalence, disjointedness etc.
among properties
OIL and DAML
RDFRDFS define a framework, however they have
limitations. There is a need for new semantic web
languages with following requirements
They should be compatible with (XML, RDF/RDFS)
They should have enough expressive power to fill in the gaps in
RDFS
They should provide automated reasoning support
Ontology Inference Layer (OIL) and DARPA Agent Markup
Language (DAML) are two important efforts developed to
fulfill these requirements.
Their combined efforts formed DAML+OIL declarative
semantic language.
OIL and DAML
DAML+OIL is built on top of RDFS.
It uses RDFS syntax.
It has richer ways to express primitive data types.
DAML+OIL allows other relationships (inverse and
transitivity) to be directly expressed.
DAML+OIL provides well defined semantics, This
provides followings:
Meaning of DAML+OIL statements can be formally specified.
Machine understanding and automated reasoning can be
supported.
More expressive power can be provided.
Example
Example: T. Rex is not herbivore and not a currently living
species.
This statement can be expressed in DAML+OIL, but not in
RDF/RDFS since RDF/RDFS cannot express disjointedness.
DAML+OIL provides automated reasoning by providing such
expressive power.
For instance, a software agent can find out the “list of all the carnivores
that won’t be any threat today” by processing the DAML+OIL data
representation of the example above.
RDF/RDFS does not express “is not” relationships and exclusions.
Web Ontology Language = OWL
OWL is an extra layer, a bit like RDFS
own namespace, own terms
it relies on RDF Schemas
It is a separate recommendation
actually… there is a 2004 version of OWL
(“OWL 1”)
and there is an update (“OWL 2”) published in
2009
OWL- Web Ontology Language
OWL is a vocabulary extension of the RDF and is
derived from the DAML+OIL Web Ontology Language.
OWL
Description Logic
Class, Thing, Nothing
DatatypeProperty, ObjectProperty, AnnotationProperty,…
Class
oneOf, disjointWith, unionOf, complementOf, intersectionOf …
Restriction, onProperty, cardinality, hasValue…
Property
inverseOf , TransitiveProperty , SymmetricProperty
FunctionalProperty, InverseFunctionalProperty
Equality– equivalentClass , sameAs , differentFrom…
Ontology annotation – Ontology, imports, versionInfo
Term equivalences
For classes:
owl:equivalentClass: two classes have the
same individuals
owl:disjointWith: no individuals in common
For properties:
owl:equivalentProperty
remember the a:author vs. f:auteur?
owl:propertyDisjointWith
Term equivalences
For individuals:
owl:sameAs: two URIs refer to the same
concept (“individual”)
owl:differentFrom: negation of owl:sameAs
Example
owl:equivalentProperty
a:author f:auteur
owl:equivalentClass
a:Novel f:Roman
Property characterization
In OWL, one can characterize the
behavior of properties (symmetric,
transitive, functional, reflexive, inverse
functional…)
One property can be defined as the
“inverse” of another
What this means is…
If the following holds in our triples:
:email rdf:type owl:InverseFunctionalProperty.
<A> :email "mailto:a@b.c".
<B> :email "mailto:a@b.c".
What this means is…
If the following holds in our triples:
:email rdf:type owl:InverseFunctionalProperty.
<A> :email "mailto:a@b.c".
<B> :email "mailto:a@b.c".
then, processed through OWL, the following
holds, too:
<A> owl:sameAs <B>.
Keys
“if two persons have the same emails and the same
homepages then they are identical”
Identification is based on the identical
values of two properties
The rule applies to persons only
What it means is…
If:
<A> rdf:type :Person ;
:email "mailto:a@b.c";
:homepage "http://www.ex.org".
<B> rdf:type :Person ;
:email "mailto:a@b.c";
:homepage "http://www.ex.org".
then, processed through OWL, the following holds,
too:
<A> owl:sameAs <B>.
Classes in OWL
In RDFS, you can subclass existing
classes… that’s all
In OWL, you can construct classes from
existing ones:
enumerate its content
through intersection, union, complement
etc
Enumerate class content
:Currency
rdf:type owl:Class;
owl:oneOf (:€ :£ :$).
I.e., the class consists of exactly of those
individuals and nothing else
Union of classes
:Novel rdf:type owl:Class.
:Short_Story rdf:type owl:Class.
:Poetry rdf:type owl:Class.
:Literature rdf:type owl:Class;
owl:unionOf (:Novel :Short_Story :Poetry).
Other possibilities: owl:complementOf,
owl:intersectionOf, …
For example…
If:
:Novel rdf:type owl:Class.
:Short_Story rdf:type owl:Class.
:Poetry rdf:type owl:Class.
:Literature rdf:type owl:Class;
owl:unionOf (:Novel :Short_Story :Poetry).
<myWork> rdf:type :Novel .
then the following holds, too:
<myWork> rdf:type :Literature .
What we have so far…
The OWL features listed so far are already
fairly powerful
E.g., various databases can be linked via
owl:sameAs, functional or inverse
functional properties, etc.
Many inferred relationship can be found
using a traditional rule engine
The most used Semantic Web
Tools
RDF Gateway- it runs both a Web
application server and database design to
handle RDF content
Jena -Java API for RDF
Smore: Semantic Markup, Ontology and
RDF Editor
Drive - a C# API. It parses and validate
RDF documents.
What is GATE?
An architecture
A macro-level organisational picture for LE software systems.
A framework
For programmers, GATE is an object-oriented class library that
implements the architecture.
A development environment
For language engineers, computational linguists et al, GATE is a
graphical development environment bundled with a set of tools for doing
e.g. Information Extraction.
Some free components... ...and wrappers for other
people's components
Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue;
ontologies; etc.
46(21)
Where did GATE come from?
A number of researchers realised in the early- mid-1990s (e.g. in
TIPSTER):
• Increasing trend towards multi-site collaborative projects
• Role of engineering in scalable, reusable, and portable HLT solutions
• Support for large data, in multiple media, languages, formats, and
locations
• Lower the cost of creation of new language processing components
• Promote quantitative evaluation metrics via tools and a level playing field
History:
• 1996 – 2002: GATE version 1, proof of concept
• March 2002: version 2, rewritten in Java, component based, more users
• Fall 2003: new development cycle
47(21)
Swoogle
• Swoogle is a crawler based indexing and retrieval
system for Semantic Web
• Swoogle crawls and discovers documents written in
RDF,OWL
• Swoogle classifies a Semantic Web
Document(SWD) as –
• Semantic Web Ontology (SWO) – Defines new
terms
• Semantic Web Databases (SWDB) – Makes
assertions about individuals