Hacking with Semantic Web

Hacking with Semantic Web

Tom Praison
Developer @ Yahoo!
http://twitter.com/tompraison

What’s in here?
• Evolution of the web
• Poorly Solved Information Needs
• Semantic Web Technologies
• Linked Data
• Demo of confhopper.in, a site built using open
datasets
• Some techniques for getting Structured
Information from Web.
• Demo of Yahoo! Contextual Analysis Platform and
Open Dapper

I just had to take the hypertext
idea and connect it to the
Transmission Control Protocol
and domain name system ideas
and—ta-da!—the World Wide
Web.

Tim Berners Lee – Inventor of the WWW

WEB 1.0
Few Content Creators! Majority Consumers!

http://www.flickr.com/photos/leandrociuffo/3665883373/

WEB 2.0

Web as a platform
http://www.flickr.com/photos/lambertwm/4737580179/

WEB 1.0 vs WEB 2.0

Ofoto Flickr

Personal Website Blogging

Britannica Online Wikipedia

Directories(taxonomy) Tagging(“folksonomy”)

Content Management Wikis
Systems

WEB 3.0

Which direction will it take?

http://www.flickr.com/photos/markhillary/337685031

Semantic Web

Virtual Web WEB 3.0 Pervasive Web

Could be anything!

Artificial
Personalization
Intelligence

Today’s Web

A Web of Documents rather than Data!

Poorly Solved Information Needs
• Multiple interpretations
– Apple
• Long tail queries
– Roja (I meant a south indian actress)
• Imprecise or overly precise searches
– jim hendler
– pictures of strong adventures people
• Searches for descriptions
– countries in africa
– 25 year old computer engineer living in Bangalore
– Reliable smart phone under 15,000 rupees

THE SOLUTION

Semantic Web

Publish data on the Web
• Linked Data: linking data similar to how we link
documents on the Web
• Query databases over the Web

Architectural Challenges
• A common format for sharing data
• Sharing the meaning of data
• Infrastructure

Semantic Web standards from W3C
• Data and schema
languages
(RDF, OWL, RIF)
• Document formats
(RDF/XML, RDFa)
• Protocols
(SPARQL, HTTP)

Current Researches & Other Efforts
• Semantic Web research into knowledge
representation and reasoning, data
integration, data quality and many other
topics
• Community effort (Linked Data movement)

RDF (Resource Description
Framework)
• The basic data model of the Semantic Web
– A universal model to capture all sorts of data:
networks, relational, object-oriented…
• Basic unit of information is a triple
– A tuple of (subject, predicate, object)
– Example: (Joe, loves, Mary)
– Each triple gives the value of a property for a given
resource or relates two objects to one another
• Object is either a resource or a literal
• An RDF model is a set of triples
– Ordering of statements in an RDF document is irrelevant
(unlike XML)

Graphical and textual notation
foaf:Person
type

my:Joe
name

“Joe A.”

A number of ways to serialize an RDF model into an
RDF document
RDF/XML, Turtle, N3, N-Triples

RDF is designed for the Web
• URIs provide web-wide global identification across datasets
– A resource may be described by multiple
documents
– URIs are intended to be reused
– Unique, but not single identifiers: two URIs may
denote the same thing

RDF is designed for the Web
• URIs can be retrieved from the Web
– A well-behaved URI returns a description of the
resource
– Provides authority: the definition of foaf:Person
lives at that URI
• Ontologies can be looked up as well
– Typically at the root of the URIs, also known as the
namespace
– Example: http://xmlns.com/foaf/0.1/Person
redirects to the specification

URIs implicitly link data together

(#joe, #loves, #mary)

(#joe, #name, “Joe A.”)
(#joe, #email, mailto:joe@joe.com) A social networking site
(#mary, name, “Mary B.”)
Joe’s homepage (#mary, gender, “female”)

Mary’s homepage

(#name, #type, #Property)
(#name, #domain, #Person)

Schema doc

Put together, triples form a single
‘global’ graph
#name “Joe A.”
#joe
#email

“joe@joe.com”

#loves

#name “Mary B.”

#mary
#gender

“female”

Linked Data cloud: interlinked RDF
datasets on the Web
http://linkeddata.org/

DBPedia
• Dbpedia is dataset that contains much of the
structured data in Wikipedia
– Data from the info-boxes
– Links between Wikipedia pages
– Categories
– Disambiguation and redirect pages
• Links to other datasets

Fetching individual resources
• Use your web browser
• http://dbpedia.org/resource/Yahoo redirects to
http://dbpedia.org/page/Yahoo
• You can plug in this URI into other Linked Data browsers
• HTTP GET to fetch data
– Using curl: add Accept: application/rdf+xml for RDF
and enable redirect
• curl -L -H 'Accept:application/rdf+xml'
'http://dbpedia.org/resource/Berlin’
• Data dumps
– http://wiki.dbpedia.org/Datasets

Querying using SPARQL
• Interactive query builders
• SPARQL Explorer: http://dbpedia.org/snorql/
• Examples at: http://wiki.dbpedia.org/OnlineAccess
• Using HTTP GET
– GET /sparql/?query=EncodedQuery HTTP/1.1
– Example:
• SELECT ?film ?x WHERE {
?film <http://dbpedia.org/ontology/language>
<http://dbpedia.org/resource/French_language> . ?film
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film>}
• curl 'http://dbpedia.org/sparql?query=encodedQuery’

ConfHopper.in
• Award winning app in WWW2012 Metadata
Challenge.
• Confhopper.in is a desktop / mobile HTML5 based
application designed for conference attendees.
• Built with the help of open datasets from
http://data.semanticweb.org/ and various other
sources.

Some Techniques for getting
Structured Information from Web
• Semantic Markup
• NER
• Extraction Tools (Dapper)

Semantic Markup
• Microdata (Schema.org)
• RDFa
• Open Graph Protocol (ogp.me)
• Example:
http://getschema.org/microdataextractor?url
=http://www.tompraison.com&out=json

NER – Named Entity Recognition
• Yahoo! Content Analysis API
• http://developer.yahoo.com/contentanalysis/

Dapper

http://open.dapper.net

Dapper is a tool that enables users to create update feeds for
their favorite sites and website owners to optimize and
distribute their content in new ways.

References
• http://www.slideshare.net/tompraison
• http://inkdroid.org/journal/2010/06/04/the-
5-stars-of-open-linked-data/
• http://www.freebase.com/
• http://dbpedia.org/About

Hacking with Semantic Web

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Hacking with Semantic Web

Ähnlich wie Hacking with Semantic Web (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hacking with Semantic Web