1. Hacking with Semantic Web
Tom Praison
Developer @ Yahoo!
http://twitter.com/tompraison
2. What’s in here?
• Evolution of the web
• Poorly Solved Information Needs
• Semantic Web Technologies
• Linked Data
• Demo of confhopper.in, a site built using open
datasets
• Some techniques for getting Structured
Information from Web.
• Demo of Yahoo! Contextual Analysis Platform and
Open Dapper
3. I just had to take the hypertext
idea and connect it to the
Transmission Control Protocol
and domain name system ideas
and—ta-da!—the World Wide
Web.
Tim Berners Lee – Inventor of the WWW
4. WEB 1.0
Few Content Creators! Majority Consumers!
http://www.flickr.com/photos/leandrociuffo/3665883373/
5. WEB 2.0
Web as a platform
http://www.flickr.com/photos/lambertwm/4737580179/
6. WEB 1.0 vs WEB 2.0
Ofoto Flickr
Personal Website Blogging
Britannica Online Wikipedia
Directories(taxonomy) Tagging(“folksonomy”)
Content Management Wikis
Systems
7. WEB 3.0
Which direction will it take?
http://www.flickr.com/photos/markhillary/337685031
8. Semantic Web
Virtual Web WEB 3.0 Pervasive Web
Could be anything!
Artificial
Personalization
Intelligence
10. Poorly Solved Information Needs
• Multiple interpretations
– Apple
• Long tail queries
– Roja (I meant a south indian actress)
• Imprecise or overly precise searches
– jim hendler
– pictures of strong adventures people
• Searches for descriptions
– countries in africa
– 25 year old computer engineer living in Bangalore
– Reliable smart phone under 15,000 rupees
14. Semantic Web standards from W3C
• Data and schema
languages
(RDF, OWL, RIF)
• Document formats
(RDF/XML, RDFa)
• Protocols
(SPARQL, HTTP)
15. Current Researches & Other Efforts
• Semantic Web research into knowledge
representation and reasoning, data
integration, data quality and many other
topics
• Community effort (Linked Data movement)
16. RDF (Resource Description
Framework)
• The basic data model of the Semantic Web
– A universal model to capture all sorts of data:
networks, relational, object-oriented…
• Basic unit of information is a triple
– A tuple of (subject, predicate, object)
– Example: (Joe, loves, Mary)
– Each triple gives the value of a property for a given
resource or relates two objects to one another
• Object is either a resource or a literal
• An RDF model is a set of triples
– Ordering of statements in an RDF document is irrelevant
(unlike XML)
17. Graphical and textual notation
foaf:Person
type
my:Joe
name
“Joe A.”
A number of ways to serialize an RDF model into an
RDF document
RDF/XML, Turtle, N3, N-Triples
18. RDF is designed for the Web
• URIs provide web-wide global identification across datasets
– A resource may be described by multiple
documents
– URIs are intended to be reused
– Unique, but not single identifiers: two URIs may
denote the same thing
19. RDF is designed for the Web
• URIs can be retrieved from the Web
– A well-behaved URI returns a description of the
resource
– Provides authority: the definition of foaf:Person
lives at that URI
• Ontologies can be looked up as well
– Typically at the root of the URIs, also known as the
namespace
– Example: http://xmlns.com/foaf/0.1/Person
redirects to the specification
20. URIs implicitly link data together
(#joe, #loves, #mary)
(#joe, #name, “Joe A.”)
(#joe, #email, mailto:joe@joe.com) A social networking site
(#mary, name, “Mary B.”)
Joe’s homepage (#mary, gender, “female”)
Mary’s homepage
(#name, #type, #Property)
(#name, #domain, #Person)
Schema doc
21. Put together, triples form a single
‘global’ graph
#name “Joe A.”
#joe
#email
“joe@joe.com”
#loves
#name “Mary B.”
#mary
#gender
“female”
23. Linked Data cloud: interlinked RDF
datasets on the Web
http://linkeddata.org/
24. DBPedia
• Dbpedia is dataset that contains much of the
structured data in Wikipedia
– Data from the info-boxes
– Links between Wikipedia pages
– Categories
– Disambiguation and redirect pages
• Links to other datasets
25. Fetching individual resources
• Use your web browser
• http://dbpedia.org/resource/Yahoo redirects to
http://dbpedia.org/page/Yahoo
• You can plug in this URI into other Linked Data browsers
• HTTP GET to fetch data
– Using curl: add Accept: application/rdf+xml for RDF
and enable redirect
• curl -L -H 'Accept:application/rdf+xml'
'http://dbpedia.org/resource/Berlin’
• Data dumps
– http://wiki.dbpedia.org/Datasets
26. Querying using SPARQL
• Interactive query builders
• SPARQL Explorer: http://dbpedia.org/snorql/
• Examples at: http://wiki.dbpedia.org/OnlineAccess
• Using HTTP GET
– GET /sparql/?query=EncodedQuery HTTP/1.1
– Example:
• SELECT ?film ?x WHERE {
?film <http://dbpedia.org/ontology/language>
<http://dbpedia.org/resource/French_language> . ?film
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film>}
• curl 'http://dbpedia.org/sparql?query=encodedQuery’
27. ConfHopper.in
• Award winning app in WWW2012 Metadata
Challenge.
• Confhopper.in is a desktop / mobile HTML5 based
application designed for conference attendees.
• Built with the help of open datasets from
http://data.semanticweb.org/ and various other
sources.
28. Some Techniques for getting
Structured Information from Web
• Semantic Markup
• NER
• Extraction Tools (Dapper)
30. NER – Named Entity Recognition
• Yahoo! Content Analysis API
• http://developer.yahoo.com/contentanalysis/
31. Dapper
http://open.dapper.net
Dapper is a tool that enables users to create update feeds for
their favorite sites and website owners to optimize and
distribute their content in new ways.