3. Semantic Web Publish data on the Web Linked Data: linking data similar to how we link documents on the Web Query databases over the Web Architectural challenges A common format for sharing data Sharing the meaning of data Infrastructure Semantic Web standards from W3C Data and schema languages (RDF, OWL, RIF) Document formats (RDF/XML, RDFa) Protocols (SPARQL, HTTP) Semantic Web research into knowledge representation and reasoning, data integration, data quality and many other topics Community effort (Linked Data movement)
4. RDF (Resource Description Framework) The basic data model of the Semantic Web A universal model to capture all sorts of data: networks, relational, object-oriented… Basic unit of information is a triple A tuple of (subject, predicate, object) Example: (Joe, loves, Mary) Each triple gives the value of a property for a given resource or relates two objects to one another Object is either a resource or a literal An RDF model is a set of triples Ordering of statements in an RDF document is irrelevant (unlike XML)
5. Resources vs. literals Resources are identified by a URI or otherwise the are called a blank node URIs are a generalization of URLs Notation: <http://www.example.org/Person> or ex:Person Literals have an optional language and datatype (string, integer etc.) Literals can not be subjects of statements Datatypes are identified by URIs, e.g. XML Schema datatypes Two literals are the same if their components are the same Notation: “Joe B.” or Joe@en^^http://…#string
6. Graphical and textual notation foaf:Person type my:Joe name “Joe A.” A number of ways to serialize an RDF model into an RDF document RDF/XML, Turtle, N3, N-Triples Example: http://www.cs.vu.nl/~pmika/foaf.rdf
7. RDF is designed for the Web URIs provide web-wide global identification across datasets A resource may be described by multiple documents We know it’s the same resource because the same URI is used or through reasoning (advanced topic…) URIs are intented to be reused Unique, but not single identifiers: two URIs may denote the same thing URIs can be retrieved from the Web A well-behaved URI returns a description of the resource Provides authority: the definition of foaf:Person lives at that URI Ontologies can be looked up as well Typically at the root of the URIs, also known as the namespace Example: http://xmlns.com/foaf/0.1/Person redirects to the specification
8. URIs implicitly link data together (#joe, #loves, #mary) (#joe, #name, “Joe A.”) (#joe, #email, mailto:joe@joe.com) A dating site (#mary, name, “Mary B.”) (#mary, gender, “female”) Joe’s homepage Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc
9. Put together, triples form a single ‘global’ graph “Joe A.” #name #joe #email “joe@joe.com” #loves “Mary B.” #name #mary #gender “female”
10. Linked Data Open your data Publish it in RDF, the lingua franca of the data web Data first, schema second Worry about linking, data integration later… someone else can do it for you! Optionally, provide query access using the SPARQL query language and protocol Powerful, SQL-like query language HTTP or SOAP protocol to communicate with SPARQL servers
11. Linked Data cloud: interlinked RDF datasets on the Web http://linkeddata.org/
12. Dbpedia Dbpedia is dataset that contains much of the structured data in Wikipedia Data from the info-boxes Links between Wikipedia pages Categories Disambiguation and redirect pages Links to other datasets
13. Fetching individual resources Use your web browser http://dbpedia.org/resource/Yahoo redirects to http://dbpedia.org/page/Yahoo You can plug in this URI into other Linked Data browsers HTTP GET to fetch data Using curl: add Accept: application/rdf+xmlfor RDF and enable redirect curl -L -H 'Accept:application/rdf+xml' 'http://dbpedia.org/resource/Berlin’ Data dumps http://wiki.dbpedia.org/Datasets
14. Querying using SPARQL Interactive query builders SPARQL Explorer: http://dbpedia.org/snorql/ Examples at: http://wiki.dbpedia.org/OnlineAccess Using HTTP GET GET /sparql/?query=EncodedQuery HTTP/1.1 Example: SELECT ?film ?x WHERE { ?film <http://dbpedia.org/ontology/language> <http://dbpedia.org/resource/French_language> . ?film <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Film>} curl 'http://dbpedia.org/sparql?query=encodedQuery’
15. More data New York Times http://data.nytimes.com/ Example URI: http://data.nytimes.com/60694995023816375851 Also supports JSON Append .json or set Accept:text/javascript Freebase http://freebase.com Example URI http://rdf.freebase.com/rdf/en.tron_legacy Data dump http://download.freebase.com
16. And more data… Geonames: open geo data Geonames.org http://sws.geonames.org/5130561/ Download: http://www.geonames.org/export/ Open Government data efforts Data.gov See apps e.g. http://flyontime.us Data.gov.uk http://data.gov.uk/sparql
17. Spanish open gov’t data and linked data efforts Spanish open data efforts La AsociaciónEspañola de Linked Data (AELID) http://aelid.es/ ProyectoAporta aporta.es Regional/local efforts risp.asturias.es (RDF, SPARQL) datos.zaragoza.es (RDF, SPARQL) opendata.euskadi.net (RDF) dadesobertes.gencat.cat (RDF) Competition AbreDatos 2010 abredatos.es
18. More info Segaran et al.: Programming the Semantic Web, O’Reilly, 2010. linkeddata.org W3C Semantic Web Activity Presentations, guides etc. RDF Primer http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ SPARQL query language and protocol specs http://www.w3.org/TR/rdf-sparql-protocol/ http://www.w3.org/TR/rdf-sparql-query/ Search SlideShare etc. for more intro material
19. Build your Own Search Service (BOSS) Peter Mika Yahoo! Research Barcelona pmika@yahoo-inc.com
20. Innovate with Search! It’s really simple… Example: pay $0.0008 for a query, earn $0.01 per query 100,000 users a day, each making 1 query a day Earn $920 dollars a day!
21. Reminds me of the underpants gnomes from the Simpsons http://en.wikipedia.org/wiki/Underpants_Gnomes
22. Yahoo BOSS: Yahoo’s Search API Ability to re-order results and blend-in addition content No restrictions on presentation No branding or attribution Access to multiple verticals (web search, image, news) Spelling suggestions 40+ supported language and region pairs Pricing (BOSS) 10,000 free queries a day Pay for more queries Serve any ads you want For more info, http://developer.yahoo.com/search/boss/ New in BOSS v2 Powered by Bing Retrieve ads from Yahoo! and earn money ;)
23. Using BOSS Simple HTTP GET calls, no authentication Get an Application ID at http://developer.yahoo.com/search/boss/ Example: http://boss.yahooapis.com/ysearch/web/v1/{query}?appid={appid}&format=xml http://boss.yahooapis.com/ysearch/spelling/v1/{query}?appid={appid}&format=xml Documentation http://developer.yahoo.com/search/boss/boss_guide/
24. Queries you can play with Yahoo!’s WebScope program Data sharing with universities and research institutions Some of the most exciting data that we have! Request access online http://webscope.sandbox.yahoo.com/ Requires approval by Department Chair For HackU, you can sign up here for access to a dataset containing real world user queries Yahoo! Search Tiny Sample v1.0: a set of 4,500 queries Ideal for testing and demonstrating your search-based apps Can you really show something interesting for all these users?