15. URIs implicitly link data together (#joe, #name, “Joe A.”) (#joe, #email, mailto:joe@joe.com) (#mary, name, “Mary B.”) (#mary, gender, “female”) (#joe, #loves, #mary) Joe’s homepage A dating site Mary’s homepage (#name, #type, #Property) (#name, #domain, #Person) Schema doc Linked Data : Following links from one document to another allows to discover the entire graph (data and ontologies)
16. When put together, they form a single ‘global’ graph “ Joe A.” #joe #name “ joe@joe.com” #email #mary #loves “ Mary B.” “ female” #name #gender
39. Example: microformats <cite class=" vcard "> <a class=" fn url " rel="friend colleague met" href="http://meyerweb.com/"> Eric Meyer </a> </cite> wrote a post ( <cite> <a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/"> Tax Relief </a></cite> ) about an unintentionally humorous letter he received from the <span class=" vcard "> <a class=" fn org url " href="http://irs.gov/"> Internal Revenue Service </a> </span>. <div class=" vcard "> <a class=" email fn " href="mailto:jfriday@host.com"> Joe Friday </a> <div class=" tel "> +1-919-555-7878 </div> <div class=" title "> Area Administrator, Assistant </div> </div>
40.
41.
42. Microdata example <div item> <p>My name is <span itemprop=" name "> Neil </span>.</p> <p>My band is called <span itemprop=" band "> Four Parts Water </span>. I was born on <time itemprop=" birthday " datetime=" 2009-05-10 ">May 10th 2009</time>. <img itemprop=" image " src=” me.png " alt=”me”> </p> </div
86. SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s Web Pages
87. Standard enhanced results Embed markup in your page, get an enhanced results without any programming
119. Semantics at every step of the IR process bla bla bla? q=“bla” * 3 Document processing bla bla bla Ranking Query processing Result presentation The IR engine The Web bla bla bla bla bla bla “ bla” θ (q,d)
Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
HTML allows us to place metadata in the head of the document. The metadata can be both properties (as a string) and relationships to other documents.
HTML also allows us to put metadata in the body of the document, using @rel and @rev on anchors.
RDFa extends the @rel/@href technique to allow licenses to be attached to images. Say we have a list of images -- perhaps from a Flickr search -- here we see that we can attach a license to each of them.
HTML allows relationships (the @rel/@href combination) to be used in both the head and the body, but text properties can only be added in the head (via @content on <meta>.
RDFa extends the use of @content to the body. Note a small twist -- we have to use @property instead of @name, since the latter attribute is already used for other stuff. Key thing here is that we've moved the machine-readable data closer to its human-readable version, which makes it a lot easier to publish.
Why would we do this? Well, first of all it's much easier to control the generation of the machine-readable data if it's close to the human-readable data. But second, once you put it close to the human-readable data, there are many situations where the human-readable version will also suffice for the machine-readable one, and so we can avoid duplication. Note that using @content for the date, illustrates a different point; in that case we preserve the distinction between the human- and machine-readable forms, because the machine-readable version is very precise.
Actually I cheated a little in the last slide. There is no such property as 'author' or 'created', they just happen to have been used in <head> over the years by a sort of convention. @rel=&quot;license' does exist, however, and there are a few other relationship values ('next', 'prev', and so on). But essentially, for other relationship values, and all property values, we need to use CURIEs. The advantage of this is that there are many pre-existing vocabularies that can immediately be used. Also, anyone can create a new vocabulary without having to ask anyone. Commontags was devised a few weeks ago, for example, and they didn't have to ask anyone's permission.
Recall that we added the relationship attributes to an image, so that we can specify license information...
...we can also add properties to the image.
HTML already supported relationships and properties that apply to the document, and we've seen how RDFa adds relationships and properties for images. Now lets look at how RDFa lets us add relationships and properties for anything . Let's say we have a link to a SlideShare presentation.
We know that if we put the @rel attribute onto the <a> tag as normal, it implies that the current document has a license, and that the presentation itself is the license. So this is no good.
The answer is to firstly create a link to the desired license...
...and then to indicate that this license is attached to the presentation. We still use @rel, but now we're using it with the new attribute that RDFa adds -- @about.
And of course, we can also add properties.
Using @about sets the context for any further RDFa, not just on the current element.
Once you are in the new context, then everything works exactly as normal, so compare this to the previous slide; the only difference is that the previous slide uses @about to set the context, whilst this example has the 'current document' to set the context.
We've gone into a lot of detail on the basics of RDFa to show how it builds upon HTML's existing semantic features, but there are many more features. The main thing to emphasise is that HTML already had some useful semantic features, but what they meant was never formalised; RDFa did that. RDFa also adds to these features, but does so by applying the same approach.
There is much more we could have said, but suggest that interested readers look at the RDFa Primer, and other tutorials and articles. In passing, would say though that RDFa supports all of RDF's more advanced features too, such as datatype of literals, rdf:type , bnodes, XML literals, and so on. Advanced RDFa also allows quite elaborate chaining of statements allowing people to be connected to companies, reviews to businesses, and so on.
As Vish discussed, SearchMonkey is all about building richer, more useful search results. Here’s a few examples Enhanced Results.
And it allows the user to add the movie directly to their online movie rental queue
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
[will be animated]
SW: Representing and reasoning with structured data on the Web Both a relational and graph view on information IR:: Aggregating information at a document-level based on ad-hoc information needs DB: Representing and querying information in a relational model NLP: from text to information
Results are good, but consider the ads: First ad says: Virgins. Looking for virgins? Find exactly what you want today. Ebay.com Second ad: Virgins. …Find cheap tickets for Virgins. Third ad: Adspam… these people buy Yahoo! traffic and sell it to Google.