Example: microformats <cite class=" vcard "> <a class=" fn url " rel="friend colleague met" href="http://meyerweb.com/"> Eric Meyer </a> </cite> wrote a post ( <cite> <a href="http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/"> Tax Relief </a></cite> ) about an unintentionally humorous letter he received from the <span class=" vcard "> <a class=" fn org url " href="http://irs.gov/"> Internal Revenue Service </a> </span>. <div class=" vcard "> <a class=" email fn " href="mailto:jfriday@host.com"> Joe Friday </a> <div class=" tel "> +1-919-555-7878 </div> <div class=" title "> Area Administrator, Assistant </div> </div>
RDFa in a slide <p xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/” typeof=”foaf:Person " about="http://example.org/staff/jo" > <span property= " rdfs:label foaf:name "> Jo Smith </span>. <span property= " foaf:title "> Web hacker </span> at <a rel=”vcard:org" property= " foaf:name " href="http://example.org"> Acme Corp </a>. You can contact me <a rel= " foaf:mbox" href="mailto:jo@example.org"> via email </a>. </p> ... Assign the prefixes rdfs and foaf to the RDFS and FOAF namespaces (as in XML, RDF/XML etc.) Create a new resource of type foaf:Person Assign a value to a property Give it a URI Link to another resource and assign a name to it
Microdata example <div item=“http://www.yahoo.com/resource/person”> <p>My name is <span itemprop=" name "> Neil </span>.</p> <p>My band is called <span itemprop=" band "> Four Parts Water </span>. I was born on <time itemprop=" birthday " datetime=" 2009-05-10 ">May 10th 2009</time>. <img itemprop=" image " src=” me.png " alt=”me”> </p> </div
Semantics at every step of the IR process bla bla bla? q=“bla” * 3 Document processing bla bla bla Ranking Query processing Search interface The IR engine The Web The Semantic Web bla bla bla bla bla bla “ bla” θ (q,d)
SearchMonkey Acme.com’s database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Acme.com’s Web Pages
Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns
In fact, some of these searches are so hard that the users don’t even try them anymore
Results are good, but consider the ads: First ad says: Virgins. Looking for virgins? Find exactly what you want today. Ebay.com Second ad: Virgins. …Find cheap tickets for Virgins. Third ad: Adspam… these people buy Yahoo! traffic and sell it to Google.
SW: Representing and reasoning with structured data on the Web Both a relational and graph view on information IR:: Aggregating information at a document-level based on ad-hoc information needs DB: Representing and querying information in a relational model NLP: from text to information One reference to Semantic Search
Entity-independent measures: M1: probability of fix given type M2: probability of fix given type, normalized by probability of fix (the more uncommon the fix, the better) M3: binary entropy function