(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
Â
Semantic Web and Schema.org
1. What a long, strange trip itâs been
R.V.Guha
Google
schema.org
2. Outline of talk
⢠The context
â How did we end up where we are
⢠Schema.org
â What it is, status of adoption
â Schema.org principles, how does it work
⢠Looking ahead
â Next Generation Applications
schema.org
3. About 18 years ago, âŚ
⢠People started thinking about structured data on the web
â A few people from Netscape, Microsoft and W3C got together @MIT
⢠Trying to make sense of a flurry of activity/proposals
â XML, MCF, CDF, Sitemaps, âŚ
⢠There were a number of problems
â PICS, Meta data, sitemaps, âŚ
⢠But one unifying idea
schema.org
4. Context: The Web for humans
Structured
Data
Web server
HTML
schema.org
5. Goal: Web for Machines & Humans
Structured
Data
Web server
Apps
schema.org
6. What does that mean?
birthplace
Chuck Norris
Ryan, Oklahama
birthdate
March 10th 1940
Actor
type
- Notable points
- Graph Data Model
- Common Vocabulary
schema.org
7. How do we get there?
⢠How does the author give us the graph
â Data Model: Graph vs tree vs âŚ
â Syntax
â Vocabulary
â Identifiers for objects
⢠Why should the author give us the graph?
schema.org
8. Going depth first
⢠Many heated battles
â Lot of proposals, standards, companies, âŚ
⢠Data model
â Trees vs DLGs vs Vertical specific vs who needs one?
⢠Syntax
â XML vs RDF vs json vs âŚ
⢠Model theory anyone
â We need one vs who cares vs whatâs that?
schema.org
9. Timeline of âstandardsâ
⢠â96: Meta Content Framework (MCF) (Apple)
⢠â97: MCF using XML (Netscape) ď RDF, CDF
⢠â99 -- : RDF, RDFS
⢠â01 -- : DAML, OWL, OWL EL, OWL QL, OWL RL
⢠â03: Microformats
⢠And many many many more ⌠SPARQL, Turtle, N3, GRDDL,
R2RML, FOAF, SIOC, SKOS, âŚ
⢠Lots of bells & whistles: model theory, inference, type systems,
âŚ
schema.org
10. But something was missing âŚ
⢠Fewer than 1000 sites were using these standards
⢠Something was clearly missing and it wasnât more language
features
⢠We had forgotten the âWhyâ part of the problem
⢠The RSS story
schema.org
11. â07 - :Rise of the consumers
⢠Yahoo! Search Monkey, Google Rich Snippets, Facebook Open
Graph
⢠Offer webmasters a simple value proposition
⢠Search engines to webmasters:
â You give us data ⌠we make your results nicer
⢠Usage begins to take off
â 1000x increase in markupâed up pages in 3 years
schema.org
12. Yahoo Search Monkey
⢠Give websites control over snippet presentation
⢠Moderate adoption
â Targeted at high end developers
â Too many choices
schema.org
15. Google Rich Snippets
⢠Multi-syntax
⢠Adhoc vocabulary for each vertical
⢠Very clear carrot
⢠Lots of experimentation on UI
⢠Moderately successful: 10ks of sites
⢠Scaling issues with vocabulary
schema.org
16. Situation in 2010
⢠Too many choices/decisions for webmasters
â Divergence in vocabularies
⢠Too much fragmentation
⢠N versions of person, address, âŚ
⢠A lot of bad/wrong markup
â ~25% for micro-formats, ~40% with RDFA
â Some spam, mostly unintended mistakes
⢠Absolute adoption numbers still rather low
â Less than 100k sites
schema.org
17. Schema.org
⢠Work started in August 2010
â Google, Yahoo!, Microsoft & then Yandex
⢠Goals:
â One vocabulary understood by all the search engines
â Make it very easy for the webmaster
⢠It is A vocabulary. Not The vocabulary.
â Webmasters can use it together other vocabs
â We might not understand the other vocabs. Others might
schema.org
19. Schema.org principles: Simplicity
⢠Simple things should be simple
â For webmasters, not necessarily for consumers of markup
â Webmasters shouldnât have to deal with N namespaces
⢠Complex things should be possible
â Advanced webmasters should be able to mix and match
vocabularies
⢠Syntax
â Microdata, usability studies
â RDFa, json-ld, âŚ
schema.org
20. Schema.org principles: Simplicity
⢠Canât expect webmasters to understand Knowledge
Representation, Semantic Web Query Languages, etc.
⢠It has to fit in with existing workflows
â A posteriori âmarkup toolsâ donât work
⢠Avoid KR system driven artifacts
â Multiple domain / range for attributes
â No classes like âAgentâ
â Categories and attributes should be concrete
schema.org
21. Schema.org principles: Simplicity
⢠Copy and edit as the default mode for authors
â It is not a linear spec, but a tree of examples
⢠Vocabularies
â Authors only need to have local view
â But schema.org tries to have a single global coherent
vocabulary
schema.org
22. Schema.org principles: Incremental
⢠Started simple
â ~ 100 categories at launch
⢠Applies to every area
â Add complexity after adoption
â now ~1200 vocab items
â Go back and fill in the blanks
⢠Move fast, accept mistakes, iterate fast
schema.org
23. Schema.org Principles: URIs
⢠~1000s of terms like Actor, birthdate
â ~10s for most sites
â Common across sites
⢠~10ks of terms like USA
â External enumerations
Chuck Norris
birthplace
⢠~1b-100b terms like Chuck Norris and Ryan, Oklahama
â Cannot expect agreement on these
â Reference by description
â Consumers can reconcile entity references
Ryan, Oklahama
March 10th 1940
Actor
type
citizenOf
USA
birthdate
schema.org
24. An Actor
named
Chuck Norris
March 10th 1940
citizenOf
USA
birthdate
A city named Ryan
In the state OK
birthplace
birthdate
March 10th 1940
An Actor
named
Chuck Norris +
spouse
A Person named
Geena OâKelley
=
Chuck Norris
USA
Ryan, Oklahama
birthplace
spouse
March 10th 1940
Actor
type
citizenOf
birthdate
Geena OâKelley
schema.org
25. Schema.org Principles: Collaborations
⢠Most discussions on public W3C lists
⢠Work closely with interest communities
⢠Work with others to incorporate their vocabularies
â We give them attribution on schema.org
â Webmasters should not have to worry about where each
piece of the vocabulary came from
â Webmasters can mix and match vocabs
schema.org
26. Schema.org Principles: Collaborations
⢠IPTC /NYTimes / Getty with rNews
⢠Martin Hepp with Good Relations
⢠US Veterans, Whitehouse, Indeed.com with Job Posting
⢠Creative Commons with LRMI
⢠NIH National Library of Medicine for Medical vocab.
⢠Bibextend, Highwire Press for Bibliographic vocabulary
⢠Benetech for Accessibility
⢠BBC, European Broadcasting Union for TV & Radio schema
⢠Stackexchange, SKOS group for message board
⢠Lots and lots and lots of individuals
schema.org
27. Schema.org Principles: Partners
⢠Partner with Authoring platforms
â Drupal, Wordpress, Blogger, YouTube
⢠Drupal 8
â Schema.org markup for many types
⢠News articles, comments, users, events, âŚ
â More schema.org types can be created by site author
â Markup in HTML5 & RDFa Lite
â Will come out early 2015
schema.org
28. Recent Additions
⢠From Nouns to Verbs: Actions
â Object ď potential actions
â Constraints on actions
â E.g., ThorMovie ď Stream, Buy, âŚ
⢠Introducing time: Roles
â E.g., Joe Montana played for the SF 49ers from 1979 to
1992 in the position QuarterBack
schema.org
30. Looking forward
⢠Schema.org is doing better than we expected
â Thanks to millions of webmasters!
⢠But this is not the final goal
â Just the means to the next generation of applications
⢠First generation of applications
â Rich presentation of search results
⢠Many new applications
â Related to search and beyond
schema.org
35. Reservations ď¨ Personal Assistant
⢠Open Table website ď confirmation email ď
Android Reminder
schema.org
36. Vertical Search
⢠Structured data in search
â Web search: annotate search results
OR
â Filtering based on structured data
⢠Only in specialized corpus
⢠Ecommerce, real estate, etc.
⢠How about filtering based on structured data across the web?
schema.org
38. Web scale vertical search
⢠Searching for Veteran friendly jobs
schema.org
39. Web Scale custom vertical search
⢠Build your own custom vertical search engine
â Google does the heavy lifting: crawling, indexing, etc.
â You specify the schema.org restricts
â APIs to help build your own UI
⢠Searches over all pages on the web with a certain
schema.org markup
⢠Demo
schema.org
40. Scientific Data Publishing
⢠US Govt alone spends over $60B/yr on scientific
research
⢠Primary output of most of this research is data
â Most of the data is thrown away
â All that is published are papers
⢠We would like the data published in a easily reusable
form
schema.org
41. Case study: Clinical Trials
⢠Clinical trials
⢠4000+ clinical trials at any time in the US alone
⢠Almost all the data âthrown awayâ
⢠All that gets published is a textual âabstractâ
⢠Many of the trials are redundant
⢠Earlier trials have the data
⢠Assumptions, etc. cannot be re-examined
⢠Longitudinal studies extremely hard, but super important
⢠Having all the clinical trial data on the web, in a
common schema will make this much easier!
schema.org
42. Case study: SkyServer
⢠Huge amount of astronomy data
⢠Jim Gray, NASA and others brought it all together,
normalized it and made it available on the web
⢠Has changed the way astronomy research takes place
⢠Students in Africa getting PhDs without leaving Africa!
⢠Radio/Ultra-violet/Visible light data easily brought together
⢠Caveats
⢠SQL biased, not distributed, not scalable
⢠All normalization done by hand, once
⢠Small number of data sources
⢠But shows that it can be done âŚ
schema.org
43. First steps for scientific data publication
⢠OPTC directive for data from federally funded research to be
freely available
⢠Formation of new âData Scienceâ institute inside NIH
⢠Seeing traction in scientific data on the web
⢠Lot of interest in creating schemas
⢠Public repositories for scientific data starting
schema.org
44. Concluding
⢠Structured data on the web is now âweb scaleâ
⢠Schema.org has got traction and is evolving
⢠The most interesting applications are yet to come
schema.org