These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
1. The Semantic Web LandscapeA Practical Introduction Lee Feigenbaum VP Technology & Standards, Cambridge Semantics Co-chair, W3C SPARQL Working Group For CSHALS 2010 Tutorial Attendees February 24, 2010
2. The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question: A Motivating Example: Drug Discovery Find me genes involved in signal transduction that are related to pyramidal neurons.
8. A Semantic Web Approach (cont’d) …(trivially) spans several databases…
9. A Semantic Web Approach (cont’d) …to deliver targeted results…
10. Agreement on common terms and relationships Incremental, flexible data structure Good-enough modeling Query interface tailored to the data model What’s the trick?
13. Semantic Web Web of Data Giant Global Graph Data Web Web 3.0 Linked Data Web Semantic Data Web Branding
14. “The Semantic Web” a.k.a “Linked Open Data” Augments the World Wide Web Represents the Web’s information in a machine-readable fashion Enables… …targeted search …data browsing …automated agents What is it & why do we care? (1) World Wide Web : Web pages :: The Semantic Web : Data
15. “Semantic Web technologies” A family of technology standards that ‘play nice together’, including: Flexible data model Expressive ontology language Distributed query language Drive Web sites, enterprise applications What is it & why do we care? (2) The technologies enable us to build applications and solutions that were not possible, practical, or feasible traditionally.
16. A common set of technologies: ...enables diverse uses ...encourages interoperability A coherent set of technologies: …encourage incremental application …provide a substantial base for innovation A standard set of technologies: ...reduces proprietary vendor lock-in ...encourages many choices for tool sets A Common & Coherent Set of Technology Standards
19. As technologies & tools have evolved, Semantic Web advocates have progressed through stages: 2010: Where we are
20. 2010: Where we’re not Image from Trey Ideker via Enoch Huang Semantic Web technologies are not a ‘magic crank’ for discovering new drugs (or solving other problems, for that matter)!
21. 2010: Where we’re not (cont’d) XML vs. RDF? “Ontology” vs. “ontology”? Data integration vs. reasoning vs. KBs vs. search vs. app. development vs. … Semantic Web vs. Linked Data? The Semantic Web still suffers from confusing and conflicting messaging, each of which asserts it’s “correct”.
22. 2010: Where we’re not (cont’d) People with appropriate skill sets for designing & building Semantic Web solutions are not widely available.
23. 2010: Where we’re not (cont’d) We don’t yet have standard solutions for privacy, trust, probability, and other elements of the Semantic Web vision.
27. RDF is… A schema-less data model that features unambiguous identifiers and named relations between pairs of resources.
28. RDF is… A labeled, directed graph of relations between resources and literal values. RDF graphs are collections of triples Triples are made up of a subject, a predicate, and an object Resources and relationships are named with URIs predicate subject object
29. “Lee Feigenbaum works for Cambridge Semantics” “Lee Feigenbaum was born in 1978” “Cambridge Semantics is headquartered in Massachusetts” Example RDF triples works for born in headquartered Lee Feigenbaum Cambridge Semantics Lee Feigenbaum Cambridge Semantics 1978 Massachusetts
30. Triples connect to form graphs headquartered lives in Massachusetts born in capital works for Lee Feigenbaum Cambridge Semantics Boston 1978
31. The graph data structure makes merging datawith shared identifiers trivial Triples act as a least common denominatorfor expressing data URIs for naming remove ambiguity …the same identifier means the same thing Why RDF? What’s different here?
33. RDF is the model, for which there are several concrete syntaxes: RDF/XML – standard, complex XML syntax Turtle – common, textual, triples-oriented syntax N3 – more expressive superset of Turtle N-Triples – textual, line-oriented, useful for streaming What does RDF look like? When writing RDF by hand and in many guides, examples, and discussions these days, you’ll see Turtle most often.
34. Write a triple by writing its parts separated by spaces (subject predicate object) A Bit of Turtle @prefix ex: <http://example.org/myvocab/> . @prefix geo: <http://geonames.example/> . ex:LeeFeigenbaumex:employerex:CambridgeSemantics . ex:LeeFeigenbaumex:birthYear 1978 . ex:CambridgeSemanticsex:headquartersgeo:BostonMA . geo:BostonMAex:population 574000 .
37. SPARQL is… A SQL-like language for querying sets of RDF graphs.
38. SPARQL is… A simple protocol for issuing queries and receiving results over HTTP. So… Every SPARQL client works with every SPARQL server!
39. SPARQL lets us: Pull information from structured and semi-structured data. Explore data by discovering unknown relationships. Query and search an integrated view of disparate data sources. Glue separate software applications together by transforming data from one vocabulary to another. Why SPARQL?
40. Dealer 1 Dealer 2 Dealer 3 Employee Directory ERP / Budget System Web EPA Fuel Efficiency Spreadsheet SPARQL Query Engine What automobiles get more than 25 miles per gallon, fit within my department’s budget, and can be purchased at a dealer located within 10 miles of one of my employees? SELECT ?automobile WHERE { ?automobile a ex:Car ; epa:mpg ?mpg ; ex:dealer ?dealer . ?employee a ex:Employee ; geo:loc ?loc . ?dealer geo:loc ?dealerloc . FILTER(?mpg > 25 && geo:dist(?loc, ?dealerloc) <= 10) . } Web dashboard SPARQL query
43. 3 pieces of the Semantic Web technology stack are about describing a domain well enough to capture (some of) the meaning of resources and relationships in the domain RDF Schema OWL RIF From the explicit to the inferred Apply knowledge to data to get more data.
45. Elements of: Vocabulary (defining terms) I define a relationship called “prescribed dose.” Schema (defining types) “prescribed dose” relates “treatments” to “dosages” (my prescribed dose is 2mg; therefore 2mg is a dosage) Taxonomy (defining hierarchies) Any “doctor” is a “medical professional” (therefore Dr. Brown is a medical professional) RDF Schema is…
47. Elements of ontology Same/different identity “author” and “auteur” are the same relation two resources with the same “ISBN” are the same “book” More expressive type definitions A “cycle” is a “vehicle” with at least one “wheel” A “bicycle” is a “cycle” with exactly two “wheels” More expressive relation definitions “sibling” is a symmetric predicate the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc” OWL is…
48. A class is a (named) collection of things with similar attributes OWL: Rich Class Definitions
49. A class is a (named) collection of things with similar attributes OWL: Rich Class Definitions
50. A class is a (named) collection of things with similar attributes OWL: Rich Class Definitions
53. Standard representation for exchanging sets of logical and business rules Logical rules A buyer buys an item from a seller if the seller sells the item to the buyer A customer becomes a "Gold" customer as soon as his cumulative purchases during the current year top $5000 Production rules Customers that become "Gold" customers must be notified immediately, and a golden customer card will be printed and sent to them within one week For shopping carts worth more than $1000, "Gold" customers receive an additional discount of 10% of the total amount RIF is…
54. Fantasy Land Architecture Ontology / Schema + Custom UI Custom UI Custom UI Custom UI Custom UI Custom UI
69. Semantic Web Tools In 2010, there are a wide variety of open-source and commercial Semantic Web tools available.
70. Triple stores Built on relational database Native RDF store Development libraries Full-featured application servers Types of RDF Tools Most RDF tools contain some elements of each of these.
72. Query engines Things that can run queries Most RDF stores provide a SPARQL engine Query rewriters E.g. to query relational databases (more later) Endpoints Things that accept queries on the Web and return results Client libraries Things that make it easy to ask queries Types of SPARQL Tools
73. Community-maintained list of query engines http://esw.w3.org/topic/SparqlImplementations Publicly accessible SPARQL endpoints http://esw.w3.org/topic/SparqlEndpoints Michael Bergman’s Sweet Tools searchable list: http://www.mkbergman.com/?page_id=325 Finding SPARQL Tools
79. What about… everything else? Standards don’t yet exist, but many tools exist to derive RDF and/or run SPARQL queries against other sources of data.
85. On the Web Google, Yahoo! Best Buy NY Times US Government UK Government Where is it being used?
86. Industries Oil & Gas (integration, classification) Finance (structured data, ontologies, XBRL) Publishing (metadata) Government (structured data, metadata, classification) Libraries & museums (metadata, classification) IT (rapid application development & evolution) Where is it being used?
87. Health Care Cleveland Clinic Clinical research Data integration, classification (= better search) UT School of Health Public health surveillance SAPPHIRE—classification, ontology-driven development Various Clinical Decision Support Agile, rule-driven, scalable in the face of change Where is it being used?
88. Life Sciences Agile knowledgebases at Pfizer Target assessment at Eli Lilly Integrated information links at Novartis Astra Zeneca, J&J, UCB, … Where is it being used? CSHALS chronicles many of these uses and many more.
90. These are horizontal, enabling technologies. But they apply particularly well to problems with these characteristics: Heterogeneous data from multiple sources Increasing reliance on connections within this data Rapidly changing information needs Significant early-mover advantage Large amounts of data that would benefit from classification Why are Semantic Web technologies appropriate for the life sciences? Many tactical and strategic challenges in the life sciences industry feature these traits.
92. Getting Started with Semantic Web technologies Goal: quick tactical wins on the path to large strategic value Be sure to consider the operational ramifications Who does what differently? Ideal Semantic Web projects/applications have an incremental path towards broad deployment that generates demonstrable value along the way
93. Look beyond the core Semantic Web capabilities and consider: integration with existing enterprise systems development & extension models deployment, logging, maintenance, backup tooling user experience Choose practical, enterprise-ready tools If you choose to build new components and assemble existing components together, it’s quite likely you’ll end up reinventing the wheel.
94. What level of expertise is necessary? Technologies only? Technologies + API? Technologies + tooling? Tooling only? … How will we acquire the expertise? In-house (and if so, how?) Vendor services 3rd-party services Open-source community Plan for Acquiring Expertise
95. I’m always happy to field questions & engage in discussion: lee@cambridgesemantics.com Thanks & Discussion
Hinweis der Redaktion
One of the goals of this tutorial is to de-mystify the all of the names of technologies, tools, projects, etc. that swirl around the Semantic Web story.And since I saw that as I researched this presentation, everyone seems to like this particular Gary Larson cartoon, it behooved me to include it.
The good – emphasize the importance of the foundational layers (URIs and RDF) ; emphasizes the long-term roadmap/vision of what’s needed for the Semantic WebThe bad – implies that perhaps things can’t be taken serious until all the pieces are in place ; implies an order to the research ; various versions of the cake tell different stories (importance of XML, absence of query, lack of UI/application layer, …)Valentin Zacharias wrote about the “infamy” part of the layer cake here: http://www.valentinzacharias.de/blog/2007/04/ban-semantic-web-layer-cake.html
The Ontology/ontology dichotomy is captured well by Jim Hendler at http://www.cs.rpi.edu/%7Ehendler/presentations/SemTech2008-2Towers.pdf
Definition.
Prescriptive.
Descriptive.
Formal.
The first is as opposed to relational tables or XML schemas where the schema needs to be explicitly adjusted to accommodate whatever data is being merged.The second is due to the expressivity of the model – can handle lists, trees, n-ary relations, etc.The third is as opposed to table & column identifiers or XML attribute names.
Definition.
Prescriptive.
Descriptive.
Descriptive (part 2). This is leagues ahead of the situation with SQL!