SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Downloaden Sie, um offline zu lesen
Ivan Herman
F2F Meeting of the W3C Business Group on Oil, Gas, and Chemicals
                   Houston, February 13, 2012
(2)
(3)
}    An intelligent system manipulating and analyzing
            knowledge bases
            §  e.g., via big ontologies, vocabularies
      }  A means to manage large amount of data
      }  Improve search by adding structure to embedded
          data
      }  A means to integrate many different pieces of data

      }  And a mixture of all these…




(5)
}    Help in finding the best drug regimen for a
                 specific patient




(6)   Courtesy of Erick Von Schweber, PharmaSURVEYOR Inc., (SWEO Use Case)
(7)   Courtesy of the BBC
(8)   Courtesy of the BBC
(9)
(10)
(11)
}  We have to acknowledge that the field has grown
           and has become multi-faceted
       }  All different “views” have their success stories

       }  There are also no clear and water-proof boundaries
           between the different views
       }  The question is: where is the emphasis?




(13)
}    There are more and more data on the Web
             §  government data, health related data, general knowledge,
                 company information, flight information, restaurants,…
       }    More and more applications rely on the availability of
             that data




(14)
Photo credit “nepatterson”, Flickr
}    A “Web” where
             §  documents are available for download on the Internet
             §  but there would be no hyperlinks among them




(16)
(17)
}    We need a proper infrastructure for a real Web of
             Data
             §  data is available on the Web
               •  accessible via standard Web technologies
             §  data are interlinked over the Web
               •  the terms used for linkage are well defined
       }    I.e.: data can be integrated over the Web




(18)
Photo credit “kxlly”, Flickr
(20)
Web of Data
                              Applications
                     Stand Alone         Browser
                     Applications      Applications



                                       Query and Update




                     Common Format &
       Inferencing      Common
                       Vocabularies



                                       “Bridges”




                          Data on the Web
(21)
}    Some technologies are in the process of finalization
             §  SPARQL 1.1 (SPARQL Protocol and RDF Query Language)
             §  RDB2RDF (Relational Databases to RDF)
             §  RDFa 1.1 (RDF in attributes)




(23)
}    Some areas are subject of intensive work
             §  RDF update (Resource Description Framework)
             §  Provenance




(24)
}    We are discussing new works, new areas, e.g.,
             §  Linked Data Platform
             §  Access Control issues
             §  Constraint checking on Semantic Web data
             §  …




(25)
}  Various communities have different emphasis on
           which part of the Semantic Web they want to use
       }  W3C has contacts with some of those
             §  health care and life sciences (a separate IG is up and
                 running)
             §  libraries, publishing
             §  financials
             §  and of course… the oil, gas, and chemicals community!
             




(26)
Photo credit “reedster”, Flickr
}  SPARQL is a query language on RDF data
       }  SPARQL is defined in terms of a protocol, to send
           query and results over the Web
       }  Is based on the idea of “graph pattern matching”:
             1.  a graph pattern is described in the query, with real and
                 unknown nodes (“variables”)
             2.  if the pattern can match a portion of the graph, the
                 unknown nodes are replaced by the “real” ones
             3.  resulting information is returned
       }     First version of SPARQL was published in 2008


(28)
}  Nested queries (i.e., SELECT within a WHERE clause)
       }  Negation (MINUS, and a NOT EXIST filter)

       }  Aggregate function on search results (SUM, MIN,…)

       }  Property path expression (?x foaf:knows+ ?y)

       }  SPARQL UPDATE facilities (INSERT, DELETE,
           CREATE)
       }  Combination with entailment regimes




(29)
SPARQL Engine with entailment
       RDF Data

       RDFS/OWL/RIF data                          entailment
       SPARQL Pattern




                                     RDF Data with extra triples

                                     SPARQL Pattern


       Query result                               pattern
                                                  matching




(30)
SPARQL Endpoint




                                                                                       SPARQL Endpoint
                                              Application

                                                                                 ct
                                                                             stru
                                                                          Con
                                                                    RQL
                                                                 SPA
   Triple store                                                                                           Database
                                           SPARQL Processor




                                                NLP Techniques




                                                                                      SQLRDF
       RDF Graph                                                                                         Relational
                                                                                                         Database




                                    HTML     Unstructured Text     XML/XHTML
(31)
SPARQL Endpoint




                                                                                       SPARQL Endpoint
                                              Application

                                                                                 ct
                                                                             stru
                                                                          Con
                                                                    RQL
                                                                 SPA
   Triple store                                                                                           Database
                                           SPARQL Processor

  Inferencing




                                                NLP Techniques




                                                                                      SQLRDF
       RDF Graph                                                                                         Relational
                                                                                                         Database


                                                                                                     Inferencing

                                    HTML     Unstructured Text     XML/XHTML
(32)
}  Technology has been finalized
       }  Goes to “candidate recommendation” soon

       }  Should be finished this summer




(33)
Photo credit “mayhem”, Flickr
}  Most of the data on the Web is, in fact, in RDB-s
       }  Proven technology, huge systems, many vendors…

       }  Data integration on the Web must provide access to
           RDB-s




(35)
Common “view”
            in
           RDF




          export
           query




(36)
}    “Export” does not necessarily mean physical
             conversion
             §  for very large databases a “duplication” would not be an
                 option
             §  systems may provide SPARQL⇔SQL “bridges” to make
                 queries on the fly
       }    Result of export is a “logical” view of the RDB
             content




(37)
}  A standard RDF “view” of RDB tables
       }  Does not require any more information than what is
           in the RDB Schema
       }  Fundamental approach:
             §  each row is turned into a series of triples with a common
                 subject
             §  column names provide the predicate names
             §  cell contents are the objects as literals
             §  linked tables are expressed with URI subjects




(38)
ISBN        Author           Title        Publisher            Year
       0006511409X           id_xyz     The Glass Palace   id_qpr        2000
       0007179871            id_xyz     The Hungry Tide    id_qpr        2004




                 ID           Name                    Homepage
        id_xyz          Ghosh, Amitav       http://www.amitavghosh.com




(39)
ISBN        Author           Title        Publisher            Year
                                                                                       Each row is a
       0006511409X           id_xyz     The Glass Palace   id_qpr        2000
       0007179871            id_xyz     The Hungry Tide    id_qpr        2004
                                                                                       subject




                 ID           Name                    Homepage
        id_xyz          Ghosh, Amitav       http://www.amitavghosh.com




(40)
Each column name provides a predicate


                 ISBN        Author           Title        Publisher            Year
                                                                                       Each row is a
       0006511409X           id_xyz     The Glass Palace   id_qpr        2000
       0007179871            id_xyz     The Hungry Tide    id_qpr        2004
                                                                                       subject




                 ID           Name                    Homepage
        id_xyz          Ghosh, Amitav       http://www.amitavghosh.com




(41)
Each column name provides a predicate


                 ISBN        Author           Title        Publisher            Year
                                                                                       Each row is a
       0006511409X           id_xyz     The Glass Palace   id_qpr        2000
       0007179871            id_xyz     The Hungry Tide    id_qpr        2004
                                                                                       subject




        Table references are
        URI objects



                 ID           Name                    Homepage
        id_xyz          Ghosh, Amitav       http://www.amitavghosh.com




(42)
Each column name provides a predicate


                 ISBN        Author           Title        Publisher            Year
                                                                                           Each row is a
       0006511409X           id_xyz     The Glass Palace   id_qpr        2000
       0007179871            id_xyz     The Hungry Tide    id_qpr        2004
                                                                                           subject




        Table references are
        URI objects
                                                                       Cells are Literal objects


                 ID           Name                    Homepage
        id_xyz          Ghosh, Amitav       http://www.amitavghosh.com




(43)
Tables
         RDB     Direct
       Schema   Mapping




(44)
Tables
         RDB     Direct
       Schema   Mapping




                          “Direct Graph”




(45)
}    Pros:
             §  Direct Mapping is simple, does not require any other
                 concepts
             §  know the Schema ⇒ know the RDF graph structure
             §  know the RDF graph structure ⇒ good idea of the
                 Schema(!)
       }    Cons:
             §  the resulting graph is not what the application really wants




(46)
Tables
         RDB         Direct
       Schema       Mapping




                                “Direct Graph”


                 Graph Processing
                (Rules, SPARQL, …)



                                Final, Application Graph


(47)
}    Separate vocabulary to control the details of the
             mapping, e.g.:
             §  finer control over the choice of the subject
             §  creation of URI references from cells
             §  predicates may be chosen from a vocabulary
             §  datatypes may be assigned
             §  etc.
       }    Gets to the final RDF graph with one processing
             step



(48)
R2RML           Tables
         RDB    Instance
       Schema




                 R2RML
                Mapping




                           Final, Application Graph


(49)
}    Fundamentals are similar:
             §  each row is turned into a series of triples with a common
                 subject
       }    Direct mapping is a “default” R2RML mapping




(50)
}  Technology has been finalized
       }  Should go to “candidate recommendation” these
           days
       }  Should be finished this summer




(51)
}  Not necessarily large amount of data per page, but
           lots of them…
       }  Has become very valuable to search engines
             §  Google, Bing, Yahoo!, or Yandex (i.e., schema.org) all
                 committed to use such data
       }    Two syntaxes have emerged at W3C:
             §  microdata with HTML5
             §  RDFa with the HTML5, XHTML, and with XML languages in
                 general




(53)
(54)
}    Both have similar philosophies:
             §  the structured data is expressed via attributes only (no
                 specialized elements)
             §  both define some special attributes
              •  e.g., itemscope for microdata, resource for RDFa
             §  both reuse some HTML core attributes (e.g., href)
             §  both reuse the textual content of the HTML source, if
                 needed
       }    RDF data can be extracted from both
             §  i.e., HTML+RDFa and HTML+microdata have become an
                 additional source of Linked Data


(55)
}    Microdata has been optimized for simpler use cases,
             concentrating on
             §  one vocabulary at a time
             §  tree shaped data
             §  no datatypes, no language control beyond HTML’s
       }    RDFa provides a full serialization of RDF in XML or
             HTML
             §  the price is an extra complexity compared to microdata
       }    RDFa 1.1 Lite is a simplified authoring profile of
             RDFa, very similar to microdata


(56)
}    For RDFa 1.1
             §  Technology has been finalized
             §  Should go to “candidate recommendation” in March
             §  Should be finished this summer
       }    For microdata
             §  Technology has been finalized
             §  Is part of HTML5, hence its advancement depends on other
                 technologies




(57)
Nexus Simulation Credit Erich Bremmer
}    Resource Description Framework: a graph-based
             model for (Web) data and its relationships
             §  has a simple (subject,predicate,object) model
             §  makes use of URI-s for the naming of terms
              •  objects can also be Literals
             §  informally: defines named relationships (named links)
                 among entities on the Web
             §  has different serialization formats
       }    Latest version was published in 2004




(59)
}    Many issues have come up since 2004:
             §  deployment issues
             §  new functionalities are needed
             §  underlying technology may have moved on (e.g., datatypes)
       }  The goal of the RDF Working Group is to refresh
           RDF
       }  NOT a complete reshaping of the standard!




(60)
}  Standardize Turtle as a serialization format
       }  Clean up some aspects of datatyping, e.g.:
             §  plain vs. typed literals
             §  details and role of rdf:XMLLiteral
       }    Proper definition for “named graphs”
             §  including concepts, semantics, syntax, …
              •  obviously important for linked data access
              •  but generates quite some discussions on the details
       }    etc.



(61)
}    Cleanup the documents, make them more readable
             §  possibly rewrite all documents
             §  maybe a completely new primer
             §  new structure for the Semantics document




(62)
}  Work has begun a bit less than a year ago
       }  Turtle is almost finalized

       }  Agreement on most of the literal cleanup

       }  Lots of discussion currently on named graphs…




(63)
}    We should be able to express all sorts of “meta”
             information on the data
             §  creator: who played what role in creating the data (author,
                 reviewer, etc.)
             §  view of the full revision chain of the data
             §  in case of a integrated data: which part comes from which
                 original data and under what process
             §  what vocabularies/ontologies/rules were used to generate
                 some portions of the data
             §  etc.




(65)
}  Requires a complete model describing the various
           constituents (actors, revisions, etc.)
       }  The model should be describable and usable with
           RDF
       }  Has to find a balance between
             §  simple (“scruffy”) provenance: easily usable and editable
             §  complex (“complete”) provenance: allows for a detailed
                 reporting of origins, versions, etc.
       }    That is the role of the Provenance Working Group
             (started in 2011)


(66)
ex:chart




(67)
ex:chart

                 prov:wasGeneratedBy

       ex:illustrate




(68)
ex:aggregation                                               ex:chart

                        prov:used             prov:wasGeneratedBy

                                    ex:illustrate




(69)
ex:aggregation                                                   ex:chart

                        prov:used             prov:wasGeneratedBy

                                    ex:illustrate


                                             prov:wasControlledBy




                                                                           foaf:name

                                                                                Derek

                                                                    foaf:mbox


                                                            mailto:derek@ex.com


(70)
ex:aggregation                                                   ex:chart

                        prov:used             prov:wasGeneratedBy

                                    ex:illustrate


         prov:wasGeneratedBy                 prov:wasControlledBy




                   ex:aggregate

                                                                           foaf:name

                                                                                Derek

                                                                    foaf:mbox


                                                            mailto:derek@ex.com


(71)
ex:aggregation                                                      ex:chart

                          prov:used              prov:wasGeneratedBy

                                       ex:illustrate


          prov:wasGeneratedBy                   prov:wasControlledBy




                       ex:aggregate

                                                                              foaf:name

                                                                                   Derek
             prov:used            prov:used
                                                                       foaf:mbox

       ex:regionList                  ex:dataSet
                                                               mailto:derek@ex.com


(72)
ex:aggregation                                                      ex:chart

                          prov:used              prov:wasGeneratedBy

                                       ex:illustrate


          prov:wasGeneratedBy                   prov:wasControlledBy



                                          prov:wasControlledBy
                       ex:aggregate

                                                                              foaf:name

                                                                                   Derek
             prov:used            prov:used
                                                                       foaf:mbox

       ex:regionList                  ex:dataSet
                                                                 mailto:derek@ex.com


(73)
}    There are ways to express more complex
             provenance situation
             §  giving more details on the action, on the exact role a person
                 has played
             §  information on versioning changes
             §  etc.




(74)
ex:aggregation                                                      ex:chart

                          prov:used              prov:wasGeneratedBy

                                       ex:illustrate


          prov:wasGeneratedBy                   prov:wasControlledBy



                                          prov:wasControlledBy
                       ex:aggregate

                                                                              foaf:name

                                                                                   Derek
             prov:used            prov:used
                                                                       foaf:mbox

       ex:regionList                  ex:dataSet
                                                                 mailto:derek@ex.com


(75)
}    “Linked Data” is also a set of principles:
             §  put things on the Web through URI-s
             §  use HTTP URI-s so that things could be dereferenced
             §  provide useful information when a URI is dereferenced
             §  include links to other URI-s
       }    RDF is an ideal vehicle to realize these principles




(77)
(78)Courtesy   of Richard Cyganiak and Anja Jentzsch
(79)Courtesy   of Frank van Harmelen, ISWC2011 keynote address
}  Scale: we are talking about billions of triples,
           increasing every day
       }  Highly distributed: data spread over the Web,
           connected via http links
       }  Very heterogeneous data of different origins

       }  Integrity, constraint checking of data becomes more
           an more important




(80)
}    Knowledge structures vs. data is very different: very
             shallow, simple vocabularies for huge sets of data
             §  The role of reasoning is different (vocabularies, OWL DL,
                 etc., may not be feasible)
       }  Highly distributed SPARQL implementations are
           necessary
       }  SPARQL endpoint may be too complicated
             §  direct HTTP access to RDF data may become an alternative
       }    etc.



(81)
(82)Courtesy   of Frank van Harmelen, ISWC2011 keynote address
(83)Courtesy   of Frank van Harmelen, ISWC2011 keynote address
}    Two major work areas:
             1.  Linked Data Profiles: subsets of existing Semantic Web
                 standards to be used for Linked Data, e.g.,
              •  use only a subset of datatypes
              •  use some subset of RDFS and OWL only (e.g., OWL 2 RL or
                 part thereof)
              •  use HTTP URI-s only, restrict the usage of blank nodes
              •  etc.
             2.  Define an HTTP protocol to 
              •  access and update RDF data 
              •  RESTful API



(84)
}    Two major work areas:
             1.  Linked Data Profiles: subsets of existing Semantic Web
                 standards to be used for Linked Data, e.g.,
              •  use only a subset of datatypes
              •  use some subset of RDFS and OWL only (e.g., OWL 2 RL or

                                          !!!
                 part thereof)

                                   lanned
              •  use HTTP URI-s only, restrict the usage of blank nodes
              •  etc.
            P
             2.  Define an HTTP protocol to 
              •  access and update RDF data 
              •  RESTful API



(85)
}  Standardized approaches for Access Control to data
       }  Reconsider rule languages for (e.g., for Linked Data
           applications)
       }  Constraint checking of Data

       }  JSON serialization of RDF

       }  API-s for client-side Web Application Developers

       }  More general view on Web of Data
             §  harmonizing the RDF, XML, and other views?
       }    …


(87)
}  Emphasis is on challenges by the Web of Data
       }  New aspects of Semantic Web technologies have to
           be explored
       }  There is work for everybody, join the club!




(88)
These slides are also available on the Web:
       
         http://www.w3.org/2012/Talks/0213-Houston-IH/



(89)

Weitere ähnliche Inhalte

Was ist angesagt? (6)

Enabling ontology based streaming data access final
Enabling ontology based streaming data access finalEnabling ontology based streaming data access final
Enabling ontology based streaming data access final
 
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
Interactive exploration of complex relational data sets in a web - SemWeb.Pro...
 
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 2 (...
 
Swib12 workshop lod_beginners
Swib12 workshop lod_beginnersSwib12 workshop lod_beginners
Swib12 workshop lod_beginners
 
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
NOSQL Overview, Neo4j Intro And Production Example (QCon London 2010)
 

Andere mochten auch (10)

Ontologies and Vocabularies
Ontologies and VocabulariesOntologies and Vocabularies
Ontologies and Vocabularies
 
Connotation (mad)
Connotation (mad)Connotation (mad)
Connotation (mad)
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Valentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introductionValentina Presutti - Ontology Design Patterns: an introduction
Valentina Presutti - Ontology Design Patterns: an introduction
 
Using ontology for natural language processing
Using ontology for natural language processingUsing ontology for natural language processing
Using ontology for natural language processing
 
Vocabulary Slides - Selecting & Teaching
Vocabulary Slides - Selecting & TeachingVocabulary Slides - Selecting & Teaching
Vocabulary Slides - Selecting & Teaching
 
Teach vocabulary
Teach vocabularyTeach vocabulary
Teach vocabulary
 
Denotation and Connotation Exercises
Denotation and Connotation ExercisesDenotation and Connotation Exercises
Denotation and Connotation Exercises
 
Denotation and Conotation
Denotation and ConotationDenotation and Conotation
Denotation and Conotation
 
Denotation and-connotation
Denotation and-connotationDenotation and-connotation
Denotation and-connotation
 

Ähnlich wie Ivan Herman - Semantic Web Activities @ W3C

SPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergySPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergy
Yannis Kalfoglou
 
RESTful writable APIs for the web of Linked Data using relational storage sol...
RESTful writable APIs for the web of Linked Data using relational storage sol...RESTful writable APIs for the web of Linked Data using relational storage sol...
RESTful writable APIs for the web of Linked Data using relational storage sol...
Antonio Garrote Hernández
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
Thomas Roessler
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
తేజ దండిభట్ల
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Rathachai Chawuthai
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
Lino Valdivia
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
DBOnto
 

Ähnlich wie Ivan Herman - Semantic Web Activities @ W3C (20)

SPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergySPARQL and SQL: technical aspects and synergy
SPARQL and SQL: technical aspects and synergy
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
 
Web Spa
Web SpaWeb Spa
Web Spa
 
RESTful writable APIs for the web of Linked Data using relational storage sol...
RESTful writable APIs for the web of Linked Data using relational storage sol...RESTful writable APIs for the web of Linked Data using relational storage sol...
RESTful writable APIs for the web of Linked Data using relational storage sol...
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...Achieving time effective federated information from scalable rdf data using s...
Achieving time effective federated information from scalable rdf data using s...
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
A year on the Semantic Web @ W3C
A year on the Semantic Web @ W3CA year on the Semantic Web @ W3C
A year on the Semantic Web @ W3C
 
Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...8th TUC Meeting -  Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
8th TUC Meeting - Zhe Wu (Oracle USA). Bridging RDF Graph and Property Graph...
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Sparql semantic information retrieval by
Sparql semantic information retrieval bySparql semantic information retrieval by
Sparql semantic information retrieval by
 
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONSSPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
SPARQL: SEMANTIC INFORMATION RETRIEVAL BY EMBEDDING PREPOSITIONS
 
SRBench Streaming RDF SPARQL Benchmark
SRBench Streaming  RDF SPARQL BenchmarkSRBench Streaming  RDF SPARQL Benchmark
SRBench Streaming RDF SPARQL Benchmark
 
Standards for Semantic Mashups
Standards for Semantic MashupsStandards for Semantic Mashups
Standards for Semantic Mashups
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 

Mehr von sssw2012 (7)

Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
Collaborative ontology development
Collaborative ontology developmentCollaborative ontology development
Collaborative ontology development
 
Manfred Linking the Real World
Manfred Linking the Real WorldManfred Linking the Real World
Manfred Linking the Real World
 
The Web of Data - Tom Heath
The Web of Data - Tom HeathThe Web of Data - Tom Heath
The Web of Data - Tom Heath
 
Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez ...
Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez ...Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez ...
Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez ...
 
jerome Euzenat - Ontology Matching
jerome Euzenat - Ontology Matchingjerome Euzenat - Ontology Matching
jerome Euzenat - Ontology Matching
 
Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective
Aldo Gangemi - Meaning on the Web: An Empirical Design PerspectiveAldo Gangemi - Meaning on the Web: An Empirical Design Perspective
Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Kürzlich hochgeladen (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Ivan Herman - Semantic Web Activities @ W3C

  • 1. Ivan Herman F2F Meeting of the W3C Business Group on Oil, Gas, and Chemicals Houston, February 13, 2012
  • 2. (2)
  • 3. (3)
  • 4.
  • 5. }  An intelligent system manipulating and analyzing knowledge bases §  e.g., via big ontologies, vocabularies }  A means to manage large amount of data }  Improve search by adding structure to embedded data }  A means to integrate many different pieces of data }  And a mixture of all these… (5)
  • 6. }  Help in finding the best drug regimen for a specific patient (6) Courtesy of Erick Von Schweber, PharmaSURVEYOR Inc., (SWEO Use Case)
  • 7. (7) Courtesy of the BBC
  • 8. (8) Courtesy of the BBC
  • 9. (9)
  • 10. (10)
  • 11. (11)
  • 12.
  • 13. }  We have to acknowledge that the field has grown and has become multi-faceted }  All different “views” have their success stories }  There are also no clear and water-proof boundaries between the different views }  The question is: where is the emphasis? (13)
  • 14. }  There are more and more data on the Web §  government data, health related data, general knowledge, company information, flight information, restaurants,… }  More and more applications rely on the availability of that data (14)
  • 16. }  A “Web” where §  documents are available for download on the Internet §  but there would be no hyperlinks among them (16)
  • 17. (17)
  • 18. }  We need a proper infrastructure for a real Web of Data §  data is available on the Web •  accessible via standard Web technologies §  data are interlinked over the Web •  the terms used for linkage are well defined }  I.e.: data can be integrated over the Web (18)
  • 20. (20)
  • 21. Web of Data Applications Stand Alone Browser Applications Applications Query and Update Common Format & Inferencing Common Vocabularies “Bridges” Data on the Web (21)
  • 22.
  • 23. }  Some technologies are in the process of finalization §  SPARQL 1.1 (SPARQL Protocol and RDF Query Language) §  RDB2RDF (Relational Databases to RDF) §  RDFa 1.1 (RDF in attributes) (23)
  • 24. }  Some areas are subject of intensive work §  RDF update (Resource Description Framework) §  Provenance (24)
  • 25. }  We are discussing new works, new areas, e.g., §  Linked Data Platform §  Access Control issues §  Constraint checking on Semantic Web data §  … (25)
  • 26. }  Various communities have different emphasis on which part of the Semantic Web they want to use }  W3C has contacts with some of those §  health care and life sciences (a separate IG is up and running) §  libraries, publishing §  financials §  and of course… the oil, gas, and chemicals community! (26)
  • 28. }  SPARQL is a query language on RDF data }  SPARQL is defined in terms of a protocol, to send query and results over the Web }  Is based on the idea of “graph pattern matching”: 1.  a graph pattern is described in the query, with real and unknown nodes (“variables”) 2.  if the pattern can match a portion of the graph, the unknown nodes are replaced by the “real” ones 3.  resulting information is returned }  First version of SPARQL was published in 2008 (28)
  • 29. }  Nested queries (i.e., SELECT within a WHERE clause) }  Negation (MINUS, and a NOT EXIST filter) }  Aggregate function on search results (SUM, MIN,…) }  Property path expression (?x foaf:knows+ ?y) }  SPARQL UPDATE facilities (INSERT, DELETE, CREATE) }  Combination with entailment regimes (29)
  • 30. SPARQL Engine with entailment RDF Data RDFS/OWL/RIF data entailment SPARQL Pattern RDF Data with extra triples SPARQL Pattern Query result pattern matching (30)
  • 31. SPARQL Endpoint SPARQL Endpoint Application ct stru Con RQL SPA Triple store Database SPARQL Processor NLP Techniques SQLRDF RDF Graph Relational Database HTML Unstructured Text XML/XHTML (31)
  • 32. SPARQL Endpoint SPARQL Endpoint Application ct stru Con RQL SPA Triple store Database SPARQL Processor Inferencing NLP Techniques SQLRDF RDF Graph Relational Database Inferencing HTML Unstructured Text XML/XHTML (32)
  • 33. }  Technology has been finalized }  Goes to “candidate recommendation” soon }  Should be finished this summer (33)
  • 35. }  Most of the data on the Web is, in fact, in RDB-s }  Proven technology, huge systems, many vendors… }  Data integration on the Web must provide access to RDB-s (35)
  • 36. Common “view” in RDF export query (36)
  • 37. }  “Export” does not necessarily mean physical conversion §  for very large databases a “duplication” would not be an option §  systems may provide SPARQL⇔SQL “bridges” to make queries on the fly }  Result of export is a “logical” view of the RDB content (37)
  • 38. }  A standard RDF “view” of RDB tables }  Does not require any more information than what is in the RDB Schema }  Fundamental approach: §  each row is turned into a series of triples with a common subject §  column names provide the predicate names §  cell contents are the objects as literals §  linked tables are expressed with URI subjects (38)
  • 39. ISBN Author Title Publisher Year 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com (39)
  • 40. ISBN Author Title Publisher Year Each row is a 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 subject ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com (40)
  • 41. Each column name provides a predicate ISBN Author Title Publisher Year Each row is a 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 subject ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com (41)
  • 42. Each column name provides a predicate ISBN Author Title Publisher Year Each row is a 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 subject Table references are URI objects ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com (42)
  • 43. Each column name provides a predicate ISBN Author Title Publisher Year Each row is a 0006511409X id_xyz The Glass Palace id_qpr 2000 0007179871 id_xyz The Hungry Tide id_qpr 2004 subject Table references are URI objects Cells are Literal objects ID Name Homepage id_xyz Ghosh, Amitav http://www.amitavghosh.com (43)
  • 44. Tables RDB Direct Schema Mapping (44)
  • 45. Tables RDB Direct Schema Mapping “Direct Graph” (45)
  • 46. }  Pros: §  Direct Mapping is simple, does not require any other concepts §  know the Schema ⇒ know the RDF graph structure §  know the RDF graph structure ⇒ good idea of the Schema(!) }  Cons: §  the resulting graph is not what the application really wants (46)
  • 47. Tables RDB Direct Schema Mapping “Direct Graph” Graph Processing (Rules, SPARQL, …) Final, Application Graph (47)
  • 48. }  Separate vocabulary to control the details of the mapping, e.g.: §  finer control over the choice of the subject §  creation of URI references from cells §  predicates may be chosen from a vocabulary §  datatypes may be assigned §  etc. }  Gets to the final RDF graph with one processing step (48)
  • 49. R2RML Tables RDB Instance Schema R2RML Mapping Final, Application Graph (49)
  • 50. }  Fundamentals are similar: §  each row is turned into a series of triples with a common subject }  Direct mapping is a “default” R2RML mapping (50)
  • 51. }  Technology has been finalized }  Should go to “candidate recommendation” these days }  Should be finished this summer (51)
  • 52.
  • 53. }  Not necessarily large amount of data per page, but lots of them… }  Has become very valuable to search engines §  Google, Bing, Yahoo!, or Yandex (i.e., schema.org) all committed to use such data }  Two syntaxes have emerged at W3C: §  microdata with HTML5 §  RDFa with the HTML5, XHTML, and with XML languages in general (53)
  • 54. (54)
  • 55. }  Both have similar philosophies: §  the structured data is expressed via attributes only (no specialized elements) §  both define some special attributes •  e.g., itemscope for microdata, resource for RDFa §  both reuse some HTML core attributes (e.g., href) §  both reuse the textual content of the HTML source, if needed }  RDF data can be extracted from both §  i.e., HTML+RDFa and HTML+microdata have become an additional source of Linked Data (55)
  • 56. }  Microdata has been optimized for simpler use cases, concentrating on §  one vocabulary at a time §  tree shaped data §  no datatypes, no language control beyond HTML’s }  RDFa provides a full serialization of RDF in XML or HTML §  the price is an extra complexity compared to microdata }  RDFa 1.1 Lite is a simplified authoring profile of RDFa, very similar to microdata (56)
  • 57. }  For RDFa 1.1 §  Technology has been finalized §  Should go to “candidate recommendation” in March §  Should be finished this summer }  For microdata §  Technology has been finalized §  Is part of HTML5, hence its advancement depends on other technologies (57)
  • 58. Nexus Simulation Credit Erich Bremmer
  • 59. }  Resource Description Framework: a graph-based model for (Web) data and its relationships §  has a simple (subject,predicate,object) model §  makes use of URI-s for the naming of terms •  objects can also be Literals §  informally: defines named relationships (named links) among entities on the Web §  has different serialization formats }  Latest version was published in 2004 (59)
  • 60. }  Many issues have come up since 2004: §  deployment issues §  new functionalities are needed §  underlying technology may have moved on (e.g., datatypes) }  The goal of the RDF Working Group is to refresh RDF }  NOT a complete reshaping of the standard! (60)
  • 61. }  Standardize Turtle as a serialization format }  Clean up some aspects of datatyping, e.g.: §  plain vs. typed literals §  details and role of rdf:XMLLiteral }  Proper definition for “named graphs” §  including concepts, semantics, syntax, … •  obviously important for linked data access •  but generates quite some discussions on the details }  etc. (61)
  • 62. }  Cleanup the documents, make them more readable §  possibly rewrite all documents §  maybe a completely new primer §  new structure for the Semantics document (62)
  • 63. }  Work has begun a bit less than a year ago }  Turtle is almost finalized }  Agreement on most of the literal cleanup }  Lots of discussion currently on named graphs… (63)
  • 64.
  • 65. }  We should be able to express all sorts of “meta” information on the data §  creator: who played what role in creating the data (author, reviewer, etc.) §  view of the full revision chain of the data §  in case of a integrated data: which part comes from which original data and under what process §  what vocabularies/ontologies/rules were used to generate some portions of the data §  etc. (65)
  • 66. }  Requires a complete model describing the various constituents (actors, revisions, etc.) }  The model should be describable and usable with RDF }  Has to find a balance between §  simple (“scruffy”) provenance: easily usable and editable §  complex (“complete”) provenance: allows for a detailed reporting of origins, versions, etc. }  That is the role of the Provenance Working Group (started in 2011) (66)
  • 68. ex:chart prov:wasGeneratedBy ex:illustrate (68)
  • 69. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate (69)
  • 70. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate prov:wasControlledBy foaf:name Derek foaf:mbox mailto:derek@ex.com (70)
  • 71. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate prov:wasGeneratedBy prov:wasControlledBy ex:aggregate foaf:name Derek foaf:mbox mailto:derek@ex.com (71)
  • 72. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate prov:wasGeneratedBy prov:wasControlledBy ex:aggregate foaf:name Derek prov:used prov:used foaf:mbox ex:regionList ex:dataSet mailto:derek@ex.com (72)
  • 73. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate prov:wasGeneratedBy prov:wasControlledBy prov:wasControlledBy ex:aggregate foaf:name Derek prov:used prov:used foaf:mbox ex:regionList ex:dataSet mailto:derek@ex.com (73)
  • 74. }  There are ways to express more complex provenance situation §  giving more details on the action, on the exact role a person has played §  information on versioning changes §  etc. (74)
  • 75. ex:aggregation ex:chart prov:used prov:wasGeneratedBy ex:illustrate prov:wasGeneratedBy prov:wasControlledBy prov:wasControlledBy ex:aggregate foaf:name Derek prov:used prov:used foaf:mbox ex:regionList ex:dataSet mailto:derek@ex.com (75)
  • 76.
  • 77. }  “Linked Data” is also a set of principles: §  put things on the Web through URI-s §  use HTTP URI-s so that things could be dereferenced §  provide useful information when a URI is dereferenced §  include links to other URI-s }  RDF is an ideal vehicle to realize these principles (77)
  • 78. (78)Courtesy of Richard Cyganiak and Anja Jentzsch
  • 79. (79)Courtesy of Frank van Harmelen, ISWC2011 keynote address
  • 80. }  Scale: we are talking about billions of triples, increasing every day }  Highly distributed: data spread over the Web, connected via http links }  Very heterogeneous data of different origins }  Integrity, constraint checking of data becomes more an more important (80)
  • 81. }  Knowledge structures vs. data is very different: very shallow, simple vocabularies for huge sets of data §  The role of reasoning is different (vocabularies, OWL DL, etc., may not be feasible) }  Highly distributed SPARQL implementations are necessary }  SPARQL endpoint may be too complicated §  direct HTTP access to RDF data may become an alternative }  etc. (81)
  • 82. (82)Courtesy of Frank van Harmelen, ISWC2011 keynote address
  • 83. (83)Courtesy of Frank van Harmelen, ISWC2011 keynote address
  • 84. }  Two major work areas: 1.  Linked Data Profiles: subsets of existing Semantic Web standards to be used for Linked Data, e.g., •  use only a subset of datatypes •  use some subset of RDFS and OWL only (e.g., OWL 2 RL or part thereof) •  use HTTP URI-s only, restrict the usage of blank nodes •  etc. 2.  Define an HTTP protocol to •  access and update RDF data •  RESTful API (84)
  • 85. }  Two major work areas: 1.  Linked Data Profiles: subsets of existing Semantic Web standards to be used for Linked Data, e.g., •  use only a subset of datatypes •  use some subset of RDFS and OWL only (e.g., OWL 2 RL or !!! part thereof) lanned •  use HTTP URI-s only, restrict the usage of blank nodes •  etc. P 2.  Define an HTTP protocol to •  access and update RDF data •  RESTful API (85)
  • 86.
  • 87. }  Standardized approaches for Access Control to data }  Reconsider rule languages for (e.g., for Linked Data applications) }  Constraint checking of Data }  JSON serialization of RDF }  API-s for client-side Web Application Developers }  More general view on Web of Data §  harmonizing the RDF, XML, and other views? }  … (87)
  • 88. }  Emphasis is on challenges by the Web of Data }  New aspects of Semantic Web technologies have to be explored }  There is work for everybody, join the club! (88)
  • 89. These slides are also available on the Web: http://www.w3.org/2012/Talks/0213-Houston-IH/ (89)