SlideShare ist ein Scribd-Unternehmen logo
1 von 42
USEWOD2012




History and Background of the
   USEWOD Data Challenge

        Knud Möller, Talis Systems Ltd.
              @knudmoeller


                                          1
Möller, K., Hausenblas, M., Cyganiak,
R., Grimnes, G., and Handschuh, S.
(2010). Learning from linked open
data usage: Patterns & metrics. In
WebScience 2010, Raleigh, NC, USA.
http://journal.webscience.org/302/

                                 2
Linked Data


Conventional “Eye-ball” Web   Web of Linked Data

interlinked documents         interlinked items of data
                              (URIs, RDF)

mainly people / Web           mainly machine agents (but
browsers                      also people?)



                                                     3
Linked Data


Conventional “Eye-ball” Web   Web of Linked Data

interlinked documents         interlinked items of data
                              (URIs, RDF)

mainly people / Web           mainly machine agents (but
browsers                      also people?)



                                                     3
How is Linked Data being
                    used?
• plenty of research on conventional Web
  usage
• what about usage of linked data?
Why?
• how healthy is the Web of linked data?
• who is using the data and how? Is it useful?
  Are there trends?
• providers: improve hosting
• ... just curiosity!                     4
Approach

particular sites:
– a URI for each data item ➙ a request for each data item
  (resource)
– content negotiation best practices
– redirection (HTTP 303)




                                                      5
Approach

particular sites:
– a URI for each data item ➙ a request for each data item
  (resource)
– content negotiation best practices
– redirection (HTTP 303)
                              http://data.semanticweb.org/
                                 conference/www/2009


                                            plain
                                        resource URI


        RDF                                                                  HTML
    document URI                                                          document URI
         http://data.semanticweb.org/                  http://data.semanticweb.org/
          conference/www/2009/rdf                      conference/www/2009/html 5
Approach (ctd.)

    • server log files
        – common log format (CLF), combined log format

     Request IP                   Request Date                       Request String


  80.219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0"
       200 64674 "-" "ARC Reader (http://arc.semsol.org/)"


Response Code Responce Size   Referrer   User Agent


      • RDF requests vs. “semantic” requests
•90.21.243.141 − − [06/Oct/2008:16:07:58 +0100] ”GET /organization/vrije
 −universiteit−amsterdam−the−netherlands HTTP/1.1” 303 7592 ”−” ”rdflib
 −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)”
•90.21.243.141 − − [06/Oct/2008:16:08:02 +0100] ”GET /organization/vrije
 −universiteit−amsterdam−the−netherlands/rdf HTTP/1.1” 200 453586 ”−” ”rdflib
 −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)”
219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0"
  200 64674 "-" "ARC Reader (http://arc.semsol.org/)"


nse Code Responce Size     Referrer
                                             Source Data
                                      User Agent

                                            Figure 1: The combined log format


                    # triples     # days      total # hits   # plain hits   # RDF hits     # HTML hits     SPARQL
       Dog Food          79,175       597        8,427,967      1,923,945        259,031       1,647,205      879,932
                                                 (14,117)         (3,223)          (434)         (2,759)      (1,471)
        DBpedia    109,750,000        118       87,203,310     22,821,475      7,008,310      22,999,237   20,972,630
                                                (739,011)      (193,402)       (59,392)       (194,909)    (177,734)
        DBTune      74,209,000        61         7,467,125      1,952,185      1,135,509         677,904    3,055,493
                                                (122,412)       (32,003)       (18,615)        (11,113)     (50,090)
   RKBExplorer      91,501,684        29           529,938             —              —               —         9,327
                                                 (18,274)            (—)            (—)             (—)         (322)


       RDF 5.8%   Semantic 2.8%       Table RDF 14.9% Semantic 4.2% datasets
                                            1: Overview of four LOD                RDF 7.8%    Semantic 2.5%



s are served. For our evaluation, we had access to log
                               Plain 47.7%
                                                                      taining a SPARQL query, we assume that it is
                                                                         Plain 45%                         Plain 41.0%
 two periods: from 24/05/2009–21/06/2009 and from                     ble of handling the query result, i.e., either a
/2009–29/10/2009, i.e., roughly two months.                           bindings (in the case of a SELECT query), pote
                                                                      containing URIs of RDF resources, or an RDF
   RKBExplorer                                                        (in the case of a CONSTRUCT or DESCRIBE q
BExplorer6 [11] is another meta-dataset currently com-
  HTML 46.5%                             HTML 39.9%
  44 sub-datasets covering various topics and sources        • RDF requests: if an agent directly requests
                                                                       HTML 51.1%
 the domain of academic research, as well as a Web             from a server, we assume that it knows how t
ation that allowsDBpedia
                  users to access and browse its content       cess data in this format. 7 Directly here mean
                                                       DBTune the agent specified an RDF syntax such as r
                                                                                    Dog Food
ntegrated fashion. Both RDF and HTML documents
the resources in all datasets are available. Apart from        as an acceptable response in the header of its re
Agents
                                        http://data.semanticweb.org, 21/07/2008 - 20/06/2009
       500000
                                                                                                                 hits




                                         3)
                                    83


                                                             66 8
                                   97                                     ordinary traffic: the usual suspects

                                                           37 23
                                                               )
                               (4


                                                       13 59
       400000
                            ot


                                          (1
                          eB


                                        rp




                                                         )
                                                      28
                      gl


                                    lu
                                           &




                                                                             )
                                                                          11
                                                   89
                     oo


                                   !S




                                                                         92
                                                11
                    G


                            oo




                                                                     (1
                                             t(
                             h




       300000
                                               bo




                                                                    er




                                                                                    5)
                          Ya




                                                                                   32
                                                               ch
                                             sn




                                                                               12
                                                             et
                                          m
hits




                                                          eF




                                                                              r(
                                                                          le
                                                        ic



                                                                          w
                                                     nd



                                                                         ra
                                                  Si



       200000




                                                                                                            )
                                                                    tic




                                                                                                       42
                                                                    ul




                                                                                                        3



                                                                                                                      8)
                                                                                                     (7
                                                               m




                                                                                                                    80
                                                                                                .0



                                                                                                                 (6
                                                                                                /1



                                                                                                                r
                                                                                             ot



                                                                                                             de
                                                                                          fb
       100000




                                                                                                            ea
                                                                                        rd



                                                                                                         R
                                                                                                      C
                                                                                                     R
                                                                                                 A
            0                                                                                                              8
                0              5                  10                  15                20                  25             30
                                                                     agents
semantic hits/total hits (>100 semantic hits)




                                    0
                                        0.2
                                              0.4
                                                    0.6
                                                          0.8
                                                                1
           attributor/1.13.2
                           triplr
                    sindicebot
                   rdflib-2.4.2
                         Ripple
OL_Virtuoso_RDF_crawler
Morph_Converter_Service
                  Falconsbot
                       Speedy
        Slug_SW_Crawler
                       yacybot
         hclsreport-crawler
                      MJ12bot
                      PycURL
              heritrix/1.14.3
            SindiceFetcher
       heritrix/pom.version
                heritrix/2.0.2
                multicrawler
                   SindiceBot
                  ia_archiver
  Zitgist-APlusPlus-Agent
                   rdflib-2.4.1
                                                                              they?




                       Mp3Bot
                            curl
         Zend_Http_Client
            Speedy_Spider
                    nxcrawler
                      marbles
                                -
                          Java
                   rdflib-2.4.0
                   (unknown)
               ARC_Reader
                        MLBot
                        Mozilla
        Jakarta_HttpClient
9




                          Wget
                 libwww-perl
                          MSIE
                        Firefox
                Python-urllib
 sindice_ontology_fetcher
                                                                    Agents: How “semantic” are




                   Googlebot
Demand for LOD?
                                                     DBpedia Hits over Time (smoothing factor 0.05)
300000
                                                                                                                                   plain
                                                                                                                                   html
                                                                                                                                     rdf
250000                                                                                                                          semantic



200000



150000



100000
                                                                                   no increase for semantic requests
 50000



     0
         2009-06-20




                      2009-07-04




                                   2009-07-18




                                                2009-08-01




                                                               2009-08-15




                                                                            2009-08-29




                                                                                         2009-09-12




                                                                                                      2009-09-26




                                                                                                                   2009-10-10




                                                                                                                                     2009-10-24




                                                                                                                                                  2009-11-07
                                                                                                                                       10
Impact of Real-world
                            Events
                                              Irish Lisbon Treaty Referendum (smoothing factor 0.05)
9
                                                                             http://dbpedia.org/resource/Republic_of_Ireland
                                                                                http://dbpedia.org/resource/European_Union
8                                                                               http://dbpedia.org/resource/Treaty_of_Lisbon

7
                 possible impact
6

5

4

3

2

1

0
    2009-06-20



                    2009-07-04



                                 2009-07-18



                                                   2009-08-01



                                                                2009-08-15



                                                                                 2009-08-29



                                                                                              2009-09-12



                                                                                                           2009-09-26



                                                                                                                        2009-10-10



                                                                                                                                     2009-10-24



                                                                                                                                                       2009-11-07
                                                                                                                                                  11
Impact of Real-world
                          Events
                                              Michael Jackson Memorial Service (smoothing factor 0.05)
4.5
                                                                          http://dbpedia.org/resource/Staples_Center
                                                    http://dbpedia.org/resource/Michael_Jackson_memorial_service
 4                                                                      http://dbpedia.org/resource/Michael_Jackson

3.5

 3

2.5

 2

1.5
                                possible impact
 1

0.5

 0
      2009-06-20



                   2009-07-04



                                 2009-07-18



                                                     2009-08-01



                                                                  2009-08-15



                                                                               2009-08-29



                                                                                            2009-09-12



                                                                                                         2009-09-26



                                                                                                                      2009-10-10



                                                                                                                                   2009-10-24



                                                                                                                                                 2009-11-07
                                                                                                                                                12
• this research: one motivation for
  USEWOD
• expand the dataset, encourage more
  and different analyses



                                13
USEWOD Data Challenge 2012

2nd International Workshop on Usage Analysis
             and the Web of Data

       Sponsored by the LATC project
USEWOD Data Challenge
USEWOD Data Challenge


Moving forward by releasing a dataset:
 • to relieve difficulty of obtaining real-life usage
   data
 • to allow comparison and combination of
   approaches done on the same dataset
 • by releasing a relatively new type of logs: usage
   on the Web of Data.
USEWOD Data Challenge


Moving forward by releasing a dataset:
 • to relieve difficulty of obtaining real-life usage
   data
 • to allow comparison and combination of
   approaches done on the same dataset
 • by releasing a relatively new type of logs: usage
   on the Web of Data.
Also for longer-term use.
The USEWOD Dataset 2011


Server logs from two major web of data
  servers:
• DBPedia
 • Several weeks during 2 months of requests
• Semantic Web Dog Food
 • 2 years of requests from Dec 2008 – Dec 2010
USEWOD 2011 Challenge
     Participants
USEWOD 2011 Challenge
               Participants

• At the time of the workshop 11 groups had
  requested the 2011 data
USEWOD 2011 Challenge
               Participants

• At the time of the workshop 11 groups had
  requested the 2011 data
• By now 28.
USEWOD 2011 Challenge
               Participants

• At the time of the workshop 11 groups had
  requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
USEWOD 2011 Challenge
                 Participants

• At the time of the workshop 11 groups had
  requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
• Winner of the 2011 USEWOD data challenge prize:
  • Mario Arias Gallego, Javier D. Fernández, Miguel A.
    Martínez-Prieto and Pablo De La Fuente. An Empirical
    Study of Real-World SPARQL Queries.
USEWOD 2011 Challenge
                 Participants

• At the time of the workshop 11 groups had
  requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
• Winner of the 2011 USEWOD data challenge prize:
  • Mario Arias Gallego, Javier D. Fernández, Miguel A.
    Martínez-Prieto and Pablo De La Fuente. An Empirical
    Study of Real-World SPARQL Queries.
The USEWOD Dataset 2012


Server logs from two major web of data
  servers:
• DBPedia
 • Several weeks during 2 months of requests
• Semantic Web Dog Food
 • 2 years of requests from Dec 2008 – Dec 2010
• Linked Open Geo Data
• Bio2RDF
USEWOD 2012 Challenge
             Participants


• 20 groups requested the data, so far.
• 2 data challenge paper submissions…
• 1 winner of the USEWOD data
  challenge prize.
 • kindly brought to you by LATC
DBpedia
DBpedia
DBpedia
DBpedia
Semantic Web Dog Food

[Screenshots and image take from http://data.semanticweb.org/]
Semantic Web Dog Food

[Screenshots and image take from http://data.semanticweb.org/]
Semantic Web Dog Food

[Screenshots and image take from http://data.semanticweb.org/]
Requests for Human / Machine
        readable Web data


Both servers get requests for RDF
 • http://dbpedia.org/data/Berlin.rdf
as well as HTML
 • http://dbpedia.org/page/Berlin
And requests for the URI itself:
 • http://dbpedia.org/resource/Berlin (will be
   served HTML or RDF)
Requests to SPARQL endpoints

• Both servers have a SPARQL endpoint
  to request RDF data:
  SELECT DISTINCT ?s ?t ?y ?to ?h
  WHERE {
         ?s dc:title ?t .
         ?s swrc:year ?y .
         OPTIONAL {?s foaf:homepage ?h }.
         OPTIONAL {?s foaf:topic ?t }
         }
  order by desc(?y”)
  LIMIT 200
Anonymizing the USEWOD
       Dataset
Anonymizing the USEWOD
               Dataset


• IP addresses:
 • replace all IPs with 0.0.0.0
 • add the country code for the original IP
   address -> track location of requests
 • add an identifier of the original IP -> track
   individual requestors
USEWOD2011, Hydebarabad,
               India

• M. Kirchberg, R. K. L. Ko, and B. S.
  Lee. From linked data to relevant data
  - time is the essence. - http://
  arxiv.org/abs/1103.5046
• M. A. Gallego, J. D. Fernández, M. A.
  Martínez-Prieto, and P. D. L. Fuente.
  An empirical study of real-world
  SPARQL queries. - http://arxiv.org/
  abs/1103.5043                    25
USEWOD2012, Lyon, France


• A. Raghuveer. Characterizing Machine
  Agent Behavior through SPARQL Query
  Mining. - http://ir.ii.uam.es/
  usewod2012/
  usewod2012_raghuveer.pdf
• J. Hoxha, M. Junghans, S. Agarwal.
  Enabling Semantic Analysis of User
  Browsing Patterns in the Web of Data.
  - http://arxiv.org/abs/1204.2713
                                 26
What could be improved?
• does not work well with embedded metadata (e.g.,
  RDFa-based sites)

• does not take into account usage through meta sites
  (indexes, search engines, mirrors, ...)

• does (probably) not take into account usage through
  apps

• what about caches?

• what about bulk/dump downloads of data?

• not enough usage to be interesting yet?       27

Weitere ähnliche Inhalte

Was ist angesagt?

RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031kwangsub kim
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for BeginnerMyungjin Lee
 
The Semantic Web #5 - RDF (2)
The Semantic Web #5 - RDF (2)The Semantic Web #5 - RDF (2)
The Semantic Web #5 - RDF (2)Myungjin Lee
 
Publishing Linked Open Data in 15 minutes
Publishing Linked Open Data in 15 minutesPublishing Linked Open Data in 15 minutes
Publishing Linked Open Data in 15 minutesAlvaro Graves
 
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Victor de Boer
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on AndroidEUCLID project
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
 
Querying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLQuerying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLOlaf Hartig
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanSemantic Web Company
 

Was ist angesagt? (11)

RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031RDF Tutorial - SPARQL 20091031
RDF Tutorial - SPARQL 20091031
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
Linked Data Modeling for Beginner
Linked Data Modeling for BeginnerLinked Data Modeling for Beginner
Linked Data Modeling for Beginner
 
The Semantic Web #5 - RDF (2)
The Semantic Web #5 - RDF (2)The Semantic Web #5 - RDF (2)
The Semantic Web #5 - RDF (2)
 
Publishing Linked Open Data in 15 minutes
Publishing Linked Open Data in 15 minutesPublishing Linked Open Data in 15 minutes
Publishing Linked Open Data in 15 minutes
 
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
Querying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQLQuerying Trust in RDF Data with tSPARQL
Querying Trust in RDF Data with tSPARQL
 
Linking UK Government Data, John Sheridan
Linking UK Government Data, John SheridanLinking UK Government Data, John Sheridan
Linking UK Government Data, John Sheridan
 
xcap
xcapxcap
xcap
 

Andere mochten auch

Building a Distributed Secure System on Multi-Agent Platform Depending on the...
Building a Distributed Secure System on Multi-Agent Platform Depending on the...Building a Distributed Secure System on Multi-Agent Platform Depending on the...
Building a Distributed Secure System on Multi-Agent Platform Depending on the...CSCJournals
 
The EU Data Cloud - Introduction
The EU Data Cloud - IntroductionThe EU Data Cloud - Introduction
The EU Data Cloud - IntroductionKnud Möller
 
Practical Applications of Semantic Web in Retail -- Semtech 2014
Practical Applications of Semantic Web in Retail -- Semtech 2014 Practical Applications of Semantic Web in Retail -- Semtech 2014
Practical Applications of Semantic Web in Retail -- Semtech 2014 Jay Myers
 
EU Data Cloud - On to the Cloud
EU Data Cloud - On to the CloudEU Data Cloud - On to the Cloud
EU Data Cloud - On to the CloudKnud Möller
 
Digitales Graffiti und vernetzte Daten für digitale Städte
Digitales Graffiti und vernetzte Daten für digitale StädteDigitales Graffiti und vernetzte Daten für digitale Städte
Digitales Graffiti und vernetzte Daten für digitale StädteKnud Möller
 
The Semantic Web (and what it can deliver for your business)
The Semantic Web (and what it can deliver for your business)The Semantic Web (and what it can deliver for your business)
The Semantic Web (and what it can deliver for your business)Knud Möller
 

Andere mochten auch (7)

Building a Distributed Secure System on Multi-Agent Platform Depending on the...
Building a Distributed Secure System on Multi-Agent Platform Depending on the...Building a Distributed Secure System on Multi-Agent Platform Depending on the...
Building a Distributed Secure System on Multi-Agent Platform Depending on the...
 
The EU Data Cloud - Introduction
The EU Data Cloud - IntroductionThe EU Data Cloud - Introduction
The EU Data Cloud - Introduction
 
Practical Applications of Semantic Web in Retail -- Semtech 2014
Practical Applications of Semantic Web in Retail -- Semtech 2014 Practical Applications of Semantic Web in Retail -- Semtech 2014
Practical Applications of Semantic Web in Retail -- Semtech 2014
 
Linked GeoRef
Linked GeoRefLinked GeoRef
Linked GeoRef
 
EU Data Cloud - On to the Cloud
EU Data Cloud - On to the CloudEU Data Cloud - On to the Cloud
EU Data Cloud - On to the Cloud
 
Digitales Graffiti und vernetzte Daten für digitale Städte
Digitales Graffiti und vernetzte Daten für digitale StädteDigitales Graffiti und vernetzte Daten für digitale Städte
Digitales Graffiti und vernetzte Daten für digitale Städte
 
The Semantic Web (and what it can deliver for your business)
The Semantic Web (and what it can deliver for your business)The Semantic Web (and what it can deliver for your business)
The Semantic Web (and what it can deliver for your business)
 

Ähnlich wie History and Background of the USEWOD Data Challenge

Terminology Services
Terminology ServicesTerminology Services
Terminology ServicesOCLC Research
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesSören Auer
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataAnisa Rula
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版Rikkyo University
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Web 3.0 & IoT (English)
Web 3.0 & IoT (English)Web 3.0 & IoT (English)
Web 3.0 & IoT (English)Peter Waher
 
Web 3.0 & io t (en)
Web 3.0 & io t (en)Web 3.0 & io t (en)
Web 3.0 & io t (en)Rikard Strid
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsAndreas Kamilaris
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」Rikkyo University
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioCatalogue
 

Ähnlich wie History and Background of the USEWOD Data Challenge (20)

Terminology Services
Terminology ServicesTerminology Services
Terminology Services
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open data
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
全てのエンジニアのためのWeb標準技術とのつきあい方 OSC福岡 2011版
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Web 3.0 & IoT (English)
Web 3.0 & IoT (English)Web 3.0 & IoT (English)
Web 3.0 & IoT (English)
 
Web 3.0 & io t (en)
Web 3.0 & io t (en)Web 3.0 & io t (en)
Web 3.0 & io t (en)
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」
オープンソースカンファレンス2011 Tokyo/ Fall 講演資料「Web技術の現状と将来」
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 

Kürzlich hochgeladen

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Kürzlich hochgeladen (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

History and Background of the USEWOD Data Challenge

  • 1. USEWOD2012 History and Background of the USEWOD Data Challenge Knud Möller, Talis Systems Ltd. @knudmoeller 1
  • 2. Möller, K., Hausenblas, M., Cyganiak, R., Grimnes, G., and Handschuh, S. (2010). Learning from linked open data usage: Patterns & metrics. In WebScience 2010, Raleigh, NC, USA. http://journal.webscience.org/302/ 2
  • 3. Linked Data Conventional “Eye-ball” Web Web of Linked Data interlinked documents interlinked items of data (URIs, RDF) mainly people / Web mainly machine agents (but browsers also people?) 3
  • 4. Linked Data Conventional “Eye-ball” Web Web of Linked Data interlinked documents interlinked items of data (URIs, RDF) mainly people / Web mainly machine agents (but browsers also people?) 3
  • 5. How is Linked Data being used? • plenty of research on conventional Web usage • what about usage of linked data? Why? • how healthy is the Web of linked data? • who is using the data and how? Is it useful? Are there trends? • providers: improve hosting • ... just curiosity! 4
  • 6. Approach particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) 5
  • 7. Approach particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) http://data.semanticweb.org/ conference/www/2009 plain resource URI RDF HTML document URI document URI http://data.semanticweb.org/ http://data.semanticweb.org/ conference/www/2009/rdf conference/www/2009/html 5
  • 8. Approach (ctd.) • server log files – common log format (CLF), combined log format Request IP Request Date Request String 80.219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0" 200 64674 "-" "ARC Reader (http://arc.semsol.org/)" Response Code Responce Size Referrer User Agent • RDF requests vs. “semantic” requests •90.21.243.141 − − [06/Oct/2008:16:07:58 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands HTTP/1.1” 303 7592 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)” •90.21.243.141 − − [06/Oct/2008:16:08:02 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands/rdf HTTP/1.1” 200 453586 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)”
  • 9. 219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0" 200 64674 "-" "ARC Reader (http://arc.semsol.org/)" nse Code Responce Size Referrer Source Data User Agent Figure 1: The combined log format # triples # days total # hits # plain hits # RDF hits # HTML hits SPARQL Dog Food 79,175 597 8,427,967 1,923,945 259,031 1,647,205 879,932 (14,117) (3,223) (434) (2,759) (1,471) DBpedia 109,750,000 118 87,203,310 22,821,475 7,008,310 22,999,237 20,972,630 (739,011) (193,402) (59,392) (194,909) (177,734) DBTune 74,209,000 61 7,467,125 1,952,185 1,135,509 677,904 3,055,493 (122,412) (32,003) (18,615) (11,113) (50,090) RKBExplorer 91,501,684 29 529,938 — — — 9,327 (18,274) (—) (—) (—) (322) RDF 5.8% Semantic 2.8% Table RDF 14.9% Semantic 4.2% datasets 1: Overview of four LOD RDF 7.8% Semantic 2.5% s are served. For our evaluation, we had access to log Plain 47.7% taining a SPARQL query, we assume that it is Plain 45% Plain 41.0% two periods: from 24/05/2009–21/06/2009 and from ble of handling the query result, i.e., either a /2009–29/10/2009, i.e., roughly two months. bindings (in the case of a SELECT query), pote containing URIs of RDF resources, or an RDF RKBExplorer (in the case of a CONSTRUCT or DESCRIBE q BExplorer6 [11] is another meta-dataset currently com- HTML 46.5% HTML 39.9% 44 sub-datasets covering various topics and sources • RDF requests: if an agent directly requests HTML 51.1% the domain of academic research, as well as a Web from a server, we assume that it knows how t ation that allowsDBpedia users to access and browse its content cess data in this format. 7 Directly here mean DBTune the agent specified an RDF syntax such as r Dog Food ntegrated fashion. Both RDF and HTML documents the resources in all datasets are available. Apart from as an acceptable response in the header of its re
  • 10. Agents http://data.semanticweb.org, 21/07/2008 - 20/06/2009 500000 hits 3) 83 66 8 97 ordinary traffic: the usual suspects 37 23 ) (4 13 59 400000 ot (1 eB rp ) 28 gl lu & ) 11 89 oo !S 92 11 G oo (1 t( h 300000 bo er 5) Ya 32 ch sn 12 et m hits eF r( le ic w nd ra Si 200000 ) tic 42 ul 3 8) (7 m 80 .0 (6 /1 r ot de fb 100000 ea rd R C R A 0 8 0 5 10 15 20 25 30 agents
  • 11. semantic hits/total hits (>100 semantic hits) 0 0.2 0.4 0.6 0.8 1 attributor/1.13.2 triplr sindicebot rdflib-2.4.2 Ripple OL_Virtuoso_RDF_crawler Morph_Converter_Service Falconsbot Speedy Slug_SW_Crawler yacybot hclsreport-crawler MJ12bot PycURL heritrix/1.14.3 SindiceFetcher heritrix/pom.version heritrix/2.0.2 multicrawler SindiceBot ia_archiver Zitgist-APlusPlus-Agent rdflib-2.4.1 they? Mp3Bot curl Zend_Http_Client Speedy_Spider nxcrawler marbles - Java rdflib-2.4.0 (unknown) ARC_Reader MLBot Mozilla Jakarta_HttpClient 9 Wget libwww-perl MSIE Firefox Python-urllib sindice_ontology_fetcher Agents: How “semantic” are Googlebot
  • 12. Demand for LOD? DBpedia Hits over Time (smoothing factor 0.05) 300000 plain html rdf 250000 semantic 200000 150000 100000 no increase for semantic requests 50000 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 10
  • 13. Impact of Real-world Events Irish Lisbon Treaty Referendum (smoothing factor 0.05) 9 http://dbpedia.org/resource/Republic_of_Ireland http://dbpedia.org/resource/European_Union 8 http://dbpedia.org/resource/Treaty_of_Lisbon 7 possible impact 6 5 4 3 2 1 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 11
  • 14. Impact of Real-world Events Michael Jackson Memorial Service (smoothing factor 0.05) 4.5 http://dbpedia.org/resource/Staples_Center http://dbpedia.org/resource/Michael_Jackson_memorial_service 4 http://dbpedia.org/resource/Michael_Jackson 3.5 3 2.5 2 1.5 possible impact 1 0.5 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 12
  • 15. • this research: one motivation for USEWOD • expand the dataset, encourage more and different analyses 13
  • 16. USEWOD Data Challenge 2012 2nd International Workshop on Usage Analysis and the Web of Data Sponsored by the LATC project
  • 18. USEWOD Data Challenge Moving forward by releasing a dataset: • to relieve difficulty of obtaining real-life usage data • to allow comparison and combination of approaches done on the same dataset • by releasing a relatively new type of logs: usage on the Web of Data.
  • 19. USEWOD Data Challenge Moving forward by releasing a dataset: • to relieve difficulty of obtaining real-life usage data • to allow comparison and combination of approaches done on the same dataset • by releasing a relatively new type of logs: usage on the Web of Data. Also for longer-term use.
  • 20. The USEWOD Dataset 2011 Server logs from two major web of data servers: • DBPedia • Several weeks during 2 months of requests • Semantic Web Dog Food • 2 years of requests from Dec 2008 – Dec 2010
  • 21. USEWOD 2011 Challenge Participants
  • 22. USEWOD 2011 Challenge Participants • At the time of the workshop 11 groups had requested the 2011 data
  • 23. USEWOD 2011 Challenge Participants • At the time of the workshop 11 groups had requested the 2011 data • By now 28.
  • 24. USEWOD 2011 Challenge Participants • At the time of the workshop 11 groups had requested the 2011 data • By now 28. • 7 data challenge paper submissions
  • 25. USEWOD 2011 Challenge Participants • At the time of the workshop 11 groups had requested the 2011 data • By now 28. • 7 data challenge paper submissions • Winner of the 2011 USEWOD data challenge prize: • Mario Arias Gallego, Javier D. Fernández, Miguel A. Martínez-Prieto and Pablo De La Fuente. An Empirical Study of Real-World SPARQL Queries.
  • 26. USEWOD 2011 Challenge Participants • At the time of the workshop 11 groups had requested the 2011 data • By now 28. • 7 data challenge paper submissions • Winner of the 2011 USEWOD data challenge prize: • Mario Arias Gallego, Javier D. Fernández, Miguel A. Martínez-Prieto and Pablo De La Fuente. An Empirical Study of Real-World SPARQL Queries.
  • 27. The USEWOD Dataset 2012 Server logs from two major web of data servers: • DBPedia • Several weeks during 2 months of requests • Semantic Web Dog Food • 2 years of requests from Dec 2008 – Dec 2010 • Linked Open Geo Data • Bio2RDF
  • 28. USEWOD 2012 Challenge Participants • 20 groups requested the data, so far. • 2 data challenge paper submissions… • 1 winner of the USEWOD data challenge prize. • kindly brought to you by LATC
  • 33. Semantic Web Dog Food [Screenshots and image take from http://data.semanticweb.org/]
  • 34. Semantic Web Dog Food [Screenshots and image take from http://data.semanticweb.org/]
  • 35. Semantic Web Dog Food [Screenshots and image take from http://data.semanticweb.org/]
  • 36. Requests for Human / Machine readable Web data Both servers get requests for RDF • http://dbpedia.org/data/Berlin.rdf as well as HTML • http://dbpedia.org/page/Berlin And requests for the URI itself: • http://dbpedia.org/resource/Berlin (will be served HTML or RDF)
  • 37. Requests to SPARQL endpoints • Both servers have a SPARQL endpoint to request RDF data: SELECT DISTINCT ?s ?t ?y ?to ?h WHERE { ?s dc:title ?t . ?s swrc:year ?y . OPTIONAL {?s foaf:homepage ?h }. OPTIONAL {?s foaf:topic ?t } } order by desc(?y”) LIMIT 200
  • 39. Anonymizing the USEWOD Dataset • IP addresses: • replace all IPs with 0.0.0.0 • add the country code for the original IP address -> track location of requests • add an identifier of the original IP -> track individual requestors
  • 40. USEWOD2011, Hydebarabad, India • M. Kirchberg, R. K. L. Ko, and B. S. Lee. From linked data to relevant data - time is the essence. - http:// arxiv.org/abs/1103.5046 • M. A. Gallego, J. D. Fernández, M. A. Martínez-Prieto, and P. D. L. Fuente. An empirical study of real-world SPARQL queries. - http://arxiv.org/ abs/1103.5043 25
  • 41. USEWOD2012, Lyon, France • A. Raghuveer. Characterizing Machine Agent Behavior through SPARQL Query Mining. - http://ir.ii.uam.es/ usewod2012/ usewod2012_raghuveer.pdf • J. Hoxha, M. Junghans, S. Agarwal. Enabling Semantic Analysis of User Browsing Patterns in the Web of Data. - http://arxiv.org/abs/1204.2713 26
  • 42. What could be improved? • does not work well with embedded metadata (e.g., RDFa-based sites) • does not take into account usage through meta sites (indexes, search engines, mirrors, ...) • does (probably) not take into account usage through apps • what about caches? • what about bulk/dump downloads of data? • not enough usage to be interesting yet? 27

Hinweis der Redaktion

  1. - not so much about USEWOD in general, but more about the challenge data in particular\n
  2. \n
  3. \n
  4. \n
  5. - you have a URI for the thing itself, the subject of a document\n- you have different URIs for documents about that thing\n- servers would be set up so that they would give a document back, based on the kind of data that the requesting agent wants\n- that shows up in the server logs\n
  6. \n
  7. \n
  8. \n
  9. - ratio of semantic/total hits\n
  10. \n
  11. \n
  12. \n
  13. - challenge dataset grew from the dataset used in this paper\n- has been constantly growing since then\n
  14. \n
  15. \n
  16. \n
  17. Close to (in the same university) of some of the people behind the Dbpedia project.\nOne of the main drivers of this project.\n
  18. Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
  19. Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
  20. Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
  21. Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
  22. Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
  23. TODO: Ask knud: the same dbpedia data?\nTODO: ask knud: what time span is the data from?\nOpen Geo Data: OpenStreetMap as RDF\nBio2RDF Linked Data for life sciences\nCredits go to Knud and Markus.\nMarkus is Close to (in the same university) of some of the people behind the Dbpedia project.\nKnud was One of the main drivers of this project.\n
  24. Todo: check getallen\n
  25. The linked data twin of Wikipedia\n
  26. The linked data twin of Wikipedia\n
  27. The linked data twin of Wikipedia\n
  28. Screenshot of dbpedia\n
  29. Screenshot of dbpedia\n
  30. \n
  31. Todo ander voorbeeld.\n
  32. Raw data, but anonymization\n
  33. - last year two interesting papers providing analysis of the dataset\n- note: not all USEWOD papers are about the challenge dataset, just like this year\n
  34. \n
  35. \n