SlideShare a Scribd company logo
1 of 17
Download to read offline
 Copyright 2010 Knud Möller
                                Except where otherwise noted, this work is licensed under
                                http://creativecommons.org/licenses/by-sa/3.0/




                                        Learning from Linked Open Data Usage:
                                                   Patterns & Metrics

                                                 Knud Möller, Michael Hausenblas, Richard Cyganiak,
                                                       Gunnar Grimnes, Siegfried Handschuh




      WebScience 2010, Raleigh, NC, USA
      26/04/2010
          13/03/2008                                                           FAST kick-off, Madrid, 2008
 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.



Monday 26 April 2010
What is Linked (Open) Data? (in <1 minute)




              Conventional “Eye-ball” Web        Web of Linked Data

                       interlinked documents   interlinked items of data
                                                       (URIs, RDF)

                       mainly people / Web     mainly machine agents
                             browsers




        2

Monday 26 April 2010
What is Linked (Open) Data? (in <1 minute)


            Linked Open Data cloud (the set of interlinked, Semantic
            Web datasets)




  February 2008




 http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
                                                                               July 2009


        3

Monday 26 April 2010
Question: How is Linked Data being Used?


        •plenty of research on conventional Web usage
        •what about usage of linked data?


        Why?
        •how healthy is the Web of linked data?
        •who is using the data and how? Is it useful? Are there
         trends?
        •providers: improve hosting
        •... just curiosity!


        4

Monday 26 April 2010
Question: How is Linked Data being Used?


        •plenty of research on conventional Web usage
        •what about usage of linked data?


        Why?
        •how healthy is the Web of linked data?
        •who is using the data and how? Is it useful? Are there
         trends?
                                                                    ics?
        •providers: improve hosting
                                                             e   tr
                                                           m
        •... just curiosity!
                                                    e bo
                                                   w
        4

Monday 26 April 2010
Approach


        •particular sites:
              – a URI for each data item ➙ a request for each data item
                (resource)
              – content negotiation best practices
              – redirection (HTTP 303)




        5

Monday 26 April 2010
Approach


        •particular sites:
              – a URI for each data item ➙ a request for each data item
                (resource)
              – content negotiation best practices
              – redirection (HTTP 303)
                                                 http://data.semanticweb.org/
                                                    conference/www/2009


                                                               plain
                                                           resource URI


                           RDF                                                                   HTML
                       document URI                                                           document URI
                            http://data.semanticweb.org/                  http://data.semanticweb.org/
                             conference/www/2009/rdf                      conference/www/2009/html


        5

Monday 26 April 2010
Approach (ctd.)


        •server log files
              – common log format (CLF), combined log format
          Request IP                Request Date                        Request String


     80.219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0"
          200 64674 "-" "ARC Reader (http://arc.semsol.org/)"


  Response Code Responce Size   Referrer   User Agent



        •RDF requests vs. “semantic” requests
 •90.21.243.141 − − [06/Oct/2008:16:07:58 +0100] ”GET /organization/vrije
  −universiteit−amsterdam−the−netherlands HTTP/1.1” 303 7592 ”−” ”rdflib −2.4.0
  (http://rdflib.net/; eikeon@eikeon.com)”
 •90.21.243.141 − − [06/Oct/2008:16:08:02 +0100] ”GET /organization/vrije
  −universiteit−amsterdam−the−netherlands/rdf HTTP/1.1” 200 45358 ”−” ”rdflib
  −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)”


        6

Monday 26 April 2010
se Code Responce Size     Referrer   User Agent
       Source Data
                                            Figure 1: The combined log format


                    # triples     # days      total # hits   # plain hits   # RDF hits     # HTML hits      SPARQL
      Dog Food           79,175       597        8,427,967      1,923,945        259,031        1,647,205      879,932
                                                 (14,117)         (3,223)          (434)          (2,759)      (1,471)
       DBpedia     109,750,000        118       87,203,310     22,821,475      7,008,310       22,999,237   20,972,630
                                                (739,011)      (193,402)       (59,392)        (194,909)    (177,734)
        DBTune      74,209,000        61         7,467,125      1,952,185      1,135,509          677,904    3,055,493
                                                (122,412)       (32,003)       (18,615)         (11,113)     (50,090)
   RKBExplorer      91,501,684        29           529,938             —              —                —         9,327
                                                 (18,274)            (—)            (—)              (—)         (322)


      RDF 5.8%    Semantic 2.8%       Table 1: Overview of four 4.2% datasets
                                                       Semantic LOD                            Semantic 2.5%
                                               RDF 14.9%                           RDF 7.8%


  are served. For our evaluation, we had access to log                taining a SPARQL query, we assume that it is
                               Plain 47.7%
 two periods: from 24/05/2009–21/06/2009 and from                     ble of 45%
                                                                         Plain handling the query result, i.e., either a
                                                                                                             Plain 41.0%
 2009–29/10/2009, i.e., roughly two months.                           bindings (in the case of a SELECT query), pote
                                                                      containing URIs of RDF resources, or an RDF
   RKBExplorer                                                        (in the case of a CONSTRUCT or DESCRIBE q
BExplorer6 [11] is another meta-dataset currently com-
  44 sub-datasets covering various topics and sources
  HTML 46.5%
                                                            • RDF requests: if an agent directly requests
                                        HTML 39.9%                    HTML 51.1%
 the domain of academic research, as well as a Web            from a server, we assume that it knows how t
ation that allows users to access and browse its content      cess data in this format. Directly here mean
                       DBpedia
ntegrated fashion. Both RDF and HTML documents        DBTune the agent specified an RDF syntax such as rd
                                                                                  Dog Food
the resources in all datasets are available. Apart from       as an acceptable response in the header of its re
g linked data, the site also features a module that           Merely requesting the URI of an RDF represen
es co-reference resolution functionality [10]. For our        does not suffice to indicate semanticity, as this
          7
tion, we had access to log files in the period from            simply mean that the agent followed a link to th
 2009–21/06/2009, i.e., roughly one month. However,           resentation.
  Monday 26 April 2010
Agents: Ordinary Traffic
                                               http://data.semanticweb.org, 21/07/2008 - 20/06/2009
               500000
                                                                                                                  hits




                                                                                         3)
                                                                                       83
                                                                               ordinary traffic: the usual suspects




                                                                                 66 8
                                                                                    97


                                                                             37 23
                                                                                    )
                                                                                 (4


                                                                          13 59
               400000

                                                                              ot


                                                                             (1
                                                                            B


                                                                           p




                                                                           )
                                                                         le




                                                                        28
                                                                        ur
                                                                       &




                                                                        )
                                  g




                                                                     11
                                                                     Sl




                                                                     89
                               oo




                                                                  92
                                                                  11
                                                                  o!
                               G




                                                               (1
                                                              ho




                                                               t(
               300000

                                                           bo




                                                           er




                                                            5)
                                                          Ya




                                                         32
                                                        ch
                                                        sn




                                                     12
                                                      et
                                                      m
        hits




                                                  eF




                                                   r(
                                                le
                                               ic



                                              w
                                            nd



                                           ra
                                         Si
               200000




                                                                                                       2)
                                       tic




                                                                                                       34
                                    ul




                                                                                                                        )
                                                                                                                      08
                                                                                                     (7
                                   m




                                                                                                                 68
                                                                                                .0
                                                                                                /1



                                                                                                                 r(
                                                                                              ot



                                                                                                            de
                                                                                          fb
               100000




                                                                                                        ea
                                                                                        rd



                                                                                                       R
                                                                                                      C
                                                                                                  R
                                                                                                 A
                       0
                           0            5               10               15              20                 25              30
                                            SW Dog Food (21/07/2008 - 20/06/2009)
                                                                agents



        8

Monday 26 April 2010
semantic hits/total hits (>100 semantic hits)




                       9
                                                               0
                                                                   0.2
                                                                         0.4
                                                                               0.6
                                                                                                      0.8
                                                                                                                            1




Monday 26 April 2010
                                      attributor/1.13.2
                                                      triplr
                                               sindicebot
                                              rdflib-2.4.2
                                                    Ripple
                           OL_Virtuoso_RDF_crawler
                           Morph_Converter_Service
                                             Falconsbot
                                                  Speedy
                                   Slug_SW_Crawler
                                                  yacybot
                                    hclsreport-crawler
                                                 MJ12bot
                                                 PycURL
                                         heritrix/1.14.3
                                       SindiceFetcher
                                  heritrix/pom.version
                                           heritrix/2.0.2
                                           multicrawler
                                              SindiceBot
                                             ia_archiver
                             Zitgist-APlusPlus-Agent
                                              rdflib-2.4.1
                                                  Mp3Bot
                                                       curl
                                                                                                                                Agents: How “Semantic” are they?




                                    Zend_Http_Client
                                       Speedy_Spider
                                               nxcrawler
                                                 marbles
                                                           -
                                                     Java
                                              rdflib-2.4.0
                                              (unknown)
                                          ARC_Reader
                                                   MLBot
                                                   Mozilla
                                   Jakarta_HttpClient
                                                     Wget
                                            libwww-perl
                                                     MSIE
                                                   Firefox
                                           Python-urllib
                            sindice_ontology_fetcher
                                                                                     semantic traffic: new kinds of agents




                                              Googlebot
Is Demand for LOD increasing?

                                                                  Dog Food Hits over Time (smoothing factor 0.05)
     6000
                                                                                                                                                         plain
                                                                                                                                                         html
                                                                                                                                                           rdf
     5000                                                                                                                                             semantic



     4000



     3000



     2000



     1000
                                                                                                           no increase for semantic requests

          0
              2008-07-01



                           2008-09-01



                                        2008-11-01



                                                     2009-01-01



                                                                        2009-03-01



                                                                                     2009-05-01



                                                                                                  2009-07-01



                                                                                                               2009-09-01



                                                                                                                            2009-11-01



                                                                                                                                         2010-01-01



                                                                                                                                                           2010-03-01



                                                                                                                                                                        2010-05-01
       10

Monday 26 April 2010
Is Demand for LOD increasing? (ctd.)

                                                          DBpedia Hits over Time (smoothing factor 0.05)
   300000
                                                                                                                                           plain
                                                                                                                                           html
                                                                                                                                             rdf
   250000                                                                                                                               semantic



   200000



   150000



   100000



    50000
                                                                                              no increase for semantic requests

          0
              2009-06-20




                           2009-07-04




                                        2009-07-18




                                                     2009-08-01




                                                                    2009-08-15




                                                                                 2009-08-29




                                                                                                 2009-09-12




                                                                                                              2009-09-26




                                                                                                                           2009-10-10




                                                                                                                                             2009-10-24




                                                                                                                                                          2009-11-07
       11

Monday 26 April 2010
Do Real-world Events have an Impact on LOD Usage?

                                                                         Demand for Events (smoothing factor 0.05)
              700
                                                                                                                                                 iswc2008
                                                                                                                                                www2009
              600                              possible impact                                                                                  eswc2009
                                                                                                                                                 iswc2009


              500


              400


              300


              200


              100


                 0
                     2008-07-01



                                  2008-09-01



                                               2008-11-01



                                                            2009-01-01



                                                                               2009-03-01



                                                                                            2009-05-01



                                                                                                         2009-07-01



                                                                                                                      2009-09-01



                                                                                                                                   2009-11-01



                                                                                                                                                2010-01-01



                                                                                                                                                             2010-03-01



                                                                                                                                                                          2010-05-01
       12

Monday 26 April 2010
Do Real-world Events have an Impact on LOD Usage?

                                                          Irish Lisbon Treaty Referendum (smoothing factor 0.05)
               9
                                                                                         http://dbpedia.org/resource/Republic_of_Ireland
                                                                                            http://dbpedia.org/resource/European_Union
               8                                                                            http://dbpedia.org/resource/Treaty_of_Lisbon

               7
                                 possible impact

               6

               5

               4

               3

               2

               1

               0
                   2009-06-20



                                2009-07-04



                                             2009-07-18



                                                               2009-08-01



                                                                            2009-08-15



                                                                                             2009-08-29



                                                                                                          2009-09-12



                                                                                                                       2009-09-26



                                                                                                                                    2009-10-10



                                                                                                                                                 2009-10-24



                                                                                                                                                              2009-11-07
       13

Monday 26 April 2010
Do Real-world Events have an Impact on LOD Usage?

                                                            Michael Jackson Memorial Service (smoothing factor 0.05)
               4.5
                                                                                        http://dbpedia.org/resource/Staples_Center
                                                                  http://dbpedia.org/resource/Michael_Jackson_memorial_service
                 4                                                                    http://dbpedia.org/resource/Michael_Jackson

               3.5

                 3

               2.5

                 2
                                                            possible impact
               1.5

                 1

               0.5

                 0
                     2009-06-20



                                  2009-07-04



                                               2009-07-18



                                                                   2009-08-01



                                                                                2009-08-15



                                                                                             2009-08-29



                                                                                                          2009-09-12



                                                                                                                       2009-09-26



                                                                                                                                    2009-10-10



                                                                                                                                                 2009-10-24



                                                                                                                                                              2009-11-07
       14

Monday 26 April 2010
Conclusion (of sorts)


        •Generic approach for analysing usage of LOD sites (but
         see below), based on server log files
        •Metric for semanticity of agents
        •Did not notice a rising demand in LOD
        •However: real-world events do seem to have an effect
         on LOD usage
        •Restrictions:
              – does not work well with embedded metadata (e.g., RDFa-based
                sites)
              – does not take into account usage through meta sites (indexes,
                search engines, ...)


       15

Monday 26 April 2010

More Related Content

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Learning from Linked Open Data Usage

  • 1.  Copyright 2010 Knud Möller Except where otherwise noted, this work is licensed under http://creativecommons.org/licenses/by-sa/3.0/ Learning from Linked Open Data Usage: Patterns & Metrics Knud Möller, Michael Hausenblas, Richard Cyganiak, Gunnar Grimnes, Siegfried Handschuh WebScience 2010, Raleigh, NC, USA 26/04/2010 13/03/2008 FAST kick-off, Madrid, 2008  Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Monday 26 April 2010
  • 2. What is Linked (Open) Data? (in <1 minute) Conventional “Eye-ball” Web Web of Linked Data interlinked documents interlinked items of data (URIs, RDF) mainly people / Web mainly machine agents browsers 2 Monday 26 April 2010
  • 3. What is Linked (Open) Data? (in <1 minute) Linked Open Data cloud (the set of interlinked, Semantic Web datasets) February 2008 http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData July 2009 3 Monday 26 April 2010
  • 4. Question: How is Linked Data being Used? •plenty of research on conventional Web usage •what about usage of linked data? Why? •how healthy is the Web of linked data? •who is using the data and how? Is it useful? Are there trends? •providers: improve hosting •... just curiosity! 4 Monday 26 April 2010
  • 5. Question: How is Linked Data being Used? •plenty of research on conventional Web usage •what about usage of linked data? Why? •how healthy is the Web of linked data? •who is using the data and how? Is it useful? Are there trends? ics? •providers: improve hosting e tr m •... just curiosity! e bo w 4 Monday 26 April 2010
  • 6. Approach •particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) 5 Monday 26 April 2010
  • 7. Approach •particular sites: – a URI for each data item ➙ a request for each data item (resource) – content negotiation best practices – redirection (HTTP 303) http://data.semanticweb.org/ conference/www/2009 plain resource URI RDF HTML document URI document URI http://data.semanticweb.org/ http://data.semanticweb.org/ conference/www/2009/rdf conference/www/2009/html 5 Monday 26 April 2010
  • 8. Approach (ctd.) •server log files – common log format (CLF), combined log format Request IP Request Date Request String 80.219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0" 200 64674 "-" "ARC Reader (http://arc.semsol.org/)" Response Code Responce Size Referrer User Agent •RDF requests vs. “semantic” requests •90.21.243.141 − − [06/Oct/2008:16:07:58 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands HTTP/1.1” 303 7592 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)” •90.21.243.141 − − [06/Oct/2008:16:08:02 +0100] ”GET /organization/vrije −universiteit−amsterdam−the−netherlands/rdf HTTP/1.1” 200 45358 ”−” ”rdflib −2.4.0 (http://rdflib.net/; eikeon@eikeon.com)” 6 Monday 26 April 2010
  • 9. se Code Responce Size Referrer User Agent Source Data Figure 1: The combined log format # triples # days total # hits # plain hits # RDF hits # HTML hits SPARQL Dog Food 79,175 597 8,427,967 1,923,945 259,031 1,647,205 879,932 (14,117) (3,223) (434) (2,759) (1,471) DBpedia 109,750,000 118 87,203,310 22,821,475 7,008,310 22,999,237 20,972,630 (739,011) (193,402) (59,392) (194,909) (177,734) DBTune 74,209,000 61 7,467,125 1,952,185 1,135,509 677,904 3,055,493 (122,412) (32,003) (18,615) (11,113) (50,090) RKBExplorer 91,501,684 29 529,938 — — — 9,327 (18,274) (—) (—) (—) (322) RDF 5.8% Semantic 2.8% Table 1: Overview of four 4.2% datasets Semantic LOD Semantic 2.5% RDF 14.9% RDF 7.8% are served. For our evaluation, we had access to log taining a SPARQL query, we assume that it is Plain 47.7% two periods: from 24/05/2009–21/06/2009 and from ble of 45% Plain handling the query result, i.e., either a Plain 41.0% 2009–29/10/2009, i.e., roughly two months. bindings (in the case of a SELECT query), pote containing URIs of RDF resources, or an RDF RKBExplorer (in the case of a CONSTRUCT or DESCRIBE q BExplorer6 [11] is another meta-dataset currently com- 44 sub-datasets covering various topics and sources HTML 46.5% • RDF requests: if an agent directly requests HTML 39.9% HTML 51.1% the domain of academic research, as well as a Web from a server, we assume that it knows how t ation that allows users to access and browse its content cess data in this format. Directly here mean DBpedia ntegrated fashion. Both RDF and HTML documents DBTune the agent specified an RDF syntax such as rd Dog Food the resources in all datasets are available. Apart from as an acceptable response in the header of its re g linked data, the site also features a module that Merely requesting the URI of an RDF represen es co-reference resolution functionality [10]. For our does not suffice to indicate semanticity, as this 7 tion, we had access to log files in the period from simply mean that the agent followed a link to th 2009–21/06/2009, i.e., roughly one month. However, resentation. Monday 26 April 2010
  • 10. Agents: Ordinary Traffic http://data.semanticweb.org, 21/07/2008 - 20/06/2009 500000 hits 3) 83 ordinary traffic: the usual suspects 66 8 97 37 23 ) (4 13 59 400000 ot (1 B p ) le 28 ur & ) g 11 Sl 89 oo 92 11 o! G (1 ho t( 300000 bo er 5) Ya 32 ch sn 12 et m hits eF r( le ic w nd ra Si 200000 2) tic 34 ul ) 08 (7 m 68 .0 /1 r( ot de fb 100000 ea rd R C R A 0 0 5 10 15 20 25 30 SW Dog Food (21/07/2008 - 20/06/2009) agents 8 Monday 26 April 2010
  • 11. semantic hits/total hits (>100 semantic hits) 9 0 0.2 0.4 0.6 0.8 1 Monday 26 April 2010 attributor/1.13.2 triplr sindicebot rdflib-2.4.2 Ripple OL_Virtuoso_RDF_crawler Morph_Converter_Service Falconsbot Speedy Slug_SW_Crawler yacybot hclsreport-crawler MJ12bot PycURL heritrix/1.14.3 SindiceFetcher heritrix/pom.version heritrix/2.0.2 multicrawler SindiceBot ia_archiver Zitgist-APlusPlus-Agent rdflib-2.4.1 Mp3Bot curl Agents: How “Semantic” are they? Zend_Http_Client Speedy_Spider nxcrawler marbles - Java rdflib-2.4.0 (unknown) ARC_Reader MLBot Mozilla Jakarta_HttpClient Wget libwww-perl MSIE Firefox Python-urllib sindice_ontology_fetcher semantic traffic: new kinds of agents Googlebot
  • 12. Is Demand for LOD increasing? Dog Food Hits over Time (smoothing factor 0.05) 6000 plain html rdf 5000 semantic 4000 3000 2000 1000 no increase for semantic requests 0 2008-07-01 2008-09-01 2008-11-01 2009-01-01 2009-03-01 2009-05-01 2009-07-01 2009-09-01 2009-11-01 2010-01-01 2010-03-01 2010-05-01 10 Monday 26 April 2010
  • 13. Is Demand for LOD increasing? (ctd.) DBpedia Hits over Time (smoothing factor 0.05) 300000 plain html rdf 250000 semantic 200000 150000 100000 50000 no increase for semantic requests 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 11 Monday 26 April 2010
  • 14. Do Real-world Events have an Impact on LOD Usage? Demand for Events (smoothing factor 0.05) 700 iswc2008 www2009 600 possible impact eswc2009 iswc2009 500 400 300 200 100 0 2008-07-01 2008-09-01 2008-11-01 2009-01-01 2009-03-01 2009-05-01 2009-07-01 2009-09-01 2009-11-01 2010-01-01 2010-03-01 2010-05-01 12 Monday 26 April 2010
  • 15. Do Real-world Events have an Impact on LOD Usage? Irish Lisbon Treaty Referendum (smoothing factor 0.05) 9 http://dbpedia.org/resource/Republic_of_Ireland http://dbpedia.org/resource/European_Union 8 http://dbpedia.org/resource/Treaty_of_Lisbon 7 possible impact 6 5 4 3 2 1 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 13 Monday 26 April 2010
  • 16. Do Real-world Events have an Impact on LOD Usage? Michael Jackson Memorial Service (smoothing factor 0.05) 4.5 http://dbpedia.org/resource/Staples_Center http://dbpedia.org/resource/Michael_Jackson_memorial_service 4 http://dbpedia.org/resource/Michael_Jackson 3.5 3 2.5 2 possible impact 1.5 1 0.5 0 2009-06-20 2009-07-04 2009-07-18 2009-08-01 2009-08-15 2009-08-29 2009-09-12 2009-09-26 2009-10-10 2009-10-24 2009-11-07 14 Monday 26 April 2010
  • 17. Conclusion (of sorts) •Generic approach for analysing usage of LOD sites (but see below), based on server log files •Metric for semanticity of agents •Did not notice a rising demand in LOD •However: real-world events do seem to have an effect on LOD usage •Restrictions: – does not work well with embedded metadata (e.g., RDFa-based sites) – does not take into account usage through meta sites (indexes, search engines, ...) 15 Monday 26 April 2010