2. Möller, K., Hausenblas, M., Cyganiak,
R., Grimnes, G., and Handschuh, S.
(2010). Learning from linked open
data usage: Patterns & metrics. In
WebScience 2010, Raleigh, NC, USA.
http://journal.webscience.org/302/
2
3. Linked Data
Conventional “Eye-ball” Web Web of Linked Data
interlinked documents interlinked items of data
(URIs, RDF)
mainly people / Web mainly machine agents (but
browsers also people?)
3
4. Linked Data
Conventional “Eye-ball” Web Web of Linked Data
interlinked documents interlinked items of data
(URIs, RDF)
mainly people / Web mainly machine agents (but
browsers also people?)
3
5. How is Linked Data being
used?
• plenty of research on conventional Web
usage
• what about usage of linked data?
Why?
• how healthy is the Web of linked data?
• who is using the data and how? Is it useful?
Are there trends?
• providers: improve hosting
• ... just curiosity! 4
6. Approach
particular sites:
– a URI for each data item ➙ a request for each data item
(resource)
– content negotiation best practices
– redirection (HTTP 303)
5
7. Approach
particular sites:
– a URI for each data item ➙ a request for each data item
(resource)
– content negotiation best practices
– redirection (HTTP 303)
http://data.semanticweb.org/
conference/www/2009
plain
resource URI
RDF HTML
document URI document URI
http://data.semanticweb.org/ http://data.semanticweb.org/
conference/www/2009/rdf conference/www/2009/html 5
9. 219.211.147 - - [23/May/2009:09:52:03 +0100] "GET /sparql?query=PREFIX [..] LIMIT+200 HTTP/1.0"
200 64674 "-" "ARC Reader (http://arc.semsol.org/)"
nse Code Responce Size Referrer
Source Data
User Agent
Figure 1: The combined log format
# triples # days total # hits # plain hits # RDF hits # HTML hits SPARQL
Dog Food 79,175 597 8,427,967 1,923,945 259,031 1,647,205 879,932
(14,117) (3,223) (434) (2,759) (1,471)
DBpedia 109,750,000 118 87,203,310 22,821,475 7,008,310 22,999,237 20,972,630
(739,011) (193,402) (59,392) (194,909) (177,734)
DBTune 74,209,000 61 7,467,125 1,952,185 1,135,509 677,904 3,055,493
(122,412) (32,003) (18,615) (11,113) (50,090)
RKBExplorer 91,501,684 29 529,938 — — — 9,327
(18,274) (—) (—) (—) (322)
RDF 5.8% Semantic 2.8% Table RDF 14.9% Semantic 4.2% datasets
1: Overview of four LOD RDF 7.8% Semantic 2.5%
s are served. For our evaluation, we had access to log
Plain 47.7%
taining a SPARQL query, we assume that it is
Plain 45% Plain 41.0%
two periods: from 24/05/2009–21/06/2009 and from ble of handling the query result, i.e., either a
/2009–29/10/2009, i.e., roughly two months. bindings (in the case of a SELECT query), pote
containing URIs of RDF resources, or an RDF
RKBExplorer (in the case of a CONSTRUCT or DESCRIBE q
BExplorer6 [11] is another meta-dataset currently com-
HTML 46.5% HTML 39.9%
44 sub-datasets covering various topics and sources • RDF requests: if an agent directly requests
HTML 51.1%
the domain of academic research, as well as a Web from a server, we assume that it knows how t
ation that allowsDBpedia
users to access and browse its content cess data in this format. 7 Directly here mean
DBTune the agent specified an RDF syntax such as r
Dog Food
ntegrated fashion. Both RDF and HTML documents
the resources in all datasets are available. Apart from as an acceptable response in the header of its re
10. Agents
http://data.semanticweb.org, 21/07/2008 - 20/06/2009
500000
hits
3)
83
66 8
97 ordinary traffic: the usual suspects
37 23
)
(4
13 59
400000
ot
(1
eB
rp
)
28
gl
lu
&
)
11
89
oo
!S
92
11
G
oo
(1
t(
h
300000
bo
er
5)
Ya
32
ch
sn
12
et
m
hits
eF
r(
le
ic
w
nd
ra
Si
200000
)
tic
42
ul
3
8)
(7
m
80
.0
(6
/1
r
ot
de
fb
100000
ea
rd
R
C
R
A
0 8
0 5 10 15 20 25 30
agents
18. USEWOD Data Challenge
Moving forward by releasing a dataset:
• to relieve difficulty of obtaining real-life usage
data
• to allow comparison and combination of
approaches done on the same dataset
• by releasing a relatively new type of logs: usage
on the Web of Data.
19. USEWOD Data Challenge
Moving forward by releasing a dataset:
• to relieve difficulty of obtaining real-life usage
data
• to allow comparison and combination of
approaches done on the same dataset
• by releasing a relatively new type of logs: usage
on the Web of Data.
Also for longer-term use.
20. The USEWOD Dataset 2011
Server logs from two major web of data
servers:
• DBPedia
• Several weeks during 2 months of requests
• Semantic Web Dog Food
• 2 years of requests from Dec 2008 – Dec 2010
22. USEWOD 2011 Challenge
Participants
• At the time of the workshop 11 groups had
requested the 2011 data
23. USEWOD 2011 Challenge
Participants
• At the time of the workshop 11 groups had
requested the 2011 data
• By now 28.
24. USEWOD 2011 Challenge
Participants
• At the time of the workshop 11 groups had
requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
25. USEWOD 2011 Challenge
Participants
• At the time of the workshop 11 groups had
requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
• Winner of the 2011 USEWOD data challenge prize:
• Mario Arias Gallego, Javier D. Fernández, Miguel A.
Martínez-Prieto and Pablo De La Fuente. An Empirical
Study of Real-World SPARQL Queries.
26. USEWOD 2011 Challenge
Participants
• At the time of the workshop 11 groups had
requested the 2011 data
• By now 28.
• 7 data challenge paper submissions
• Winner of the 2011 USEWOD data challenge prize:
• Mario Arias Gallego, Javier D. Fernández, Miguel A.
Martínez-Prieto and Pablo De La Fuente. An Empirical
Study of Real-World SPARQL Queries.
27. The USEWOD Dataset 2012
Server logs from two major web of data
servers:
• DBPedia
• Several weeks during 2 months of requests
• Semantic Web Dog Food
• 2 years of requests from Dec 2008 – Dec 2010
• Linked Open Geo Data
• Bio2RDF
28. USEWOD 2012 Challenge
Participants
• 20 groups requested the data, so far.
• 2 data challenge paper submissions…
• 1 winner of the USEWOD data
challenge prize.
• kindly brought to you by LATC
33. Semantic Web Dog Food
[Screenshots and image take from http://data.semanticweb.org/]
34. Semantic Web Dog Food
[Screenshots and image take from http://data.semanticweb.org/]
35. Semantic Web Dog Food
[Screenshots and image take from http://data.semanticweb.org/]
36. Requests for Human / Machine
readable Web data
Both servers get requests for RDF
• http://dbpedia.org/data/Berlin.rdf
as well as HTML
• http://dbpedia.org/page/Berlin
And requests for the URI itself:
• http://dbpedia.org/resource/Berlin (will be
served HTML or RDF)
37. Requests to SPARQL endpoints
• Both servers have a SPARQL endpoint
to request RDF data:
SELECT DISTINCT ?s ?t ?y ?to ?h
WHERE {
?s dc:title ?t .
?s swrc:year ?y .
OPTIONAL {?s foaf:homepage ?h }.
OPTIONAL {?s foaf:topic ?t }
}
order by desc(?y”)
LIMIT 200
39. Anonymizing the USEWOD
Dataset
• IP addresses:
• replace all IPs with 0.0.0.0
• add the country code for the original IP
address -> track location of requests
• add an identifier of the original IP -> track
individual requestors
40. USEWOD2011, Hydebarabad,
India
• M. Kirchberg, R. K. L. Ko, and B. S.
Lee. From linked data to relevant data
- time is the essence. - http://
arxiv.org/abs/1103.5046
• M. A. Gallego, J. D. Fernández, M. A.
Martínez-Prieto, and P. D. L. Fuente.
An empirical study of real-world
SPARQL queries. - http://arxiv.org/
abs/1103.5043 25
41. USEWOD2012, Lyon, France
• A. Raghuveer. Characterizing Machine
Agent Behavior through SPARQL Query
Mining. - http://ir.ii.uam.es/
usewod2012/
usewod2012_raghuveer.pdf
• J. Hoxha, M. Junghans, S. Agarwal.
Enabling Semantic Analysis of User
Browsing Patterns in the Web of Data.
- http://arxiv.org/abs/1204.2713
26
42. What could be improved?
• does not work well with embedded metadata (e.g.,
RDFa-based sites)
• does not take into account usage through meta sites
(indexes, search engines, mirrors, ...)
• does (probably) not take into account usage through
apps
• what about caches?
• what about bulk/dump downloads of data?
• not enough usage to be interesting yet? 27
Hinweis der Redaktion
- not so much about USEWOD in general, but more about the challenge data in particular\n
\n
\n
\n
- you have a URI for the thing itself, the subject of a document\n- you have different URIs for documents about that thing\n- servers would be set up so that they would give a document back, based on the kind of data that the requesting agent wants\n- that shows up in the server logs\n
\n
\n
\n
- ratio of semantic/total hits\n
\n
\n
\n
- challenge dataset grew from the dataset used in this paper\n- has been constantly growing since then\n
\n
\n
\n
Close to (in the same university) of some of the people behind the Dbpedia project.\nOne of the main drivers of this project.\n
Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
Observations about the logs: \nwhich are the most used language elements \nWhat are the characteristics of the triple patterns used?\nVery insightful. \ndesigning query evaluation engines or fine-tuning RDF stores.\n
TODO: Ask knud: the same dbpedia data?\nTODO: ask knud: what time span is the data from?\nOpen Geo Data: OpenStreetMap as RDF\nBio2RDF Linked Data for life sciences\nCredits go to Knud and Markus.\nMarkus is Close to (in the same university) of some of the people behind the Dbpedia project.\nKnud was One of the main drivers of this project.\n
Todo: check getallen\n
The linked data twin of Wikipedia\n
The linked data twin of Wikipedia\n
The linked data twin of Wikipedia\n
Screenshot of dbpedia\n
Screenshot of dbpedia\n
\n
Todo ander voorbeeld.\n
Raw data, but anonymization\n
- last year two interesting papers providing analysis of the dataset\n- note: not all USEWOD papers are about the challenge dataset, just like this year\n