SlideShare ist ein Scribd-Unternehmen logo
1 von 29
What Are Links in Linked Open Data?
A Characterization and Evaluation of Links between
Knowledge Graphs on the Web
[Invited Talk 04/08/21 at spatial@UCSB]
Armin Haller
Associate Professor, ANU
Armin Haller, Javier D. Fernández, Maulik R. Kamdar, Axel Polleres: What Are Links in Linked Open Data?
A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM Journal of Data
and Information Quality 12(2): 9:1-9:34 (2020)
Knowledge Graphs (KGs)
“A Knowledge Graph is a a graph of data intended to accumulate and
convey knowledge of the real world, whose nodes represent entities of
interest and whose edges represent relations between these entities.”
[Hogan et. al 2020]
• Knowledge graphs are created collaboratively by many users
• Information can be added in a relatively arbitrary manner as
structural constraints are few
Closed KGs (~2019)
Microsoft ~2bn entities, ~55bn facts
Google ~1bn entities, ~70bn assertions
Facebook ~50m entities, ~500m assertions
eBay ~1bn triples
IBM ~100m entities, 5bn relationships
Open KGs (April 2021)
DBpedia ~4.58m entities, ~9.25GB
Yago4 ~50m entities, ~18.4GB
Wikidata ~93m entities, ~99GB
N. Noy, Y. Gao, A. Jain, A. Narayanan, A. Patterson, J. Taylor: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019.
A. Hogan et al.: Knowledge Graphs. CoRR abs/2003.02320, 2020.
Limited
many entities
Generic
applies to many
Specific
applies to few
RDF Knowledge Graphs
Comprehensive
fewer entities
ABox (Data)
TBox (Schema)
Q58043963
Q76
Barack Obama
(3,947 axioms)
Armin Haller
(189 axioms)
P361
Q35120
Entity
partOf
minimum
no of players
Chess Person Q73145133
P1872
KG development
methodology
Top-Down
Schema first,
Data later
Bottom Up
Data first,
Schema later
ABox (Data)
TBox (Schema)
Linked (Open) data principles
• LDP1: Use URIs as identifiers for things;
• LDP2: Use HTTP URIs so those identifiers can be
dereferenced;
• LDP3: return useful information upon dereferencing of those
URIs using a standard format (typically, RDF);
• LDP4: include links using externally dereferenceable URIs
T. Berners-Lee. 2006. Linked Data. W3C Design Issues. From http://www.w3.org/DesignIssues/LinkedData.html, 2010.
Challenges with Links in Linked Data
(KGs)
• References to many inaccessible URIs (i.e., broken
links) may render a dataset largely useless
• Changes in linked external dataset are out of control of
the data publisher
• No definition of what constitutes “internal links” (i.e.,
links between parts of one coherent dataset/KG), and
“external links” (i.e., links between different
datasets/KGs)
Related Work
Availability and Discoverability of Linked Open Data sources
• P.-Y. Vandenbussche, J. Umbrich, L. Matteis, A. Hogan, C. B. Aranda. 2017. SPARQLES: Monitoring public SPARQL endpoints. Semantic Web 8, 6
(2017), 1049–1065.
• J. Debattista, C. Lange, S. Auer, and D. Cortis. 2018. Evaluating the Quality of the LOD Cloud: An Empirical Investigation. Semantic Web 9, 6 (2018),
859–901.
• A. Polleres, M. R. Kamdar, J. D. Fernández, T. Tudorache, and Mark A. Musen. 2018. A More Decentralized Vision for Linked Data. In Proc. of the
2nd Workshop on Decentralizing the Semantic Web, co-located with ISWC, Vol. 2165. CEUR-WS.org, Monterey, CA, USA, 2019.
Metadata Representation and Quality
• K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note 03
March 2011. W3C.
• A. Zaveri, A.a Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. 2016. Quality assessment for Linked Data: A Survey. Semantic Web 7, 1
(2016), 63–93.
• A. Hogan, J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of Linked Data conformance. Journal of Web
Semantics 14 (2012), 14 – 44.
• L. Rietveld, W. Beek, R. Hoekstra, and S. Schlobach. 2017. Metadata for a lot of LOD. Semantic Web 8, 6 (2017), 1067–1080.
Authoritative Namespaces and Links Between Linked Datasets
• Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the Linked Data Best Practices in Different Topical Domains. In Proc.
of ISWC. LNCS, Riva del Garda, Italy, 245–260.
• A. Harth, S. Kinsella, and S. Decker. 2009. Using Naming Authority to Rank Data and Ontologies for Web Search. In Proc. of ISWC. LNCS,
Washington, DC., USA, 277–292.
• Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proc. s of the WWW2010
Workshop on Linked Data on the Web, LDOW (CEUR Workshop Proceedings), Vol. 628. CEUR-WS.org, Raleigh, USA, 1–10.
• A. S. Butt, A. Haller, and L. Xie. 2014. Ontology Search: An Empirical Evaluation. In Proc. of ISWC. LNCS, Riva del Garda, Italy, 130–147.
Linked Data Profiling and Link Analysis Tools
• C. Böhm, F. Naumann, Z. Abedjan, D. Fenz, T. Grütze, D. Hefenbrock, M. Pohl, and D. Sonnabend. 2010. Profiling linked open data with ProLOD. In
Proc. of 26th ICDEW 2010. IEEE, Long Beach, CA, USA, 17–178.
• N. Mihindukulasooriya, M. P.-Villalón, R. García-Castro, and A. Gómez-Pérez. 2015. Loupe - An Online Tool for Inspecting Datasets in the Linked
Data Cloud. In Proc. of ISWC (Posters & Demos). CEUR-WS.org, Bethlehem, PA, USA.
• C. B. Neto, K. Müller, M. Brümmer, D. Kontokostas, and S. Hellmann. 2016. LODVader: An Interface to LOD Visualization, Analytics and DiscovERy
in Real-time. In Proc. of 25th WWW Conference. ACM, Montreal, Quebec, Canada, 163–166.
• M. Ben Ellefi, Z. Bellahsene, S. Dietze, and K. Todorov. 2016. Dataset Recommendation for Data Linking: An Intensional Approach. In Proc. of
ESWC. LNCS, Crete, Greece, 36–51.
• J. Debattista, S. Auer, and C. Lange. 2016. Luzzu - A Methodology and Framework for Linked Data Quality Assessment. Journal of Data and
Information Quality 8, 1 (Oct. 2016), 4:1–4:32.
What is a dataset?
• No notion about the sets of triples which form a dataset/KG
– Linked Data datasets/KGs published on the Web are often partitioned
into several files, are made available through Linked Data APIs or are
in separate named graphs behind SPARQL endpoints
• Common practice suggests that single datasets and the URIs
“belonging” to these datasets can be referred to by sharing a
common namespace
• This notion of a namespace is typically not tied to a notion of
authority, as opposed to the original intention of URIs in the
Web architecture, where authority is an integral part of URIs:
URI = scheme ":" [//authority] path ["?"query] ["#"fragment]
What is a dataset?
• In RDF a namespace and thereby authority depends on the RDF serialization
and if the prefix of an identifier determining the namespace is clearly recognizable
as such or not (as opposed to XML)
• Best practice suggest to declare certain namespace prefixes to be authoritatively
owned by the dataset within metadata (using for example
vann:preferredNamespacePrefix, vann:preferredNamespaceUri).
Examples of namespaces and their prefixes:
dbr: http://dbpedia.org/resource/
dbo: http://dbpedia.org/ontology/
foaf: http://xmlns.com/foaf/0.1/
wd: http://www.wikidata.org/entity/
– However, 53.8% of all datasets in our analysis did not explicitly declare their namespace(s)
What is a link?
• Links in Linked Data do not have a clear definition
of direction (in contrast to hyperlinks)
t1:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
t2:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, dbo:Person]
t3:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, “Wolfgang Amadeus Mozart”@en]
• Direction of the link does not depend on the URI in
its subject, but rather on the fact in which dataset
(KG) the triple appears, e.g., 𝑡1 can be defined in
DBpedia or Wikidata, in either direction.
Dataset
Definition 1: A dataset is a collection of
one or more associated RDF graphs,
published by a single controlling entity.
Given a dataset ds, we denote by Gds the
merge of all of its graphs.
Namespace
Definition 2: Let us assume that each
dataset uses a finite set of namespaces,
some of which it controls authoritatively.
Given a dataset 𝑑𝑠, we denote by 𝑁𝑆𝑑𝑠 the
set of its authoritative namespaces for 𝑑𝑠.
We assume each namespace is
authoritatively controlled by at most a single
dataset. That is, we assume that 𝑑𝑠1 ≠ 𝑑𝑠2
implies that 𝑁𝑆𝑑𝑠1
∩ 𝑁𝑆𝑑𝑠2
=∅.
Non-standard use
Definition 3: (Non-Standard-use). Let 𝑅𝐷𝐹, 𝑅𝐷𝐹𝑆, 𝑂𝑊𝐿 and 𝑋𝑆𝐷,
respectively, denote the reserved namespaces. Let 𝐺𝑅𝐷𝐹, 𝐺𝑅𝐷𝐹𝑆,
and 𝐺𝑂𝑊𝐿, resp., denote the RDF graphs accessible at these URIs,
where we write 𝐺𝑟𝑒𝑠 = 𝐺𝑅𝐷𝐹 ∪ 𝐺𝑅𝐷𝐹𝑆 ∪ 𝐺𝑂𝑊𝐿. A non-standard triple
in any RDF graph other than 𝐺𝑟𝑒𝑠 is a triple where:
• a class in 𝐺𝑟𝑒𝑠 appears in a position other than as the value of
rdf:type, or
t:[rdfs:Class, rdf:subClassOf, rdf:Property]
• a property in 𝐺𝑟𝑒𝑠 appears outside of the predicate position.
t:[rdfs:range, rdf:subPropertyOf, rdf:Property]
Class position
Definition 4: A URI 𝑢 outside of one of the reserved
namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a
class position if:
• 𝑠 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑜𝑤𝑙: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠 ∨
(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓𝑠: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠}
t:[foaf:Person, rdfs:subClassOf, foaf:Agent]
• 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑜𝑤𝑙: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠 ∨
(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓𝑠: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠}
t:[foaf:knows, rdfs:range, foaf:Person]
• 𝑜 = 𝑢 ∧ 𝑝 = 𝑟𝑑𝑓: 𝑡𝑦𝑝𝑒
t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, dbo:Person]
Property position
Definition 5: A URI 𝑢 outside of the reserved namespaces
in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a property position if:
• 𝑠 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑜𝑤𝑙: 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} ∪
{𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓: 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠}
t:[foaf:knows, rdfs:domain, foaf:Person]
• 𝑝 = 𝑢
t:[wd:Q58043963, foaf:knows, wd:Q54860587]
• 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑜𝑤𝑙: 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} ∪
{𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓: 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠}
t:[foaf:homepage, rdfs:subPropertyOf, foaf:page]
Property position (cont’d)
Definition 6: A URI 𝑢 outside of the reserved
namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a
datatype position if:
• 𝑠 = 𝑢 ∧ 𝑝 ∈ 𝑝 𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓𝑠: 𝐷𝑎𝑡𝑎𝑡𝑦𝑝𝑒 ∈ 𝐺𝑟𝑒𝑠
t:[sosa:resultTime, rdfs:range, xsd:dateTime]
• 𝑢 occurs as the datatype of a typed literal 𝑜 = ”𝑙”^^𝑢
t:[ex:observation/123, sosa:hasSimpleResult, "12.4m"^^cdt:length]
• 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓𝑠: 𝐷𝑎𝑡𝑎𝑡𝑦𝑝𝑒) ∈ 𝐺𝑟𝑒𝑠}
t:[wd:Q58043963, foaf:name, “Armin Haller”]
Instance position
Definition 7: A URI 𝑢 outside of the
reserved namespaces in an RDF triple
𝑡 = (𝑠, 𝑝, 𝑜) that is neither in a class, nor
property, nor datatype position, is in an
instance position.
t:[ex:observation/108, sosa:observedProperty, ex:tree/124/height]
Link Types
Definition 8: Let 𝑑𝑠1, 𝑑𝑠2 be datasets. Then, we call triple 𝑡 ∈ 𝐺𝑑𝑠1
a link from 𝑑𝑠1 to
𝑑𝑠2, if 𝑡 contains a URI 𝑢 from a namespace in 𝑁𝑆𝑑𝑠2
. Depending on the position of 𝑢 we
distinguish:
• 𝑡 is called an instance link, if 𝑢 is in an instance position in 𝑡
t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
• otherwise, 𝑡 is called an ontology link, where we further distinguish:
– 𝑡 is called a class link, if 𝑢 is in a class position other than the 𝑜 position of an rdf:type triple
t:[dbo:Person, rdfs:subClassOf, foaf:Person]
– 𝑡 is called an instance typing link, if 𝑢 is in the class position 𝑢 = 𝑜 of an rdf:type triple
t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person]
– 𝑡 is called a property link, if 𝑢 is in a property position other than 𝑝
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang Amadeus Mozart"@en]
– 𝑡 is called an instance role link, if 𝑢 is in the property position 𝑢 = 𝑝
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088] (Antonio Salieri)
• If 𝑢 does not appear in 𝐺𝑑𝑠2
, we call 𝑡 a broken link.
Empirical dataset
• Crawl of the LODcloud + historical datasets from the LODcloud that were
cached in the LODLaundromat
• 430 Linked datasets in resulting corpus, each encoded in HDT for a total
size of 51 GB (3.3bn triples)
% of total Available Available as % of total
Total # of datasets 1,359 100%
SPARQL endpoint 459 33.5% 125 9.1%
Available as download 890 65.4% 226 16.6%
Characteristic Median Mean
Number of Triples 4,478 17,860,436
Number of Unique Subjects 613 1,774,578
Number of Unique Predicates 31 65.4%
Number of unique objects 2,245 5,296,390
A. Abele, J. P. McCrae, P. Buitelaar, A. Jentzsch, and R. Cyganiak. 2017. Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre.
Ontology corpus
• Ontologies typically only consist of terminological axioms 𝑇 (TBox),
they may also include a set of assertional axioms 𝐴 (ABox) (e.g.,
codelists or thesaurological terms)
• Datasets registered in the LODcloud are typically not ontologies (i.e.,
in our analysis only 3/430 are ontologies)
• Ontology corpus created through a crawl of prefix.cc and through the
analysis of all ontology links in our empirical dataset
# of unique Classes 204,616
# of unique Properties 1,821
Ratio 1/112
Authoritative namespace
• To identify links (ontology or instance links) we need to identify
the dataset authority for each namespace
1. Use HDT to and extract all namespaces in each RDF dataset
2. Compute the ‘relative occurrence’ of each namespace in the dataset
3. Check if a namespace that is extensively used in a dataset is in fact
an external link to a dataset that is not in the corpus. Therefore, we
define only one authoritative namespace for each dataset
4. Only consider the Pay Level Domains (PLD) of the authoritative name
# of datasets in our corpus 430
# of datasets with authoritative namespace 395
# of datasets with namespace in LOD Cloud metadata 257
# of datasets matching authoritative namespace and LOD Cloud metadata 162
Link analysis
Class Links
http://vivo.iu.edu 119,538
http://vivo.scripps.edu 63,128
http://www.imagesnippets.com 12,874
http://core.kmi.open.ac.uk 9,143
http://commons.wikimedia.org 8,258
http://vivo.psm.edu 8,036
http://datos.bne.es 2,778
http://dbpedia.org 1,614
http://www.productontology.org 1,000
http://vivoweb.org 84
http://commons.wikimedia.org 4,995
http://datos.bne.es 1,255
http://vivo.iu.edu 510
http://vivo.psm.edu 481
http://vivoweb.org 386
http://vivo.scripps.edu 187
http://semanticscience.org 168
http://www.iupac.org 102
http://dbpedia.org 101
http://tkm.kiom.re.kr 60
Property Links
Median 0
Mean 1,299
% above 0 44%
Median 0
Mean 47
% above 0 18%
Link analysis (cont’d)
• Most used class/property URIs (other than RDFS/OWL
URIs) in our corpus
Class URI # of
datasets
http://rdfs.org/ns/void#Dataset 118
http://rdfs.org/ns/void#Linkset 90
http://xmlns.com/foaf/0.1/Person 74
http://persistence.uni-
leipzig.org/nlp2rdf/ontologies/nif-core#Word
65
http://persistence.uni-
leipzig.org/nlp2rdf/ontologies/nif-core#Sentence
64
http://www.w3.org/2004/02/skos/core#Concept 56
http://xmlns.com/foaf/0.1/Organization 51
http://vivoweb.org/ontology/core#CoreLaboratory 30
http://vivoweb.org/ontology/core#Center 28
http://xmlns.com/foaf/0.1/Agent 24
Property URI # of
datasets
http://purl.org/dc/terms/title 163
http://purl.org/dc/terms/creator 140
http://purl.org/dc/terms/description 134
http://xmlns.com/foaf/0.1/homepage 125
http://purl.org/dc/terms/publisher 112
http://purl.org/dc/terms/subject 105
http://rdfs.org/ns/void#vocabulary 103
http://purl.org/dc/terms/modified 98
http://rdfs.org/ns/void#exampleReso
urce
96
http://rdfs.org/ns/void#subset 88
Instance Typing Links
Link analysis (cont’d)
Instance Links
http://webisa.webdatacommons.org 101,491,507
http://commons.wikimedia.org 100,022,186
http://lod.b3kat.de 40,674,519
http://lod.hebis.de 39,160,423
http://d-nb.info 20,096,228
http://datos.bne.es 7,419,630
http://data.ordnancesurvey.co.uk 5,653,997
http://data.europeana.eu 4,987,332
http://id.loc.gov 1,570,877
http://data.bibsys.no 1,440,011
http://ld.zdb-services.de 398,381,851
http://commons.wikimedia.org 319,988,690
http://d-nb.info 14,160,649
http://data.ordnancesurvey.co.uk 13,277,718
https://data.gov.cz 3,081,559
http://core.kmi.open.ac.uk 1,696,618
http://lod.hebis.de 1,624,579
http://id.loc.gov 1,143,545
http://data.europeana.eu 687,735
http://spraakbanken.gu.se 451,081
http://www.imagesnippets.com 214,362
http://data.coi.cz 34,277
Median 206
Mean 1,967,570
% above
0
97%
Median 206
Mean 4,240,890
% above 0 72%
Link analysis (cont’d)
• Selected predicates used in links
owl:samesAs owl:DifferentFrom Rdfs:seeAlso Owl:AllDifferent
Median 0 0 0 0
Mean 503,859 581 2,735 0
% above 0 53% <1% 14% 0
P90% 1,460 0 1 0
1st
1st #
http://commons.wikimedia.org N/A
40,636,493 103,439 324,659
2nd
2nd #
http://ld.zdb-services.de
18,049,155
N/A http://stitch.cs.vu.nl N/A
3rd
3rd #
http://d-nb.info
17,410,586
N/A http://data.nobelprize.org N/A
Link analysis (cont’d)
Total Links
http://ld.zdb-services.de 421,206,061
http://commons.wikimedia.org 420,024,129
http://webisa.webdatacommons.org 101,491,507
http://lod.hebis.de 40,785,002
http://lod.b3kat.de 40,677,795
http://d-nb.info 34,256,877
http://data.ordnancesurvey.co.uk 18,931,817
http://datos.bne.es 7,428,111
http://data.europeana.eu 5,675,067
https://data.gov.cz 3,958,043
Median 416
Mean 6,209,808
%
above 0
96%
Broken Class URIs Broken Property URIs
Prefix.cc crawl LOD corpus Prefix.cc crawl LOD corpus
HTTP Code # % # % # % # %
200 7,175 12.3% 2,579 12.8% 814 44.7% 58,108 40.9%
301 18,598 31.8% 2,610 12.9% 442 24.3% 1,137 0.8%
302 4,331 7.4% 925 0.5% 194 10.7% 1,391 1.0%
303 12,805 21.9% 3,903 19.3% 108 5.9% 5,247 3.7%
40x 12,054 20.6% 8,664 42.9% 130 7.1% 73,366 51.7%
50x 66 <0.1% 111 <0.1% 4 <0.1% 362 0.3%
No response 146,145 5.9% 1,425 7% 129 7.1% 2,332 1.6%
Total 204,616 100% 20,217 100% 1,821 100% 141,943 100%
Link analysis in Wikidata
• Wikidata by far the largest openly available KG and the only one truly built bottom-up → cause of
many modelling errors/inconsistencies (e.g., 4,182 separate properties for external identifiers)
• Not part of the LODCloud, therefore was not included in our paper, however, we did an analysis
since for the 9th of March, 2020 Wikidata dump (HDT file 49.4GB compressed)
Number of triples 3,381,623,911
Number of unique subjects 1,327,447,995
Number of predicates 32,713
Number of unique objects 2,010,015,636
Number of shared subject-object 1,173,987,281
Unique Individuals 75,261,968
Class Links 375,351,770
Property Links 2,723,834
of which sameAs links
2,723,834
Instance Typing Links 77,479,623
# of Classes 1,045,455
# of Properties 74,746
Ratio 1/14
# of unique Properties 7,259
e.g.,
P4330, contains
P150, contains administrative
territorial entity
P1383, contains settlement
P2821, by-product
P2822, by-product of ID
Propert
y, 4182
Others,
3077
e.g.,
P2014, Museum of Modern Art work ID
P6276, Amazon Music artist ID
P6145, Academy Awards Database film ID
Discussion
• Instance Typing Links are most used
– Only one’s that grow linear with dataset size
– Instance links are less popular than one would assume, and do not linearly grow with size
• Ontologies are reused widely
– Only a few datasets define their own ontology. This is a sign that:
1. Dataset publishers follow best practices and separate the ontology namespace from the authoritative
namespace of the dataset
2. There exists a large number of ontologies that cover already many domains that can be readily reused
• Need for ontology publishing best practices
– Authoritative ontology register needed (e.g., the LOV portal)
– A persistence mechanism that assigns a DOI to an ontology and persists the document
• Ubiquity of broken Class and Property links
– Alarming number of broken links, i.e., more than half of all class and property URIs
– Data publishers need to consider to replicate linked ontologies
P.-Y. Vandenbussche, G. Atemezing, M. Poveda-Villalón, and B. Vatant. 2017. Linked Open Vocabularies (LOV): A gateway to
reusable semantic vocabularies on the Web. Semantic Web 8, 3 (2017), 437–452.
Discussion (cont’d)
• Lack of ABox Links
– Many (28% of all) datasets do not use any Instance Links, and owl:sameAs is not
particularly popular at all (other than in Wikidata)
1. these links are expensive to establish manually
2. expensive to maintain, and
3. even if they exist, there is no incentive to publish them openly.
• Lack of and incorrect namespace declarations
– Only 59% of all datasets in our corpus publish namespace in their metadata, and of
those 257, only 162 match the namespace we obtained through our analysis
• Plethora of data and metadata formats
– heterogeneity of publication formats, potentially involving parse errors, constituted a
major part of the effort used for our experiments
– Once each dataset node/dump had been converted to HDT, the analysis was easy:
link computations can be done at scale on even large datasets in HDT

Weitere ähnliche Inhalte

Was ist angesagt?

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarianstrevorthornton
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the WebArmin Haller
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageOntotext
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Juan Sequeda
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaEUCLID project
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platformJindřich Mynarz
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Mathieu d'Aquin
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityMathieu d'Aquin
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Anja Jentzsch
 

Was ist angesagt? (20)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
Big Linked Data - Creating Training Curricula
Big Linked Data - Creating Training CurriculaBig Linked Data - Creating Training Curricula
Big Linked Data - Creating Training Curricula
 
Linked data as a library data platform
Linked data as a library data platformLinked data as a library data platform
Linked data as a library data platform
 
Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data Experience from 10 months of University Linked Data
Experience from 10 months of University Linked Data
 
Working with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open UniversityWorking with data.open.ac.uk, the Linked Data Platform of the Open University
Working with data.open.ac.uk, the Linked Data Platform of the Open University
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 

Ähnlich wie What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web

Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers Getaneh Alemu
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities Getaneh Alemu
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)robin fay
 
Semantic citation
Semantic citationSemantic citation
Semantic citationDeepak K
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...Carole Goble
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationReynold Xin
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureEmily Nimsakont
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeDiane Rasmussen Pennington
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedSören Auer
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 

Ähnlich wie What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web (20)

Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
A theory of Metadata enriching & filtering
A theory of  Metadata enriching & filteringA theory of  Metadata enriching & filtering
A theory of Metadata enriching & filtering
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
CSV-X
CSV-XCSV-X
CSV-X
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...NISO/NFAIS Joint Virtual Conference:  Connecting the Library to the Wider Wor...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
 
Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)Linked data presentation for libraries (COMO)
Linked data presentation for libraries (COMO)
 
Semantic citation
Semantic citationSemantic citation
Semantic citation
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 

Kürzlich hochgeladen

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 

Kürzlich hochgeladen (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 

What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web

  • 1. What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web [Invited Talk 04/08/21 at spatial@UCSB] Armin Haller Associate Professor, ANU Armin Haller, Javier D. Fernández, Maulik R. Kamdar, Axel Polleres: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM Journal of Data and Information Quality 12(2): 9:1-9:34 (2020)
  • 2. Knowledge Graphs (KGs) “A Knowledge Graph is a a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities.” [Hogan et. al 2020] • Knowledge graphs are created collaboratively by many users • Information can be added in a relatively arbitrary manner as structural constraints are few Closed KGs (~2019) Microsoft ~2bn entities, ~55bn facts Google ~1bn entities, ~70bn assertions Facebook ~50m entities, ~500m assertions eBay ~1bn triples IBM ~100m entities, 5bn relationships Open KGs (April 2021) DBpedia ~4.58m entities, ~9.25GB Yago4 ~50m entities, ~18.4GB Wikidata ~93m entities, ~99GB N. Noy, Y. Gao, A. Jain, A. Narayanan, A. Patterson, J. Taylor: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019. A. Hogan et al.: Knowledge Graphs. CoRR abs/2003.02320, 2020.
  • 3. Limited many entities Generic applies to many Specific applies to few RDF Knowledge Graphs Comprehensive fewer entities ABox (Data) TBox (Schema) Q58043963 Q76 Barack Obama (3,947 axioms) Armin Haller (189 axioms) P361 Q35120 Entity partOf minimum no of players Chess Person Q73145133 P1872
  • 4. KG development methodology Top-Down Schema first, Data later Bottom Up Data first, Schema later ABox (Data) TBox (Schema)
  • 5. Linked (Open) data principles • LDP1: Use URIs as identifiers for things; • LDP2: Use HTTP URIs so those identifiers can be dereferenced; • LDP3: return useful information upon dereferencing of those URIs using a standard format (typically, RDF); • LDP4: include links using externally dereferenceable URIs T. Berners-Lee. 2006. Linked Data. W3C Design Issues. From http://www.w3.org/DesignIssues/LinkedData.html, 2010.
  • 6. Challenges with Links in Linked Data (KGs) • References to many inaccessible URIs (i.e., broken links) may render a dataset largely useless • Changes in linked external dataset are out of control of the data publisher • No definition of what constitutes “internal links” (i.e., links between parts of one coherent dataset/KG), and “external links” (i.e., links between different datasets/KGs)
  • 7. Related Work Availability and Discoverability of Linked Open Data sources • P.-Y. Vandenbussche, J. Umbrich, L. Matteis, A. Hogan, C. B. Aranda. 2017. SPARQLES: Monitoring public SPARQL endpoints. Semantic Web 8, 6 (2017), 1049–1065. • J. Debattista, C. Lange, S. Auer, and D. Cortis. 2018. Evaluating the Quality of the LOD Cloud: An Empirical Investigation. Semantic Web 9, 6 (2018), 859–901. • A. Polleres, M. R. Kamdar, J. D. Fernández, T. Tudorache, and Mark A. Musen. 2018. A More Decentralized Vision for Linked Data. In Proc. of the 2nd Workshop on Decentralizing the Semantic Web, co-located with ISWC, Vol. 2165. CEUR-WS.org, Monterey, CA, USA, 2019. Metadata Representation and Quality • K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. 2011. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note 03 March 2011. W3C. • A. Zaveri, A.a Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. 2016. Quality assessment for Linked Data: A Survey. Semantic Web 7, 1 (2016), 63–93. • A. Hogan, J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of Linked Data conformance. Journal of Web Semantics 14 (2012), 14 – 44. • L. Rietveld, W. Beek, R. Hoekstra, and S. Schlobach. 2017. Metadata for a lot of LOD. Semantic Web 8, 6 (2017), 1067–1080. Authoritative Namespaces and Links Between Linked Datasets • Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the Linked Data Best Practices in Different Topical Domains. In Proc. of ISWC. LNCS, Riva del Garda, Italy, 245–260. • A. Harth, S. Kinsella, and S. Decker. 2009. Using Naming Authority to Rank Data and Ontologies for Web Search. In Proc. of ISWC. LNCS, Washington, DC., USA, 277–292. • Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proc. s of the WWW2010 Workshop on Linked Data on the Web, LDOW (CEUR Workshop Proceedings), Vol. 628. CEUR-WS.org, Raleigh, USA, 1–10. • A. S. Butt, A. Haller, and L. Xie. 2014. Ontology Search: An Empirical Evaluation. In Proc. of ISWC. LNCS, Riva del Garda, Italy, 130–147. Linked Data Profiling and Link Analysis Tools • C. Böhm, F. Naumann, Z. Abedjan, D. Fenz, T. Grütze, D. Hefenbrock, M. Pohl, and D. Sonnabend. 2010. Profiling linked open data with ProLOD. In Proc. of 26th ICDEW 2010. IEEE, Long Beach, CA, USA, 17–178. • N. Mihindukulasooriya, M. P.-Villalón, R. García-Castro, and A. Gómez-Pérez. 2015. Loupe - An Online Tool for Inspecting Datasets in the Linked Data Cloud. In Proc. of ISWC (Posters & Demos). CEUR-WS.org, Bethlehem, PA, USA. • C. B. Neto, K. Müller, M. Brümmer, D. Kontokostas, and S. Hellmann. 2016. LODVader: An Interface to LOD Visualization, Analytics and DiscovERy in Real-time. In Proc. of 25th WWW Conference. ACM, Montreal, Quebec, Canada, 163–166. • M. Ben Ellefi, Z. Bellahsene, S. Dietze, and K. Todorov. 2016. Dataset Recommendation for Data Linking: An Intensional Approach. In Proc. of ESWC. LNCS, Crete, Greece, 36–51. • J. Debattista, S. Auer, and C. Lange. 2016. Luzzu - A Methodology and Framework for Linked Data Quality Assessment. Journal of Data and Information Quality 8, 1 (Oct. 2016), 4:1–4:32.
  • 8. What is a dataset? • No notion about the sets of triples which form a dataset/KG – Linked Data datasets/KGs published on the Web are often partitioned into several files, are made available through Linked Data APIs or are in separate named graphs behind SPARQL endpoints • Common practice suggests that single datasets and the URIs “belonging” to these datasets can be referred to by sharing a common namespace • This notion of a namespace is typically not tied to a notion of authority, as opposed to the original intention of URIs in the Web architecture, where authority is an integral part of URIs: URI = scheme ":" [//authority] path ["?"query] ["#"fragment]
  • 9. What is a dataset? • In RDF a namespace and thereby authority depends on the RDF serialization and if the prefix of an identifier determining the namespace is clearly recognizable as such or not (as opposed to XML) • Best practice suggest to declare certain namespace prefixes to be authoritatively owned by the dataset within metadata (using for example vann:preferredNamespacePrefix, vann:preferredNamespaceUri). Examples of namespaces and their prefixes: dbr: http://dbpedia.org/resource/ dbo: http://dbpedia.org/ontology/ foaf: http://xmlns.com/foaf/0.1/ wd: http://www.wikidata.org/entity/ – However, 53.8% of all datasets in our analysis did not explicitly declare their namespace(s)
  • 10. What is a link? • Links in Linked Data do not have a clear definition of direction (in contrast to hyperlinks) t1:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254] t2:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, dbo:Person] t3:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, “Wolfgang Amadeus Mozart”@en] • Direction of the link does not depend on the URI in its subject, but rather on the fact in which dataset (KG) the triple appears, e.g., 𝑡1 can be defined in DBpedia or Wikidata, in either direction.
  • 11. Dataset Definition 1: A dataset is a collection of one or more associated RDF graphs, published by a single controlling entity. Given a dataset ds, we denote by Gds the merge of all of its graphs.
  • 12. Namespace Definition 2: Let us assume that each dataset uses a finite set of namespaces, some of which it controls authoritatively. Given a dataset 𝑑𝑠, we denote by 𝑁𝑆𝑑𝑠 the set of its authoritative namespaces for 𝑑𝑠. We assume each namespace is authoritatively controlled by at most a single dataset. That is, we assume that 𝑑𝑠1 ≠ 𝑑𝑠2 implies that 𝑁𝑆𝑑𝑠1 ∩ 𝑁𝑆𝑑𝑠2 =∅.
  • 13. Non-standard use Definition 3: (Non-Standard-use). Let 𝑅𝐷𝐹, 𝑅𝐷𝐹𝑆, 𝑂𝑊𝐿 and 𝑋𝑆𝐷, respectively, denote the reserved namespaces. Let 𝐺𝑅𝐷𝐹, 𝐺𝑅𝐷𝐹𝑆, and 𝐺𝑂𝑊𝐿, resp., denote the RDF graphs accessible at these URIs, where we write 𝐺𝑟𝑒𝑠 = 𝐺𝑅𝐷𝐹 ∪ 𝐺𝑅𝐷𝐹𝑆 ∪ 𝐺𝑂𝑊𝐿. A non-standard triple in any RDF graph other than 𝐺𝑟𝑒𝑠 is a triple where: • a class in 𝐺𝑟𝑒𝑠 appears in a position other than as the value of rdf:type, or t:[rdfs:Class, rdf:subClassOf, rdf:Property] • a property in 𝐺𝑟𝑒𝑠 appears outside of the predicate position. t:[rdfs:range, rdf:subPropertyOf, rdf:Property]
  • 14. Class position Definition 4: A URI 𝑢 outside of one of the reserved namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a class position if: • 𝑠 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑜𝑤𝑙: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠 ∨ (𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓𝑠: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠} t:[foaf:Person, rdfs:subClassOf, foaf:Agent] • 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑜𝑤𝑙: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠 ∨ (𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓𝑠: 𝐶𝑙𝑎𝑠𝑠) ∈ 𝐺𝑟𝑒𝑠} t:[foaf:knows, rdfs:range, foaf:Person] • 𝑜 = 𝑢 ∧ 𝑝 = 𝑟𝑑𝑓: 𝑡𝑦𝑝𝑒 t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, dbo:Person]
  • 15. Property position Definition 5: A URI 𝑢 outside of the reserved namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a property position if: • 𝑠 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑜𝑤𝑙: 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} ∪ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓: 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} t:[foaf:knows, rdfs:domain, foaf:Person] • 𝑝 = 𝑢 t:[wd:Q58043963, foaf:knows, wd:Q54860587] • 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑜𝑤𝑙: 𝑂𝑏𝑗𝑒𝑐𝑡𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} ∪ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓: 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦) ∈ 𝐺𝑟𝑒𝑠} t:[foaf:homepage, rdfs:subPropertyOf, foaf:page]
  • 16. Property position (cont’d) Definition 6: A URI 𝑢 outside of the reserved namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) is in a datatype position if: • 𝑠 = 𝑢 ∧ 𝑝 ∈ 𝑝 𝑝, 𝑟𝑑𝑓𝑠: 𝑑𝑜𝑚𝑎𝑖𝑛, 𝑟𝑑𝑓𝑠: 𝐷𝑎𝑡𝑎𝑡𝑦𝑝𝑒 ∈ 𝐺𝑟𝑒𝑠 t:[sosa:resultTime, rdfs:range, xsd:dateTime] • 𝑢 occurs as the datatype of a typed literal 𝑜 = ”𝑙”^^𝑢 t:[ex:observation/123, sosa:hasSimpleResult, "12.4m"^^cdt:length] • 𝑜 = 𝑢 ∧ 𝑝 ∈ {𝑝|(𝑝, 𝑟𝑑𝑓𝑠: 𝑟𝑎𝑛𝑔𝑒, 𝑟𝑑𝑓𝑠: 𝐷𝑎𝑡𝑎𝑡𝑦𝑝𝑒) ∈ 𝐺𝑟𝑒𝑠} t:[wd:Q58043963, foaf:name, “Armin Haller”]
  • 17. Instance position Definition 7: A URI 𝑢 outside of the reserved namespaces in an RDF triple 𝑡 = (𝑠, 𝑝, 𝑜) that is neither in a class, nor property, nor datatype position, is in an instance position. t:[ex:observation/108, sosa:observedProperty, ex:tree/124/height]
  • 18. Link Types Definition 8: Let 𝑑𝑠1, 𝑑𝑠2 be datasets. Then, we call triple 𝑡 ∈ 𝐺𝑑𝑠1 a link from 𝑑𝑠1 to 𝑑𝑠2, if 𝑡 contains a URI 𝑢 from a namespace in 𝑁𝑆𝑑𝑠2 . Depending on the position of 𝑢 we distinguish: • 𝑡 is called an instance link, if 𝑢 is in an instance position in 𝑡 t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254] • otherwise, 𝑡 is called an ontology link, where we further distinguish: – 𝑡 is called a class link, if 𝑢 is in a class position other than the 𝑜 position of an rdf:type triple t:[dbo:Person, rdfs:subClassOf, foaf:Person] – 𝑡 is called an instance typing link, if 𝑢 is in the class position 𝑢 = 𝑜 of an rdf:type triple t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person] – 𝑡 is called a property link, if 𝑢 is in a property position other than 𝑝 t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang Amadeus Mozart"@en] – 𝑡 is called an instance role link, if 𝑢 is in the property position 𝑢 = 𝑝 t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088] (Antonio Salieri) • If 𝑢 does not appear in 𝐺𝑑𝑠2 , we call 𝑡 a broken link.
  • 19. Empirical dataset • Crawl of the LODcloud + historical datasets from the LODcloud that were cached in the LODLaundromat • 430 Linked datasets in resulting corpus, each encoded in HDT for a total size of 51 GB (3.3bn triples) % of total Available Available as % of total Total # of datasets 1,359 100% SPARQL endpoint 459 33.5% 125 9.1% Available as download 890 65.4% 226 16.6% Characteristic Median Mean Number of Triples 4,478 17,860,436 Number of Unique Subjects 613 1,774,578 Number of Unique Predicates 31 65.4% Number of unique objects 2,245 5,296,390 A. Abele, J. P. McCrae, P. Buitelaar, A. Jentzsch, and R. Cyganiak. 2017. Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre.
  • 20. Ontology corpus • Ontologies typically only consist of terminological axioms 𝑇 (TBox), they may also include a set of assertional axioms 𝐴 (ABox) (e.g., codelists or thesaurological terms) • Datasets registered in the LODcloud are typically not ontologies (i.e., in our analysis only 3/430 are ontologies) • Ontology corpus created through a crawl of prefix.cc and through the analysis of all ontology links in our empirical dataset # of unique Classes 204,616 # of unique Properties 1,821 Ratio 1/112
  • 21. Authoritative namespace • To identify links (ontology or instance links) we need to identify the dataset authority for each namespace 1. Use HDT to and extract all namespaces in each RDF dataset 2. Compute the ‘relative occurrence’ of each namespace in the dataset 3. Check if a namespace that is extensively used in a dataset is in fact an external link to a dataset that is not in the corpus. Therefore, we define only one authoritative namespace for each dataset 4. Only consider the Pay Level Domains (PLD) of the authoritative name # of datasets in our corpus 430 # of datasets with authoritative namespace 395 # of datasets with namespace in LOD Cloud metadata 257 # of datasets matching authoritative namespace and LOD Cloud metadata 162
  • 22. Link analysis Class Links http://vivo.iu.edu 119,538 http://vivo.scripps.edu 63,128 http://www.imagesnippets.com 12,874 http://core.kmi.open.ac.uk 9,143 http://commons.wikimedia.org 8,258 http://vivo.psm.edu 8,036 http://datos.bne.es 2,778 http://dbpedia.org 1,614 http://www.productontology.org 1,000 http://vivoweb.org 84 http://commons.wikimedia.org 4,995 http://datos.bne.es 1,255 http://vivo.iu.edu 510 http://vivo.psm.edu 481 http://vivoweb.org 386 http://vivo.scripps.edu 187 http://semanticscience.org 168 http://www.iupac.org 102 http://dbpedia.org 101 http://tkm.kiom.re.kr 60 Property Links Median 0 Mean 1,299 % above 0 44% Median 0 Mean 47 % above 0 18%
  • 23. Link analysis (cont’d) • Most used class/property URIs (other than RDFS/OWL URIs) in our corpus Class URI # of datasets http://rdfs.org/ns/void#Dataset 118 http://rdfs.org/ns/void#Linkset 90 http://xmlns.com/foaf/0.1/Person 74 http://persistence.uni- leipzig.org/nlp2rdf/ontologies/nif-core#Word 65 http://persistence.uni- leipzig.org/nlp2rdf/ontologies/nif-core#Sentence 64 http://www.w3.org/2004/02/skos/core#Concept 56 http://xmlns.com/foaf/0.1/Organization 51 http://vivoweb.org/ontology/core#CoreLaboratory 30 http://vivoweb.org/ontology/core#Center 28 http://xmlns.com/foaf/0.1/Agent 24 Property URI # of datasets http://purl.org/dc/terms/title 163 http://purl.org/dc/terms/creator 140 http://purl.org/dc/terms/description 134 http://xmlns.com/foaf/0.1/homepage 125 http://purl.org/dc/terms/publisher 112 http://purl.org/dc/terms/subject 105 http://rdfs.org/ns/void#vocabulary 103 http://purl.org/dc/terms/modified 98 http://rdfs.org/ns/void#exampleReso urce 96 http://rdfs.org/ns/void#subset 88
  • 24. Instance Typing Links Link analysis (cont’d) Instance Links http://webisa.webdatacommons.org 101,491,507 http://commons.wikimedia.org 100,022,186 http://lod.b3kat.de 40,674,519 http://lod.hebis.de 39,160,423 http://d-nb.info 20,096,228 http://datos.bne.es 7,419,630 http://data.ordnancesurvey.co.uk 5,653,997 http://data.europeana.eu 4,987,332 http://id.loc.gov 1,570,877 http://data.bibsys.no 1,440,011 http://ld.zdb-services.de 398,381,851 http://commons.wikimedia.org 319,988,690 http://d-nb.info 14,160,649 http://data.ordnancesurvey.co.uk 13,277,718 https://data.gov.cz 3,081,559 http://core.kmi.open.ac.uk 1,696,618 http://lod.hebis.de 1,624,579 http://id.loc.gov 1,143,545 http://data.europeana.eu 687,735 http://spraakbanken.gu.se 451,081 http://www.imagesnippets.com 214,362 http://data.coi.cz 34,277 Median 206 Mean 1,967,570 % above 0 97% Median 206 Mean 4,240,890 % above 0 72%
  • 25. Link analysis (cont’d) • Selected predicates used in links owl:samesAs owl:DifferentFrom Rdfs:seeAlso Owl:AllDifferent Median 0 0 0 0 Mean 503,859 581 2,735 0 % above 0 53% <1% 14% 0 P90% 1,460 0 1 0 1st 1st # http://commons.wikimedia.org N/A 40,636,493 103,439 324,659 2nd 2nd # http://ld.zdb-services.de 18,049,155 N/A http://stitch.cs.vu.nl N/A 3rd 3rd # http://d-nb.info 17,410,586 N/A http://data.nobelprize.org N/A
  • 26. Link analysis (cont’d) Total Links http://ld.zdb-services.de 421,206,061 http://commons.wikimedia.org 420,024,129 http://webisa.webdatacommons.org 101,491,507 http://lod.hebis.de 40,785,002 http://lod.b3kat.de 40,677,795 http://d-nb.info 34,256,877 http://data.ordnancesurvey.co.uk 18,931,817 http://datos.bne.es 7,428,111 http://data.europeana.eu 5,675,067 https://data.gov.cz 3,958,043 Median 416 Mean 6,209,808 % above 0 96% Broken Class URIs Broken Property URIs Prefix.cc crawl LOD corpus Prefix.cc crawl LOD corpus HTTP Code # % # % # % # % 200 7,175 12.3% 2,579 12.8% 814 44.7% 58,108 40.9% 301 18,598 31.8% 2,610 12.9% 442 24.3% 1,137 0.8% 302 4,331 7.4% 925 0.5% 194 10.7% 1,391 1.0% 303 12,805 21.9% 3,903 19.3% 108 5.9% 5,247 3.7% 40x 12,054 20.6% 8,664 42.9% 130 7.1% 73,366 51.7% 50x 66 <0.1% 111 <0.1% 4 <0.1% 362 0.3% No response 146,145 5.9% 1,425 7% 129 7.1% 2,332 1.6% Total 204,616 100% 20,217 100% 1,821 100% 141,943 100%
  • 27. Link analysis in Wikidata • Wikidata by far the largest openly available KG and the only one truly built bottom-up → cause of many modelling errors/inconsistencies (e.g., 4,182 separate properties for external identifiers) • Not part of the LODCloud, therefore was not included in our paper, however, we did an analysis since for the 9th of March, 2020 Wikidata dump (HDT file 49.4GB compressed) Number of triples 3,381,623,911 Number of unique subjects 1,327,447,995 Number of predicates 32,713 Number of unique objects 2,010,015,636 Number of shared subject-object 1,173,987,281 Unique Individuals 75,261,968 Class Links 375,351,770 Property Links 2,723,834 of which sameAs links 2,723,834 Instance Typing Links 77,479,623 # of Classes 1,045,455 # of Properties 74,746 Ratio 1/14 # of unique Properties 7,259 e.g., P4330, contains P150, contains administrative territorial entity P1383, contains settlement P2821, by-product P2822, by-product of ID Propert y, 4182 Others, 3077 e.g., P2014, Museum of Modern Art work ID P6276, Amazon Music artist ID P6145, Academy Awards Database film ID
  • 28. Discussion • Instance Typing Links are most used – Only one’s that grow linear with dataset size – Instance links are less popular than one would assume, and do not linearly grow with size • Ontologies are reused widely – Only a few datasets define their own ontology. This is a sign that: 1. Dataset publishers follow best practices and separate the ontology namespace from the authoritative namespace of the dataset 2. There exists a large number of ontologies that cover already many domains that can be readily reused • Need for ontology publishing best practices – Authoritative ontology register needed (e.g., the LOV portal) – A persistence mechanism that assigns a DOI to an ontology and persists the document • Ubiquity of broken Class and Property links – Alarming number of broken links, i.e., more than half of all class and property URIs – Data publishers need to consider to replicate linked ontologies P.-Y. Vandenbussche, G. Atemezing, M. Poveda-Villalón, and B. Vatant. 2017. Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web. Semantic Web 8, 3 (2017), 437–452.
  • 29. Discussion (cont’d) • Lack of ABox Links – Many (28% of all) datasets do not use any Instance Links, and owl:sameAs is not particularly popular at all (other than in Wikidata) 1. these links are expensive to establish manually 2. expensive to maintain, and 3. even if they exist, there is no incentive to publish them openly. • Lack of and incorrect namespace declarations – Only 59% of all datasets in our corpus publish namespace in their metadata, and of those 257, only 162 match the namespace we obtained through our analysis • Plethora of data and metadata formats – heterogeneity of publication formats, potentially involving parse errors, constituted a major part of the effort used for our experiments – Once each dataset node/dump had been converted to HDT, the analysis was easy: link computations can be done at scale on even large datasets in HDT