SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
10/22/17 Heiko Paulheim 1
Towards Knowledge Graph Profiling
Heiko Paulheim
10/22/17 Heiko Paulheim 2
Introduction
• You’ve seen this, haven’t you?
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae,
Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
10/22/17 Heiko Paulheim 3
Introduction
• Knowledge Graphs on the Web
• Everybody talks about them, but what is a Knowledge Graph?
– I don’t have a definition either...
10/22/17 Heiko Paulheim 4
Introduction
• Knowledge Graph definitions
• Many people talk about KGs, few give definitions
• Working definition: a Knowledge Graph
– mainly describes instances and their relations in a graph
• Unlike an ontology
• Unlike, e.g., WordNet
– Defines possible classes and relations in a schema or ontology
• Unlike schema-free output of some IE tools
– Allows for interlinking arbitrary entities with each other
• Unlike a relational database
– Covers various domains
• Unlike, e.g., Geonames
10/22/17 Heiko Paulheim 5
Introduction
• Knowledge Graphs out there (not guaranteed to be complete)
public
private
Paulheim: Knowledge graph refinement: A survey of approaches and evaluation
methods. Semantic Web 8:3 (2017), pp. 489-508
10/22/17 Heiko Paulheim 6
Motivation
• In the coming days, you’ll see quite a few works
– that use DBpedia for doing X
– that use Wikidata for doing Y
– ...
• If you see them, do you ever ask yourselves:
– Why DBpedia and not Wikidata?
(or the other way round?)
10/22/17 Heiko Paulheim 7
Motivation
• Questions:
– which knowledge graph should I use for which purpose?
– are there significant differences?
– would it help to combine them?
• For answering those questions, we need knowledge graph profiling
– making quantitative statements about knowledge graphs
– defining measures
– defining setups in which to measure them
10/22/17 Heiko Paulheim 8
Outline
• How are Knowledge Graphs created?
• Key objectives for profiling Knowledge Graphs
– Size
– Timeliness
– Level of detail
– Overlap
– ...
• Key figures for public Knowledge Graphs
• Common Shortcomings of Knowledge Graphs
– ...and how to address them
• New Kids on the Block
10/22/17 Heiko Paulheim 9
Knowledge Graph Creation: CyC
• The beginning
– Encyclopedic collection of knowledge
– Started by Douglas Lenat in 1984
– Estimation: 350 person years and 250,000 rules
should do the job
of collecting the essence of the world’s knowledge
• The present
– >900 person years
– Far from completion
– Used to exist until 2017
10/22/17 Heiko Paulheim 10
Knowledge Graph Creation
• Lesson learned no. 1:
– Trading efforts against accuracy
Min. efforts Max. accuracy
10/22/17 Heiko Paulheim 11
Knowledge Graph Creation: Freebase
• The 2000s
– Freebase: collaborative editing
– Schema not fixed
• Present
– Acquired by Google in 2010
– Powered first version of Google’s Knowledge Graph
– Shut down in 2016
– Partly lives on in Wikidata (see in a minute)
10/22/17 Heiko Paulheim 12
Knowledge Graph Creation
• Lesson learned no. 2:
– Trading formality against number of users
Max. user involvement Max. degree of formality
10/22/17 Heiko Paulheim 13
Knowledge Graph Creation: Wikidata
• The 2010s
– Wikidata: launched 2012
– Goal: centralize data from Wikipedia languages
– Collaborative
– Imports other datasets
• Present
– One of the largest public knowledge graphs
(see later)
– Includes rich provenance
10/22/17 Heiko Paulheim 14
Knowledge Graph Creation
• Lesson learned no. 3:
– There is not one truth (but allowing for plurality adds complexity)
Max. simplicity Max. support for plurality
10/22/17 Heiko Paulheim 15
Knowledge Graph Creation: DBpedia & YAGO
• The 2010s
– DBpedia: launched 2007
– YAGO: launched 2008
– Extraction from Wikipedia
using mappings & heuristics
• Present
– Two of the most used knowledge graphs
10/22/17 Heiko Paulheim 16
Knowledge Graph Creation
• Lesson learned no. 4:
– Heuristics help increasing coverage (at the cost of accuracy)
Max. accuracy Max. coverage
10/22/17 Heiko Paulheim 17
Knowledge Graph Creation: NELL
• The 2010s
– NELL: Never ending language learner
– Input: ontology, seed examples, text corpus
– Output: facts, text patterns
– Large degree of automation, occasional human feedback
• Today
– Still running
– New release every few days
10/22/17 Heiko Paulheim 18
Knowledge Graph Creation
• Lesson learned no. 5:
– Quality cannot be maximized without human intervention
Min. human intervention Max. accuracy
10/22/17 Heiko Paulheim 19
Summary of Trade Offs
• (Manual) effort vs. accuracy and completeness
• User involvement (or usability) vs. degree of formality
• Simplicity vs. support for plurality and provenance
→ all those decisions influence the profile of a knowledge graph!
10/22/17 Heiko Paulheim 20
Non-Public Knowledge Graphs
• Many companies have their
own private knowledge graphs
– Google: Knowledge Graph,
Knowledge Vault
– Yahoo!: Knowledge Graph
– Microsoft: Satori
– Facebook: Entities Graph
– Thomson Reuters: permid.org
(partly public)
• However, we usually know only little about them
10/22/17 Heiko Paulheim 21
Comparison of Knowledge Graphs
• Release cycles
Instant updates:
DBpedia live,
Freebase
Wikidata
Days:
NELL
Months:
DBpedia
Years:
YAGO
Cyc
• Size and density
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
Caution!
10/22/17 Heiko Paulheim 22
Comparison of Knowledge Graphs
• What do they actually contain?
• Experiment: pick 25 classes of interest
– And find them in respective ontologies
• Count instances (coverage)
• Determine in and out degree (level of detail)
10/22/17 Heiko Paulheim 23
Comparison of Knowledge Graphs
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 24
Comparison of Knowledge Graphs
• Summary findings:
– Persons: more in Wikidata
(twice as many persons as DBpedia and YAGO)
– Countries: more details in Wikidata
– Places: most in DBpedia
– Organizations: most in YAGO
– Events: most in YAGO
– Artistic works:
• Wikidata contains more movies and albums
• YAGO contains more songs
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 25
Caveats
• Reading the diagrams right…
• So, Wikidata contains more persons
– but less instances of all the interesting subclasses?
• There are classes like Actor in Wikidata
– but they are hardly used
– rather: modeled using profession relation
10/22/17 Heiko Paulheim 26
Caveats
• Reading the diagrams right… (ctd.)
• So, Wikidata contains more data on countries, but less countries?
• First: Wikidata only counts current, actual countries
– DBpedia and YAGO also count historical countries
• “KG1 contains less of X than KG2” can mean
– it actually contains less instances of X
– it contains equally many or more instances,
but they are not typed with X (see later)
• Second: we count single facts about countries
– Wikidata records some time indexed information, e.g., population
– Each point in time contributes a fact
10/22/17 Heiko Paulheim 27
Overlap of Knowledge Graphs
• How largely do knowledge graphs overlap?
• They are interlinked, so we can simply count links
– For NELL, we use links to Wikipedia as a proxy
DBpedia
YAGO
Wikidata
NELL Open
Cyc
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 28
Overlap of Knowledge Graphs
• How largely do knowledge graphs overlap?
• They are interlinked, so we can simply count links
– For NELL, we use links to Wikipedia as a proxy
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 29
Overlap of Knowledge Graphs
• Links between Knowledge Graphs are incomplete
– The Open World Assumption also holds for interlinks
• But we can estimate their number
• Approach:
– find link set automatically with different heuristics
– determine precision and recall on existing interlinks
– estimate actual number of links
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 30
Overlap of Knowledge Graphs
• Idea:
– Given that the link set F is found
– And the (unknown) actual link set would be C
• Precision P: Fraction of F which is actually correct
– i.e., measures how much |F| is over-estimating |C|
• Recall R: Fraction of C which is contained in F
– i.e., measures how much |F| is under-estimating |C|
• From that, we estimate |C|=|F|⋅P⋅
1
R
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 31
Overlap of Knowledge Graphs
• Mathematical derivation:
– Definition of recall:
– Definition of precision:
• Resolve both to , substitute, and resolve to
R=
|Fcorrect|
|C|
P=
|Fcorrect|
|F|
|Fcorrect| |C|
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
|C|=|F|⋅P⋅
1
R
10/22/17 Heiko Paulheim 32
Overlap of Knowledge Graphs
• Experiment:
– We use the same 25 classes as before
– Measure 1: overlap relative to smaller KG (i.e., potential gain)
– Measure 2: overlap relative to explicit links
(i.e., importance of improving links)
• Link generation with 16 different metrics and thresholds
– Intra-class correlation coefficient for |C|: 0.969
– Intra-class correlation coefficient for |F|: 0.646
• Bottom line:
– Despite variety in link sets generated, the overlap is estimated reliably
– The link generation mechanisms do not need to be overly accurate
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 33
Overlap of Knowledge Graphs
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 34
Overlap of Knowledge Graphs
• Summary findings:
– DBpedia and YAGO cover roughly the same instances
(not much surprising)
– NELL is the most complementary to the others
– Existing interlinks are insufficient for out-of-the-box parallel usage
Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
10/22/17 Heiko Paulheim 35
Common Shortcomings of Knowledge Graphs
• Knowledge Graph Profiling can reveal certain shortcomings
– ...and once we know them, we can address them
• What is the impact of that?
– Example: answering a SPARQL query
10/22/17 Heiko Paulheim 36
Finding Information in Knowledge Graphs
• Find list of science fiction writers in DBpedia
select ?x where
{?x a dbo:Writer .
?x dbo:genre dbr:Science_Fiction}
order by ?x
10/22/17 Heiko Paulheim 37
Finding Information in Knowledge Graphs
• Results from DBpedia
Arthur C. Clarke?
H.G. Wells?
Isaac Asimov?
10/22/17 Heiko Paulheim 38
Common Shortcomings of Knoweldge Graphs
• What reasons can cause incomplete results?
• Two possible problems:
– The resource at hand is not of type dbo:Writer
– The genre relation to dbr:Science_Fiction is missing
select ?x where
{?x a dbo:Writer .
?x dbo:genre dbr:Science_Fiction}
order by ?x
10/22/17 Heiko Paulheim 39
Common Shortcomings of Knowledge Graphs
• Various works on Knowledge Graph Refinement
– Knowledge Graph completion
– Error detection
• See, e.g., 2017 survey in
Semantic Web Journal
Paulheim: Knowledge Graph Refinement – A Survey
of Approaches and Evaluation Methods. SWJ 8(3), 2017
Tuesday, 4:30 pm
Journal track
paper presentation
10/22/17 Heiko Paulheim 40
New Kids on the Block
Subjective age:
Measured by the fraction
of the audience
that understands a reference
to your young days’
pop culture...
10/22/17 Heiko Paulheim 41
New Kids on the Block
• Wikipedia-based Knowledge Graphs will remain
an essential building block of Semantic Web applications
• But they suffer from...
– ...a coverage bias
– ...limitations of the creating heuristics
10/22/17 Heiko Paulheim 42
Wikipedia’s Coverage Bias
• One (but not the only!) possible source of coverage bias
– Articles about long-tail entities become deleted
10/22/17 Heiko Paulheim 43
Work in Progress: DBkWik
• Why stop at Wikipedia?
• Wikipedia is based on the MediaWiki software
– ...and so are thousands of Wikis
– Fandom by Wikia: >385,000 Wikis on special topics
– WikiApiary: reports >20,000 installations of MediaWiki on the Web
10/22/17 Heiko Paulheim 44
Work in Progress: DBkWik
• Back to our original example...
10/22/17 Heiko Paulheim 45
Work in Progress: DBkWik
• Back to our original example...
10/22/17 Heiko Paulheim 46
Work in Progress: DBkWik
• The DBpedia Extraction Framework consumes MediaWiki dumps
• Experiment
– Can we process dumps from arbitrary Wikis with it?
– Are the results somewhat meaningful?
10/22/17 Heiko Paulheim 47
Work in Progress: DBkWik
• Example from Harry Potter Wiki
http://dbkwik.webdatacommons.org/
10/22/17 Heiko Paulheim 48
Work in Progress: DBkWik
• Differences to DBpedia
– DBpedia has manually created mappings to an ontology
– Wikipedia has one page per subject
– Wikipedia has global infobox conventions (more or less)
• Challenges
– On-the-fly ontology creation
– Instance matching
– Schema matching
10/22/17 Heiko Paulheim 49
Work in Progress: DBkWik
Dump
Downloader
Extraction
Framework
Interlinking
Instance
Matcher
Schema
Matcher
MediaWiki Dumps Extracted RDF
Internal Linking
Instance
Matcher
Schema
Matcher
Consolidated
Knowledge Graph
DBkWik
Linked
Data
Endpoint
1 2
34
5
• Avoiding O(n²) internal linking:
– Match to DBpedia first
– Use common links to DBpedia as blocking keys for internal matching
10/22/17 Heiko Paulheim 50
Work in Progress: DBkWik
• Downloaded ~15k Wiki dumps from Fandom
– 52.4GB of data, roughly the size of the English Wikipedia
• Prototype: extracted data for ~250 Wikis
– 4.3M instances, ~750k linked to DBpedia
– 7k classes, ~1k linked to DBpedia
– 43k properties, ~20k linked to DBpedia
– ...including duplicates!
• Link quality
– Good for classes, OK for properties (F1 of .957 and .852)
– Needs improvement for instances (F1 of .641)
Monday 6:30pm
Poster presentation
10/22/17 Heiko Paulheim 51
Work in Progress: WebIsALOD
• Background: Web table interpretation
• Most approaches need typing information
– DBpedia etc. have too little coverage
on the long tail
– Wanted: extensive type database
10/22/17 Heiko Paulheim 52
Work in Progress: WebIsALOD
• Extraction of type information using Hearst-like patterns, e.g.,
– T, such as X
– X, Y, and other T
• Text corpus: common crawl
– ~2 TB crawled web pages
– Fast implementation: regex over text
– “Expensive” operations only applied once regex has fired
• Resulting database
– 400M hypernymy relations
Seitner et al.: A large DataBase of hypernymy relations extracted from the Web.
LREC 2016
10/22/17 Heiko Paulheim 53
Work in Progress: WebIsALOD
• Back to our original example...
http://webisa.webdatacommons.org/
10/22/17 Heiko Paulheim 54
Work in Progress: WebIsALOD
• Initial effort: transformation to a LOD dataset
– including rich provenance information
Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted
from the Web as Linked Open Data. ISWC 2017
10/22/17 Heiko Paulheim 55
Work in Progress: WebIsALOD
• Estimated contents breakdown
Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted
from the Web as Linked Open Data. ISWC 2017
10/22/17 Heiko Paulheim 56
Work in Progress: WebIsALOD
• Main challenge
– Original dataset is quite noisy (<10% correct statements)
– Recap: coverage vs. accuracy
– Simple thresholding removes too much knowledge
• Approach
– Train RandomForest model for predicting correct vs. wrong statements
– Using all the provenance information we have
– Use model to compute confidence scores
Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted
from the Web as Linked Open Data. ISWC 2017
10/22/17 Heiko Paulheim 57
Work in Progress: WebIsALOD
• Current challenges and works in progress
– Distinguishing instances and classes
• i.e.: subclass vs. instance of relations
– Splitting instances
• Bauhaus is a goth band
• Bauhaus is a German school
– Knowledge extraction from pre and post modifiers
• Bauhaus is a goth band → genre(Bauhaus, Goth)
• Bauhaus is a German school → location(Bauhaus, Germany)
Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted
from the Web as Linked Open Data. ISWC 2017
Tuesday, 2:30 pm
Resource track
paper presentation
10/22/17 Heiko Paulheim 58
Take Aways
• Knowledge Graphs contain a massive amount of information
– Various trade offs in their creation
– That lead to different profiles
– ...and different shortcomings
• Knowledge Graph Profiling
– What is in a knowledge graph?
– At which level of detail is it described?
– How different are knowledge graphs?
• Various methods exist for
– ...addressing the various shortcomings
• New kids on the block
– DBkWik and WebIsALOD
– Focus on long tail entities
10/22/17 Heiko Paulheim 59
Towards Knowledge Graph Profiling
Heiko Paulheim

Weitere ähnliche Inhalte

Was ist angesagt?

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF DataHeiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge DiscoveryHeiko Paulheim
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...Lukas Koster
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Europeana
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersRichard Wallis
 

Was ist angesagt? (18)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
They have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library UsersThey have left the building: The Web Route to Library Users
They have left the building: The Web Route to Library Users
 

Ähnlich wie Comparing Knowledge Graphs for Size, Timeliness, Detail and Overlap

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
EDUC-1092 Week Three (eTV)
EDUC-1092 Week Three (eTV)EDUC-1092 Week Three (eTV)
EDUC-1092 Week Three (eTV)RDC ZP
 
After You've Opened Pandora's Box
After You've Opened Pandora's BoxAfter You've Opened Pandora's Box
After You've Opened Pandora's BoxEd Rodley
 
Project Management in Libraries for UCLA IS 410
Project Management in Libraries for UCLA IS 410Project Management in Libraries for UCLA IS 410
Project Management in Libraries for UCLA IS 410Karen S Calhoun
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsMia
 
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationNickRichardson44
 
A multi-institutional model for advancing open access journals and reclaiming...
A multi-institutional model for advancing open access journals and reclaiming...A multi-institutional model for advancing open access journals and reclaiming...
A multi-institutional model for advancing open access journals and reclaiming...NASIG
 
Shortkeynote at the CoCreation und Collaboration Workshop
Shortkeynote at the CoCreation und Collaboration WorkshopShortkeynote at the CoCreation und Collaboration Workshop
Shortkeynote at the CoCreation und Collaboration Workshopjovoto GmbH
 
Advocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCEAdvocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCENick Sheppard
 
Carpe Diem Comes of Age: Learning Design for Programmes
Carpe Diem Comes of Age: Learning Design for ProgrammesCarpe Diem Comes of Age: Learning Design for Programmes
Carpe Diem Comes of Age: Learning Design for ProgrammesGilly Salmon
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfJaberRad1
 
Week 10 Writing & Disseminating Social Science Research.pptx
Week 10 Writing & Disseminating Social Science Research.pptxWeek 10 Writing & Disseminating Social Science Research.pptx
Week 10 Writing & Disseminating Social Science Research.pptxDr Igor Calzada, MBA, FeRSA
 
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...UKSG: connecting the knowledge community
 
B2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public SectorB2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public SectorMarieke Guy
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
The Benefits Of Sharing - SDLC AGM
 The Benefits Of Sharing - SDLC AGM The Benefits Of Sharing - SDLC AGM
The Benefits Of Sharing - SDLC AGMStuart Lewis
 
What is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiWhat is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiProfessor Lili Saghafi
 
KTP learning lunch: Digital Game-based Learning for Young Learners
KTP learning lunch: Digital Game-based Learning for Young LearnersKTP learning lunch: Digital Game-based Learning for Young Learners
KTP learning lunch: Digital Game-based Learning for Young LearnersRuby Rennie
 
Preparing For The Future: Helping Libraries Respond to Changing Technological...
Preparing For The Future: Helping Libraries Respond to Changing Technological...Preparing For The Future: Helping Libraries Respond to Changing Technological...
Preparing For The Future: Helping Libraries Respond to Changing Technological...lisbk
 

Ähnlich wie Comparing Knowledge Graphs for Size, Timeliness, Detail and Overlap (20)

What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
EDUC-1092 Week Three (eTV)
EDUC-1092 Week Three (eTV)EDUC-1092 Week Three (eTV)
EDUC-1092 Week Three (eTV)
 
After You've Opened Pandora's Box
After You've Opened Pandora's BoxAfter You've Opened Pandora's Box
After You've Opened Pandora's Box
 
Project Management in Libraries for UCLA IS 410
Project Management in Libraries for UCLA IS 410Project Management in Libraries for UCLA IS 410
Project Management in Libraries for UCLA IS 410
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDs
 
AMIA 2017 - Data visualisation
AMIA 2017 - Data visualisationAMIA 2017 - Data visualisation
AMIA 2017 - Data visualisation
 
A multi-institutional model for advancing open access journals and reclaiming...
A multi-institutional model for advancing open access journals and reclaiming...A multi-institutional model for advancing open access journals and reclaiming...
A multi-institutional model for advancing open access journals and reclaiming...
 
Community building in complex settings: exploration based on Swiss multi-lib...
Community building in complex settings:  exploration based on Swiss multi-lib...Community building in complex settings:  exploration based on Swiss multi-lib...
Community building in complex settings: exploration based on Swiss multi-lib...
 
Shortkeynote at the CoCreation und Collaboration Workshop
Shortkeynote at the CoCreation und Collaboration WorkshopShortkeynote at the CoCreation und Collaboration Workshop
Shortkeynote at the CoCreation und Collaboration Workshop
 
Advocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCEAdvocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCE
 
Carpe Diem Comes of Age: Learning Design for Programmes
Carpe Diem Comes of Age: Learning Design for ProgrammesCarpe Diem Comes of Age: Learning Design for Programmes
Carpe Diem Comes of Age: Learning Design for Programmes
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
 
Week 10 Writing & Disseminating Social Science Research.pptx
Week 10 Writing & Disseminating Social Science Research.pptxWeek 10 Writing & Disseminating Social Science Research.pptx
Week 10 Writing & Disseminating Social Science Research.pptx
 
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...
UKSG Conference 2016 Breakout Session - Measuring the research impact of digi...
 
B2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public SectorB2: Open Up: Open Data in the Public Sector
B2: Open Up: Open Data in the Public Sector
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
The Benefits Of Sharing - SDLC AGM
 The Benefits Of Sharing - SDLC AGM The Benefits Of Sharing - SDLC AGM
The Benefits Of Sharing - SDLC AGM
 
What is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili SaghafiWhat is digital humanities ,By: Professor Lili Saghafi
What is digital humanities ,By: Professor Lili Saghafi
 
KTP learning lunch: Digital Game-based Learning for Young Learners
KTP learning lunch: Digital Game-based Learning for Young LearnersKTP learning lunch: Digital Game-based Learning for Young Learners
KTP learning lunch: Digital Game-based Learning for Young Learners
 
Preparing For The Future: Helping Libraries Respond to Changing Technological...
Preparing For The Future: Helping Libraries Respond to Changing Technological...Preparing For The Future: Helping Libraries Respond to Changing Technological...
Preparing For The Future: Helping Libraries Respond to Changing Technological...
 

Mehr von Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaHeiko Paulheim
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionHeiko Paulheim
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesHeiko Paulheim
 

Mehr von Heiko Paulheim (9)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List Pages
 

Kürzlich hochgeladen

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Kürzlich hochgeladen (20)

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Comparing Knowledge Graphs for Size, Timeliness, Detail and Overlap

  • 1. 10/22/17 Heiko Paulheim 1 Towards Knowledge Graph Profiling Heiko Paulheim
  • 2. 10/22/17 Heiko Paulheim 2 Introduction • You’ve seen this, haven’t you? Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
  • 3. 10/22/17 Heiko Paulheim 3 Introduction • Knowledge Graphs on the Web • Everybody talks about them, but what is a Knowledge Graph? – I don’t have a definition either...
  • 4. 10/22/17 Heiko Paulheim 4 Introduction • Knowledge Graph definitions • Many people talk about KGs, few give definitions • Working definition: a Knowledge Graph – mainly describes instances and their relations in a graph • Unlike an ontology • Unlike, e.g., WordNet – Defines possible classes and relations in a schema or ontology • Unlike schema-free output of some IE tools – Allows for interlinking arbitrary entities with each other • Unlike a relational database – Covers various domains • Unlike, e.g., Geonames
  • 5. 10/22/17 Heiko Paulheim 5 Introduction • Knowledge Graphs out there (not guaranteed to be complete) public private Paulheim: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8:3 (2017), pp. 489-508
  • 6. 10/22/17 Heiko Paulheim 6 Motivation • In the coming days, you’ll see quite a few works – that use DBpedia for doing X – that use Wikidata for doing Y – ... • If you see them, do you ever ask yourselves: – Why DBpedia and not Wikidata? (or the other way round?)
  • 7. 10/22/17 Heiko Paulheim 7 Motivation • Questions: – which knowledge graph should I use for which purpose? – are there significant differences? – would it help to combine them? • For answering those questions, we need knowledge graph profiling – making quantitative statements about knowledge graphs – defining measures – defining setups in which to measure them
  • 8. 10/22/17 Heiko Paulheim 8 Outline • How are Knowledge Graphs created? • Key objectives for profiling Knowledge Graphs – Size – Timeliness – Level of detail – Overlap – ... • Key figures for public Knowledge Graphs • Common Shortcomings of Knowledge Graphs – ...and how to address them • New Kids on the Block
  • 9. 10/22/17 Heiko Paulheim 9 Knowledge Graph Creation: CyC • The beginning – Encyclopedic collection of knowledge – Started by Douglas Lenat in 1984 – Estimation: 350 person years and 250,000 rules should do the job of collecting the essence of the world’s knowledge • The present – >900 person years – Far from completion – Used to exist until 2017
  • 10. 10/22/17 Heiko Paulheim 10 Knowledge Graph Creation • Lesson learned no. 1: – Trading efforts against accuracy Min. efforts Max. accuracy
  • 11. 10/22/17 Heiko Paulheim 11 Knowledge Graph Creation: Freebase • The 2000s – Freebase: collaborative editing – Schema not fixed • Present – Acquired by Google in 2010 – Powered first version of Google’s Knowledge Graph – Shut down in 2016 – Partly lives on in Wikidata (see in a minute)
  • 12. 10/22/17 Heiko Paulheim 12 Knowledge Graph Creation • Lesson learned no. 2: – Trading formality against number of users Max. user involvement Max. degree of formality
  • 13. 10/22/17 Heiko Paulheim 13 Knowledge Graph Creation: Wikidata • The 2010s – Wikidata: launched 2012 – Goal: centralize data from Wikipedia languages – Collaborative – Imports other datasets • Present – One of the largest public knowledge graphs (see later) – Includes rich provenance
  • 14. 10/22/17 Heiko Paulheim 14 Knowledge Graph Creation • Lesson learned no. 3: – There is not one truth (but allowing for plurality adds complexity) Max. simplicity Max. support for plurality
  • 15. 10/22/17 Heiko Paulheim 15 Knowledge Graph Creation: DBpedia & YAGO • The 2010s – DBpedia: launched 2007 – YAGO: launched 2008 – Extraction from Wikipedia using mappings & heuristics • Present – Two of the most used knowledge graphs
  • 16. 10/22/17 Heiko Paulheim 16 Knowledge Graph Creation • Lesson learned no. 4: – Heuristics help increasing coverage (at the cost of accuracy) Max. accuracy Max. coverage
  • 17. 10/22/17 Heiko Paulheim 17 Knowledge Graph Creation: NELL • The 2010s – NELL: Never ending language learner – Input: ontology, seed examples, text corpus – Output: facts, text patterns – Large degree of automation, occasional human feedback • Today – Still running – New release every few days
  • 18. 10/22/17 Heiko Paulheim 18 Knowledge Graph Creation • Lesson learned no. 5: – Quality cannot be maximized without human intervention Min. human intervention Max. accuracy
  • 19. 10/22/17 Heiko Paulheim 19 Summary of Trade Offs • (Manual) effort vs. accuracy and completeness • User involvement (or usability) vs. degree of formality • Simplicity vs. support for plurality and provenance → all those decisions influence the profile of a knowledge graph!
  • 20. 10/22/17 Heiko Paulheim 20 Non-Public Knowledge Graphs • Many companies have their own private knowledge graphs – Google: Knowledge Graph, Knowledge Vault – Yahoo!: Knowledge Graph – Microsoft: Satori – Facebook: Entities Graph – Thomson Reuters: permid.org (partly public) • However, we usually know only little about them
  • 21. 10/22/17 Heiko Paulheim 21 Comparison of Knowledge Graphs • Release cycles Instant updates: DBpedia live, Freebase Wikidata Days: NELL Months: DBpedia Years: YAGO Cyc • Size and density Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017 Caution!
  • 22. 10/22/17 Heiko Paulheim 22 Comparison of Knowledge Graphs • What do they actually contain? • Experiment: pick 25 classes of interest – And find them in respective ontologies • Count instances (coverage) • Determine in and out degree (level of detail)
  • 23. 10/22/17 Heiko Paulheim 23 Comparison of Knowledge Graphs Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 24. 10/22/17 Heiko Paulheim 24 Comparison of Knowledge Graphs • Summary findings: – Persons: more in Wikidata (twice as many persons as DBpedia and YAGO) – Countries: more details in Wikidata – Places: most in DBpedia – Organizations: most in YAGO – Events: most in YAGO – Artistic works: • Wikidata contains more movies and albums • YAGO contains more songs Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 25. 10/22/17 Heiko Paulheim 25 Caveats • Reading the diagrams right… • So, Wikidata contains more persons – but less instances of all the interesting subclasses? • There are classes like Actor in Wikidata – but they are hardly used – rather: modeled using profession relation
  • 26. 10/22/17 Heiko Paulheim 26 Caveats • Reading the diagrams right… (ctd.) • So, Wikidata contains more data on countries, but less countries? • First: Wikidata only counts current, actual countries – DBpedia and YAGO also count historical countries • “KG1 contains less of X than KG2” can mean – it actually contains less instances of X – it contains equally many or more instances, but they are not typed with X (see later) • Second: we count single facts about countries – Wikidata records some time indexed information, e.g., population – Each point in time contributes a fact
  • 27. 10/22/17 Heiko Paulheim 27 Overlap of Knowledge Graphs • How largely do knowledge graphs overlap? • They are interlinked, so we can simply count links – For NELL, we use links to Wikipedia as a proxy DBpedia YAGO Wikidata NELL Open Cyc Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 28. 10/22/17 Heiko Paulheim 28 Overlap of Knowledge Graphs • How largely do knowledge graphs overlap? • They are interlinked, so we can simply count links – For NELL, we use links to Wikipedia as a proxy Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 29. 10/22/17 Heiko Paulheim 29 Overlap of Knowledge Graphs • Links between Knowledge Graphs are incomplete – The Open World Assumption also holds for interlinks • But we can estimate their number • Approach: – find link set automatically with different heuristics – determine precision and recall on existing interlinks – estimate actual number of links Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 30. 10/22/17 Heiko Paulheim 30 Overlap of Knowledge Graphs • Idea: – Given that the link set F is found – And the (unknown) actual link set would be C • Precision P: Fraction of F which is actually correct – i.e., measures how much |F| is over-estimating |C| • Recall R: Fraction of C which is contained in F – i.e., measures how much |F| is under-estimating |C| • From that, we estimate |C|=|F|⋅P⋅ 1 R Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 31. 10/22/17 Heiko Paulheim 31 Overlap of Knowledge Graphs • Mathematical derivation: – Definition of recall: – Definition of precision: • Resolve both to , substitute, and resolve to R= |Fcorrect| |C| P= |Fcorrect| |F| |Fcorrect| |C| Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017 |C|=|F|⋅P⋅ 1 R
  • 32. 10/22/17 Heiko Paulheim 32 Overlap of Knowledge Graphs • Experiment: – We use the same 25 classes as before – Measure 1: overlap relative to smaller KG (i.e., potential gain) – Measure 2: overlap relative to explicit links (i.e., importance of improving links) • Link generation with 16 different metrics and thresholds – Intra-class correlation coefficient for |C|: 0.969 – Intra-class correlation coefficient for |F|: 0.646 • Bottom line: – Despite variety in link sets generated, the overlap is estimated reliably – The link generation mechanisms do not need to be overly accurate Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 33. 10/22/17 Heiko Paulheim 33 Overlap of Knowledge Graphs Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 34. 10/22/17 Heiko Paulheim 34 Overlap of Knowledge Graphs • Summary findings: – DBpedia and YAGO cover roughly the same instances (not much surprising) – NELL is the most complementary to the others – Existing interlinks are insufficient for out-of-the-box parallel usage Ringler & Paulheim: One Knowledge Graph to Rule them All? KI 2017
  • 35. 10/22/17 Heiko Paulheim 35 Common Shortcomings of Knowledge Graphs • Knowledge Graph Profiling can reveal certain shortcomings – ...and once we know them, we can address them • What is the impact of that? – Example: answering a SPARQL query
  • 36. 10/22/17 Heiko Paulheim 36 Finding Information in Knowledge Graphs • Find list of science fiction writers in DBpedia select ?x where {?x a dbo:Writer . ?x dbo:genre dbr:Science_Fiction} order by ?x
  • 37. 10/22/17 Heiko Paulheim 37 Finding Information in Knowledge Graphs • Results from DBpedia Arthur C. Clarke? H.G. Wells? Isaac Asimov?
  • 38. 10/22/17 Heiko Paulheim 38 Common Shortcomings of Knoweldge Graphs • What reasons can cause incomplete results? • Two possible problems: – The resource at hand is not of type dbo:Writer – The genre relation to dbr:Science_Fiction is missing select ?x where {?x a dbo:Writer . ?x dbo:genre dbr:Science_Fiction} order by ?x
  • 39. 10/22/17 Heiko Paulheim 39 Common Shortcomings of Knowledge Graphs • Various works on Knowledge Graph Refinement – Knowledge Graph completion – Error detection • See, e.g., 2017 survey in Semantic Web Journal Paulheim: Knowledge Graph Refinement – A Survey of Approaches and Evaluation Methods. SWJ 8(3), 2017 Tuesday, 4:30 pm Journal track paper presentation
  • 40. 10/22/17 Heiko Paulheim 40 New Kids on the Block Subjective age: Measured by the fraction of the audience that understands a reference to your young days’ pop culture...
  • 41. 10/22/17 Heiko Paulheim 41 New Kids on the Block • Wikipedia-based Knowledge Graphs will remain an essential building block of Semantic Web applications • But they suffer from... – ...a coverage bias – ...limitations of the creating heuristics
  • 42. 10/22/17 Heiko Paulheim 42 Wikipedia’s Coverage Bias • One (but not the only!) possible source of coverage bias – Articles about long-tail entities become deleted
  • 43. 10/22/17 Heiko Paulheim 43 Work in Progress: DBkWik • Why stop at Wikipedia? • Wikipedia is based on the MediaWiki software – ...and so are thousands of Wikis – Fandom by Wikia: >385,000 Wikis on special topics – WikiApiary: reports >20,000 installations of MediaWiki on the Web
  • 44. 10/22/17 Heiko Paulheim 44 Work in Progress: DBkWik • Back to our original example...
  • 45. 10/22/17 Heiko Paulheim 45 Work in Progress: DBkWik • Back to our original example...
  • 46. 10/22/17 Heiko Paulheim 46 Work in Progress: DBkWik • The DBpedia Extraction Framework consumes MediaWiki dumps • Experiment – Can we process dumps from arbitrary Wikis with it? – Are the results somewhat meaningful?
  • 47. 10/22/17 Heiko Paulheim 47 Work in Progress: DBkWik • Example from Harry Potter Wiki http://dbkwik.webdatacommons.org/
  • 48. 10/22/17 Heiko Paulheim 48 Work in Progress: DBkWik • Differences to DBpedia – DBpedia has manually created mappings to an ontology – Wikipedia has one page per subject – Wikipedia has global infobox conventions (more or less) • Challenges – On-the-fly ontology creation – Instance matching – Schema matching
  • 49. 10/22/17 Heiko Paulheim 49 Work in Progress: DBkWik Dump Downloader Extraction Framework Interlinking Instance Matcher Schema Matcher MediaWiki Dumps Extracted RDF Internal Linking Instance Matcher Schema Matcher Consolidated Knowledge Graph DBkWik Linked Data Endpoint 1 2 34 5 • Avoiding O(n²) internal linking: – Match to DBpedia first – Use common links to DBpedia as blocking keys for internal matching
  • 50. 10/22/17 Heiko Paulheim 50 Work in Progress: DBkWik • Downloaded ~15k Wiki dumps from Fandom – 52.4GB of data, roughly the size of the English Wikipedia • Prototype: extracted data for ~250 Wikis – 4.3M instances, ~750k linked to DBpedia – 7k classes, ~1k linked to DBpedia – 43k properties, ~20k linked to DBpedia – ...including duplicates! • Link quality – Good for classes, OK for properties (F1 of .957 and .852) – Needs improvement for instances (F1 of .641) Monday 6:30pm Poster presentation
  • 51. 10/22/17 Heiko Paulheim 51 Work in Progress: WebIsALOD • Background: Web table interpretation • Most approaches need typing information – DBpedia etc. have too little coverage on the long tail – Wanted: extensive type database
  • 52. 10/22/17 Heiko Paulheim 52 Work in Progress: WebIsALOD • Extraction of type information using Hearst-like patterns, e.g., – T, such as X – X, Y, and other T • Text corpus: common crawl – ~2 TB crawled web pages – Fast implementation: regex over text – “Expensive” operations only applied once regex has fired • Resulting database – 400M hypernymy relations Seitner et al.: A large DataBase of hypernymy relations extracted from the Web. LREC 2016
  • 53. 10/22/17 Heiko Paulheim 53 Work in Progress: WebIsALOD • Back to our original example... http://webisa.webdatacommons.org/
  • 54. 10/22/17 Heiko Paulheim 54 Work in Progress: WebIsALOD • Initial effort: transformation to a LOD dataset – including rich provenance information Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data. ISWC 2017
  • 55. 10/22/17 Heiko Paulheim 55 Work in Progress: WebIsALOD • Estimated contents breakdown Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data. ISWC 2017
  • 56. 10/22/17 Heiko Paulheim 56 Work in Progress: WebIsALOD • Main challenge – Original dataset is quite noisy (<10% correct statements) – Recap: coverage vs. accuracy – Simple thresholding removes too much knowledge • Approach – Train RandomForest model for predicting correct vs. wrong statements – Using all the provenance information we have – Use model to compute confidence scores Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data. ISWC 2017
  • 57. 10/22/17 Heiko Paulheim 57 Work in Progress: WebIsALOD • Current challenges and works in progress – Distinguishing instances and classes • i.e.: subclass vs. instance of relations – Splitting instances • Bauhaus is a goth band • Bauhaus is a German school – Knowledge extraction from pre and post modifiers • Bauhaus is a goth band → genre(Bauhaus, Goth) • Bauhaus is a German school → location(Bauhaus, Germany) Hertling & Paulheim: WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data. ISWC 2017 Tuesday, 2:30 pm Resource track paper presentation
  • 58. 10/22/17 Heiko Paulheim 58 Take Aways • Knowledge Graphs contain a massive amount of information – Various trade offs in their creation – That lead to different profiles – ...and different shortcomings • Knowledge Graph Profiling – What is in a knowledge graph? – At which level of detail is it described? – How different are knowledge graphs? • Various methods exist for – ...addressing the various shortcomings • New kids on the block – DBkWik and WebIsALOD – Focus on long tail entities
  • 59. 10/22/17 Heiko Paulheim 59 Towards Knowledge Graph Profiling Heiko Paulheim