SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto
Heiko Paulheim, Ponzetto
Heiko

1
Disclaimer
•

This presentation shows an idea
– after all, it says “position paper”
– We don't know if it works!
– (but we are quite confident)

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

2
Lists in Wikipedia
•

Wikipedia loves lists

•

As of June 2013, there are almost 600,000 list pages

•

Lists organize Wikipedia pages
– that correspond to DBpedia instances

•

Example:
– List of African-American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

3
Lists in Wikipedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

4
Lists in Wikipedia
•

Different types of lists
– simple bullet point lists
– broken bullet point lists (i.e., different sections)
• sometimes, the sections are semantically meaningful
– tables
– ...

Simple Bullet List
Broken Bullet List
Table
Other

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

5
Lists in Wikipedia
•

What information is in a list?
– the linked things have the same “type”

•

The type can be a complex construct
– e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American}

•

Sometimes, there are more information bits
– e.g., birth dates for persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

6
Extracting Information from Lists
•

Goal:
– find the common characteristics of all things in the list

•

Example: African-American writers
– all instances are writers

25%

– all instances have nationality=United_States
– all instances have ethnicity=African_American

•

12%
3%

Information in DBpedia is far from complete
– makes extraction difficult
– but: big potential to add information to DBpedia

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

7
Extracting Information from Lists
•

Possible approach: finding characteristics with high TF-IDF
– TF: percentage of instances in the list that carry characteristic
– IDF: 1 / (percentage of all DBpedia instances that carry characteristic)

•

Rationale: only going by frequency would rate owl:Thing the highest

•

Example: African-American writers
– type=Writer: 0.608 (maximal across all possible classes)
– nationality=United_States: 0.277
– ethnicity=African_American: 0.127

•

But:
– deathPlace=New_York_City: 0.157 :-(

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

8
Extracting Information from Lists
•

Example: African-American writers
– ethnicity=African_American: 0.127
– deathPlace=New_York_City: 0.157

•

Exploit further information from list page
– e.g., wiki:African_American is linked from page, New_York_City is not
– e.g., analyze list page title, e.g., using DBpedia Spotlight
• African_American is recognized as an entity

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

9
Lists of Lists in Wikipedia
•

Wikipedia also knows ~600 lists of lists
– organize lists
– form a hierachy

•

E.g.:
– Lists of Writers
– Lists of American writers
– List of African American writers

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

10
From Lists of Lists to an Extended Ontology
•

Idea:
– find corresponding lists of... pages for DBpedia classes
– extend hierarchy
owl:Thing
...

Agent
...

Person

Corresponding Wikipedia page:

Artist

...
DBpedia Ontology

...

Extended Ontology ...

Lists of Writers

Writer

African-American Writer

10/22/13

Lists of American Writers

American Writer
...

List of African-American Writers

Heiko Paulheim, Simone Paolo Ponzetto

11
Potential of the Idea
•

Given that we extract everything correctly from
List of African American writers, we get
– 814 new type statements (only DBpedia ontology)
– 1409 new property assertions
– two entirely new instances

•

...and there are ~600,000 list pages
– extrapolation: we can roughly double the information in DBpedia

•

many list pages contain extra information
– e.g., birth places and birth dates of persons

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

12
Challenges
•

Robust extraction of instances
– from different kinds of list pages
– e.g., picking the right column in a table
– tables and bullet point lists already make for 75%

•

Picking good scoring functions
– TF-IDF seems not bad at first glance

•

Combining statistical and textual evidence

•

Scalable implementation
– Advantage: perfectly parallelizable

10/22/13

Heiko Paulheim, Simone Paolo Ponzetto

13
Extending DBpedia
with Wikipedia List Pages

10/22/13 Paulheim, Christian Bizer
Heiko Paulheim, Simone Paolo Ponzetto
Heiko

14

Weitere ähnliche Inhalte

Was ist angesagt?

Heritage University Newspaper Resources
Heritage University Newspaper ResourcesHeritage University Newspaper Resources
Heritage University Newspaper ResourcesRonald Hodge
 
Digital Library exploration evaluation
Digital Library exploration evaluationDigital Library exploration evaluation
Digital Library exploration evaluationSusan Kelly
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewJudith Ahronheim
 
Part V Documenting Your Sources
Part V Documenting Your SourcesPart V Documenting Your Sources
Part V Documenting Your SourcesJean Reynolds
 
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...Nebraska Library Commission
 
RDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityRDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityAJL2011
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011echeneyl
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industrygulab sharma
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harderuwlibeo
 
Custom source types
Custom source typesCustom source types
Custom source typesCarole Riley
 
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Emily Nimsakont
 
Getting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionGetting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionMay Chan
 

Was ist angesagt? (20)

Another one like this
Another one like thisAnother one like this
Another one like this
 
English Postgraduates introduction to the library
English Postgraduates introduction to the libraryEnglish Postgraduates introduction to the library
English Postgraduates introduction to the library
 
Searching Workshop
Searching WorkshopSearching Workshop
Searching Workshop
 
Heritage University Newspaper Resources
Heritage University Newspaper ResourcesHeritage University Newspaper Resources
Heritage University Newspaper Resources
 
English Session 1: finding quality information for your course
English Session 1: finding quality information for your courseEnglish Session 1: finding quality information for your course
English Session 1: finding quality information for your course
 
Digital Library exploration evaluation
Digital Library exploration evaluationDigital Library exploration evaluation
Digital Library exploration evaluation
 
Enhancing authority records to aid copyright review
Enhancing authority records to aid copyright reviewEnhancing authority records to aid copyright review
Enhancing authority records to aid copyright review
 
PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014PIE-J - NISO Update Jan 2014
PIE-J - NISO Update Jan 2014
 
Find articles theatre 1313
Find articles theatre 1313Find articles theatre 1313
Find articles theatre 1313
 
Library resources
Library resourcesLibrary resources
Library resources
 
Part V Documenting Your Sources
Part V Documenting Your SourcesPart V Documenting Your Sources
Part V Documenting Your Sources
 
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
NCompass Live: Making Your Catalog Work for Your Community: How to Develop Lo...
 
RDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging communityRDA and Hebraica: Applying RDA in one cataloging community
RDA and Hebraica: Applying RDA in one cataloging community
 
Engr185 fall 2011
Engr185 fall 2011Engr185 fall 2011
Engr185 fall 2011
 
finding info for film industry
finding info for film industryfinding info for film industry
finding info for film industry
 
Mla citation
Mla citationMla citation
Mla citation
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harder
 
Custom source types
Custom source typesCustom source types
Custom source types
 
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
Making Your Catalog Work for Your Community: How to Develop Local Cataloging ...
 
Getting Started with Ancestry Library Edition
Getting Started with Ancestry Library EditionGetting Started with Ancestry Library Edition
Getting Started with Ancestry Library Edition
 

Ähnlich wie Extending DBpedia with Wikipedia List Pages

The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanationArmando Castillo
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1lakehead1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301lakehead1
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sourcesmkwalsh55
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011Sue Bennett
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011Sue Bennett
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talkphibaa
 
Types of information sources module
Types of information sources moduleTypes of information sources module
Types of information sources moduleSharon Tyler
 
Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Christine Serrano
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2charlottefrost
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxdunnramage
 
Basic search skills training
Basic search skills trainingBasic search skills training
Basic search skills trainingCandy Yip
 
Research skills final revision
Research skills final revisionResearch skills final revision
Research skills final revisionHeba Bakry
 

Ähnlich wie Extending DBpedia with Wikipedia List Pages (20)

1 Hf Research Journey
1 Hf Research Journey1 Hf Research Journey
1 Hf Research Journey
 
1 hf research_journey
1 hf research_journey1 hf research_journey
1 hf research_journey
 
The essay parts & explanation
The essay parts & explanationThe essay parts & explanation
The essay parts & explanation
 
Canadian history 1
Canadian history 1Canadian history 1
Canadian history 1
 
Canadian history 2301
Canadian history  2301Canadian history  2301
Canadian history 2301
 
Wikimedia Workshop
Wikimedia WorkshopWikimedia Workshop
Wikimedia Workshop
 
Biographical Reference Sources
Biographical Reference SourcesBiographical Reference Sources
Biographical Reference Sources
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Research strategies update 2011
Research strategies update 2011Research strategies update 2011
Research strategies update 2011
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Writing a bibleography
Writing a bibleographyWriting a bibleography
Writing a bibleography
 
Research: Multicultural Education
Research: Multicultural EducationResearch: Multicultural Education
Research: Multicultural Education
 
swib12 lightning talk
swib12 lightning talkswib12 lightning talk
swib12 lightning talk
 
Types of information sources module
Types of information sources moduleTypes of information sources module
Types of information sources module
 
Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)Reading Comprehension (Language Curriculum)
Reading Comprehension (Language Curriculum)
 
How do you research art part 2
How do you research art part 2How do you research art part 2
How do you research art part 2
 
Works Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docxWorks Cited Modern Language AssociationModern Lang.docx
Works Cited Modern Language AssociationModern Lang.docx
 
Basic search skills training
Basic search skills trainingBasic search skills training
Basic search skills training
 
Research skills final revision
Research skills final revisionResearch skills final revision
Research skills final revision
 

Mehr von Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 

Mehr von Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 

Kürzlich hochgeladen

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 

Kürzlich hochgeladen (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 

Extending DBpedia with Wikipedia List Pages

  • 1. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Simone Paolo Simone Paolo Ponzetto Heiko Paulheim, Ponzetto Heiko 1
  • 2. Disclaimer • This presentation shows an idea – after all, it says “position paper” – We don't know if it works! – (but we are quite confident) 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 2
  • 3. Lists in Wikipedia • Wikipedia loves lists • As of June 2013, there are almost 600,000 list pages • Lists organize Wikipedia pages – that correspond to DBpedia instances • Example: – List of African-American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 3
  • 4. Lists in Wikipedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 4
  • 5. Lists in Wikipedia • Different types of lists – simple bullet point lists – broken bullet point lists (i.e., different sections) • sometimes, the sections are semantically meaningful – tables – ... Simple Bullet List Broken Bullet List Table Other 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 5
  • 6. Lists in Wikipedia • What information is in a list? – the linked things have the same “type” • The type can be a complex construct – e.g., Writer∩∀ nationality. {United States}∩∀ ethnicity.{African American} • Sometimes, there are more information bits – e.g., birth dates for persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 6
  • 7. Extracting Information from Lists • Goal: – find the common characteristics of all things in the list • Example: African-American writers – all instances are writers 25% – all instances have nationality=United_States – all instances have ethnicity=African_American • 12% 3% Information in DBpedia is far from complete – makes extraction difficult – but: big potential to add information to DBpedia 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 7
  • 8. Extracting Information from Lists • Possible approach: finding characteristics with high TF-IDF – TF: percentage of instances in the list that carry characteristic – IDF: 1 / (percentage of all DBpedia instances that carry characteristic) • Rationale: only going by frequency would rate owl:Thing the highest • Example: African-American writers – type=Writer: 0.608 (maximal across all possible classes) – nationality=United_States: 0.277 – ethnicity=African_American: 0.127 • But: – deathPlace=New_York_City: 0.157 :-( 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 8
  • 9. Extracting Information from Lists • Example: African-American writers – ethnicity=African_American: 0.127 – deathPlace=New_York_City: 0.157 • Exploit further information from list page – e.g., wiki:African_American is linked from page, New_York_City is not – e.g., analyze list page title, e.g., using DBpedia Spotlight • African_American is recognized as an entity 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 9
  • 10. Lists of Lists in Wikipedia • Wikipedia also knows ~600 lists of lists – organize lists – form a hierachy • E.g.: – Lists of Writers – Lists of American writers – List of African American writers 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 10
  • 11. From Lists of Lists to an Extended Ontology • Idea: – find corresponding lists of... pages for DBpedia classes – extend hierarchy owl:Thing ... Agent ... Person Corresponding Wikipedia page: Artist ... DBpedia Ontology ... Extended Ontology ... Lists of Writers Writer African-American Writer 10/22/13 Lists of American Writers American Writer ... List of African-American Writers Heiko Paulheim, Simone Paolo Ponzetto 11
  • 12. Potential of the Idea • Given that we extract everything correctly from List of African American writers, we get – 814 new type statements (only DBpedia ontology) – 1409 new property assertions – two entirely new instances • ...and there are ~600,000 list pages – extrapolation: we can roughly double the information in DBpedia • many list pages contain extra information – e.g., birth places and birth dates of persons 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 12
  • 13. Challenges • Robust extraction of instances – from different kinds of list pages – e.g., picking the right column in a table – tables and bullet point lists already make for 75% • Picking good scoring functions – TF-IDF seems not bad at first glance • Combining statistical and textual evidence • Scalable implementation – Advantage: perfectly parallelizable 10/22/13 Heiko Paulheim, Simone Paolo Ponzetto 13
  • 14. Extending DBpedia with Wikipedia List Pages 10/22/13 Paulheim, Christian Bizer Heiko Paulheim, Simone Paolo Ponzetto Heiko 14