SlideShare a Scribd company logo
1 of 28
Download to read offline
Extending DBpedia (LOD) using
         WikiTables

              Emir Muñoz
   Unit for Reasoning and Querying
         emir.munoz@deri.org
Linked Open Data




Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

                                                      October 12, 2012 -- E. Muñoz
Linked Open Data

• DBpedia, an export of Wikipedia’s structured data




DBpedia provides RDF version of all wikipedia structured data (infoboxes)



                                October 12, 2012 -- E. Muñoz
Linked Open Data

• DBpedia, an export of Wikipedia’s structured data




DBpedia provides RDF version of all wikipedia structured data (infoboxes)

        But not yet a version of all normal Wikipedia tables or wikitables

                                October 12, 2012 -- E. Muñoz
Tables as a source of LOD
      Tables are inherently concise                                            Infoboxes
       as well as information rich                                            (attr-value)


   The values                  Column header represents
    represent                     types of information                               Caption as
instances of that                                                                   another row
      types




     http://en.wikipedia.org/wiki/Dublin

                                                                    http://en.wikipedia.org/wiki/Galway


                                           October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

   Recovering Table Semantics …
Dublin is twinned with the following places:
                                                                  http://en.wikipedia.org/wiki/Dublin




                                   October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

 Entity annotation for cells, mappings to DBpedia resources
                                                                                   http://en.wikipedia.org/wiki/Dublin

       dbpedia.org/property/city                     dbpedia.org/property/nation                 dbpedia.org/property/since

dbpedia.org/resource/San_Jose,_California         dbpedia.org/resource/United_States


     dbpedia.org/resource/Liverpool              dbpedia.org/resource/United_Kingdom

 dbpedia.org/resource/Matsue,_Shimane                 dbpedia.org/resource/Japan


    dbpedia.org/resource/Barcelona                    dbpedia.org/resource/Spain

      dbpedia.org/resource/Beijing          dbpedia.org/resource/People’s_Republic_of_China


                                                                                                    (xsd:integer)



                                            October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

                         dbpedia.org/ontology/country
                     dbpedia.org/property/subdivisionName
                                                                          Extracting relations
                                                                                   http://en.wikipedia.org/wiki/Dublin

       dbpedia.org/property/city                     dbpedia.org/property/nation                 dbpedia.org/property/since

dbpedia.org/resource/San_Jose,_California         dbpedia.org/resource/United_States


     dbpedia.org/resource/Liverpool              dbpedia.org/resource/United_Kingdom

 dbpedia.org/resource/Matsue,_Shimane                 dbpedia.org/resource/Japan


    dbpedia.org/resource/Barcelona                    dbpedia.org/resource/Spain

      dbpedia.org/resource/Beijing          dbpedia.org/resource/People’s_Republic_of_China


                                                                                                    (xsd:integer)

                       is dbpedia.org/ontology/country of

                                            October 12, 2012 -- E. Muñoz
•   <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_States> .
•              Reasoning over Wikipedia Tables
    <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_States> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/People's_Republic_of_China> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/People's_Republic_of_China> .

                                    October 12, 2012 -- E. Muñoz
•   <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_States> .
•              Reasoning over Wikipedia Tables
    <http://dbpedia.org/resource/San_Jose,_California>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_States> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Liverpool>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/United_Kingdom> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Matsue,_Shimane>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Japan> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Barcelona>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/Spain> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/property/subdivisionName>
    <http://dbpedia.org/resource/People's_Republic_of_China> .
•   <http://dbpedia.org/resource/Beijing>
    <http://dbpedia.org/ontology/country>
    <http://dbpedia.org/resource/People's_Republic_of_China> .

                                    October 12, 2012 -- E. Muñoz
Reasoning over Wikipedia Tables

• Let’s analyze these cases …

• Liverpool

• Matsue

• Beijing


                  October 12, 2012 -- E. Muñoz
Not that simple…

• Web tables usually don’t have explicit semantics
  by themselves.
• Main issues:
  –   Complex tables with spans
  –   Captions inside the table as another row
  –   Not well-formed tables (i.e., not a matrix)
  –   We need filters (e.g., min 2 columns, 2 rows)
• We are extracting relations at row level and
  between the main entity and the table resources

                       October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

First step: parsing Wiki format                                         Caption as
                                                                       another row




                                         http://en.wikipedia.org/wiki/People%27s_Republic_of_China



 Rowspans           Table split
with pictures

                          October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

• Problems with parsing the cell’s content

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

• Problems with parsing the cell’s content

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Parsing: Extracting Tables

                         Same page link                              Many different
                                                                       formats




Anchor text
    vs.
Content text




                    http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s

                         October 12, 2012 -- E. Muñoz
Extracting Relations

                                                http://en.wikipedia.org/wiki/AFC_Ajax

     A table
containing tables




                        October 12, 2012 -- E. Muñoz
Extracting Relations

• Also relations between the main entity and
  the entities in the table      http://en.wikipedia.org/wiki/AFC_Ajax


                                           16 players
dbpedia.org/resource/AFC_Ajax

14   dbpedia.org/ontology/team
14   dbpedia.org/property/clubs
11   dbpedia.org/property/currentclub
3    dbpedia.org/property/youthclubs

                            In his dbpedia page
                            there is no mention
                                 to AFC Ajax


                          October 12, 2012 -- E. Muñoz
dbpedia.org/resource/Christian_Eriksen




                                                               http://en.wikipedia.org/wiki/AFC_Ajax
Disambiguation page
dbpedia.org/resource/Ajax




                                    October 12, 2012 -- E. Muñoz
Our Dataset

• enwiki dump from 2012-09-03 02:17:37
• 8.6 GB of Wikipedia pages that comprise
  – 10,531,986 documents (HTML pages)
  – Only 413,256 HTML contains tables
  – 2,989,098 tables
  – 905,929 tables after the filter
     • 27.7% of the whole tables
  – 0.46 tables per page (or 2.15 discarding pages
    without tables)

                     October 12, 2012 -- E. Muñoz
Methodology




 October 12, 2012 -- E. Muñoz
Ranking of Relationships

• The current ranking function is naïve
                          𝑓 𝑟𝑒𝑙                              http://en.wikipedia.org/wiki/AFC_Ajax
              𝑠𝑐𝑜𝑟𝑒 =
                        𝑛 𝑟𝑜𝑤𝑠
                                                16 players

freq             relationship                   score
 14       dbpedia.org/ontology/team             0,875
 14       dbpedia.org/property/clubs            0,875
 11    dbpedia.org/property/currentclub         0,6875
 3     dbpedia.org/property/youthclubs          0,1875




                                  October 12, 2012 -- E. Muñoz
Ranking of Relationships

• For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1]

                                         http://en.wikipedia.org/wiki/Danny_Kaye




                  October 12, 2012 -- E. Muñoz
Ongoing Work and Challenges

• Improve the ranking function for relations.
• Store the 5.5M DBpedia (transitive) redirects
  locally (optimizing time).
• Statistical analysis of Wikipedia tables
  – Number of columns, rows
  – Headers, Captions
  – External and internal links
• The big following challenge is the evaluation.

                    October 12, 2012 -- E. Muñoz
What’s next?

• Some ideas in mind:
  – Use the extracted relations to classify WikiTables
  – Define a similarity function for WikiTables




                     English       Italian


                    October 12, 2012 -- E. Muñoz
What’s next?

http://en.wikipedia.org/wiki/Electronegativity




                   What means                       Here there is no reference to those numbers!
                   this number?



                                             October 12, 2012 -- E. Muñoz
What’s next?
                                                                      http://dbpedia.org/page/Chlorous_acid


http://en.wikipedia.org/wiki/Electronegativity




                                             Chlorous acid is a chlorite


                                                                           http://en.wikipedia.org/wiki/Chlorine




                                             October 12, 2012 -- E. Muñoz
Open problems

•   Handle multiple-entities in the same cell
•   Improve the ranking function
                      Thanks!
•   Handle redirects before querying DBpedia
                       Q&A
•   How to evaluate the outcome

                                             Thanks!
                                           Emir Muñoz
                                Unit for Reasoning and Querying
                                      emir.munoz@deri.org


                    October 12, 2012 -- E. Muñoz

More Related Content

Similar to WikiTables DERI Talk

Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataStefan Dietze
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupalemmanuel_jamin
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
 
06 gioca-ontologies
06 gioca-ontologies06 gioca-ontologies
06 gioca-ontologiesnidzokus
 
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Gaurav Vaidya
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositoriesandrea huang
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnClaudiu Mihăilă
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsJakob .
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintokeee
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemJakob .
 
Wikipedia takes angkor ppt & demo - final 20121003
Wikipedia takes angkor   ppt & demo - final 20121003Wikipedia takes angkor   ppt & demo - final 20121003
Wikipedia takes angkor ppt & demo - final 20121003Kounila Keo
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1manujam
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?ESPOL
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasadoespol
 

Similar to WikiTables DERI Talk (20)

Demo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open DataDemo: Profiling & Exploration of Linked Open Data
Demo: Profiling & Exploration of Linked Open Data
 
Linking Open Data with Drupal
Linking Open Data with DrupalLinking Open Data with Drupal
Linking Open Data with Drupal
 
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic WebWeb Science Synergies: Exploring Web Knowledge through the Semantic Web
Web Science Synergies: Exploring Web Knowledge through the Semantic Web
 
06 gioca-ontologies
06 gioca-ontologies06 gioca-ontologies
06 gioca-ontologies
 
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
Extracting Data from Historical Documents: Crowdsourcing Annotations on Wikis...
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition Yarn
 
Social Work Subject Guide
Social Work Subject GuideSocial Work Subject Guide
Social Work Subject Guide
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization SystemsWikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
 
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprintSw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
Wikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization SystemWikipedia as Knowledge Organization System
Wikipedia as Knowledge Organization System
 
Linked Open Data stuff
Linked Open Data stuffLinked Open Data stuff
Linked Open Data stuff
 
Wikipedia takes angkor ppt & demo - final 20121003
Wikipedia takes angkor   ppt & demo - final 20121003Wikipedia takes angkor   ppt & demo - final 20121003
Wikipedia takes angkor ppt & demo - final 20121003
 
121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1121004 linking open_data_with_drupal_v1
121004 linking open_data_with_drupal_v1
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
¿ARCHIVO?
¿ARCHIVO?¿ARCHIVO?
¿ARCHIVO?
 
que hisciste el verano pasado
que hisciste el verano pasadoque hisciste el verano pasado
que hisciste el verano pasado
 

More from Emir Muñoz

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesEmir Muñoz
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010Emir Muñoz
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsEmir Muñoz
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked DataEmir Muñoz
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónEmir Muñoz
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesEmir Muñoz
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014Emir Muñoz
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataEmir Muñoz
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesEmir Muñoz
 

More from Emir Muñoz (10)

A Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review MoviesA Linked Data-Based Decision Tree Classifier to Review Movies
A Linked Data-Based Decision Tree Classifier to Review Movies
 
Web Intelligence - 2010
Web Intelligence - 2010Web Intelligence - 2010
Web Intelligence - 2010
 
μRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elementsμRaptor: A DOM-based system with appetite for hCard elements
μRaptor: A DOM-based system with appetite for hCard elements
 
Learning Content Patterns from Linked Data
Learning Content Patterns from Linked DataLearning Content Patterns from Linked Data
Learning Content Patterns from Linked Data
 
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y ValidaciónClaves XML: Una Implementación de Algoritmos de Implicación y Validación
Claves XML: Una Implementación de Algoritmos de Implicación y Validación
 
Using Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's TablesUsing Linked Data to Mine RDF from Wikipedia's Tables
Using Linked Data to Mine RDF from Wikipedia's Tables
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 
Soft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML DataSoft Cardinality Constraints on XML Data
Soft Cardinality Constraints on XML Data
 
DRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From WikitablesDRETa: Extracting RDF From Wikitables
DRETa: Extracting RDF From Wikitables
 
DEXA 2012 Talk
DEXA 2012 TalkDEXA 2012 Talk
DEXA 2012 Talk
 

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 

WikiTables DERI Talk

  • 1. Extending DBpedia (LOD) using WikiTables Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org
  • 2. Linked Open Data Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ October 12, 2012 -- E. Muñoz
  • 3. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) October 12, 2012 -- E. Muñoz
  • 4. Linked Open Data • DBpedia, an export of Wikipedia’s structured data DBpedia provides RDF version of all wikipedia structured data (infoboxes) But not yet a version of all normal Wikipedia tables or wikitables October 12, 2012 -- E. Muñoz
  • 5. Tables as a source of LOD Tables are inherently concise Infoboxes as well as information rich (attr-value) The values Column header represents represent types of information Caption as instances of that another row types http://en.wikipedia.org/wiki/Dublin http://en.wikipedia.org/wiki/Galway October 12, 2012 -- E. Muñoz
  • 6. Reasoning over Wikipedia Tables Recovering Table Semantics … Dublin is twinned with the following places: http://en.wikipedia.org/wiki/Dublin October 12, 2012 -- E. Muñoz
  • 7. Reasoning over Wikipedia Tables Entity annotation for cells, mappings to DBpedia resources http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) October 12, 2012 -- E. Muñoz
  • 8. Reasoning over Wikipedia Tables dbpedia.org/ontology/country dbpedia.org/property/subdivisionName Extracting relations http://en.wikipedia.org/wiki/Dublin dbpedia.org/property/city dbpedia.org/property/nation dbpedia.org/property/since dbpedia.org/resource/San_Jose,_California dbpedia.org/resource/United_States dbpedia.org/resource/Liverpool dbpedia.org/resource/United_Kingdom dbpedia.org/resource/Matsue,_Shimane dbpedia.org/resource/Japan dbpedia.org/resource/Barcelona dbpedia.org/resource/Spain dbpedia.org/resource/Beijing dbpedia.org/resource/People’s_Republic_of_China (xsd:integer) is dbpedia.org/ontology/country of October 12, 2012 -- E. Muñoz
  • 9. <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 10. <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_States> . • Reasoning over Wikipedia Tables <http://dbpedia.org/resource/San_Jose,_California> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_States> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Liverpool> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/United_Kingdom> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Matsue,_Shimane> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Japan> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Barcelona> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/Spain> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/property/subdivisionName> <http://dbpedia.org/resource/People's_Republic_of_China> . • <http://dbpedia.org/resource/Beijing> <http://dbpedia.org/ontology/country> <http://dbpedia.org/resource/People's_Republic_of_China> . October 12, 2012 -- E. Muñoz
  • 11. Reasoning over Wikipedia Tables • Let’s analyze these cases … • Liverpool • Matsue • Beijing October 12, 2012 -- E. Muñoz
  • 12. Not that simple… • Web tables usually don’t have explicit semantics by themselves. • Main issues: – Complex tables with spans – Captions inside the table as another row – Not well-formed tables (i.e., not a matrix) – We need filters (e.g., min 2 columns, 2 rows) • We are extracting relations at row level and between the main entity and the table resources October 12, 2012 -- E. Muñoz
  • 13. Parsing: Extracting Tables First step: parsing Wiki format Caption as another row http://en.wikipedia.org/wiki/People%27s_Republic_of_China Rowspans Table split with pictures October 12, 2012 -- E. Muñoz
  • 14. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 15. Parsing: Extracting Tables • Problems with parsing the cell’s content http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 16. Parsing: Extracting Tables Same page link Many different formats Anchor text vs. Content text http://en.wikipedia.org/wiki/List_of_animated_television_series_of_the_1990s October 12, 2012 -- E. Muñoz
  • 17. Extracting Relations http://en.wikipedia.org/wiki/AFC_Ajax A table containing tables October 12, 2012 -- E. Muñoz
  • 18. Extracting Relations • Also relations between the main entity and the entities in the table http://en.wikipedia.org/wiki/AFC_Ajax 16 players dbpedia.org/resource/AFC_Ajax 14 dbpedia.org/ontology/team 14 dbpedia.org/property/clubs 11 dbpedia.org/property/currentclub 3 dbpedia.org/property/youthclubs In his dbpedia page there is no mention to AFC Ajax October 12, 2012 -- E. Muñoz
  • 19. dbpedia.org/resource/Christian_Eriksen http://en.wikipedia.org/wiki/AFC_Ajax Disambiguation page dbpedia.org/resource/Ajax October 12, 2012 -- E. Muñoz
  • 20. Our Dataset • enwiki dump from 2012-09-03 02:17:37 • 8.6 GB of Wikipedia pages that comprise – 10,531,986 documents (HTML pages) – Only 413,256 HTML contains tables – 2,989,098 tables – 905,929 tables after the filter • 27.7% of the whole tables – 0.46 tables per page (or 2.15 discarding pages without tables) October 12, 2012 -- E. Muñoz
  • 21. Methodology October 12, 2012 -- E. Muñoz
  • 22. Ranking of Relationships • The current ranking function is naïve 𝑓 𝑟𝑒𝑙 http://en.wikipedia.org/wiki/AFC_Ajax 𝑠𝑐𝑜𝑟𝑒 = 𝑛 𝑟𝑜𝑤𝑠 16 players freq relationship score 14 dbpedia.org/ontology/team 0,875 14 dbpedia.org/property/clubs 0,875 11 dbpedia.org/property/currentclub 0,6875 3 dbpedia.org/property/youthclubs 0,1875 October 12, 2012 -- E. Muñoz
  • 23. Ranking of Relationships • For this cases is not good and 𝑠𝑐𝑜𝑟𝑒 ∉ [0,1] http://en.wikipedia.org/wiki/Danny_Kaye October 12, 2012 -- E. Muñoz
  • 24. Ongoing Work and Challenges • Improve the ranking function for relations. • Store the 5.5M DBpedia (transitive) redirects locally (optimizing time). • Statistical analysis of Wikipedia tables – Number of columns, rows – Headers, Captions – External and internal links • The big following challenge is the evaluation. October 12, 2012 -- E. Muñoz
  • 25. What’s next? • Some ideas in mind: – Use the extracted relations to classify WikiTables – Define a similarity function for WikiTables English Italian October 12, 2012 -- E. Muñoz
  • 26. What’s next? http://en.wikipedia.org/wiki/Electronegativity What means Here there is no reference to those numbers! this number? October 12, 2012 -- E. Muñoz
  • 27. What’s next? http://dbpedia.org/page/Chlorous_acid http://en.wikipedia.org/wiki/Electronegativity Chlorous acid is a chlorite http://en.wikipedia.org/wiki/Chlorine October 12, 2012 -- E. Muñoz
  • 28. Open problems • Handle multiple-entities in the same cell • Improve the ranking function Thanks! • Handle redirects before querying DBpedia Q&A • How to evaluate the outcome Thanks! Emir Muñoz Unit for Reasoning and Querying emir.munoz@deri.org October 12, 2012 -- E. Muñoz