SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Daniel Gerber
Axel-Cyrille Ngonga Ngomo
               AKSW, Universität Leipzig
Bootstrapping the Data Web
Motivation



             ๏       Most knowledge bases extracted from (semi)-
                     structured data
             ๏       Only 15-20 % of information in structured data
             ๏       Semantic Web ⬌ Document Web
             ๏       How can we extract data from the document-
                     oriented web?


WeKEx@ISWC - 17.01.2012   - Page 2                              http://boa.aksw.org
Bootstrapping the Data Web
Idea I



                                     dbpedia:Barack_Obama


     dbpedia-owl:birthPlace
                                                                     dbpedia-owl:spouse

                                                 dbpedia-owl:party

 dbpedia:Honolulu,_Hawaii
                                                                     dbpedia:Michelle_Obama

                                      dbpedia:Democratic_Party


WeKEx@ISWC - 17.01.2012   - Page 3                                                   http://boa.aksw.org
Bootstrapping the Data Web
Idea II



                      Barack Obama        was born in Honolulu, Hawaii.



                                is a politician of the
          Barack Hussein Obama is a politician of the Democratic Party.



                          Obama      married Michelle Robinson in 1992.


WeKEx@ISWC - 17.01.2012   - Page 4                                        http://boa.aksw.org
Bootstrapping the Data Web
Idea III




                              married                         is a politician of the
      Jackie Bouvier Kennedy Onassis who
      married John F. Kennedy was tied to           Joseph Martin "Joschka" Fischer (born 1948-04-12)
      the Auchinclosses via her sister's            is a politician of the German Green Party.
      marriage into the Auchincloss family.




                            was born in       Dietrich's only child, Maria
                                              Elisabeth Sieber, was born in
                                              Berlin on 13 December 1924.


WeKEx@ISWC - 17.01.2012   - Page 5                                                           http://boa.aksw.org
Bootstrapping the Data Web
Related Work




          ๏   ReadTheWeb Project: N(ever) E(nding) L(anguage) L(earner)

          ๏   PROSPERA: Scalable Knowledge Harvesting with High Precision

          and High Recall


WeKEx@ISWC - 17.01.2012   - Page 6                                        http://boa.aksw.org
Bootstrapping the Data Web

                                                  The BOA approach
                                                                   Use in next
                                 Knowledge Acquisition             iteration          Filtering
            Data Web
                                        SPARQL

                                                  2                               3
                                                          Background
                Web                                       Knowledge                               Pattern
                                                                        Pattern                   Scoring
                                                                                  Patterns
                                                      1                  Search
                          Corpus Extraction
                                                                                             4
                     Crawler            Indexer

                              Cleaner                                             RDF
                                                            Corpora               Generation        5



WeKEx@ISWC - 17.01.2012     - Page 7                                                                    http://boa.aksw.org
Bootstrapping the Data Web
Knowledge acquisition

                     SELECT ?x ?xLabel ?prop ?y ?yLabel ?domain ?range
                     WHERE {
                     	

 ?x rdf:type dbpedia-owl:[Organisation|Person|Place] .
                     	

 ?x rdfs:label ?xLabel . ?y rdfs:label ?yLabel .
                     	

 [?y ?prop ?x | ?x ?prop ?y] .
                     	

 FILTER ( lang(?xLabel) = ‘en’ && lang(?yLabel) = ‘en’ ) .
                     	

 ?prop rdfs:range ?range . ?prop rdfs:domain ?domain .
                     }


         http://dbpedia.org/resource/Google          http://dbpedia.org/ontology/Company
         “Google”                                    http://dbpedia.org/ontology/Company
         http://dbpedia.org/ontology/subsidiary
         http://dbpedia.org/resource/YouTube
         “Youtube”

WeKEx@ISWC - 17.01.2012   - Page 8                                                         http://boa.aksw.org
Bootstrapping the Data Web
Pattern Search


            (1) Set of entities s and o connected through p
            (2) Find all sentences which contain s and o
            (3) Replace labels with variables (?D?, ?R?)


            BOA pattern:                  BOA pattern mapping:
                                      dbpedia-owl:spouse
           dbpedia-owl:spouse                                     dbpedia-owl:spouse
                                     “?D? with his wife ?R?”
          “?D? with his wife ?R?”                              “?D? and her husband ?R?”
                                                   dbpedia-owl:spouse
                                                  “?D? and his wife ?R?”

WeKEx@ISWC - 17.01.2012   - Page 9                                              http://boa.aksw.org
Bootstrapping the Data Web
Pattern Scoring - Support



             Support
             pattern should be used across several triples in background knowledge


             subsidiary ↣ “?R? was acquired by ?D?”
             ๏    [Google, DoubleClick] ↣ 2
             ๏    [General Motors, Opel] ↣ 1
             ๏    [Cablevision, Rainbow Media] ↣ 4


WeKEx@ISWC - 17.01.2012   - Page 10                                                  http://boa.aksw.org
Bootstrapping the Data Web
Pattern Scoring - Specificity



             Specificity
             pattern should not be used by many pattern mappings

               ๏   subsidiary: “?D? agreed to buy ?R?”
               ๏   subsidiary: “?R? is a part of ?D?”
               ๏   foundationOrganisation: “?R? is a part of ?D?”



WeKEx@ISWC - 17.01.2012   - Page 11                                http://boa.aksw.org
Bootstrapping the Data Web
Pattern Scoring - Typicity



             Typicity
             pattern should be used to connect entities of correct type

            ๏   Hypercom was acquired by Verifone .
                ๏   Hypercom_ORG was_O acquired_O by_O Verifone_ORG ._O

            ๏   Maktoob was acquired by Yahoo!
                ๏   Maktoob_PER was_O acquired_O by_O Yahoo_ORG ._O



WeKEx@ISWC - 17.01.2012   - Page 12                                       http://boa.aksw.org
Bootstrapping the Data Web
RDF Generation
                                             ?D? with his wife ?R?

                   Pacheco arrived with his wife Leyla Rodriguez Stahl and several...


Pacheco_PER arrived_O with_O his_O wife_O Leyla_PER Rodriguez_PER Stahl_PER and_O


                                                            NEW
                                            dbpedia-owl:spouse                                    NEW
     dbpedia:Abel_Pacheco                                                  boa:Leyla_Rodriguez_Stahl


                                 rdf:type                            rdf:type
            rdfs:label                                                          NEW     rdfs:label

                               dbpedia-                                                          NEW
                                                              dbpedia-
‘‘Abel Pacheco’’@en           owl:Person                                        ‘‘Leyla Rodriguez Stahl’’@en
                                                             owl:Person

WeKEx@ISWC - 17.01.2012   - Page 13                                                              http://boa.aksw.org
Bootstrapping the Data Web
Evaluation I

                                                                                                       riverMouth
                                                                                                       musicalArtist
                                                                                                       musicalBand




                                                                                                                           # of triples
                             en-wiki     en-news                                                       award
                                                                                                       writer
                                                                                                       almaMater
                                                                                                       occupation
     Language                  english    english                                                      formerTeam
                                                                                                       deathPlace

                              general                                                                  birthPlace
     Topic                                 news
                            knowledge

     # of lines                44.7M      256.1M
                                               riverMouth                                 158697
                                                  musicalArtist
                                                  musicalBand                                         is object
                                                  award           # of triples
                                                                                                      is subject
     # of words              1,032.1M    5,068.7M
                                                writer
                                                                                          551693
                                                  almaMater                      327430
                                                  occupation
                                                                                                      137990
                                                  formerTeam
                                                  deathPlace                     72820                64239
                                                  birthPlace
                                                                                 Place    Person   Organisation


WeKEx@ISWC - 17.01.2012   - Page 14                                                                      http://boa.aksw.org
Bootstrapping the Data Web
Evaluation II

                                                    en-wiki                           en-news
                                          LOC         PER        ORG        LOC         PER        ORG
      Triples extracted                   1465       8817        2567        488        903         916
      Triples in DBpedia                   138        183         48         52          44           7
      Evaluated Triples                  100 (8)    100 (1)     100 (1)    100 (1)    100 (7)     100 (0)
      Precision                           90,5         97         99         61,5       73,5         91
      New true Statements*                1200       8375        2494        268        631         827
      Found pattern mappings               62          72         59         49          70          55
      Found patterns                      123k       136k        38k        569k        465k        92k
      Scored patterns                     1045        612        241        3832        7294       1077

        * Number of extracted statements not found in DBpedia multiplied with the precision of our approach

WeKEx@ISWC - 17.01.2012   - Page 15                                                                http://boa.aksw.org
Bootstrapping the Data Web
Future Work


            ๏   Iteration 1+
            ๏   Human feedback
            ๏   Pattern generalization
            ๏   Datatype Properties
            ๏   Languages/Corpora
            ๏ Webservices

WeKEx@ISWC - 17.01.2012   - Page 16          http://boa.aksw.org
Bootstrapping the Data Web
Conclusion



           ๏    No manual created seed patterns needed
           ๏    95.5% Precision on DBpedia/Wikipedia
           ๏    Output easily integrable in LOD Cloud
           ๏    Library of natural-language representations of
                 formal relations, Demo
           ๏    Quasi language independent (German/Korean)

WeKEx@ISWC - 17.01.2012   - Page 17                          http://boa.aksw.org
Thank you!
                                           Questions?
Daniel Gerber
Johannisgasse 26, Room 5-21
04103 Leipzig, Germany
SIMBA@AKSW
http://bis.informatik.uni-leipzig.de/DanielGerber
http://boa.aksw.org
http://code.google.com/p/boa

   LOD2 Presentation . 02.09.2010 . Page                http://lod2.eu

Weitere ähnliche Inhalte

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Empfohlen

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Empfohlen (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

BOA - Bootstrapping Linked Data

  • 1. Daniel Gerber Axel-Cyrille Ngonga Ngomo AKSW, Universität Leipzig
  • 2. Bootstrapping the Data Web Motivation ๏ Most knowledge bases extracted from (semi)- structured data ๏ Only 15-20 % of information in structured data ๏ Semantic Web ⬌ Document Web ๏ How can we extract data from the document- oriented web? WeKEx@ISWC - 17.01.2012 - Page 2 http://boa.aksw.org
  • 3. Bootstrapping the Data Web Idea I dbpedia:Barack_Obama dbpedia-owl:birthPlace dbpedia-owl:spouse dbpedia-owl:party dbpedia:Honolulu,_Hawaii dbpedia:Michelle_Obama dbpedia:Democratic_Party WeKEx@ISWC - 17.01.2012 - Page 3 http://boa.aksw.org
  • 4. Bootstrapping the Data Web Idea II Barack Obama was born in Honolulu, Hawaii. is a politician of the Barack Hussein Obama is a politician of the Democratic Party. Obama married Michelle Robinson in 1992. WeKEx@ISWC - 17.01.2012 - Page 4 http://boa.aksw.org
  • 5. Bootstrapping the Data Web Idea III married is a politician of the Jackie Bouvier Kennedy Onassis who married John F. Kennedy was tied to Joseph Martin "Joschka" Fischer (born 1948-04-12) the Auchinclosses via her sister's is a politician of the German Green Party. marriage into the Auchincloss family. was born in Dietrich's only child, Maria Elisabeth Sieber, was born in Berlin on 13 December 1924. WeKEx@ISWC - 17.01.2012 - Page 5 http://boa.aksw.org
  • 6. Bootstrapping the Data Web Related Work ๏ ReadTheWeb Project: N(ever) E(nding) L(anguage) L(earner) ๏ PROSPERA: Scalable Knowledge Harvesting with High Precision and High Recall WeKEx@ISWC - 17.01.2012 - Page 6 http://boa.aksw.org
  • 7. Bootstrapping the Data Web The BOA approach Use in next Knowledge Acquisition iteration Filtering Data Web SPARQL 2 3 Background Web Knowledge Pattern Pattern Scoring Patterns 1 Search Corpus Extraction 4 Crawler Indexer Cleaner RDF Corpora Generation 5 WeKEx@ISWC - 17.01.2012 - Page 7 http://boa.aksw.org
  • 8. Bootstrapping the Data Web Knowledge acquisition SELECT ?x ?xLabel ?prop ?y ?yLabel ?domain ?range WHERE { ?x rdf:type dbpedia-owl:[Organisation|Person|Place] . ?x rdfs:label ?xLabel . ?y rdfs:label ?yLabel . [?y ?prop ?x | ?x ?prop ?y] . FILTER ( lang(?xLabel) = ‘en’ && lang(?yLabel) = ‘en’ ) . ?prop rdfs:range ?range . ?prop rdfs:domain ?domain . } http://dbpedia.org/resource/Google http://dbpedia.org/ontology/Company “Google” http://dbpedia.org/ontology/Company http://dbpedia.org/ontology/subsidiary http://dbpedia.org/resource/YouTube “Youtube” WeKEx@ISWC - 17.01.2012 - Page 8 http://boa.aksw.org
  • 9. Bootstrapping the Data Web Pattern Search (1) Set of entities s and o connected through p (2) Find all sentences which contain s and o (3) Replace labels with variables (?D?, ?R?) BOA pattern: BOA pattern mapping: dbpedia-owl:spouse dbpedia-owl:spouse dbpedia-owl:spouse “?D? with his wife ?R?” “?D? with his wife ?R?” “?D? and her husband ?R?” dbpedia-owl:spouse “?D? and his wife ?R?” WeKEx@ISWC - 17.01.2012 - Page 9 http://boa.aksw.org
  • 10. Bootstrapping the Data Web Pattern Scoring - Support Support pattern should be used across several triples in background knowledge subsidiary ↣ “?R? was acquired by ?D?” ๏ [Google, DoubleClick] ↣ 2 ๏ [General Motors, Opel] ↣ 1 ๏ [Cablevision, Rainbow Media] ↣ 4 WeKEx@ISWC - 17.01.2012 - Page 10 http://boa.aksw.org
  • 11. Bootstrapping the Data Web Pattern Scoring - Specificity Specificity pattern should not be used by many pattern mappings ๏ subsidiary: “?D? agreed to buy ?R?” ๏ subsidiary: “?R? is a part of ?D?” ๏ foundationOrganisation: “?R? is a part of ?D?” WeKEx@ISWC - 17.01.2012 - Page 11 http://boa.aksw.org
  • 12. Bootstrapping the Data Web Pattern Scoring - Typicity Typicity pattern should be used to connect entities of correct type ๏ Hypercom was acquired by Verifone . ๏ Hypercom_ORG was_O acquired_O by_O Verifone_ORG ._O ๏ Maktoob was acquired by Yahoo! ๏ Maktoob_PER was_O acquired_O by_O Yahoo_ORG ._O WeKEx@ISWC - 17.01.2012 - Page 12 http://boa.aksw.org
  • 13. Bootstrapping the Data Web RDF Generation ?D? with his wife ?R? Pacheco arrived with his wife Leyla Rodriguez Stahl and several... Pacheco_PER arrived_O with_O his_O wife_O Leyla_PER Rodriguez_PER Stahl_PER and_O NEW dbpedia-owl:spouse NEW dbpedia:Abel_Pacheco boa:Leyla_Rodriguez_Stahl rdf:type rdf:type rdfs:label NEW rdfs:label dbpedia- NEW dbpedia- ‘‘Abel Pacheco’’@en owl:Person ‘‘Leyla Rodriguez Stahl’’@en owl:Person WeKEx@ISWC - 17.01.2012 - Page 13 http://boa.aksw.org
  • 14. Bootstrapping the Data Web Evaluation I riverMouth musicalArtist musicalBand # of triples en-wiki en-news award writer almaMater occupation Language english english formerTeam deathPlace general birthPlace Topic news knowledge # of lines 44.7M 256.1M riverMouth 158697 musicalArtist musicalBand is object award # of triples is subject # of words 1,032.1M 5,068.7M writer 551693 almaMater 327430 occupation 137990 formerTeam deathPlace 72820 64239 birthPlace Place Person Organisation WeKEx@ISWC - 17.01.2012 - Page 14 http://boa.aksw.org
  • 15. Bootstrapping the Data Web Evaluation II en-wiki en-news LOC PER ORG LOC PER ORG Triples extracted 1465 8817 2567 488 903 916 Triples in DBpedia 138 183 48 52 44 7 Evaluated Triples 100 (8) 100 (1) 100 (1) 100 (1) 100 (7) 100 (0) Precision 90,5 97 99 61,5 73,5 91 New true Statements* 1200 8375 2494 268 631 827 Found pattern mappings 62 72 59 49 70 55 Found patterns 123k 136k 38k 569k 465k 92k Scored patterns 1045 612 241 3832 7294 1077 * Number of extracted statements not found in DBpedia multiplied with the precision of our approach WeKEx@ISWC - 17.01.2012 - Page 15 http://boa.aksw.org
  • 16. Bootstrapping the Data Web Future Work ๏ Iteration 1+ ๏ Human feedback ๏ Pattern generalization ๏ Datatype Properties ๏ Languages/Corpora ๏ Webservices WeKEx@ISWC - 17.01.2012 - Page 16 http://boa.aksw.org
  • 17. Bootstrapping the Data Web Conclusion ๏ No manual created seed patterns needed ๏ 95.5% Precision on DBpedia/Wikipedia ๏ Output easily integrable in LOD Cloud ๏ Library of natural-language representations of formal relations, Demo ๏ Quasi language independent (German/Korean) WeKEx@ISWC - 17.01.2012 - Page 17 http://boa.aksw.org
  • 18. Thank you! Questions? Daniel Gerber Johannisgasse 26, Room 5-21 04103 Leipzig, Germany SIMBA@AKSW http://bis.informatik.uni-leipzig.de/DanielGerber http://boa.aksw.org http://code.google.com/p/boa LOD2 Presentation . 02.09.2010 . Page http://lod2.eu