SlideShare ist ein Scribd-Unternehmen logo
1 von 27
SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig Tobias Wunner Unit for Natural Language Processing (UNLP) firstname.lastname@deri.org Wednesday,22nd June, 2011 DERI, Reading Group 1
Based On: “SOFIE: A Self-Organizing Framework for Information Extraction” Authors: Fabian Suchanek, Mauro Sozio,                       Gerhard Weikum Published: World Wide Web Conference (WWW)                      Madrid, 2009 2
Overview Introduction SOFIE Model + Rules Excursion: Satisfiability SOFIE Approach Evaluation experiments Conclusion 3
Motivation Classical IE on text   pattern-based  80pc Semistructural approach   Wikipedia infoboxes 95% Idea of Paper: combine  use text (hypotheses)  +  ontology (trusted facts) 4
Example 5 Document1 YAGO ontology familyName(AlbertEinstein, Einstein) bornIn(AlbertEinstein, Germany) attendedSchoolIn( AlbertEinstein, Germany) Einstein attended secondary school in Germany. New Knowledge
General Idea Express extraction patterns as fact Rules to understand usage of terms Add restrictions 6 patternOcc(“X went to school in Y”,Einstein, Switzerland) patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
Contribution Unified approach to Pattern matching Word Sense Disambiguation Reasoning Large Scale On Unstructured Data 7
Pattern extraction with WICs Extract patterns based on ‘interesting’ entities 8 Documents Einstein was born at Ulm in WĂŒrttemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass.  When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there
 Knowledge Base patternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1] patternOcc(“Ulm is in WĂŒrttemberg, Germany”,Ulm@D1, Germany@D1) [1] patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1] WICs (Word in Context)
Grounding Test Rules How? find an instance which satisfies the formulae 9 bornIn(Einstein,Ulm) ⇒ ¬bornIn(Einstein,Timbuktu) studiedIn(Einstein,Ulm) bornIn(X,Ulm) ⇒ ¬bornIn(X,Timbuktu) studiedIn(X,Ulm)
Rules (Hypotheses) Disambiguation disambiguatesAs(Albert@D,AlberEinstein)[?] Expresses a new fact expresses(P, livedIn(Einstein,Switzerland) )[?] New facts CityIn(Ulm,Germany)[?] 10
New fact rule ...with disambiguation 11 “Pattern P expresses  Relation R when     analysis of WICs      are disambiguated” patternOcc( P, WX, WY ) and disambiguatesAs(WX, X) and disambiguatesAs(WY, Y) and R(X,Y) ⇒  express( P, R )
Restrictions Disambiguation  disambiguation prior should influence choice of disambiguation 12 N - any disamb. function disambPrior( W, X, N ) ⇒  disambiguatedAs( W, X ) | words(D1) ∩ rel(AlbertEinstein)| | words(D1) |
Restrictions Functional restrictions 13 R(X,Y) and  type(R, function) and different(Y,Z) ⇒ ¬R(X,Z) “Albert@D1 born in?” Albert@D1 ≠ Albert@D2
SOFIE Rules Framework to test the hypotheses Question   “How to satisfy all them?”  rules      +         trusted facts 14 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒  disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒  express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒  disambiguatesAs(Albert@D1, HermannEinstein)    Country(Germany) livedIn(AlbertEinstein,Ulm)    

SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good    example:   Nk NP  Bad                     cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver 15 F = (X or Y or Z) and (ÂŹX or Y or Z)        and (ÂŹX or ÂŹY or ÂŹZ) G = (X or Y) and (ÂŹX or ÂŹY) and (X) truth table has 23 rows Details Schöning 2010
SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good    example:   Nk NP  Bad                     cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver MAX SAT 16 F = (X or Y or Z) and (ÂŹX or Y or Z)        and (ÂŹX or ÂŹY or ÂŹZ) G = (X or Y) and (ÂŹX or ÂŹY) and (X) truth table has 23 rows Details Schöning 2010
Weighted MAX SAT in SOFIE ...back to SOFIE this is MAX SAT but with weights 17 rules      +     trusted facts    Country(Germany) livedIn(AlbertEinstein,Ulm)    
 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒  disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒  express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒  disambiguatesAs(Albert@D1, HermannEinstein)
Weighted MAX SAT in SOFIE Weighted MAX SAT is NP hard only approximation algorithms  impractical to find optimal solution SAT Solver Johnson’s algorithm:    2/3  (apprx guarantee)
Weighted MAX SAT in SOFIE Functional MAX SAT Specialized reasoning (support for functional properties) Approximation guarantee 1/2 Propagates dominating unit clauses Considers only unit clauses A  v  B    [w1] A  v  B    [w2] B  v  C    [w3] C                 [w4] A  v  B     [10] A             [10] A                [30] A = true 30 > 10+10
Controlled experiment Corpus from Wikipedia infoboxes 100 articles Semantic is known! 20
Controlled experiment Large-scale: Corpus from Wikipedia articles 2000 articles 13 frequent relations from YAGO Parsing 	 = 87min         Reaoning = 77min 21
Unstructured text sources 150 news paper articles relation under test headquarterOf YAGO (modified with relation seeds) Parsing 87min     WeightedMaxSat 77min disambiguated entries (provenance) could be manually assessed 22 functional relation
Unstructured text sources Large-scale: 10 biographies for each of 400 US senators 5 relationships Disambiguation was not ideal for YAGO (13 James Watson) Parsing 7h    W-MAX-SAT  9h Results 4 good 1 bad (misleading patterns) 23
MAX SAT can’t do OWL per se (Open World Assumption) Reformulate OWL in propositional logic OWL  FOL  Skolem Normal Form  Propositional Logic Might find OWL-inconsistent ontologies due to OW Assumption 24 define a student as a subclass “attends some course” ⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y) ⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k ⇒ ÂŹattends(xi, ki) or ÂŹCourse(xi) or Student(xi); k=x1 .. xn Inferred Ontology { Student(alex), Student(bob),   Student subClassOf attends some Course,                                 attends(alex, SemanticWeb) } Details JMC 2010
Conclusions Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem Approximation algorithm with 1/2 Works and scales (large corpus + YAGO) 25
Limitations Specialized approximation algorithm Accounts for SOFIE rules NOT OWL MAX SAT Restrictions ∈ Prepositional Logic ∉ First-Order Logic Ontology population approach (can’t infer new relations) 26
References 27 F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, link John McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 link Uwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, link F Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. SaarbrĂŒcken, Germany, 2009, link

Weitere Àhnliche Inhalte

Was ist angesagt?

Gamma sag semi ti spaces in topological spaces
 Gamma sag semi ti spaces in topological spaces Gamma sag semi ti spaces in topological spaces
Gamma sag semi ti spaces in topological spacesAlexander Decker
 
11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spacesAlexander Decker
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative ClusteringToshihiro Kamishima
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Arthur Charpentier
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Note on closed sets in topological spaces
Note on    closed sets in topological spacesNote on    closed sets in topological spaces
Note on closed sets in topological spacesAlexander Decker
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.IJERA Editor
 
MarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinalMarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinalMark Drach-Meinel
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers Istiak Ahmed
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
 
CONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACESCONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACESIAEME Publication
 

Was ist angesagt? (20)

Gamma sag semi ti spaces in topological spaces
 Gamma sag semi ti spaces in topological spaces Gamma sag semi ti spaces in topological spaces
Gamma sag semi ti spaces in topological spaces
 
11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces11. gamma sag semi ti spaces in topological spaces
11. gamma sag semi ti spaces in topological spaces
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Absolute and Relative Clustering
Absolute and Relative ClusteringAbsolute and Relative Clustering
Absolute and Relative Clustering
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Note on closed sets in topological spaces
Note on    closed sets in topological spacesNote on    closed sets in topological spaces
Note on closed sets in topological spaces
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification LogicsVerification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
Verification of Data-Aware Processes at ESSLLI 2017 3/6 - Verification Logics
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.An Overview of Separation Axioms by Nearly Open Sets in Topology.
An Overview of Separation Axioms by Nearly Open Sets in Topology.
 
MarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinalMarkDrachMeinelThesisFinal
MarkDrachMeinelThesisFinal
 
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
QMC: Operator Splitting Workshop, Composite Infimal Convolutions - Zev Woodst...
 
Predicates and Quantifiers
Predicates and Quantifiers Predicates and Quantifiers
Predicates and Quantifiers
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
CONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACESCONTINUITY ON N-ARY SPACES
CONTINUITY ON N-ARY SPACES
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 

Ähnlich wie SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Cristiano Longo
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionXin-She Yang
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov ModelsMinesh A. Jethva
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paperDBOnto
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Mechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceMechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceLawrence Paulson
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFaiz Zeya
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1aGaston Liberman
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicPalGov
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015livpre
 
An Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyAn Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyMichele Thomas
 
Introduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professorIntroduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professormanrak
 
Imprecision in learning: an overview
Imprecision in learning: an overviewImprecision in learning: an overview
Imprecision in learning: an overviewSebastien Destercke
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.Anirbit Mukherjee
 
Theory of computing
Theory of computingTheory of computing
Theory of computingBipul Roy Bpl
 

Ähnlich wie SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig (20)

Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
Fol
FolFol
Fol
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Mechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choiceMechanizing set theory: cardinal arithmetic and the axiom of choice
Mechanizing set theory: cardinal arithmetic and the axiom of choice
 
FOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptxFOLBUKCFAIZ.pptx
FOLBUKCFAIZ.pptx
 
dma_ppt.pdf
dma_ppt.pdfdma_ppt.pdf
dma_ppt.pdf
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph data
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
Jarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logicJarrar.lecture notes.aai.2011s.ch7.p logic
Jarrar.lecture notes.aai.2011s.ch7.p logic
 
RuleML 2015
RuleML 2015RuleML 2015
RuleML 2015
 
An Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population StudyAn Implicit Cover Problem In Wild Population Study
An Implicit Cover Problem In Wild Population Study
 
Introduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professorIntroduction to set theory by william a r weiss professor
Introduction to set theory by william a r weiss professor
 
Imprecision in learning: an overview
Imprecision in learning: an overviewImprecision in learning: an overview
Imprecision in learning: an overview
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
 
Theory of computing
Theory of computingTheory of computing
Theory of computing
 

KĂŒrzlich hochgeladen

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

KĂŒrzlich hochgeladen (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỏ TUYỂN SINH TIáșŸNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig

  • 1. SOFIE - A Unified Approach To Ontology-Based Information Extraction Using Reasonig Tobias Wunner Unit for Natural Language Processing (UNLP) firstname.lastname@deri.org Wednesday,22nd June, 2011 DERI, Reading Group 1
  • 2. Based On: “SOFIE: A Self-Organizing Framework for Information Extraction” Authors: Fabian Suchanek, Mauro Sozio, Gerhard Weikum Published: World Wide Web Conference (WWW) Madrid, 2009 2
  • 3. Overview Introduction SOFIE Model + Rules Excursion: Satisfiability SOFIE Approach Evaluation experiments Conclusion 3
  • 4. Motivation Classical IE on text pattern-based  80pc Semistructural approach Wikipedia infoboxes 95% Idea of Paper: combine use text (hypotheses) + ontology (trusted facts) 4
  • 5. Example 5 Document1 YAGO ontology familyName(AlbertEinstein, Einstein) bornIn(AlbertEinstein, Germany) attendedSchoolIn( AlbertEinstein, Germany) Einstein attended secondary school in Germany. New Knowledge
  • 6. General Idea Express extraction patterns as fact Rules to understand usage of terms Add restrictions 6 patternOcc(“X went to school in Y”,Einstein, Switzerland) patternOcc(Pattern,X,Y) and R(X,Y) ⇒ express(Pattern,R)
  • 7. Contribution Unified approach to Pattern matching Word Sense Disambiguation Reasoning Large Scale On Unstructured Data 7
  • 8. Pattern extraction with WICs Extract patterns based on ‘interesting’ entities 8 Documents Einstein was born at Ulm in WĂŒrttemberg, Germany, on March 18, 1879. When Albert was around four, his father gave him a magnetic compass. When Albert became older, he went to a school in Switzerland. After he graduated, he got a job in the patent office there
 Knowledge Base patternOcc(“Einstein was born in Ulm”,Einstein@D1, Ulm@D1) [1] patternOcc(“Ulm is in WĂŒrttemberg, Germany”,Ulm@D1, Germany@D1) [1] patternOcc(“Albert .. Switzerland”,Albert@D1, Switzerland@D1) [1] WICs (Word in Context)
  • 9. Grounding Test Rules How? find an instance which satisfies the formulae 9 bornIn(Einstein,Ulm) ⇒ ÂŹbornIn(Einstein,Timbuktu) studiedIn(Einstein,Ulm) bornIn(X,Ulm) ⇒ ÂŹbornIn(X,Timbuktu) studiedIn(X,Ulm)
  • 10. Rules (Hypotheses) Disambiguation disambiguatesAs(Albert@D,AlberEinstein)[?] Expresses a new fact expresses(P, livedIn(Einstein,Switzerland) )[?] New facts CityIn(Ulm,Germany)[?] 10
  • 11. New fact rule ...with disambiguation 11 “Pattern P expresses Relation R when analysis of WICs are disambiguated” patternOcc( P, WX, WY ) and disambiguatesAs(WX, X) and disambiguatesAs(WY, Y) and R(X,Y) ⇒ express( P, R )
  • 12. Restrictions Disambiguation disambiguation prior should influence choice of disambiguation 12 N - any disamb. function disambPrior( W, X, N ) ⇒ disambiguatedAs( W, X ) | words(D1) ∩ rel(AlbertEinstein)| | words(D1) |
  • 13. Restrictions Functional restrictions 13 R(X,Y) and type(R, function) and different(Y,Z) ⇒ ÂŹR(X,Z) “Albert@D1 born in?” Albert@D1 ≠ Albert@D2
  • 14. SOFIE Rules Framework to test the hypotheses Question “How to satisfy all them?” rules + trusted facts 14 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒ disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒ express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒ disambiguatesAs(Albert@D1, HermannEinstein) Country(Germany) livedIn(AlbertEinstein,Ulm) 

  • 15. SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good example: Nk NP  Bad cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver 15 F = (X or Y or Z) and (ÂŹX or Y or Z) and (ÂŹX or ÂŹY or ÂŹZ) G = (X or Y) and (ÂŹX or ÂŹY) and (X) truth table has 23 rows Details Schöning 2010
  • 16. SAT / MAX SAT SAT (Satisfiability) proove formula can be TRUE Complexity Classes P  Good example: Nk NP  Bad cN e.g. naive algorithm for 100 variables  2100 x 10-10 ms per row = 4 x 1012 y Not always.. 3SAT in (4/3)N SAT Solver MAX SAT 16 F = (X or Y or Z) and (ÂŹX or Y or Z) and (ÂŹX or ÂŹY or ÂŹZ) G = (X or Y) and (ÂŹX or ÂŹY) and (X) truth table has 23 rows Details Schöning 2010
  • 17. Weighted MAX SAT in SOFIE ...back to SOFIE this is MAX SAT but with weights 17 rules + trusted facts Country(Germany) livedIn(AlbertEinstein,Ulm) 
 dismbPrior(Albert@D1, AlbertEinstein, 10) ⇒ disambiguatesAs(Albert@D1, AlbertEinstein) patternOcc( P, X, Y ) and R(X,Y) ⇒ express( P, R ) dismbPrior(Albert@D1, HermannEinstein, 3) ⇒ disambiguatesAs(Albert@D1, HermannEinstein)
  • 18. Weighted MAX SAT in SOFIE Weighted MAX SAT is NP hard only approximation algorithms  impractical to find optimal solution SAT Solver Johnson’s algorithm:  2/3 (apprx guarantee)
  • 19. Weighted MAX SAT in SOFIE Functional MAX SAT Specialized reasoning (support for functional properties) Approximation guarantee 1/2 Propagates dominating unit clauses Considers only unit clauses A v B [w1] A v B [w2] B v C [w3] C [w4] A v B [10] A [10] A [30] A = true 30 > 10+10
  • 20. Controlled experiment Corpus from Wikipedia infoboxes 100 articles Semantic is known! 20
  • 21. Controlled experiment Large-scale: Corpus from Wikipedia articles 2000 articles 13 frequent relations from YAGO Parsing = 87min Reaoning = 77min 21
  • 22. Unstructured text sources 150 news paper articles relation under test headquarterOf YAGO (modified with relation seeds) Parsing 87min WeightedMaxSat 77min disambiguated entries (provenance) could be manually assessed 22 functional relation
  • 23. Unstructured text sources Large-scale: 10 biographies for each of 400 US senators 5 relationships Disambiguation was not ideal for YAGO (13 James Watson) Parsing 7h W-MAX-SAT 9h Results 4 good 1 bad (misleading patterns) 23
  • 24. MAX SAT can’t do OWL per se (Open World Assumption) Reformulate OWL in propositional logic OWL  FOL  Skolem Normal Form  Propositional Logic Might find OWL-inconsistent ontologies due to OW Assumption 24 define a student as a subclass “attends some course” ⇒ ∀ x, ∃ y: attends(x,y), Course(y) -> Student(y) ⇒ ∀ x: attends(x,k), Course(y) -> Student(y); ∃ k ⇒ ÂŹattends(xi, ki) or ÂŹCourse(xi) or Student(xi); k=x1 .. xn Inferred Ontology { Student(alex), Student(bob), Student subClassOf attends some Course, attends(alex, SemanticWeb) } Details JMC 2010
  • 25. Conclusions Ontology-based IE (OBIE) reformulated as weighted MAX SAT problem Approximation algorithm with 1/2 Works and scales (large corpus + YAGO) 25
  • 26. Limitations Specialized approximation algorithm Accounts for SOFIE rules NOT OWL MAX SAT Restrictions ∈ Prepositional Logic ∉ First-Order Logic Ontology population approach (can’t infer new relations) 26
  • 27. References 27 F Suchanek et al, SOFIE: a self-organizing framework for information extraction, Proceeding WWW '09 Proceedings of the 18th international conference on World wide web, link John McCrae, Automatic Extraction Of Logically Consistent Ontologies From Text, PhD thesis at National Institute of Informatics, Japan, 2009 link Uwe Schöning: Das SAT-Problem. In Informatik Spektrum 33(5): 479-483, 2010, link F Suchanek, Automated Construction and Growth of a Large Ontology, PhD thesis at Technology of Saarland University. SaarbrĂŒcken, Germany, 2009, link