SlideShare a Scribd company logo
1 of 22
Download to read offline
Automated Focus Extraction for
    Question Answering over Topic Maps

     Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
2




       Context: domain portable Question
          Answering over Topic Maps
•Partly funded by the Flemish government as part of the ITEA2
 project LINDO (ITEA2-06011)
•The research towards portable domain question answering over
 Topic Maps is done within the Belgian part of the LINDO project.




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
3




                            Why Topic Maps?
      • Space industry needs a solution to the knowledge
        retention problem.
      • More structured than mind maps, less formal than
          RDF/OWL.
      • Allows to organize information in an ontological view.
      • An ISO standard.




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
4




                            Why Topic Maps?

                                                 Who is the composer of La Bohème?

                                                      Puccini




Automated Focus Extraction for Question Answering over Topic Maps        TMRA’09, Leipzig
5




         LINDO-BE General Architecture


                       Focus
                      Extractor                                                       Answer
Question                                                      Graph        Answer
                                          Anchorer
                                                             Reducer      Extractor



                     Time Exp.
                                                       Topic Map Engine
                     Extractor




Automated Focus Extraction for Question Answering over Topic Maps                       TMRA’09, Leipzig
6




         LINDO-BE General Architecture


                       Focus
                      Extractor                                                       Answer
Question                                                      Graph        Answer
                                          Anchorer
                                                             Reducer      Extractor



                     Time Exp.
                                                       Topic Map Engine
                     Extractor




Automated Focus Extraction for Question Answering over Topic Maps                       TMRA’09, Leipzig
7




                            Question Focus
Focus is the type of the answer in the question terminology

                                                 Who is the composer of La Bohème?

                                                      Puccini




Automated Focus Extraction for Question Answering over Topic Maps       TMRA’09, Leipzig
8




                                           Focus

             Asking Point (AP)                        Expected Answer Type (EAT)



“Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?”
              (explicit)                                               (implicit)

                                                                    EAT Classes:    TIME,

                                                                                    NUMERIC,

                                                                                    DEFINITION,

                                                                                    LOCATION,
Automated Focus Extraction for Question Answering over Topic Maps                   TMRA’09, Leipzig
                                                                                    HUMAN,
9




           Is it difficult to find the focus?
      •   Where was Puccini born?
                                                                                                  City
      •   What is Puccini's place of birth?
      •   What is Puccini's birthplace?




                                                                                                is a
      •   What is the birth place of Puccini?
      •   What city was Puccini born in?                                                       Lucca
                                                                                          ce
      •   What place was Puccini born in?                                           in pla
                                                                                    n
                                                                                 or
      •   Where is Puccini from?                                               b n
                                                                                   o
                                                                                rs
                                                                              pe
                                                                    Puccini




Automated Focus Extraction for Question Answering over Topic Maps                          TMRA’09, Leipzig
10




Why AP should take precedence over EAT?
                                                    “Who is the librettist of La Tilda?”

                                                    EAT         =   HUMAN        Person
                                                    AP          =   Librettist




Automated Focus Extraction for Question Answering over Topic Maps                TMRA’09, Leipzig
11




                         Precision and Recall

                       | {relevant} I {retrieved } |
                    P=
                              | {retrieved } |


                        | {relevant} I {retrieved} |
                     R=
                                | {relevant} |


Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
12




Why AP should take precedence over EAT?
                                                    “Who is the librettist of La Tilda?”

                                                    EAT         =    HUMAN        Person
                                                    AP          =    Librettist

                                                    PAP         =    57/57           =
                                                                1
                                                    PEAT        =     57/1165        =
                                                                0.049



Automated Focus Extraction for Question Answering over Topic Maps                 TMRA’09, Leipzig
13




Why AP should take precedence over EAT?
         Results over 100 annotated questions:


                               Name            Precision            Recall

                             AP                         0.311          0.30

                             EAT                        0.089          0.21




Automated Focus Extraction for Question Answering over Topic Maps             TMRA’09, Leipzig
14




                              Focus Branching




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
15




            Focus Extractor Architecture
• Supervised machine learning based on the
  principal of maximum entropy (Maxent).
• 2100 questions have been annotated:
   • 1500 from Li & Roth corpus
   • 500 from TREC-10
   • 100 asked over the Italian Opera topic map
• The corpus was split into 80% of training and
  20% testing. The evaluation was done 10 times,
  each time shuffling the training and test data.
Question                             POS             Syntactic      Lexical     Focus      Focus
                Tokenizer
                                    Tagger            Parser        Analysis   Extractor


Automated Focus Extraction for Question Answering over Topic Maps                    TMRA’09, Leipzig
16




                   Questions Annotation
     Asking Point                                   Expected Answer Type

                                               HUMAN: Who is Puccini
          O: What                              DEFINITION: What is Tosca?
         AP: opera                             LOCATION: Where did Dante die?
          O: did                               TIME: When did Puccini die?
          O: Puccini                           NUMERIC: How many characters have
          O: write                                          been killed by poisoning?
          O: ?                                 OTHER: What did Heinrich Heine write?

        AP classifier                                          EAT classifier


Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
17




                                        AP Results

           Class                  Precision                    Recall       F-Score
     AskingPoint                             0.854                  0.734        0.789
     Other                                   0.973                  0.987        0.980




Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
18




                                        EAT Results
            Class                  Precision                    Recall      F-Score
      DEFINITION                              0.887                 0.800        0.841
      LOCATION                                0.834                 0.812        0.821
      HUMAN                                   0.904                 0.753        0.820
      TIME                                    0.880                 0.802        0.838
      NUMERIC                                 0.943                 0.782        0.854
      OTHER                                   0.746                 0.893        0.812



Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
19




                                   Overall Results
       The overall results are provided as the accuracy
       of the classifier.

         Accuracy = correct instances / overall instances

                                         Value                      Std dev      Std err

   Focus (AP+EAT)                               0.827                    0.020         0.006




Automated Focus Extraction for Question Answering over Topic Maps                  TMRA’09, Leipzig
20




                         Prediction of Accuracy




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
21




                                   Conclusions
       • We achieved 82.7% accuracy for focus extraction.
       • The specificity of the focus degrades gracefully (we first try
         to extract the AP, and fall back to the EAT).
       • The focus is identified dynamically instead of relying on
         static taxonomy of question types.
       • Machine learning techniques were used throughout the
         application stack.
       • The results could be improved with more training data.
       • The whole setting is domain independent.



Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
22




                                     Questions?


                                          Thank you




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig

More Related Content

More from tmra

External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Databasetmra
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brntmra
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic mapstmra
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Databasetmra
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federationtmra
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentstmra
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Mapstmra
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapstmra
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorertmra
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuurapostertmra
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementtmra
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010tmra
 
Presentation final
Presentation finalPresentation final
Presentation finaltmra
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontologytmra
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressionstmra
 
Mappe1
Mappe1Mappe1
Mappe1tmra
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semanticstmra
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integrationtmra
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Frameworktmra
 
Hatana tmra 2010
Hatana tmra 2010Hatana tmra 2010
Hatana tmra 2010tmra
 

More from tmra (20)

External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
 
Mappe1
Mappe1Mappe1
Mappe1
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
 
Hatana tmra 2010
Hatana tmra 2010Hatana tmra 2010
Hatana tmra 2010
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Automated Focus Extraction for QA over Topic Maps

  • 1. Automated Focus Extraction for Question Answering over Topic Maps Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 2. 2 Context: domain portable Question Answering over Topic Maps •Partly funded by the Flemish government as part of the ITEA2 project LINDO (ITEA2-06011) •The research towards portable domain question answering over Topic Maps is done within the Belgian part of the LINDO project. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 3. 3 Why Topic Maps? • Space industry needs a solution to the knowledge retention problem. • More structured than mind maps, less formal than RDF/OWL. • Allows to organize information in an ontological view. • An ISO standard. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 4. 4 Why Topic Maps? Who is the composer of La Bohème? Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 5. 5 LINDO-BE General Architecture Focus Extractor Answer Question Graph Answer Anchorer Reducer Extractor Time Exp. Topic Map Engine Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 6. 6 LINDO-BE General Architecture Focus Extractor Answer Question Graph Answer Anchorer Reducer Extractor Time Exp. Topic Map Engine Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 7. 7 Question Focus Focus is the type of the answer in the question terminology Who is the composer of La Bohème? Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 8. 8 Focus Asking Point (AP) Expected Answer Type (EAT) “Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?” (explicit) (implicit) EAT Classes: TIME, NUMERIC, DEFINITION, LOCATION, Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig HUMAN,
  • 9. 9 Is it difficult to find the focus? • Where was Puccini born? City • What is Puccini's place of birth? • What is Puccini's birthplace? is a • What is the birth place of Puccini? • What city was Puccini born in? Lucca ce • What place was Puccini born in? in pla n or • Where is Puccini from? b n o rs pe Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 10. 10 Why AP should take precedence over EAT? “Who is the librettist of La Tilda?” EAT = HUMAN Person AP = Librettist Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 11. 11 Precision and Recall | {relevant} I {retrieved } | P= | {retrieved } | | {relevant} I {retrieved} | R= | {relevant} | Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 12. 12 Why AP should take precedence over EAT? “Who is the librettist of La Tilda?” EAT = HUMAN Person AP = Librettist PAP = 57/57 = 1 PEAT = 57/1165 = 0.049 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 13. 13 Why AP should take precedence over EAT? Results over 100 annotated questions: Name Precision Recall AP 0.311 0.30 EAT 0.089 0.21 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 14. 14 Focus Branching Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 15. 15 Focus Extractor Architecture • Supervised machine learning based on the principal of maximum entropy (Maxent). • 2100 questions have been annotated: • 1500 from Li & Roth corpus • 500 from TREC-10 • 100 asked over the Italian Opera topic map • The corpus was split into 80% of training and 20% testing. The evaluation was done 10 times, each time shuffling the training and test data. Question POS Syntactic Lexical Focus Focus Tokenizer Tagger Parser Analysis Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 16. 16 Questions Annotation Asking Point Expected Answer Type HUMAN: Who is Puccini O: What DEFINITION: What is Tosca? AP: opera LOCATION: Where did Dante die? O: did TIME: When did Puccini die? O: Puccini NUMERIC: How many characters have O: write been killed by poisoning? O: ? OTHER: What did Heinrich Heine write? AP classifier EAT classifier Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 17. 17 AP Results Class Precision Recall F-Score AskingPoint 0.854 0.734 0.789 Other 0.973 0.987 0.980 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 18. 18 EAT Results Class Precision Recall F-Score DEFINITION 0.887 0.800 0.841 LOCATION 0.834 0.812 0.821 HUMAN 0.904 0.753 0.820 TIME 0.880 0.802 0.838 NUMERIC 0.943 0.782 0.854 OTHER 0.746 0.893 0.812 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 19. 19 Overall Results The overall results are provided as the accuracy of the classifier. Accuracy = correct instances / overall instances Value Std dev Std err Focus (AP+EAT) 0.827 0.020 0.006 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 20. 20 Prediction of Accuracy Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 21. 21 Conclusions • We achieved 82.7% accuracy for focus extraction. • The specificity of the focus degrades gracefully (we first try to extract the AP, and fall back to the EAT). • The focus is identified dynamically instead of relying on static taxonomy of question types. • Machine learning techniques were used throughout the application stack. • The results could be improved with more training data. • The whole setting is domain independent. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 22. 22 Questions? Thank you Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig