SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Die richtige Antwort finden mit
Question
Answering
Prof. Dr. Jens Albrecht
Technische Hochschule Nürnberg
https://www.m3-konferenz.de/nlp.php#programm
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 3
Fragen über Fragen
Was ist der
Umsatz von
Amazon?
Welche Sprache
spricht man in
Afghanistan?
Was ist der Unterschied
zwischen Tensorflow und
PyTorch?
Was nervt beim
neuen iPhone?
Wer kann mir bei
Depressionen helfen?
Warum taugt die
Kamera nichts?
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 4
Suche vs. Question Answering
Wenige Schlagworte
Viele Ergebnisse
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 5
Suche vs. Question Answering
Spezifische Frage
Spezifische Antwort
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 6
Suche vs. Question Answering
Spezifische Frage
Spezifische Antwort
https://www.kryptowissen.de/enigma.html#:~:text=Im%20Jahre%201940%20kam%20der,die%20%22Turing-Bombe%22.
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 7
Closed-Domain Question Answering
› Eng abgegrenztes Gebiet
(z.B. IT-Support)
› Alternativ: Nur spezifische
Fragetypen
› Häufig über wissensbasierte
Systeme mit strukturierter
Datenbank (Ontologie,
Knowledge Graph) realisiert Strukturierte
Datenbasis
Konvertierung
in DB-Abfrage
Antwort-
generierung
Frage
Antwort
Immer korrekt
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 8
Open-Domain Question Answering
› Beliebige Fragen in beliebigem Kontext
› Beantwortung mit Hilfe unstrukturierter
Text-Dokumente
› Nutzung von Transformer-Modellen für
das Textverständnis (Machine Reading
Comprehension)
Sammlung
unstrukturierter
Textdaten
"NLP Magic"
Frage
Antwort
Extraktive QA-Systeme
› Input: Text (Kontext) + Frage
› Output: Span = Beginn und Ende der
Antwort im Text
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 9
Transfer Learning für QA
Pretrained
Base
Model
Classification
Model
Lots of text
from the Web
Classification
Data
Task: Language
Modeling
für spezifisches Problem werden
spezifische Daten benötigt
QA
Model
QA
Training Data
SQuAD: 150.000 QA-pairs
SQuAD liefert vielfach gute
Ergebnisse
Better QA
Model
Additional QA
Training Data
MLQA: 5k QA-pairs each
for 7 languages
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 10
SQuAD 2.0 (Stanford Question Answering Dataset)
https://rajpurkar.github.io/SQuAD-explorer/
SQuAD 2.0 enthält Kontrollfragen,
die zwar zum Kontext passen, aber
nicht allein mit dem Text
beantwortbar sind
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 11
MLQA: Multi-Lingual Question Answering
https://github.com/facebookresearch/MLQA
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 12
SQuAD 2.0 Leaderboard
EM (Exact Match)
› Binäre Metrik mit EM=1, wenn
Span von Ground Truth exakt
vorhergesagt, 0 sonst
F1 Score
› Harmonisches Mittel zwischen
Precision und Recall
› Berechnet anhand der
Übereinstimmung der Wörter in
Antwort und Ground Truth
https://rajpurkar.github.io/SQuAD-explorer/
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 13
SQuAD 2.0 Leaderboard
https://paperswithcode.com/sota/question-answering-on-squad20
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 14
Antwort-Extraktion
Teilaufgaben:
› Tokenisierung
› Span-Klassifikation
› Umgang mit langen
Texten (länger als
Modell erlaubt)
https://mccormickml.com/2020/03/10/question-answering-with-a-fine-tuned-BERT/
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 17
Retriever – Reader - Modell
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 18
Retriever-Reader
Question
top k
retrieved
contexts
Lots of
unstructured
text documents
Retriever
sparse or dense
Document
Store
search
relevant
docs
Reader
Bert & Co
Answer(s)
Zhu, e.a. (2021): Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering.
https://arxiv.org/abs/2101.00774
Karpukhin, e.a. (2020): Dense Passage Retrieval for Open-Domain Question Answering.
https://arxiv.org/abs/2004.04906
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 20
Zusammenfassung und Ausblick
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 21
Wofür lässt sich mit QA nutzen?
› Information-Retrieval++
› Aspekt-basierte Analysen
» Analyse der Antworten mit WordClouds, Topic Modeling, Clustering
› Unterstützung im Customer Support
› Chatbots (z.B. gefüttert mit FAQ-Dokumenten)
› Iterative Frage-Szenarien:
» Welche Firmen bauen Solar-Anlagen?
» Dann für jede Firma: Welche Technologie wird eingesetzt?
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 22
Challenges: Long-form QA
Current QA challenges
• Q: What’s the nearest restaurant?
• Q: What is the largest lake in the world?
• Q: What time is it in Tokyo right now?
Long-form QA challenges
• Q: Why are some restaurants better than
others if they serve basically the same food?
• Q: What are the differences between bodies
of water like lakes, rivers, and seas?
• Q: Why do we feel more jet lagged when
traveling east?
https://ai.facebook.com/blog/longform-qa/
Erfordert einen ganzen
Absatz als Antwort!
Extraktiv
Extraktion eines
langen Spans
Abstraktiv
Generierung einer
synthetischen Antwort
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 23
Weiterführende Links
› Natural Language Processing with Transformers, O'Reilly, März 2022, Ch. 4
https://www.oreilly.com/library/view/natural-language-processing/9781098103231/
› Schöne visuelle Einführung:
https://mccormickml.com/2020/03/10/question-answering-with-a-fine-tuned-BERT/
› Ausführlicher Überblick über aktuelle Ansätze:
https://lilianweng.github.io/lil-log/2020/10/29/open-domain-question-answering.html
› Details zu SQuAD: https://rajpurkar.github.io/mlx/qa-and-squad/
› Details zu German QuAD: https://www.deepset.ai/blog/enabling-german-neural-search-announcing-
germanquad-and-germandpr
› Zhu, e.a. (2021): Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering,
https://arxiv.org/abs/2101.00774
M3 NLP: Question Answering
Prof. Dr. Jens Albrecht, TH Nürnberg 24
Fragen?
Kontakt: jens.albrecht@th-nuernberg.de

Weitere ähnliche Inhalte

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Empfohlen (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Die richtige Antwort finden mit Question Answering

  • 1. Die richtige Antwort finden mit Question Answering Prof. Dr. Jens Albrecht Technische Hochschule Nürnberg https://www.m3-konferenz.de/nlp.php#programm
  • 2. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 3 Fragen über Fragen Was ist der Umsatz von Amazon? Welche Sprache spricht man in Afghanistan? Was ist der Unterschied zwischen Tensorflow und PyTorch? Was nervt beim neuen iPhone? Wer kann mir bei Depressionen helfen? Warum taugt die Kamera nichts?
  • 3. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 4 Suche vs. Question Answering Wenige Schlagworte Viele Ergebnisse
  • 4. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 5 Suche vs. Question Answering Spezifische Frage Spezifische Antwort
  • 5. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 6 Suche vs. Question Answering Spezifische Frage Spezifische Antwort https://www.kryptowissen.de/enigma.html#:~:text=Im%20Jahre%201940%20kam%20der,die%20%22Turing-Bombe%22.
  • 6. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 7 Closed-Domain Question Answering › Eng abgegrenztes Gebiet (z.B. IT-Support) › Alternativ: Nur spezifische Fragetypen › Häufig über wissensbasierte Systeme mit strukturierter Datenbank (Ontologie, Knowledge Graph) realisiert Strukturierte Datenbasis Konvertierung in DB-Abfrage Antwort- generierung Frage Antwort Immer korrekt
  • 7. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 8 Open-Domain Question Answering › Beliebige Fragen in beliebigem Kontext › Beantwortung mit Hilfe unstrukturierter Text-Dokumente › Nutzung von Transformer-Modellen für das Textverständnis (Machine Reading Comprehension) Sammlung unstrukturierter Textdaten "NLP Magic" Frage Antwort Extraktive QA-Systeme › Input: Text (Kontext) + Frage › Output: Span = Beginn und Ende der Antwort im Text
  • 8. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 9 Transfer Learning für QA Pretrained Base Model Classification Model Lots of text from the Web Classification Data Task: Language Modeling für spezifisches Problem werden spezifische Daten benötigt QA Model QA Training Data SQuAD: 150.000 QA-pairs SQuAD liefert vielfach gute Ergebnisse Better QA Model Additional QA Training Data MLQA: 5k QA-pairs each for 7 languages
  • 9. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 10 SQuAD 2.0 (Stanford Question Answering Dataset) https://rajpurkar.github.io/SQuAD-explorer/ SQuAD 2.0 enthält Kontrollfragen, die zwar zum Kontext passen, aber nicht allein mit dem Text beantwortbar sind
  • 10. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 11 MLQA: Multi-Lingual Question Answering https://github.com/facebookresearch/MLQA
  • 11. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 12 SQuAD 2.0 Leaderboard EM (Exact Match) › Binäre Metrik mit EM=1, wenn Span von Ground Truth exakt vorhergesagt, 0 sonst F1 Score › Harmonisches Mittel zwischen Precision und Recall › Berechnet anhand der Übereinstimmung der Wörter in Antwort und Ground Truth https://rajpurkar.github.io/SQuAD-explorer/
  • 12. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 13 SQuAD 2.0 Leaderboard https://paperswithcode.com/sota/question-answering-on-squad20
  • 13. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 14 Antwort-Extraktion Teilaufgaben: › Tokenisierung › Span-Klassifikation › Umgang mit langen Texten (länger als Modell erlaubt) https://mccormickml.com/2020/03/10/question-answering-with-a-fine-tuned-BERT/
  • 14. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 17 Retriever – Reader - Modell
  • 15. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 18 Retriever-Reader Question top k retrieved contexts Lots of unstructured text documents Retriever sparse or dense Document Store search relevant docs Reader Bert & Co Answer(s) Zhu, e.a. (2021): Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering. https://arxiv.org/abs/2101.00774 Karpukhin, e.a. (2020): Dense Passage Retrieval for Open-Domain Question Answering. https://arxiv.org/abs/2004.04906
  • 16. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 20 Zusammenfassung und Ausblick
  • 17. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 21 Wofür lässt sich mit QA nutzen? › Information-Retrieval++ › Aspekt-basierte Analysen » Analyse der Antworten mit WordClouds, Topic Modeling, Clustering › Unterstützung im Customer Support › Chatbots (z.B. gefüttert mit FAQ-Dokumenten) › Iterative Frage-Szenarien: » Welche Firmen bauen Solar-Anlagen? » Dann für jede Firma: Welche Technologie wird eingesetzt?
  • 18. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 22 Challenges: Long-form QA Current QA challenges • Q: What’s the nearest restaurant? • Q: What is the largest lake in the world? • Q: What time is it in Tokyo right now? Long-form QA challenges • Q: Why are some restaurants better than others if they serve basically the same food? • Q: What are the differences between bodies of water like lakes, rivers, and seas? • Q: Why do we feel more jet lagged when traveling east? https://ai.facebook.com/blog/longform-qa/ Erfordert einen ganzen Absatz als Antwort! Extraktiv Extraktion eines langen Spans Abstraktiv Generierung einer synthetischen Antwort
  • 19. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 23 Weiterführende Links › Natural Language Processing with Transformers, O'Reilly, März 2022, Ch. 4 https://www.oreilly.com/library/view/natural-language-processing/9781098103231/ › Schöne visuelle Einführung: https://mccormickml.com/2020/03/10/question-answering-with-a-fine-tuned-BERT/ › Ausführlicher Überblick über aktuelle Ansätze: https://lilianweng.github.io/lil-log/2020/10/29/open-domain-question-answering.html › Details zu SQuAD: https://rajpurkar.github.io/mlx/qa-and-squad/ › Details zu German QuAD: https://www.deepset.ai/blog/enabling-german-neural-search-announcing- germanquad-and-germandpr › Zhu, e.a. (2021): Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering, https://arxiv.org/abs/2101.00774
  • 20. M3 NLP: Question Answering Prof. Dr. Jens Albrecht, TH Nürnberg 24 Fragen? Kontakt: jens.albrecht@th-nuernberg.de