SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Minerva:
Solving Quantitative Reasoning
Problems with Language Models
MINERVA
1
Google Research
조해창(발표자), 박희수, 허정원
CONTENTS
Quantitative Reasoning Problems
Minerva - Overview
Prompt
Majority voting
Conclusions
MINERVA
2
Quantitative Reasoning Problems
3
• 정량적 추론문제
• 수학문제, 과학문제
• 수치적인 연산이 포함되고, 정답이 정해져있는 문제
• Datasets:
• Math Word Problem (MWP)
• Grade School Math (GSM8k)
• Massive Multitask Language Understanding (MMLU)
Modern model
4
Quantitative Reasoning Problems
5
Quantitative Reasoning Problems
6
• GPT, BERT로 접근.
• 언어모델이 양적 정보(숫자 등)를 잘 학습하지 못함.
• 외부 도구에 의존해야 할 수 있음.
Quantitative Reasoning Problems
7
Quantitative Reasoning Problems
8
한 변의 길이가 24cm인 정육각
형과 둘레가 같은 정팔각형이 있
습니다. 이 정팔각형의 한 변의
길이는 몇 cm 인지 구하시오.
a = 24
b = 6
c = 8
y = a * b // c
print(y)
18
인공지능
모델
<서술형 수학문제> <풀이 과정> <해답>
외부 도구
Minerva
• 학습 데이터:
• Math Web Pages
• arXiv
• General Natural Language Data
• 학습 방법
• few-shot prompting
• chain of thought & scratchpad prompting
• majority voting – nucleus sampling
• based on the PaLM general language models
9
Minerva
10
Prompt
• 다음 대사를 상기시켜주기 위한 힌트
11
user@host:/home$ _
Thought of chain
12
Scratchpad Prompt
13
Minerva Prompt
14
Majority voting
• 256개 이상의 output sampling.
• 나올 수 있는 정답이 제한되지 않기 때문에 유효함.
15
Nucleus sampling
Majority voting
• 256개 이상의 output sampling.
• 나올 수 있는 정답이 제한되지 않기 때문에 유효함.
16
Nucleus sampling
17
Conclusions
18
Conclusions
19
• 고품질 데이터 셋을 확보함으로 외부 도구의 도움 없이 정량적
문제를 효과적으로 해결할 수 있다.
• 풀이과정까지 맞았는지 확인할 방법은 없다.
• 코드 생성 모델과 결합할 경우 좋은 성능을 낼 여지가 있다.
THANK YOU
MINERVA
20

Weitere ähnliche Inhalte

Mehr von taeseon ryu

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper reviewtaeseon ryu
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...taeseon ryu
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdftaeseon ryu
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferencestaeseon ryu
 

Mehr von taeseon ryu (20)

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper review
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferences
 

Minerva - Solving Quantitative Reasoning Problems with Language Models