SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
GPT-3: Language Models are Few-Shot Learners
ALMA MATER STUDIORUM UNIVERSITY OF BOLOGNA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING – DISI
Luca Ragazzi, Giacomo Frisoni, Lorenzo Valgimigli
PhD Students, XXXVI Cycle
Department of Computer Science and Engineering – DISI
University of Bologna, Cesena, Italy
l.ragazzi@unibo.it, giacomo.frisoni@unibo.it, lorenzo.valgimigli@unibo.it
"Neural architectures: from the McCulloch-Pitts model to GPT-3" Presentation
October 29th, 2021
GPT-3: Language Models are Few-Shot Learners 2
Overview of GPT-3
• Generative Pre-trained Transformer – 3
• Developed by OpenAI in May 2020
• Largest neural network ever created
• Philosophy: the bigger, the better
What is the motivation around it?
GPT-3: Language Models are Few-Shot Learners 3
Pre-trained language models
• Current state-of-the-art models in NLP
• Trained in a semi-supervised learning with large corpora
• Both X and Y are extracted from the text without having a prior labeled dataset
• Acquire high capability for modeling the natural language (with task-agnostic architecture)
• Limitation (i): need for downstream task-specific datasets and fine-tuning
– Difficult to collect large supervised training datasets for every new task
• Limitation (ii): non-correlation with humans
– Humans do not require large supervised datasets to learn most language tasks, but only brief
directives or a tiny number of demonstrations are needed
– So, why give models a large dataset of labeled examples for every new task?
Why not try to create NLP systems to have the same fluidity and
generality as humans?
GPT-3: Language Models are Few-Shot Learners 4
Solution: more parameters and in-context learning
• Let models develop a broad set of skills and pattern recognition abilities during
pre-training and use them at inference time to adapt to the desired task rapidly
• Since in-context learning involves absorbing many skills and tasks within the
model's parameters, it is plausible that learning abilities correlate with model size.
OpenAI creates GPT-3 to show that very large unsupervised
language models trained with a lot of data can multitask to
the level of fine-tuned state-of-the-art models
GPT-3: Language Models are Few-Shot Learners 5
Model Architecture and Training Process
• The GPT-3 model architecture is the same as its GPT-2 predecessor
– Transformer-based, built using only decoder blocks (BERT opposite)
– Stronger in natural language generation (NLG), instead of creating contextual embeddings
• An auto-regressive language model
– GPT-3 is trained using next word prediction, outputting one token (wordpiece) at a time
– Differently from bidirectional models like BERT, the prediction at each step is conditioned only
on the left context (masked self-attention)
• From an architecture perspective, GPT-3 is not actually very novel!
– … So, what makes it so special and magical? It’s really big
GPT-3: Language Models are Few-Shot Learners 6
Trained Models
• More layers, wider layers, and more data to train on
– GPT-3 comes in eight sizes, ranging from 125M to 175B parameters
– GPT-3 175B (referenced by default) → 470x BERT-Large (345M), 117x GPT-2-Large (1.5B),
and 10x the previous record holder, Turing-NLG
– The largest model ever created (at the time of paper writing) w 96 attention layers, each with
96x128-dimension heads, and 3.2M batch size 😱
• “With great powers sizes comes great responsibilities costs” 🦸💰
– A single training run costs over $4.6M using a Tesla V100 cloud instance (3.14E23 required
FLOPS at 28 TFLOPS HW capacity for 355 GPU-years)
– Time is not the only enemy. GPT-3 needs 700GB memory to store FP32 parameters (4 Bytes
each), where the maximum memory in a single GPU is 48GB (Quadro RTX 8000)
– OpenAI used model parallelism on a high-bandwidth cluster (w V100 GPUs) by Microsoft
GPT-3: Language Models are Few-Shot Learners 7
Training Datasets
• Extensive training on massive unlabeled text datasets (300B tokens in total)
– Since neural networks are compressed/compiled version of the training data, the size of the
dataset should scale accordingly with the size of the model
– The author mainly use Common Crawl, a crawl of over 50B web pages (filter down for quality)
• GPT-3 has a lower data compression ratio than GPT-2
– 300/175=1.71 (GPT-3) vs 10/1.5=6.67… This raises the question: “Is it only a big memory?”
570GB of compressed plaintext (45TB before filtering)
Note: GPT-2 was trained on 40GB of Internet text (10B tokens)
Bug to ignore
some overlaps
between dev and
test sets, but
costs made re-
training unfeasible
GPT-3: Language Models are Few-Shot Learners 8
Zero-, one-, few-shot vs fine-tuning
• GPT-3 can perform specific tasks without any special tuning 🧙 🔮
– Most other pre-trained language models require an elaborate fine-tuning (also with
architectural changes) on thousands of samples to perform well on downstream tasks
– GPT-3 doesn’t need a fine-tuning step and directly uses a single pre-trained model for
all downstream tasks (plug-and-play 🔌), demonstrating even superior performance
• Three different evaluation settings focused on task-agnostic performance,
which allows zero, one, or a few examples to be prefixed to the input model
Fine-tuning(repeated gradient
updates using a large corpus of
example tasks) → postponed
(i) Zero-shot
(ii) One-shot
(iii) Few-shot
Contextto better
inform the model
aboutwhatit is
expected to do
"If I were to see this
text somewhere on
the Web, what will
be the most likely
next word?"
GPT-3: Language Models are Few-Shot Learners 9
Results - i
• Different sizes of GPT-3 were tested using
different benchmarks in various tasks (e.g.,
question answering, translation, summarization,
etc.) to study the generalization ability of such
large models. In particular, it was evaluated in
three contexts:
• zero-shot learning
• one-shot learning
• few-shot learning.
• Every time the raising of
parameters made
generalization
capabilities appear
GPT-3: Language Models are Few-Shot Learners 10
Results - ii
• GPT-3, the giant version, was compared to the SOTA solutions in different
datasets.
• LAMBADA, StoryCloze,HellaSwag,TriviaQA,BLUE...
• In many cases, it reaches and outperforms the previous SOTA, which are
neural models fine-tuned on the dataset.
GPT-3: Language Models are Few-Shot Learners 11
Limits
• Despite the great improvement of GPT-3, it still has notable weakness, it has
difficulty with common sense physics and in long text generation.
• Lose of coherence
• Contradiction
• Useless semantical repetition
• Large language model are not grounded in other domains of experience as video
or real-world physic interaction, lacking a large amount of context.
• GPT-3 works better after pretrain, it is still far from human level.
• Humans show strong zero-shot capabilities
• By now, it is impossible to say how GPT-3 learns during train
• It is even more complex understand what it learns in inference time
• Does it learn the new task from scratch? Does it reshape similar task it
learned?
• Finally, it shares some limitations common to most deep learning models
• The knowledge learned is no interpretable
• It requires large resources and time for the train
• It is strongly affected by biases in the data
GPT-3: Language Models are Few-Shot Learners 12
Ethical Concerns - i
• GPT-3 can be misused in dangerous
situations
• fake news generation, phishing,
fraudulent academic essay
• It is affected by biases in data on
different topics
• Gender
• Race
• Religion
GPT-3: Language Models are Few-Shot Learners 13
Ethical Concerns - ii
• The energy consumption of this large model is a problem that needs to be
underlined.
– GPT-3 consumed several thousand petaflop/s-day during pre-train while
GPT-2 tens petaflop/s-day.
• It is important to consider how the
resources are amortized during the
lifecycle of the model
• It consumes significant resources during
train, but it is surprisingly efficient once
trained.
• GPT-3, full version, can generate 100
pages of content from a trained model at
the cost of about 0.4 kW/hr.
14
GPT-3: Language Models are Few-Shot Learners
Thanks for the attention
(is all you need)

Weitere ähnliche Inhalte

Was ist angesagt?

Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaMichal Jaskolski
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Chat GPT TEL Community of Practice
Chat GPT TEL Community of PracticeChat GPT TEL Community of Practice
Chat GPT TEL Community of PracticePeter Windle
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?Bernard Marr
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
ChatGPT vs. GPT-3.pdf
ChatGPT vs. GPT-3.pdfChatGPT vs. GPT-3.pdf
ChatGPT vs. GPT-3.pdfAddepto
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdfChaoYang81
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 

Was ist angesagt? (20)

Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
Prompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowaniaPrompting is an art / Sztuka promptowania
Prompting is an art / Sztuka promptowania
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
Chat GPT TEL Community of Practice
Chat GPT TEL Community of PracticeChat GPT TEL Community of Practice
Chat GPT TEL Community of Practice
 
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
What Is GPT-3 And Why Is It Revolutionizing Artificial Intelligence?
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
CHATGPT.pptx
CHATGPT.pptxCHATGPT.pptx
CHATGPT.pptx
 
OpenAI Chatgpt.pptx
OpenAI Chatgpt.pptxOpenAI Chatgpt.pptx
OpenAI Chatgpt.pptx
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
ChatGPT vs. GPT-3.pdf
ChatGPT vs. GPT-3.pdfChatGPT vs. GPT-3.pdf
ChatGPT vs. GPT-3.pdf
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
LLMs_talk_March23.pdf
LLMs_talk_March23.pdfLLMs_talk_March23.pdf
LLMs_talk_March23.pdf
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Ähnlich wie GPT-3 Few-Shot Learners

Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingGeorgeMikros3
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
 
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxTrustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxsylvioneto11
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindsporeijdms
 
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdf
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdfArtificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdf
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdfArtificial Intelligence Board of America
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis fmaru kindeneh
 
Allganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PETAllganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PET민우 박
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...rahul_net
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Sharmila Sathish
 
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняLviv Startup Club
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfSonal Tiwari
 
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfGDG Bujumbura
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMsJim Steele
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Jameson Toole
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 

Ähnlich wie GPT-3 Few-Shot Learners (20)

NLP in 2020
NLP in 2020NLP in 2020
NLP in 2020
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Writing Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric ProfilingWriting Machines: Detection and Stylometric Profiling
Writing Machines: Detection and Stylometric Profiling
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptxTrustworthy Generative AI_ ICML'23 Tutorial.pptx
Trustworthy Generative AI_ ICML'23 Tutorial.pptx
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdf
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdfArtificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdf
Artificial Intelligence Innovation The Future With OpenAI GPT-3 ARTiBA.pdf
 
chalenges and apportunity of deep learning for big data analysis f
 chalenges and apportunity of deep learning for big data analysis f chalenges and apportunity of deep learning for big data analysis f
chalenges and apportunity of deep learning for big data analysis f
 
Allganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PETAllganize AI seminar - GPT3 and PET
Allganize AI seminar - GPT3 and PET
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
 
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdfLLMs for the “GPU-Poor” - Franck Nijimbere.pdf
LLMs for the “GPU-Poor” - Franck Nijimbere.pdf
 
Customizing LLMs
Customizing LLMsCustomizing LLMs
Customizing LLMs
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.Creating smaller, faster, production-ready mobile machine learning models.
Creating smaller, faster, production-ready mobile machine learning models.
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 

Kürzlich hochgeladen

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Kürzlich hochgeladen (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

GPT-3 Few-Shot Learners

  • 1. GPT-3: Language Models are Few-Shot Learners ALMA MATER STUDIORUM UNIVERSITY OF BOLOGNA DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING – DISI Luca Ragazzi, Giacomo Frisoni, Lorenzo Valgimigli PhD Students, XXXVI Cycle Department of Computer Science and Engineering – DISI University of Bologna, Cesena, Italy l.ragazzi@unibo.it, giacomo.frisoni@unibo.it, lorenzo.valgimigli@unibo.it "Neural architectures: from the McCulloch-Pitts model to GPT-3" Presentation October 29th, 2021
  • 2. GPT-3: Language Models are Few-Shot Learners 2 Overview of GPT-3 • Generative Pre-trained Transformer – 3 • Developed by OpenAI in May 2020 • Largest neural network ever created • Philosophy: the bigger, the better What is the motivation around it?
  • 3. GPT-3: Language Models are Few-Shot Learners 3 Pre-trained language models • Current state-of-the-art models in NLP • Trained in a semi-supervised learning with large corpora • Both X and Y are extracted from the text without having a prior labeled dataset • Acquire high capability for modeling the natural language (with task-agnostic architecture) • Limitation (i): need for downstream task-specific datasets and fine-tuning – Difficult to collect large supervised training datasets for every new task • Limitation (ii): non-correlation with humans – Humans do not require large supervised datasets to learn most language tasks, but only brief directives or a tiny number of demonstrations are needed – So, why give models a large dataset of labeled examples for every new task? Why not try to create NLP systems to have the same fluidity and generality as humans?
  • 4. GPT-3: Language Models are Few-Shot Learners 4 Solution: more parameters and in-context learning • Let models develop a broad set of skills and pattern recognition abilities during pre-training and use them at inference time to adapt to the desired task rapidly • Since in-context learning involves absorbing many skills and tasks within the model's parameters, it is plausible that learning abilities correlate with model size. OpenAI creates GPT-3 to show that very large unsupervised language models trained with a lot of data can multitask to the level of fine-tuned state-of-the-art models
  • 5. GPT-3: Language Models are Few-Shot Learners 5 Model Architecture and Training Process • The GPT-3 model architecture is the same as its GPT-2 predecessor – Transformer-based, built using only decoder blocks (BERT opposite) – Stronger in natural language generation (NLG), instead of creating contextual embeddings • An auto-regressive language model – GPT-3 is trained using next word prediction, outputting one token (wordpiece) at a time – Differently from bidirectional models like BERT, the prediction at each step is conditioned only on the left context (masked self-attention) • From an architecture perspective, GPT-3 is not actually very novel! – … So, what makes it so special and magical? It’s really big
  • 6. GPT-3: Language Models are Few-Shot Learners 6 Trained Models • More layers, wider layers, and more data to train on – GPT-3 comes in eight sizes, ranging from 125M to 175B parameters – GPT-3 175B (referenced by default) → 470x BERT-Large (345M), 117x GPT-2-Large (1.5B), and 10x the previous record holder, Turing-NLG – The largest model ever created (at the time of paper writing) w 96 attention layers, each with 96x128-dimension heads, and 3.2M batch size 😱 • “With great powers sizes comes great responsibilities costs” 🦸💰 – A single training run costs over $4.6M using a Tesla V100 cloud instance (3.14E23 required FLOPS at 28 TFLOPS HW capacity for 355 GPU-years) – Time is not the only enemy. GPT-3 needs 700GB memory to store FP32 parameters (4 Bytes each), where the maximum memory in a single GPU is 48GB (Quadro RTX 8000) – OpenAI used model parallelism on a high-bandwidth cluster (w V100 GPUs) by Microsoft
  • 7. GPT-3: Language Models are Few-Shot Learners 7 Training Datasets • Extensive training on massive unlabeled text datasets (300B tokens in total) – Since neural networks are compressed/compiled version of the training data, the size of the dataset should scale accordingly with the size of the model – The author mainly use Common Crawl, a crawl of over 50B web pages (filter down for quality) • GPT-3 has a lower data compression ratio than GPT-2 – 300/175=1.71 (GPT-3) vs 10/1.5=6.67… This raises the question: “Is it only a big memory?” 570GB of compressed plaintext (45TB before filtering) Note: GPT-2 was trained on 40GB of Internet text (10B tokens) Bug to ignore some overlaps between dev and test sets, but costs made re- training unfeasible
  • 8. GPT-3: Language Models are Few-Shot Learners 8 Zero-, one-, few-shot vs fine-tuning • GPT-3 can perform specific tasks without any special tuning 🧙 🔮 – Most other pre-trained language models require an elaborate fine-tuning (also with architectural changes) on thousands of samples to perform well on downstream tasks – GPT-3 doesn’t need a fine-tuning step and directly uses a single pre-trained model for all downstream tasks (plug-and-play 🔌), demonstrating even superior performance • Three different evaluation settings focused on task-agnostic performance, which allows zero, one, or a few examples to be prefixed to the input model Fine-tuning(repeated gradient updates using a large corpus of example tasks) → postponed (i) Zero-shot (ii) One-shot (iii) Few-shot Contextto better inform the model aboutwhatit is expected to do "If I were to see this text somewhere on the Web, what will be the most likely next word?"
  • 9. GPT-3: Language Models are Few-Shot Learners 9 Results - i • Different sizes of GPT-3 were tested using different benchmarks in various tasks (e.g., question answering, translation, summarization, etc.) to study the generalization ability of such large models. In particular, it was evaluated in three contexts: • zero-shot learning • one-shot learning • few-shot learning. • Every time the raising of parameters made generalization capabilities appear
  • 10. GPT-3: Language Models are Few-Shot Learners 10 Results - ii • GPT-3, the giant version, was compared to the SOTA solutions in different datasets. • LAMBADA, StoryCloze,HellaSwag,TriviaQA,BLUE... • In many cases, it reaches and outperforms the previous SOTA, which are neural models fine-tuned on the dataset.
  • 11. GPT-3: Language Models are Few-Shot Learners 11 Limits • Despite the great improvement of GPT-3, it still has notable weakness, it has difficulty with common sense physics and in long text generation. • Lose of coherence • Contradiction • Useless semantical repetition • Large language model are not grounded in other domains of experience as video or real-world physic interaction, lacking a large amount of context. • GPT-3 works better after pretrain, it is still far from human level. • Humans show strong zero-shot capabilities • By now, it is impossible to say how GPT-3 learns during train • It is even more complex understand what it learns in inference time • Does it learn the new task from scratch? Does it reshape similar task it learned? • Finally, it shares some limitations common to most deep learning models • The knowledge learned is no interpretable • It requires large resources and time for the train • It is strongly affected by biases in the data
  • 12. GPT-3: Language Models are Few-Shot Learners 12 Ethical Concerns - i • GPT-3 can be misused in dangerous situations • fake news generation, phishing, fraudulent academic essay • It is affected by biases in data on different topics • Gender • Race • Religion
  • 13. GPT-3: Language Models are Few-Shot Learners 13 Ethical Concerns - ii • The energy consumption of this large model is a problem that needs to be underlined. – GPT-3 consumed several thousand petaflop/s-day during pre-train while GPT-2 tens petaflop/s-day. • It is important to consider how the resources are amortized during the lifecycle of the model • It consumes significant resources during train, but it is surprisingly efficient once trained. • GPT-3, full version, can generate 100 pages of content from a trained model at the cost of about 0.4 kW/hr.
  • 14. 14 GPT-3: Language Models are Few-Shot Learners Thanks for the attention (is all you need)