Abstract: Large Language Models receive a lot of attention in the media these days. We have all experienced that generative language models of the GPT family are very fluent and can convincingly answer complex questions. But they also have their limitations and pitfalls. In this presentation I will introduce Transformer-based language models, explain the relation between BERT, GPT, and the 130 thousand other models available on https://huggingface.co. I will discuss their use and applications and why they are so powerful. Then I will point out challenges and pitfalls of Large Language Models and the consequences for our daily work and education.
2. Today’stalk
Large Language Models
BERT
Huggingface
Generative Pretrained Transformers (GPT)
Challenges and problems
Consequences for work and education
Suzan Verberne 2023
2
4. LargeLanguage
Models Transformers: Attention is all you need (2017)
Designed for sequence-to-sequence (i.e. translation)
Encoder-decoder architecture
Suzan Verberne 2023
4
Explanation of this paper: https://www.youtube.com/watch?v=iDulhoQ2pro
How it all started…
5. LargeLanguage
Models
Transformers are powerful because of
the long-distance relation between all words (attention)
parallel processing instead of sequential
unsupervised pre-training on HUGE amount of data
Suzan Verberne 2023
5
6. LargeLanguage
Models BERT (Bidirectional Encoder Representations from
Transformers)
An encoder-only transformer
Input is text, output is embeddings
Suzan Verberne 2023
6
Next…
7. Some
linguistics…
BERT is based on the distributional hypothesis
The context of a word defines its meaning
Words that occur in similar contexts tend to be similar
Suzan Verberne 2023
Harris, Z. (1954). “Distributional structure”. Word. 10 (23): 146–162
8. Word
Embeddings
BERT embeddings are learned from unlabelled data
Through a process called ‘masked language modelling’
with self-supervision
Suzan Verberne 2023
9. BERT
BERT is so powerful because
it is used in a transfer
learning setting
Pre-training: learning
embeddings from huge
unlabeled data (self-
supervised)
Fine-tuning: learning
the classification model
from smaller labeled
data (supervised) for
any NLP task (e.g.
sentiment, named
entities)
Suzan Verberne 2023
9
10. Huggingface
But also because:
The authors (from Google) open-sourced the model
implementation
And publicly release pretrained models (which are
computationally expensive to pretrain from scratch)
https://huggingface.co/ is a the standard
implementation package for training and applying
Transformer models
Currently over 150k models have been published on
Huggingface
Suzan Verberne 2023
10
13. Huggingface
Working with Huggingface
Take a pre-trained model
Run ‘zero-shot’:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
output=sentiment_pipeline(data)
print(output)
[{'label': 'POSITIVE', 'score': 0.9998656511306763},
{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
Or fine-tune on your own data
Suzan Verberne 2023
13
Default model: distilbert-base-uncased-finetuned-sst-2-english
15. GPT GPT is a decoder-only transformer model
It does not have an encoder
Instead: use the prompt to generate outputs
A growing family of models since 2018: GPT-2,
DialoGPT, GPT-3, GPT3.5, ChatGPT, GPT-4
Suzan Verberne 2023
15
16. GPT-3
GPT is trained to generate the most probable/plausible
text
Trained on crawled internet data, open source books,
Wikipedia, sampled early 2022
After each word, predict the most probable next word
given all the previous words
It will give you fluent text that looks very real
Suzan Verberne 2023
16
17. Few-shot
learning
Few-shot learning: learn from a small number of examples
Suzan Verberne 2023
17
'Old paradigm'
• pre-training
• fine-tuning with ~100s-1000s
training samples
'New paradigm'
• pre-training
• prompting with ~3-50
examples in the prompt
20. WhyareLLMs
so powerful?
Because they are HUGE (many parameters)
And trained on HUGE data
Suzan Verberne 2023
20
https://huggingface.co/blog/large-language-models
22. Challengesand
problems
Computational power
Environmental footprint
Heavy GPU computing required for training models
Lengthy texts are challenging
Low resource languages
Low resource domains
Closed models (‘OpenAI’) vs open source models
Suzan Verberne 2023
22
https://lessen-project.nl/ Together, the project partners will
develop, implement and evaluate state-of-the-art safe and
transparent chat-based conversational AI agents based on
state-of-the-art neural architectures. The focus is on lesser
resourced tasks, domains, and scenarios.
27. Challengesand
problems
Search engines allow us to verify the source of the information
Interfaces to generative language models should do the same
Suzan Verberne 2023
27
29. Consequences
forworkand
education
29
Do not replace humans, but assist them to do
their work better
When the boring part of the work is done by
computational models, the human can do the
interesting part
(think about graphic designers using
generative models for creating images)
Suzan Verberne 2023
30. Consequences
forworkand
education
Computational methods can help humans (students)
Search engines
Spelling correction
Grammarly
… Generative language models?
New regulations
We have to stress the importance of sources
and of writing your own texts (and code!)
and carefully pick our homework assignments
Suzan Verberne 2023
30
31. Research
opportunities
Use generative models to
develop tools
(e.g. QA-systems, chatbots, summarizers)
generate training data1
The prompting can be engineered to be more effective
study linguistic phenomena
which errors does the model make?
study social phenomena
simulate communication (opinionated /political content)2
Suzan Verberne 2023
31
1. https://github.com/arian-askari/ChatGPT-RetrievalQA
2. Chris Congleton, Peter van der Putten, and Suzan Verberne. Tracing Political Positioning of Dutch
Newspapers. In: Disinformation in Open Online Media. MISDOOM 2022.
32. Final
recommendations
Listen to the interview with Emily Bender
Suzan Verberne 2023
32
Find me: https://duckduckgo.com/?t=ffab&q=suzan+verberne&ia=web