Quantifying Subjective Phenomena: Simulation, Detection, and Synthetic Training Data with LLMs

Fabian Haak, Björn Engelmann
CIR @ CAIML31
Jul 2, 2024
Quantifying
Subjective
Phenomena
Simulation, Quantiﬁcation,
and Synthetic Training Data
with LLMs
TH Köln

CIR @ CAIML31
Jul 2, 2024
This Presentation
● Introduction: Data and Subjectivity
● Research: How we use LLMs to generate
synthetic training data

CIR @ CAIML31
Jul 2, 2024
What, if we
do not have data
or
can not easily label data?

CIR @ CAIML31
Jul 2, 2024
THE DATA PROBLEM

CIR @ CAIML31
Jul 2, 2024
Example:
Spam Mail Detection

CIR @ CAIML31
Jul 2, 2024
Data Model
Classiﬁcation,
Generation, …
Mails
● Private Communication
● Business Communication
● Automated Notiﬁcations
● Password resets
● Phishing
● “Spam”
● …
Features
● Text length
● Punctuation
● Sender mail address
● Orthographic
correctness
● Lexical features
● …
Spam?
● Dichotomous Label
● Multi-Class Categorical
Labels
● Probabilities

CIR @ CAIML31
Jul 2, 2024
To: fabian_haak@th-koeln.de ✉
From: Schwiegermutter61@gmail.com
Betreff: Aw: Aw: Aw: Das lustige BärchenVideo
Hallo Fabian!!!!
Du hast mir immer noch nicht auf das lustige BärchenVideo
geantwortet, das ich dir gestern geschickt hab.
Hier ist nochmal eines, ist das nicht super lustig?
https://www.youtube.com/watch?v=abUxMmcXoNo
PS: Ihr kommt heute doch zum Essen vorbei??!
Bussi

CIR @ CAIML31
Jul 2, 2024
Betreff: Aw: Aw: Aw: Das lustige Bärchenwideo
Hallo Fabian!!!!
Du hast mir immer noch nicht auf das lustige Bärchenwideo
geantwortet, das ich dir gestern Geschickt hab.
Bussi
SPAM?
Private Communcation?

CIR @ CAIML31
Jul 2, 2024
Betreff: Aw: Aw: Aw: Das lustige Bärchenwideo
Hallo Fabian!!!!
Du hast mir immer noch nicht auf das lustige Bärchenwideo
geantwortet, das ich dir gestern Geschickt hab.
Bussi
SPAM?
Both?

CIR @ CAIML31
Jul 2, 2024
Problem:
SUBJECTIVITY

CIR @ CAIML31
Jul 2, 2024
Subjective Phenomena
● Usually involves user-labeled data
● Judment highly subjective and personal
● Further: temporal and local differences
Bias
Text
Simplicity
Relevance Profanity
Sentiment Morality
Beauty of
Poem/Art
…

CIR @ CAIML31
Jul 2, 2024
Data/Object
Model/
Annotator
Modality
● Text
○ Sentence
○ Paragraph
○ Poem
● Table
● Image
● …
Subjective
Quantiﬁcation
User Context
● Demographics
● Personal experiences
and morality
● Cultural/ethnic aspects
● Knowledge
● Language expertise
● …
Object properties
● Semantic aspects
● Linguistic aspects
● Image features
● …
Model Context
● Interaction language
● Prompt
● Model properties
● …
Phenomena
● Profanity
● Simplicity
● Beauty
● …
Subjective
Deﬁnition

CIR @ CAIML31
Jul 2, 2024
Data
Features
Model
What, if something is missing?

CIR @ CAIML31
Jul 2, 2024
BATS
&
ARTS
Evaluating
Text Readability
and Simplicity

CIR @ CAIML31
Jul 2, 2024
CLEF23 Simpletext:
Simpliﬁcation of scientiﬁc texts
for non-experts.

CIR @ CAIML31
Jul 2, 2024
Current Evaluation Challenges
● Limitations of current evaluation approaches
○ too ignorant: Flesch-Kincaid and similar metrics evaluate
basic readability
○ reference-based: SARI, BLEU etc. assess similarity to an
optimal simpliﬁcation reference
● Lack of Explainability
● Not taking domain & target audience into account
● Lack of Datasets

CIR @ CAIML31
Jul 2, 2024
BATS

CIR @ CAIML31
Jul 2, 2024
BenchmArking Text Simplicity:
How to evaluate
simplicity?

CIR @ CAIML31
Jul 2, 2024

CIR @ CAIML31
Jul 2, 2024
From Features to Snorkel Labels
Features Interpretations
● few unique entities
● few long words
● few words per sentence
● few negations
● low depth of the
syntactic tree
● …
37 135 1249
Parametrizations
○ entities/sentence
○ entities/text
○ entities/tokens
○ entities/sentence
■ < 5
■ < 4
■ < 3
■ < 2
■ < 1

CIR @ CAIML31
Jul 2, 2024
Which dataset-, target audience-, and domain-speciﬁc
characteristics can be found regarding simplicity?

CIR @ CAIML31
Jul 2, 2024
Which dataset-, target audience-, and domain-speciﬁc
characteristics can be found regarding simplicity?
Available datasets are
inconsisent

CIR @ CAIML31
Jul 2, 2024
Data
Features
Model
RE: What, if something is missing?

CIR @ CAIML31
Jul 2, 2024
We need more, better datasets!
(faster & cheaper!)

CIR @ CAIML31
Jul 2, 2024
ARTS

CIR @ CAIML31
Jul 2, 2024
Assessing Readability &
Text Simplicity
1. Overcoming subjectivity
2. LLM-labeled synthetic data

CIR @ CAIML31
Jul 2, 2024
Hard:
How simple is this text on a scale
from 0 to 100?

CIR @ CAIML31
Jul 2, 2024
Easier:
Which of these two texts is
simpler?

CIR @ CAIML31
Jul 2, 2024
Generating Labeled Data with ARTS
1. Pairwise comparison
2. Apply Elo-algorithm
3. Derive ranking
4. Apply simplicity/readability scores

CIR @ CAIML31
Jul 2, 2024
Elo Algorithm

CIR @ CAIML31
Jul 2, 2024
Back to BATS…

CIR @ CAIML31
Jul 2, 2024
ARTS3000
BATS
embeddings
Test: ARTS94
RF/GB Model

CIR @ CAIML31
Jul 2, 2024
SYNTHETIC DATA

CIR @ CAIML31
Jul 2, 2024
Idea:
Simulate user decisions with
Large Language Models.

CIR @ CAIML31
Jul 2, 2024
Example: User Aspect Simulation
Seite 38
Motoki et al. (2024).
More human than human:
measuring ChatGPT
political bias.
https://doi.org/10.1007/s11127
-023-01097-2

CIR @ CAIML31
Jul 2, 2024
Small Study:
Simulating user groups with
GPT personas

CIR @ CAIML31
Jul 2, 2024
For 1000 German news headlines:
“How agreeable/biased/correct/offensive is the
following news headline on a scale from 1 to 10?
Answer as a typical CDU/SPD/Grüne/AFD/Linkspartei voter”

CIR @ CAIML31
Jul 2, 2024
How agreeable is the news headline?

CIR @ CAIML31
Jul 2, 2024
Headline CDU SPD Grüne AFD Linkspartei
Apple liefert Ersatzteile nun an Kunden 9 9 9 9 9
Verbietet den Test von Weltraumwummen! 5 7 9 3 8
"Ungeimpfte gefährden uns alle" 9 9 9 2 8
How agreeable is the news headline?

CIR @ CAIML31
Jul 2, 2024
Agreeability:
avg.
absolute
difference

CIR @ CAIML31
Jul 2, 2024
Limitation: Unintended Biases
Seite 44
Gupta et al. (2023).
Bias Runs Deep: Implicit
Reasoning Biases in
Persona-Assigned LLMs.
http://arxiv.org/abs/2311.04892

CIR @ CAIML31
Jul 2, 2024
Synthetic
Training Data
Small Model
Classification,
Generation, …
Classification,
Generation, …
LLM LLM Personas
Unlabelled
Data
Expert
Knowledge
Labelled
Data
Typical Workflow of Using LLMs for
Synthetic Data

CIR @ CAIML31
Jul 2, 2024
When and Why Synthetic Data
● Rare Events and Edge Cases
● Unavailable Users/Data
● Subjective Phenomena
● Accelerated Development
● Cost
● But: Biases & Validation

CIR @ CAIML31
Jul 2, 2024
Synthetic
Training Data
Small Model
Classiﬁcation,
Generation, …
Classiﬁcation,
Generation, …
LLM LLM Personas
Unlabelled
Data
Expert
Knowledge
Labelled
Data
Questions?
Both?

Quantifying Subjective Phenomena: Simulation, Detection, and Synthetic Training Data with LLMs

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Empfohlen

Quantifying Subjective Phenomena: Simulation, Detection, and Synthetic Training Data with LLMs