Constrained text generation to measure reading performance:
A new approach based on constraint programming
JOINT WORK WITH ALEXANDRE BONLARRON(1,3), AURÉLIE CALABRÈSE(2), PIERRE KORNPROBST(1).
1
Jean-Charles Régin(3)
(1) Université Côte d’Azur, Inria, France
(2) Aix Marseille Université, CNRS, LPC, Marseille, France
(3) Université Côte d’Azur, I3S, France
• Standardized text: sentences read at the same speed
• Usability: to assess reading performance
Standardized Text
(Mansfield et al., 1993)
3
Constrained text generation
This is a rules (constraints) dominated problem
MNREAD Chart
=
4
MNREAD Rules
Display Rules
Lexical Rules
Grammatical Rules
Length Rules
E.g. Entering the rectangle
E.g. 3000 words from CE2 textbooks
E.g. No Punctuation
E.g. 60 characters, between 9 and 15 words
Example of a MNREAD sentence
There are 38 MNREAD sentences in French
5
Are there enough sentences?
No, a few thousand sentences are needed to detect and
monitor visual pathology throughout life.
Is it really difficult to have more sentences that
respect the rules ?
Questions :
There are 38 MNREAD phrases in French.
6
Naive method
Search for MNREAD type sentences in a corpus.
Problem : This method does not scale up
Solution : We have to generate them, but how ?
Display
Lexicon
Grammar
Length
1 000 000 sentences
10 000 sentences
6 sentences
2300 books 10 000 000 sentences 3 sentences
7
How to generate standardized sentences?
8
LLM-based approach (GPT, BERT) + SEARCH := good text quality, but unlikely to find in an
instance that satisfies the constraints.
Generates word-by-word sentences and selects the next word as the most suitable (use token
instead of words)
prompt (ChatGPT 3.5): give me a sentence of sixty characters with spaces included
“Elephants march majestically through the savannah at sunset, their presence captivating”
prompt (ChatGPT 3.5): give me a sentence of sixty characters
“The cat sat on the mat and purred softly”
How to generate standardized sentences?
• One "good" sentence out of 8000 (only); remembering bias
• A semi-automatic method for the English language
• Non-trivial extension to Latin languages!
mon ami est beau; mon amie est belle
Ad hoc method: a recent method proposed by the creators of MNREAD and based on hand-
defined models (Mansfield et al., 2019).
9
How to generate standardized sentences?
n-gram based methods (Papadopoulos et al., 2015)
Corpus n-gram Génération
When using with a random walk it produces sentences in the style of an author
Problems:
How to integrate constraints?
How to manage the meaning of the sentences?
• A generalization of Binary Decision Diagrams (BDD)
• Each layer represents a variable
• Each path between root and tt is a valid assignment of the variables
• An MDD models all tuples that satisfy a constraint
Multi-Valued Decision Diagram (MDD)
11
MDD having 3 solutions :
(a,b) (a,a) (b,b)
• Data structures for calculating and storing problem solutions in a
compressed form using an acyclic directed graph
• Advantage: Powerful modeling tools. With one billion of arcs we can
represent 10^90 solutions!
MDD and compression
● Sum of 3 variables
● Corresponds to an automata
● The last layer is the sum value
Reduction
● Operation which merges equivalent
nodes
● Two nodes are equivalent if they
have
○ the same outgoing edges
(same destination + same labels)
root
a
0
b
1
c
0 1
d
2
e
0 1
0 1 1 0 1
● Minimization of finite automata
tt
Reduction
● Operation which merges equivalent
nodes
● Two nodes are equivalent if they
have
○ the same outgoing edges
(same destination + same labels)
root
a
0
b
1
ce
d
2 0 1 1 0
1 1 0
● Minimization of finite automata
tt
Reduction
● Reduction may gain an exponential factor
● Consequence:
○ MDD can be exponentially smaller than an equivalent automaton
Compression gain
● Compression may gain an exponential factor
● It often does!
● Example
○ MDD requiring 600,000 edges for representing 10^90 solutions, that is a
compression factor of 10^86.
● Sometimes it can be subtle
Alldiff constraint
● #node = 2^n
● #solutions = n!
● n!/2^n ???
● is exponential
MDD: creation
● MDD can be created without enumerating the solution set
● Can be created from Dynamic Programming
● Kind of Search compression
● So what?
● Operations!
MDD: operations
● Intersection, union, difference, negation etc…
● Operations are performed without decompression
● Intersection of 2 MDDs is equivalent to make the conjunction of the 2
constraints represented by the MDDs
● Relation between MDD operations and constraints combination
○ Intersection : conjunction
○ Union : disjunction
○ Negation : negation
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
1.20
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
1.21
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
1.22
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
1.23
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
1.24
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
1.25
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
1.26
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
1.27
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
1.28
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
tt
1
1.29
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
tt
1
1.30
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
tt
1 1
1.31
Intersection
root1
a
0
b
1
ce
d
2 0 1 1 0
tt1
1 1 0
root2
k
0 2
l
0 2
tt2
1 2
root
ak'
0
cel'
0
dl'
2
tt
1 1
1.32
Intersection
● Be careful, do not think that the number of nodes/edges of the intersection
will be reduced
● The resulting MDD can be exponentially larger! Because it can be locally
decompressed
Operations
● In-place
● On-the-fly (i.e avoid having to define the MDD ina dvance and proceeds by
level)
3 IDEAS
35
First idea
store and retrieve n-grams efficiently
36
Successions Constraint
• Assuming all n-grams are inserted in
the MDD as solutions.
• MDD as a TRIE
• To store and reTRIEve n-grams.
What is the next word of The white cat ? 37
Second idea
Integrate constraints on-the-fly
38
- Using the first MDD (successions)
- We compile the second one
-Constraint are checked on-the-fly
The girl and the boy walked through the forest under t he majestic oak trees
39
MDD Unfolding (top-down)
The modeling properties of MDD leads to solve
the problem by representing each rule by an
MDD and by intersecting them.
40
Modeling
From Rules to MDD
9 to 15 words Language Restrictions (3000 lemma) 59 characters corpus
An arc is a word
An arc is a lemma
An arc is the number of characters of a word A state is the sum
An arc is a word A state is a k-gram
MDD Universel MDD Lexique MDD Size MDD Corpus
41
Intersection
From MDD to sentences
9 to 15 words Language Restrictions (3000 lemma) 59 characters corpus
MDD Taille MDD Corpus
MDD Universel MDD Lexique
# Le
Le sac
42
The intersections of MDD gives: Le sac noir
Third idea
Use an LLM to select best sentences
43
• Transformers (very large context window):
• Perplexity is derived from Shannon entropy.
• It quantify the uncertainty of a model with respect to a sample
• Lower the better, range is [1 ; + inf[
LLM sentences scoring : Perplexity
44
• Input : 443 books belonging to the youth category (FR)
• Input : 75 books belonging to the fiction category (EN)
• Evaluation :
• MNREAD candidate sentence set (syntax and meaning correct)
• Ineligible set of sentences (syntax and/or meaning problems)
• Software & Hardware :
• The model is implemented in Java 17 in an MDD solver (MDDLib) @I3S.
• The LLM use to rank sentences is GPT-2
• Machine: Ubuntu 18.04 using an Intel(R) Xeon(R) Gold 5222 @ 3.80GHz CPU and 256 GB RAM.
45
Experimental conditions
• With 1% of the corpus, i.e. 63000 sentences, in 3-grams we obtain 9899 sentences
•J'aimerais bien que le soleil commence à se rendre au salon
Phrases generated in 3-grams
Do we have sentences?
•Ils sont morts et les yeux sur le nom de ce que vous croyez
•Mes yeux se posent sur le nom qui lui a dit que vous croyez
•Ses mains n'étaient pas de sa mère dans ses bras et le même
•Aucun de ses pieds nus sur les yeux de ce qui ne se passera
•L'expression a pris un coup de poing et de leur sort demain
•Y en a pas de nous préparer à tout bout de sa petite bouche
•
•
J'en ai dit que si je vous en emparez et vous ne pouvez pas
Entrez là et tu as de ma part de sa main dans le monde voit
Bien que je ne veux pas que les yeux de ce que ça me plaira
I wish the sun would start coming into the living room
• Those sentences are not admissible.
• In 3-grams we produce a large majority of sentences with problems of meaning and syntax
• None of his bare feet on the eyes of what will happen
46
Are MNREAD sentences generated?
• YES ! In 5-grams , with 443 books , we generate hundreds of sentences (7028).
47
Are MNREAD sentences generated?
• YES ! In 5-grams , with 75 books , we generate hundreds of sentences (204).
48
Performances analysis
FR 443 3Go 72s 7028
EN 75 <<1Go 3s 204
MNREAD sentence generation
49
Scoring takes roughly 1 hours for 7000 sentences. GPT-2 (pylia)
Scoring takes roughly 30 mins for 7000 sentences. GPT-3 (OpenAI cloud)
Recent benchmark :
569.77 ms / 15 tokens ( 37.98 ms per token) ~~ 1 sentence, llama.cpp (comparable to
GPT-3) (pylia)
• Select sentences by using GPT2 or similar generative model.
• Examples (everybody can have a personal opinion about the score!):
• Very good : The two men looked at each other in a state of stupefaction (10)
• Moderatley good: The wolves had for the most part wholly ignorant of warfare (270)
• Bad: The farmer sat down on the Museum steps except the nice one (930)
• Poetic (medium): Il est tombé dans le vide avec une sorte de douceur absente (100)
• Complex:
• The aircraft will be as common as I can to hinder their way (380)
• The depth was very great and it seemed to me to do as I did (97)
Discussion
• Score are related to frequency of occurrence
• The wolves had for the most part wholly ignorant of warfare (272)
• We change words by using more frequent ones
• The wolves had for the most part completely ignored the war (90)
Discussion
English Ranking
52
The two men looked at each other in a state of stupefaction ,10.47
The wolves had for the most part wholly ignorant of warfare ,272
• Generate sentences having only ten words (or 12, 11…): no problem
• Changing the level of vocabulary: no problem
• Modifying the size: no problem
• Other constraints: be careful with the combinatorics. If main constraints are relaxed then number of
solutions explodes!
New Constraints
• Promising method: more suitable than generic methods for handling constraints (e.g., GPT, Bert) and
more flexible than the ad-hoc method of Mansfield et al [3].
• Advantages: modularity (easy to add and/or remove rules), constraints taken into account at generation,
potentially applicable to other languages
• Perspectives: a perplexity constraint
55
Conclusion
Thanks.
56

Constrained text generation to measure reading performance: A new approach based on constraint programming

  • 1.
    Constrained text generationto measure reading performance: A new approach based on constraint programming JOINT WORK WITH ALEXANDRE BONLARRON(1,3), AURÉLIE CALABRÈSE(2), PIERRE KORNPROBST(1). 1 Jean-Charles Régin(3) (1) Université Côte d’Azur, Inria, France (2) Aix Marseille Université, CNRS, LPC, Marseille, France (3) Université Côte d’Azur, I3S, France
  • 2.
    • Standardized text:sentences read at the same speed • Usability: to assess reading performance Standardized Text (Mansfield et al., 1993) 3
  • 3.
    Constrained text generation Thisis a rules (constraints) dominated problem MNREAD Chart = 4
  • 4.
    MNREAD Rules Display Rules LexicalRules Grammatical Rules Length Rules E.g. Entering the rectangle E.g. 3000 words from CE2 textbooks E.g. No Punctuation E.g. 60 characters, between 9 and 15 words Example of a MNREAD sentence There are 38 MNREAD sentences in French 5
  • 5.
    Are there enoughsentences? No, a few thousand sentences are needed to detect and monitor visual pathology throughout life. Is it really difficult to have more sentences that respect the rules ? Questions : There are 38 MNREAD phrases in French. 6
  • 6.
    Naive method Search forMNREAD type sentences in a corpus. Problem : This method does not scale up Solution : We have to generate them, but how ? Display Lexicon Grammar Length 1 000 000 sentences 10 000 sentences 6 sentences 2300 books 10 000 000 sentences 3 sentences 7
  • 7.
    How to generatestandardized sentences? 8 LLM-based approach (GPT, BERT) + SEARCH := good text quality, but unlikely to find in an instance that satisfies the constraints. Generates word-by-word sentences and selects the next word as the most suitable (use token instead of words) prompt (ChatGPT 3.5): give me a sentence of sixty characters with spaces included “Elephants march majestically through the savannah at sunset, their presence captivating” prompt (ChatGPT 3.5): give me a sentence of sixty characters “The cat sat on the mat and purred softly”
  • 8.
    How to generatestandardized sentences? • One "good" sentence out of 8000 (only); remembering bias • A semi-automatic method for the English language • Non-trivial extension to Latin languages! mon ami est beau; mon amie est belle Ad hoc method: a recent method proposed by the creators of MNREAD and based on hand- defined models (Mansfield et al., 2019). 9
  • 9.
    How to generatestandardized sentences? n-gram based methods (Papadopoulos et al., 2015) Corpus n-gram Génération When using with a random walk it produces sentences in the style of an author Problems: How to integrate constraints? How to manage the meaning of the sentences?
  • 10.
    • A generalizationof Binary Decision Diagrams (BDD) • Each layer represents a variable • Each path between root and tt is a valid assignment of the variables • An MDD models all tuples that satisfy a constraint Multi-Valued Decision Diagram (MDD) 11 MDD having 3 solutions : (a,b) (a,a) (b,b) • Data structures for calculating and storing problem solutions in a compressed form using an acyclic directed graph • Advantage: Powerful modeling tools. With one billion of arcs we can represent 10^90 solutions!
  • 11.
    MDD and compression ●Sum of 3 variables ● Corresponds to an automata ● The last layer is the sum value
  • 12.
    Reduction ● Operation whichmerges equivalent nodes ● Two nodes are equivalent if they have ○ the same outgoing edges (same destination + same labels) root a 0 b 1 c 0 1 d 2 e 0 1 0 1 1 0 1 ● Minimization of finite automata tt
  • 13.
    Reduction ● Operation whichmerges equivalent nodes ● Two nodes are equivalent if they have ○ the same outgoing edges (same destination + same labels) root a 0 b 1 ce d 2 0 1 1 0 1 1 0 ● Minimization of finite automata tt
  • 14.
    Reduction ● Reduction maygain an exponential factor ● Consequence: ○ MDD can be exponentially smaller than an equivalent automaton
  • 15.
    Compression gain ● Compressionmay gain an exponential factor ● It often does! ● Example ○ MDD requiring 600,000 edges for representing 10^90 solutions, that is a compression factor of 10^86. ● Sometimes it can be subtle
  • 16.
    Alldiff constraint ● #node= 2^n ● #solutions = n! ● n!/2^n ??? ● is exponential
  • 17.
    MDD: creation ● MDDcan be created without enumerating the solution set ● Can be created from Dynamic Programming ● Kind of Search compression ● So what? ● Operations!
  • 18.
    MDD: operations ● Intersection,union, difference, negation etc… ● Operations are performed without decompression ● Intersection of 2 MDDs is equivalent to make the conjunction of the 2 constraints represented by the MDDs ● Relation between MDD operations and constraints combination ○ Intersection : conjunction ○ Union : disjunction ○ Negation : negation
  • 19.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root 1.20
  • 20.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 1.21
  • 21.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 1.22
  • 22.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 1.23
  • 23.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 1.24
  • 24.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 1.25
  • 25.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 1.26
  • 26.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 1.27
  • 27.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 1.28
  • 28.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 tt 1 1.29
  • 29.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 tt 1 1.30
  • 30.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 tt 1 1 1.31
  • 31.
    Intersection root1 a 0 b 1 ce d 2 0 11 0 tt1 1 1 0 root2 k 0 2 l 0 2 tt2 1 2 root ak' 0 cel' 0 dl' 2 tt 1 1 1.32
  • 32.
    Intersection ● Be careful,do not think that the number of nodes/edges of the intersection will be reduced ● The resulting MDD can be exponentially larger! Because it can be locally decompressed
  • 33.
    Operations ● In-place ● On-the-fly(i.e avoid having to define the MDD ina dvance and proceeds by level)
  • 34.
  • 35.
    First idea store andretrieve n-grams efficiently 36
  • 36.
    Successions Constraint • Assumingall n-grams are inserted in the MDD as solutions. • MDD as a TRIE • To store and reTRIEve n-grams. What is the next word of The white cat ? 37
  • 37.
  • 38.
    - Using thefirst MDD (successions) - We compile the second one -Constraint are checked on-the-fly The girl and the boy walked through the forest under t he majestic oak trees 39 MDD Unfolding (top-down)
  • 39.
    The modeling propertiesof MDD leads to solve the problem by representing each rule by an MDD and by intersecting them. 40
  • 40.
    Modeling From Rules toMDD 9 to 15 words Language Restrictions (3000 lemma) 59 characters corpus An arc is a word An arc is a lemma An arc is the number of characters of a word A state is the sum An arc is a word A state is a k-gram MDD Universel MDD Lexique MDD Size MDD Corpus 41
  • 41.
    Intersection From MDD tosentences 9 to 15 words Language Restrictions (3000 lemma) 59 characters corpus MDD Taille MDD Corpus MDD Universel MDD Lexique # Le Le sac 42 The intersections of MDD gives: Le sac noir
  • 42.
    Third idea Use anLLM to select best sentences 43
  • 43.
    • Transformers (verylarge context window): • Perplexity is derived from Shannon entropy. • It quantify the uncertainty of a model with respect to a sample • Lower the better, range is [1 ; + inf[ LLM sentences scoring : Perplexity 44
  • 44.
    • Input :443 books belonging to the youth category (FR) • Input : 75 books belonging to the fiction category (EN) • Evaluation : • MNREAD candidate sentence set (syntax and meaning correct) • Ineligible set of sentences (syntax and/or meaning problems) • Software & Hardware : • The model is implemented in Java 17 in an MDD solver (MDDLib) @I3S. • The LLM use to rank sentences is GPT-2 • Machine: Ubuntu 18.04 using an Intel(R) Xeon(R) Gold 5222 @ 3.80GHz CPU and 256 GB RAM. 45 Experimental conditions
  • 45.
    • With 1%of the corpus, i.e. 63000 sentences, in 3-grams we obtain 9899 sentences •J'aimerais bien que le soleil commence à se rendre au salon Phrases generated in 3-grams Do we have sentences? •Ils sont morts et les yeux sur le nom de ce que vous croyez •Mes yeux se posent sur le nom qui lui a dit que vous croyez •Ses mains n'étaient pas de sa mère dans ses bras et le même •Aucun de ses pieds nus sur les yeux de ce qui ne se passera •L'expression a pris un coup de poing et de leur sort demain •Y en a pas de nous préparer à tout bout de sa petite bouche • • J'en ai dit que si je vous en emparez et vous ne pouvez pas Entrez là et tu as de ma part de sa main dans le monde voit Bien que je ne veux pas que les yeux de ce que ça me plaira I wish the sun would start coming into the living room • Those sentences are not admissible. • In 3-grams we produce a large majority of sentences with problems of meaning and syntax • None of his bare feet on the eyes of what will happen 46
  • 46.
    Are MNREAD sentencesgenerated? • YES ! In 5-grams , with 443 books , we generate hundreds of sentences (7028). 47
  • 47.
    Are MNREAD sentencesgenerated? • YES ! In 5-grams , with 75 books , we generate hundreds of sentences (204). 48
  • 48.
    Performances analysis FR 4433Go 72s 7028 EN 75 <<1Go 3s 204 MNREAD sentence generation 49 Scoring takes roughly 1 hours for 7000 sentences. GPT-2 (pylia) Scoring takes roughly 30 mins for 7000 sentences. GPT-3 (OpenAI cloud) Recent benchmark : 569.77 ms / 15 tokens ( 37.98 ms per token) ~~ 1 sentence, llama.cpp (comparable to GPT-3) (pylia)
  • 49.
    • Select sentencesby using GPT2 or similar generative model. • Examples (everybody can have a personal opinion about the score!): • Very good : The two men looked at each other in a state of stupefaction (10) • Moderatley good: The wolves had for the most part wholly ignorant of warfare (270) • Bad: The farmer sat down on the Museum steps except the nice one (930) • Poetic (medium): Il est tombé dans le vide avec une sorte de douceur absente (100) • Complex: • The aircraft will be as common as I can to hinder their way (380) • The depth was very great and it seemed to me to do as I did (97) Discussion
  • 50.
    • Score arerelated to frequency of occurrence • The wolves had for the most part wholly ignorant of warfare (272) • We change words by using more frequent ones • The wolves had for the most part completely ignored the war (90) Discussion
  • 51.
    English Ranking 52 The twomen looked at each other in a state of stupefaction ,10.47 The wolves had for the most part wholly ignorant of warfare ,272
  • 53.
    • Generate sentenceshaving only ten words (or 12, 11…): no problem • Changing the level of vocabulary: no problem • Modifying the size: no problem • Other constraints: be careful with the combinatorics. If main constraints are relaxed then number of solutions explodes! New Constraints
  • 54.
    • Promising method:more suitable than generic methods for handling constraints (e.g., GPT, Bert) and more flexible than the ad-hoc method of Mansfield et al [3]. • Advantages: modularity (easy to add and/or remove rules), constraints taken into account at generation, potentially applicable to other languages • Perspectives: a perplexity constraint 55 Conclusion
  • 55.