It is rare that texts or entire books written in a Controlled Natural Language (CNL) become very popular, but exactly this has happened with a book that has been published last year. Randall Munroe's Thing Explainer uses only the 1'000 most often used words of the English language together with drawn pictures to explain complicated things such as nuclear reactors, jet engines, the solar system, and dishwashers. This restricted language is a very interesting new case for the CNL community. I describe here its place in the context of existing approaches on Controlled Natural Languages, and I provide a first analysis from a scientific perspective, covering the word production rules and word distributions.
Formation of low mass protostars and their circumstellar disks
The Controlled Natural Language of Randall Munroe’s Thing Explainer
1. The Controlled Natural Language of Randall
Munroe’s Thing Explainer
Tobias Kuhn
http://www.tkuhn.org
@txkuhn
Department of Computer Science, VU University Amsterdam
Fifth Workshop on Controlled Natural Language (CNL 2016)
Aberdeen, Scotland
26 July 2016
2. Does CNL have a Visibility Problem?
Controlled Natural Languages have been successfully
applied in a wide range of domains...
... but how many people know what CNLs are, outside
of our small community?
But maybe things are about to change...
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 2 / 24
3. Bill Gates on Controlled Natural Language:
“It is a brilliant concept.”
https://www.gatesnotes.com/Books/Thing-Explainer
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 3 / 24
4. Well, OK, Bill Gates wasn’t referring to CNL in general but to a very
specific one:
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 4 / 24
5. The author of Thing Explainer, Randall Munroe,
is also the creator of the xkcd webcomics:
https://xkcd.com/1443/
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 5 / 24
6. Thing Explainer —
Complicated Stuff in Simple Words
Intriguingly simple idea: Only use the 1000 most common words of
English
“This is a book of pictures and simple words [...] using only the ten
hundred words in our language that people use the most.” [Thing
Explainer]
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 6 / 24
7. Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 7 / 24
8. Excerpt from the Book
from: R. Munroe. Thing Explainer Complicated Stuff in Simple Words. Houghton Mifflin Harcourt, 2015.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 8 / 24
9. Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 9 / 24
10. from: R. Munroe. Thing Explainer Complicated Stuff in Simple Words. Houghton Mifflin Harcourt, 2015.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 10 / 24
11. Language Properties
PENS class: P1E5N5S1 (like unrestricted English)
Comparison to similar languages:
• Basic English: 850 manually selected words (only 18 verbs!) with
specified categories (nouns, verbs, etc.), and strict usage rules
including grammar restrictions
• Special English: 1500 manually selected words (evolve over time)
with specified categories
• Thing Explainer language: 1000 words chosen by frequency and
without specified categories, no restrictions on word usage or
grammar
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 11 / 24
12. Similar Language: Special English
from: VOA Special English word book, Voice of America, 2009
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 12 / 24
13. Word List
Randall Munroe: “I spent several months going back over a bunch of
different lists and generating some of my own based on the Google
Books corpus and even my own email inbox. Then I combined the
lists and where they disagreed I just let my sense of consistency be
the tie-breaker.” [Heaven 2015]
“In this set, I count different word forms—like ‘talk,’ ‘talking,’ and
‘talked’—as one word. I also allowed most ‘thing’ forms of ‘doing’
words, like ‘talker’—especially if, like ‘goer,’ it wasn’t a real word but
it sounded funny.” [Thing Explainer]
D. Heaven. Its not a rocket its an up goer. New Scientist, 228(3049):3233, 2015.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 13 / 24
14. Implicit Rules
1. The word forms on the list of the 1’000 most often used words.
2. All conjugation forms of verbs on the list. This includes third
singular present (-s), past (-ed), and infinitive from (-ing),
including irregular forms.
3. Noun forms built from verbs on the list by -er, for example
carrier.
4. The plural forms of nouns on the list (-s), for example things,
including irregular forms like teeth. This rule can also be applied
to the word other to produce others, even though it is not a
noun.
5. Comparative (-er) and superlative forms (-est) built from
adjectives on the list, for example smaller or fastest, and
including irregular forms like worse.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 14 / 24
15. Implicit Rules (Continued)
6. Adjective forms built from nouns on the list by -y or -ful, for
example pointy or colorful.
7. Adverb forms built from adjectives on the list by -ly, for example
normally.
8. Noun forms built from adjectives on the list by -ness, for
example thickness.
9. Different case and possessive forms of pronouns on the list: they
for them, us/ours from we/our, and his from he.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 15 / 24
16. I don’t want to go all Language Nerd, but ...
https://xkcd.com/1443/
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 16 / 24
17. Implicit Rules (Continued)
10. Verb forms of nouns on the list and noun forms of verbs on the
list when the two forms are similar but not equal, such as
thought from think, and live from life, including deduced forms
like thoughts and living. (only these two cases)
11. More basic word forms for words on the list, such as nouns from
which adjectives on the list were built (person from personal)
and verbs from which nouns on the list were built (build from
building). (only these two cases)
12. Verb forms built from adjectives on the list, such as lower from
low, including conjugated forms like lowering and lowered. (only
this one case)
13. Common acronyms for words on the list, such as TV for
television. (only this one case)
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 17 / 24
18. Word Form Counts
780listed form (rule 1)
361* + s (rules 2/4)
260noun-verb + s (rules 2/4)
168* + er (rules 3/5)
167verb + ing (rule 2)
119verb + er (rule 3)
114verb + ed (rule 2)
84irr. verb form (rule 2)
68noun + s (rule 4)
32verb + s (rule 2)
28adj. + er (rule 5)
21adj. + est (rule 5)
21verb-adj. + er (rules 3/5)
7noun + y (rule 6)
6extra word
6basic form (rule 11)
4pronoun form (rule 9)
4adj. + -ly (rule 7)
3noun to verb (rule 10)
3adj. + ness (rule 8)
3irr. noun form (rule 4)
2adj. to verb (rule 12)
1verb to noun (rule 10)
1other + s (rule 4)
1adj. + ful (rule 6)
1acronym (rule 13)
0 100 200 300 400 500 600 700 800
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 18 / 24
19. Word Occurrence Counts
40’494listed form (rule 1)
4’688* + s (rules 2/4)
3’232noun-verb + s (rules 2/4)
1’795irr. verb form (rule 2)
1’274verb + ing (rule 2)
1’140* + er (rules 3/5)
1’050noun + s (rule 4)
790verb + er (rule 3)
628verb + ed (rule 2)
529pronoun form (rule 9)
398verb + s (rule 2)
225adj. + er (rule 5)
205extra word
125verb-adj. + er (rules 3/5)
99basic form (rule 11)
77adj. + est (rule 5)
68noun to verb (rule 10)
36noun + y (rule 6)
31irr. noun form (rule 4)
10adj. + -ly (rule 7)
8other + s (rule 4)
3adj. + ness (rule 8)
3adj. + ful (rule 6)
2verb to noun (rule 10)
2adj. to verb (rule 12)
2acronym (rule 13)
0k 10k 20k 30k 40k
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 19 / 24
20. Six Extra Words and
the Importance of Tool Support
These six words are used in Thing Explainer, but are not on the list
(nor can they be produced by any obvious rule from the words on the
list): some, mad, hat, apart, rid, and worth
Caused by the use of a faulty spell-checker?
Randall Munroe: “As I wrote, I had tools that would warn me if I used
a word that was not on the list, like a spell-checker.” [Heaven 2015]
D. Heaven. Its not a rocket its an up goer. New Scientist, 228(3049):3233, 2015.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 20 / 24
21. Inconsistency with Creation/Use of Word List
Some words (comparative and superlative forms of adjectives;
adjectives built by -ly; and different case and possessive forms of
pronouns) count as different when the list is defined, but count as
the same word when the list is used.
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 21 / 24
22. Word Distribution
q
q q q qq
qqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
1 5 10 50 500
1550500
Frequency distribution of non−lemmatized terms
rank
count
q
q q q qqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
1 5 10 50 5001550500
Frequency distribution of lemmatized terms
rank
count
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 22 / 24
23. Conclusion
So, what makes the Thing Explainer language special?
• Very popular (I don’t know of any other book written in a CNL
that is as popular as Thing Explainer...)
• Intriguingly simple restriction of English (even though not as
simple as it looks at first)
• Demonstration that virtually everything can be expressed in it in
an easy-to-grasp manner (I don’t know of a similarly convincing
demonstration for Basic English or Special English...)
• Some things could be improved to make the language even more
powerful (and maybe a bit less funny)
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 23 / 24
24. Thank you for your attention!
Questions?
Tobias Kuhn, VU Amsterdam The Controlled Natural Language of Randall Munroe’s Thing Explainer 24 / 24