This paper describes the first edition of the “Solving language games” (NLP4FUN) task at the EVALITA 2018 campaign. The task consists in designing an artificial player for “The Guillotine” (La Ghigliottina, in Italian), a challenging language game which demands knowledge covering a broad range of topics. The game consists in finding a word which is semantically correlated with a set of 5 words called clues. Artificial players for that game can take advantage from the availability of open repositories
on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to find the solution.
1. EVALITA 2018
EVALUATION OF NLP AND SPEECH TOOLS FOR ITALIAN
Overview of the EVALITA 2018 Solving
language games (NLP4FUN) Task
Pierpaolo Basile, Marco de Gemmis
Lucia Siciliani, Giovanni Semeraro
Dipartimento di Informatica
Università degli Studi di Bari Aldo Moro, Italy
3. EVALITA 2018 Workshop
December 12-13 2018, Turin
“La Ghigliottina”
The solution is pacco:
✓ “Pacco, doppio pacco e contropaccotto” (movie)
✓ Carta da pacco
✓ Pacco di soldi
✓ Pacco di pasta
✓ Pacco regalo
4. EVALITA 2018 Workshop
December 12-13 2018, Turin
Motivation
● Language Games have attracted the attention of
researchers in the fields of AI and NLP
○ Jeopardy!, crossword puzzles
● “La Ghigliottina” is a challenging language
game which demands knowledge covering a
broad range of topics
○ take advantage from the availability of open
repositories and the web
○ cultural and linguistic background are
necessary to understand clues
5. EVALITA 2018 Workshop
December 12-13 2018, Turin
Task and dataset
● The task: given a set of five words - the
clues - each linked in some way to a
specific word that represents the unique
solution of the game
○ clues are unrelated to each other
○ the player has one minute to find the
solution!!!
● Dataset: set of games taken from
○ the TV show “L’Eredità”
○ the board game “L’Eredità”
6. EVALITA 2018 Workshop
December 12-13 2018, Turin
Data format
<games>
<game>
<id>3fc953bd...</id>
<clue>uomo</clue>
<clue>cane</clue>
<clue>musica</clue>
<clue>casa</clue>
<clue>pietra</clue>
<solution>chiesa</solution>
<type>TV</type>
</game>
...
</games>
● XML format
● a root element
games which
contains several
game elements
● each game has five
clue elements and
one solution
● the element type
specifies the type of
the game: TV or
board game
7. EVALITA 2018 Workshop
December 12-13 2018, Turin
Output
The participants must return a ranked list of
solutions in plain text file:
id solution score rank time
For example:
3fc953bd-... porta 0.978 1 3459
3fc953bd-... chiesa 0.932 2 3251
3fc953bd-... santo 0.897 3 4321
...
3fc953bd-... carta 0.321 100 2343
MAX 100
candidate
solutions for each
game
8. EVALITA 2018 Workshop
December 12-13 2018, Turin
Output
The participants must return a ranked list of
solutions in plain text file:
id solution score rank time
For example:
3fc953bd-... porta 0.978 1 3459
3fc953bd-... chiesa 0.932 2 3251
3fc953bd-... santo 0.897 3 4321
...
3fc953bd-... carta 0.321 100 2343
time taken by the
system to
compute the
solution is
reported in
milliseconds
9. EVALITA 2018 Workshop
December 12-13 2018, Turin
Dataset: statistics
● Games have different levels of difficulty
○ instances taken both from the TV game and
from the official board game
● Training set: 315 instances of the game
○ 64.8% (TV game), 35.2% (board game)
● Test set: 105 instances of the game
○ 62.9% (TV game)
○ 37.1% (board game)
● 300 fake games (automatically created)
added in the evaluation data
10. EVALITA 2018 Workshop
December 12-13 2018, Turin
Evaluation
● a (time) weighted version of Mean
Reciprocal Rank (MRR)
● G is the set of games
● rg
is the rank of the solution
● tg
denotes the minutes taken by the system
to give the solution
11. EVALITA 2018 Workshop
December 12-13 2018, Turin
Participants
● 12 registered teams
● only 2 team submitted results
○ UNIOR4FUN: the idea is that clue words and
the corresponding solution are often part of a
multiword expression (multiword expressions
are filtered by linguistic patterns)
○ LucaSquadrone: co-occurrences of clues and
candidate solutions
12. EVALITA 2018 Workshop
December 12-13 2018, Turin
Results
● UNIOR4NLP reports very high MRR, the
system is able to place the solution in the
first positions
● Squadrone system takes more time for
solving games MRR≠MRR (std)
System MRR MRR (std) Solved
UNIOR4NLP 0.6428 0.6428 81.90%
Squadrone 0.0134 0.0350 25.71%
13. EVALITA 2018 Workshop
December 12-13 2018, Turin
Comments
Reported results are remarkable but some
difficult games requiring inference are
unsolved:
● uno, notte, la trippa, auto, palazzo → portiere
○ uno is the number generally assigned to the
role of the goalkeeper (portiere)
○ “La Trippa” is the surname of “Antonio La
Trippa”, a character of the Italian movie “Gli
onorevoli”, whose job is the porter (portiere) of
a building
14. EVALITA 2018 Workshop
December 12-13 2018, Turin
Conclusions
● Challenging task
● Good results when the solution is a
multiword expression
○ inference is hard to tackle
● Few participants
○ Is the task too difficult?
○ Do no-classification tasks attract few
participants?
● Mobile app “Ghigliottiniamo”
○ integrate your artificial player through REST API,
contact support@quiztime.io
15. EVALITA 2018 Workshop
December 12-13 2018, Turin
Thank you!
Download our dataset from the GitHub
EVALITA 2018 repository
https://github.com/evalita2018/data