TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
Studying Migrations Routes: New data and Tools
1. Roberto Navigli
BabelNet and beyond: a huge multilingual
semantic network and its potential for
interconnecting migration routes
16th June 2016 – Rome
http://lcl.uniroma1.it
2. Roberto Navigli
20/06/2016 2
• Associate Professor in the Department of Computer
Science (Sapienza, Rome)
• Principal investigator of several projects:
– ERC Starting Grant (MultiJEDI)
– FP7 CSA (LIDER)
– Google Focused Research Award (co-PI)
• Managing a team of 10 researchers, out of which 6
Ph.D. students
BabelNet, Babelfy and Beyond!
Roberto Navigli
3. Outline of the talk
• BabelNet: a huge multilingual semantic network
• Babelfy: a state-of-the-art multilingual disambiguation
system
• What's next: how to help interconnect/detect migration
flows with the aid of our technologies
20/06/2016 3BabelNet, Babelfy and Beyond!
Roberto Navigli
6. • There are many online dictionaries and encyclopedias
• Each covers one or a limited number of languages
• The knowledge found in different resources is often
complementary
– To get coverage of more languages
– To get additional information about the entry
– To obtain links to geographical information
• However, each resource provides different meaning
inventories
20/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
6
The resource diaspora
7. 20/06/2016 7
Key Objective 1: create knowledge for all languages
BabelNet: Unifying Lexical Knowledge Resources into
a Single Semantic Network
WordNet
MultiWordNet
WOLF
MCR
GermaNet
BalkaNet
BabelNet, Babelfy and Beyond!
Roberto Navigli
9. BabelNet, Babelfy, Video games with a purpose & the Wikipedia Bitaxonomy
Roberto Navigli
9
• Playing with senses
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
• Bla bla bla bla bla bla bla
concepts
(unspecified) semantic relation
Wikipedia [The Web Community, 2001-today]
10. • We collect lexicalizations, definitions, translations,
images, etc. from each of the merged resources
Merging entries from different resources into BabelNet
BabelNet, Babelfy and Beyond!
Roberto Navigli
10
WordNet
11. BabelNet: concepts and semantic relations
• We encode knowledge as a labeled directed graph:
– Each vertex is a Babel synset (=synonym set)
– Each edge is a semantic relation between synsets:
• is-a (balloon is-a aircraft)
• part-of (gasbag part-of balloon)
• instance-of (Einstein instance-of physicist)
• …
• unspecified/relatedness (balloon related-to flight)
balloonEN, BallonDE,
aerostatoES, aerostatoIT,
pallone aerostaticoIT,
mongolfièreFR
1120/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
12. What is BabelNet?
• A merger of resources of different kinds:
20/06/2016META Prize 2015: BabelNet
Roberto Navigli
12
13. 20/06/2016 13
• A merger of resources of different kinds:
– WordNet: the most popular computational lexicon of English
– Open Multilingual WordNet: a collection of open wordnets
– WoNeF: a French WordNet
– Wikipedia: the largest collaborative encyclopedia
– Wikidata: the largest collaborative knowledge base
– Wiktionary: the largest collaborative dictionary
– OmegaWiki: a medium-size collaborative multilingual dictionary
– GeoNames: a worldwide geographical database
– Microsoft Terminology: a computer science thesaurus
– High-quality automatic sense-based translations
BabelNet, Babelfy and Beyond!
Roberto Navigli
What is BabelNet?
14. 20/06/2016 14
What is BabelNet?
• A merger of resources of different kinds:
BabelNet, Babelfy and Beyond!
Roberto Navigli
15. 20/06/2016 15
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
BabelNet, Babelfy and Beyond!
Roberto Navigli
16. 20/06/2016 16
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
– 6M concepts and 7.7M named entities
– 119M word senses
– 378M semantic relations (27 relations per concept on avg.)
– 11M images associated with concepts
– 41M textual definitions
– 2M concepts with domains associated
BabelNet, Babelfy and Beyond!
Roberto Navigli
17. 20/06/2016Multilingual Web Access – WWW 2015
Roberto Navigli
17
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
20/06/2016META Prize 2015: BabelNet
Roberto Navigli
17
18. 20/06/2016Multilingual Web Access – WWW 2015
Roberto Navigli
18
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
20/06/2016META Prize 2015: BabelNet
Roberto Navigli
18
19. 20/06/2016 19
Why do we need BabelNet?
• Multilinguality: the same concept is expressed in tens of
languages
• Coverage: 271 languages and 14 million entries!
• Concepts and named entities together: dictionary and
encyclopedic knowledge is semantically interconnected
• "Dictionary of the future": semantic network structure
with labeled relations, pictures, multilingual synsets
• Media coverage and prestigious prizes
19BabelNet, Babelfy and Beyond!
Roberto Navigli
21. Lexical ambiguity!
• Thomas and Mario played as strikers in Munich.
2120/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
22. Word Sense Disambiguation and Entity Linking
• Thomas and Mario are strikers playing in Munich
Entity Linking: The task
of discovering mentions
of entities within a text
and linking them in a
knowledge base.
WSD: The task aimed at
assigning meanings to word
occurrences within text.
2220/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
23. Multilingual Joint Word Sense Disambiguation
(MultiJEDI)
Key Objective 2: use all languages to disambiguate one
2320/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
25. Step 1: Find all possible meanings of words
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)
Munich (City)
FC Bayern Munich
Munich (Song)
Ambiguity!
2520/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
26. Step 2: Connect all the candidate meanings
Thomas and Mario are strikers playing in Munich
2620/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
27. Step 3: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
2720/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
28. Step 3: Extract a dense subgraph
Thomas and Mario are strikers playing in Munich
2820/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
29. Step 4: Select the most reliable meanings
“Thomas and Mario are strikers playing in Munich”
Thomas (novel)
Seth Thomas
Thomas Müller
Mario Gómez
Mario (Album)
Mario (Character)
Striker (Movie)
Striker (Video Game)
striker (Sport)
Munich (City)
FC Bayern Munich
Munich (Song)
2920/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
33. Live demo – Crazy polyglot!
EN In todayʼs knowledge and information society
FR le paysage lexicographique est plus hétérogène que
jamais.
IT Possono le risorse stand-alone competere
ES con múltiples funciones, portale lexicográficas
multilingüe y servicios web,
ZH Web服务,定 制 的 喜 好 和 个 人 用 户 的 个 人 资 料 ?
20/06/2016 33BabelNet, Babelfy and Beyond!
Roberto Navigli
34. 1) Geographical named entities are interlinked
• Each geographical entity comes with:
– geolocation information
– translations in dozens of languages
– connections to other concepts and named entities (e.g.
politicians, important places, concepts, events, etc.)
20/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
34
35. 2) Named entities, events and actions are expressed in
any language
• We can process tweets, facebook/instagram/blog posts
and identify these entities and interconnect them
independently of the language they are expressed in
Οδεύουμε προς τη #Μακεδονία (We are moving to #Macedonia)
Greek police started phase 2 of #Idomeni evacuation, emptying
camp near Polykastro-1,828 people 2B moved
إخالء!الغاز محطة الشغب مكافحة شرطة حاصرت EKO! تركه رافضا الالجئين!إشعار أيمسبق
(EKO Evacuation! Riot police have surrounded EKO gas station!
Refugees refusing to leave! No prior notice given)
نقل سيتمسكان Idomeni سالونيك ،مدينة أكبر ثاني في ذلك في بما ،جديدة مخيمات إلى.
(Idomeni residents will be moved to new camps, including in the
second-largest city, Thessaloniki.)
20/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
35
36. 2) Named entities , events and actions are expressed
in any language
• We can process tweets, facebook/instagram/blog posts
and identify these entities and interconnect them
independently of the language they are expressed in
Οδεύουμε προς τη #Μακεδονία (We are moving to #Macedonia)
Greek police started phase 2 of #Idomeni evacuation, emptying
camp near Polykastro-1,828 people 2B moved
إخالء!الغاز محطة الشغب مكافحة شرطة حاصرت EKO! تركه رافضا الالجئين!إشعار أيمسبق
(EKO Evacuation! Riot police have surrounded EKO gas station!
Refugees refusing to leave! No prior notice given)
نقل سيتمسكان Idomeni سالونيك ،مدينة أكبر ثاني في ذلك في بما ،جديدة مخيمات إلى.
(Idomeni residents will be moved to new camps, including in the
second-largest city, Thessaloniki.)
20/06/2016BabelNet, Babelfy and Beyond!
Roberto Navigli
36
37. 3) Predicting where the migration flows are moving next
• Intentions can be automatically identified and
extracted from text
– Including the next most popular actions and events (e.g.
moving, evacuating, going back, etc.)
• Integrated with GPS and satellite views of the places
20/06/2016Recent achievements in multilingual NLP
Roberto Navigli
37