SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Learning with the
Web
Structuring data to ease
machine understanding
http://twitter.com/giusepperizzo
July 11th, 2013 Università di Torino, Italy 2/44
Google
Knowledge
Graph
Viewer
July 11th, 2013 Università di Torino, Italy 3/44
Google Knowledge Graph
July 11th, 2013 Università di Torino, Italy 4/44
The Google Knowledge Graph bulk:
encyclopedic sources
July 11th, 2013 Università di Torino, Italy 5/44
Web community has highlithed the road,
but ...
July 11th, 2013 Università di Torino, Italy 6/44
Vast wealth of unstructured data
“80% of data on the Web and on internal
corporate intranets is unstructured"
“80% of data on the Web and on internal
corporate intranets is unstructured”
“Semantic Web and Information Extraction Workshop”, SWAIE
at RANLP2013
July 11th, 2013 Università di Torino, Italy 7/44
The entire digital universe, going to
be part of the Web
“unstructured data will account for 90 percent of
all data created in the next decade”
IDC IVIEW, “Extracting Value from Chaos”, June 2011
July 11th, 2013 Università di Torino, Italy 8/44
Structured means
making those
resources available to be easily processed
by machines
July 11th, 2013 Università di Torino, Italy 9/44
A Web of Linked Entities
http://wole2013.eurecom.fr
http://wole2012.eurecom.fr
➢ GGG (global giant graph)
http://goo.gl/fH3h
➢ Nodes are Web entities
➢ Entities provide disambiguation
pointers
➢ Entities can be univocally referred
(disambiguated)
➢ Entities as centroids for topic
generation and undestanding
July 11th, 2013 Università di Torino, Italy 10/44
Chapter 1:
Named Entity Recognition (NER)
and
Named Entity Linking (NEL)
July 11th, 2013 Università di Torino, Italy 11/44
I want to book a room in an hotel located in the
heart of Paris, just a stone’s throw from the
Eiffel Tower
Eric Charton, “Named Entity Detection and
Entity Linking in the Context of Semantic Web:
Exploring the ambiguity question”
July 11th, 2013 Università di Torino, Italy 12/44
Part of Speech
I
want
to
book
a
room
in
..
Paris
PRP
VBP
TO
VB
DT
NN
IN
..
NNP
I
want
to
book
a
room
in
..
Paris
NER: What is Paris?
NEL: Which Paris are we
talking about?
July 11th, 2013 Università di Torino, Italy 13/44
What is Paris?
Type ambiguity
asteroid location/city film
July 11th, 2013 Università di Torino, Italy 14/44
Entity recognition
I
want
to
book
a
room
in
..
Paris
PRP
VBP
TO
VB
DT
NN
IN
..
NNP
I
want
to
book
a
room
in
..
Paris
O
O
O
O
O
O
O
..
LOC
July 11th, 2013 Università di Torino, Italy 15/44
NER: State of the art
➢ CRFs (Conditional Random Fields)
➢ FSM (Finite-State Machine)
➢ HMM (Hidden Markov Model)
➢ Gazetteers
➢ Wikipedia/DBpedia
➢ In-house dictionaries
July 11th, 2013 Università di Torino, Italy 16/44
Which Paris?
Name ambiguity
Paris, Kentucky Paris, Maine Paris, Tennessee
Paris, France Paris, Ontario
Paris, Idaho
July 11th, 2013 Università di Torino, Italy 17/44
Entity linking
I
want
to
book
a
room
in
..
Paris
PRP
VBP
TO
VB
DT
NN
IN
..
NNP
I
want
to
book
a
room
in
..
Paris
O
O
O
O
O
O
O
..
LOC
O
O
O
O
O
O
O
..
http://en.wikipedia.org/wiki/Paris
July 11th, 2013 Università di Torino, Italy 18/44
Ambiguity resolution: linking to an
external knowledge base
➢ Wikipedia/DBpedia
➢ Gigaword Corpus
➢ In-house dataset
➢ LOD dataset
➢ DBLP
➢ ACM
➢ BBC
➢ ...
July 11th, 2013 Università di Torino, Italy 19/44
NEL: State of the art
➢ Clustering
➢ Vector Space Model (Cosine similarity or
Maximum Entropy) – it requires a priori
knowledge of the spotted entities
➢ Conditional probability – it requires a priori
knowledge of the spotted entities
➢ Dictionaries
➢ Wikipedia/DBpedia
➢ In-house dataset
July 11th, 2013 Università di Torino, Italy 20/44
Processing natural language texts
➢ Several attempts from the Web community to
structure the large wealth of data available
➢ Numerous off-the-shelf systems (commercial, and
academic) that perform the NER+NEL chain
➢ AlchemyAPI
➢ DBpedia Spotlight
➢ Wikimeta
➢ TextRazor
➢ Stanford CRF
➢ ...
July 11th, 2013 Università di Torino, Italy 21/44
The NERD initiative
http://nerd.eurecom.fr
July 11th, 2013 Università di Torino, Italy 22/44
Combination of off-the-shelf systems
and properly trained CRFs
July 11th, 2013 Università di Torino, Italy 23/44
The strength of this approach lies in the fact that
the supported off-the-shelf systems have access
to large knowledge bases of entities such as
DBpedia and Freebase, while CRFs are domain
specific
July 11th, 2013 Università di Torino, Italy 24/44
Diversity
Alchemy
API
DBpedia
Spotlight
Extractiv Lupedia Open
Calais
Saplo Semi
Tags
Wikimeta Yahoo! Zemanta
Classification
schema
Alchemy DBpedia
FreeBase
Scema.org
Extractiv DBpedia
LinkedM
DB
Open
Calais
Saplo ConLL-
3
ESTER Yahoo FreeBase
Number of
classes
324 320 34 319 95 5 4 7 13 81
July 11th, 2013 Università di Torino, Italy 25/44
NERD Ontology
NERD type Occurrence
Person 10
Organization 10
Country 6
Company 6
Location 6
Continent 5
City 5
RadioStation 5
Album 5
Product 5
... ...
The NERD ontology has been integrated in the NIF project, a EU FP7 in the
context of the LOD2: Creating Knowledge out of Interlinked Data
July 11th, 2013 Università di Torino, Italy 26/44
Learning with the Web
➢ FSM-core based
➢ combination of the NERD supported off-the-shelf
systems
➢ ML-core based
➢ combination of the NERD supported off-the-shelf
systems
– and a CRF, properly trained with the given corpus
July 11th, 2013 Università di Torino, Italy 27/44
Challenges and benchmark
July 11th, 2013 Università di Torino, Italy 28/44
ETAPE 2012 - Entity Extraction
Challenge
➢ French transcripts of radio and video programs
➢ Challenge objective: entity typing
➢ Sumitted system:
➢ FSM-core based
➢ Given annotation priority to the systems that have
fine grained classification schemes
➢ Ranked 7th/7
July 11th, 2013 Università di Torino, Italy 29/44
#MSM'13 - Concept Extraction
Challenge
➢ English Twitter microposts
➢ Challenge objective: entity typing
➢ Submitted system:
➢ ML-core based: SVM
➢ Features = linguistic features (some of them are
capitalization, 3 chars of prefix and suffix, POS), output
of a CRF properly trained with the challenge training
dataset, outputs of the off-the-shelf systems
➢ Ranked 2nd/22
July 11th, 2013 Università di Torino, Italy 30/44
CoNLL-2003
➢
English newswire corpus
➢
Benchmark objective: entity typing
➢
System:
➢
ML-core based: SVM and NB
➢
Features = linguistic features (some of them are capitalization, 3
chars of prefix, 3 chars of suffix, POS), output of a CRF properly
trained with the challenge training dataset, output of the
off-the-shelf systems
➢
Results: outperformed significantly the performances of all
the systems (off-the-shelf) used as inputs and the Stanford
CRF properly trained with the CoNLL-2003 training corpus
July 11th, 2013 Università di Torino, Italy 31/44
TAC KBP 2011
➢ English newswire corpus
➢ Benchmark objective: entity linking
➢ System:
➢ FSM-core based
➢ Features: outputs of the off-the-shelf systems,
harmonized with the Gigaword corpus
ongoing
July 11th, 2013 Università di Torino, Italy 32/44
NERD in action
http://nerd.eurecom.fr/annotation/247957
July 11th, 2013 Università di Torino, Italy 33/44
Chapter 2:
Annotating streams of
heterogeneous data coming from
social platforms for topic
generation
July 11th, 2013 Università di Torino, Italy 34/44
The Social Web is growing fast and is becoming
of a crucial importance for research and
companies
July 11th, 2013 Università di Torino, Italy 35/44
Social Web = Big Data
Gartner “3V” definition: Volume, Velocity, Variety
of microposts
July 11th, 2013 Università di Torino, Italy 36/44
Microposts
➢ Short (~140 characters) and informal text
➢ Grammar free text
➢ Slang
➢ Media items
➢ Picture
➢ Video
July 11th, 2013 Università di Torino, Italy 37/44
Can we make sense out of the massive and
rapidly changing amount of information shared in
the Social Web?
July 11th, 2013 Università di Torino, Italy 38/44
Live topic generation
http://youtu.be/8iRiwz7cDYY
July 11th, 2013 Università di Torino, Italy 39/44
http://mediafinder.eurecom.fr
July 11th, 2013 Università di Torino, Italy 40/44
Tracking and analyzing an event
➢ 1 week period
➢ We collected microposts enclosed with pictures
➢ We followed the 2013 Italian Election
➢ We compared the results with the articles
published in those days on famous newspapers
http://youtu.be/jIMdnwMoWnk
July 11th, 2013 Università di Torino, Italy 41/44
http://mediafinder.eurecom.fr/story/elezioni2013
July 11th, 2013 Università di Torino, Italy 42/44
Outlook: an entity graph from the open and
Social Web
July 11th, 2013 Università di Torino, Italy 43/44
Thanks for your time and attention
http://www.slideshare.net/giusepperizzo
July 11th, 2013 Università di Torino, Italy 44/44
Do you have any questions?

Weitere ähnliche Inhalte

Andere mochten auch

Edisi 23 Maret2010 Nas
Edisi 23 Maret2010 NasEdisi 23 Maret2010 Nas
Edisi 23 Maret2010 Nasepaper
 
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.Deepak Ravindran
 
Edisi19mei nas
Edisi19mei nasEdisi19mei nas
Edisi19mei nasepaper
 
7jun aceh
7jun aceh7jun aceh
7jun acehepaper
 
Edisi 3 April Nas
Edisi 3 April NasEdisi 3 April Nas
Edisi 3 April Nasepaper
 
Antonio Mogollon investigacion 2
Antonio  Mogollon investigacion 2Antonio  Mogollon investigacion 2
Antonio Mogollon investigacion 2Antonio Mogollon
 
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Giuseppe Rizzo
 
Journalism Today #2 - slideshare
Journalism Today #2 -  slideshareJournalism Today #2 -  slideshare
Journalism Today #2 - slideshareJill Falk
 
Being a Pirate - TEDx Delhi
Being a Pirate - TEDx DelhiBeing a Pirate - TEDx Delhi
Being a Pirate - TEDx DelhiDeepak Ravindran
 
Edisi 1 April Nas
Edisi 1 April NasEdisi 1 April Nas
Edisi 1 April Nasepaper
 
Binder24 Nas
Binder24 NasBinder24 Nas
Binder24 Nasepaper
 
04 Mar Aceh
04 Mar Aceh04 Mar Aceh
04 Mar Acehepaper
 
Is the grass greener in ireland? A comparison of UX in Dublin and Melbourne
Is the grass greener in ireland? A comparison of UX in Dublin and MelbourneIs the grass greener in ireland? A comparison of UX in Dublin and Melbourne
Is the grass greener in ireland? A comparison of UX in Dublin and MelbourneCory-Ann Joseph
 
Themm Ryan Resume 2012
Themm Ryan Resume 2012Themm Ryan Resume 2012
Themm Ryan Resume 2012rthemm
 
Binderaceh29
Binderaceh29Binderaceh29
Binderaceh29epaper
 
11 M Ar Aceh
11 M Ar Aceh11 M Ar Aceh
11 M Ar Acehepaper
 
Anemia y embarazo
Anemia y embarazoAnemia y embarazo
Anemia y embarazojenniefer
 

Andere mochten auch (20)

Edisi 23 Maret2010 Nas
Edisi 23 Maret2010 NasEdisi 23 Maret2010 Nas
Edisi 23 Maret2010 Nas
 
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
Innoz Presentation on SMSGYAN at MIT-EmTech 2011,Bangalore.
 
Edisi19mei nas
Edisi19mei nasEdisi19mei nas
Edisi19mei nas
 
7jun aceh
7jun aceh7jun aceh
7jun aceh
 
Edisi 3 April Nas
Edisi 3 April NasEdisi 3 April Nas
Edisi 3 April Nas
 
2010 4.7
2010 4.72010 4.7
2010 4.7
 
Antonio Mogollon investigacion 2
Antonio  Mogollon investigacion 2Antonio  Mogollon investigacion 2
Antonio Mogollon investigacion 2
 
2010 table 5.1 5.4
2010 table 5.1 5.42010 table 5.1 5.4
2010 table 5.1 5.4
 
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...Learning with the Web: Spotting Named Entities on the intersection of NERD an...
Learning with the Web: Spotting Named Entities on the intersection of NERD an...
 
Journalism Today #2 - slideshare
Journalism Today #2 -  slideshareJournalism Today #2 -  slideshare
Journalism Today #2 - slideshare
 
Change
ChangeChange
Change
 
Being a Pirate - TEDx Delhi
Being a Pirate - TEDx DelhiBeing a Pirate - TEDx Delhi
Being a Pirate - TEDx Delhi
 
Edisi 1 April Nas
Edisi 1 April NasEdisi 1 April Nas
Edisi 1 April Nas
 
Binder24 Nas
Binder24 NasBinder24 Nas
Binder24 Nas
 
04 Mar Aceh
04 Mar Aceh04 Mar Aceh
04 Mar Aceh
 
Is the grass greener in ireland? A comparison of UX in Dublin and Melbourne
Is the grass greener in ireland? A comparison of UX in Dublin and MelbourneIs the grass greener in ireland? A comparison of UX in Dublin and Melbourne
Is the grass greener in ireland? A comparison of UX in Dublin and Melbourne
 
Themm Ryan Resume 2012
Themm Ryan Resume 2012Themm Ryan Resume 2012
Themm Ryan Resume 2012
 
Binderaceh29
Binderaceh29Binderaceh29
Binderaceh29
 
11 M Ar Aceh
11 M Ar Aceh11 M Ar Aceh
11 M Ar Aceh
 
Anemia y embarazo
Anemia y embarazoAnemia y embarazo
Anemia y embarazo
 

Ähnlich wie Learning with the Web. Structuring data to ease machine understanding

New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsMaría Poveda Villalón
 
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...Ghislain ATEMEZING
 
Curriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesCurriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesILOT Project
 
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarity
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarityInteractions 34: The Sorbonne Universities (SU) cluster and interdisciplinarity
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarityUniversité de Technologie de Compiègne
 
Transforming repositories: from repository managers to institutional data man...
Transforming repositories: from repository managers to institutional data man...Transforming repositories: from repository managers to institutional data man...
Transforming repositories: from repository managers to institutional data man...JISC KeepIt project
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionErik Mannens
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayEuropeana Newspapers
 
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...Paolo Nesi
 
Mmit Mobile Learning
Mmit Mobile LearningMmit Mobile Learning
Mmit Mobile Learningjontrinder
 
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesPaolo Nesi
 
Publishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesPublishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesMaría Poveda Villalón
 
SemanticXO: connecting the XO with the World’s largest information network
SemanticXO: connecting the XO with the World’s largest information networkSemanticXO: connecting the XO with the World’s largest information network
SemanticXO: connecting the XO with the World’s largest information networkChristophe Guéret
 
OpenAIRE at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...
OpenAIRE  at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...OpenAIRE  at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...
OpenAIRE at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...OpenAIRE
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondULB - Bibliothèques
 
GeoForAll: a successful OSGeo Initiative
GeoForAll: a successful OSGeo InitiativeGeoForAll: a successful OSGeo Initiative
GeoForAll: a successful OSGeo InitiativeMaria Antonia Brovelli
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesFabio Benedetti
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?Lorna Campbell
 

Ähnlich wie Learning with the Web. Structuring data to ease machine understanding (20)

New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and tools
 
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
Semantic Web Methodologies, Best Practices and Ontology Engineering Applied t...
 
Curriculum data enrichment with ontologies
Curriculum data enrichment with ontologiesCurriculum data enrichment with ontologies
Curriculum data enrichment with ontologies
 
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarity
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarityInteractions 34: The Sorbonne Universities (SU) cluster and interdisciplinarity
Interactions 34: The Sorbonne Universities (SU) cluster and interdisciplinarity
 
Transforming repositories: from repository managers to institutional data man...
Transforming repositories: from repository managers to institutional data man...Transforming repositories: from repository managers to institutional data man...
Transforming repositories: from repository managers to institutional data man...
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking Session
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
Open Data Day 2016, Km4City, L’universita’ come aggregatore di Open Data del ...
 
Mmit Mobile Learning
Mmit Mobile LearningMmit Mobile Learning
Mmit Mobile Learning
 
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
 
Publishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesPublishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of Ontologies
 
Lesson1 esa summer_school_brovelli
Lesson1 esa summer_school_brovelliLesson1 esa summer_school_brovelli
Lesson1 esa summer_school_brovelli
 
SemanticXO: connecting the XO with the World’s largest information network
SemanticXO: connecting the XO with the World’s largest information networkSemanticXO: connecting the XO with the World’s largest information network
SemanticXO: connecting the XO with the World’s largest information network
 
OpenAIRE at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...
OpenAIRE  at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...OpenAIRE  at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...
OpenAIRE at the 8th e-Infrastructure Concetration Meeting Nov 5, 2010 CERN -...
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the Pond
 
GeoForAll: a successful OSGeo Initiative
GeoForAll: a successful OSGeo InitiativeGeoForAll: a successful OSGeo Initiative
GeoForAll: a successful OSGeo Initiative
 
Online Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data SourcesOnline Index Extraction from Linked Open Data Sources
Online Index Extraction from Linked Open Data Sources
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
 

Mehr von Giuseppe Rizzo

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social goodGiuseppe Rizzo
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HRGiuseppe Rizzo
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsGiuseppe Rizzo
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your CustomersGiuseppe Rizzo
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized ChatbotGiuseppe Rizzo
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsGiuseppe Rizzo
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1Giuseppe Rizzo
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingGiuseppe Rizzo
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for TouristsGiuseppe Rizzo
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityGiuseppe Rizzo
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summaryGiuseppe Rizzo
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing AlignmentGiuseppe Rizzo
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Giuseppe Rizzo
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksGiuseppe Rizzo
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data CloudGiuseppe Rizzo
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebGiuseppe Rizzo
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataGiuseppe Rizzo
 
Zenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of DataZenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of DataGiuseppe Rizzo
 

Mehr von Giuseppe Rizzo (20)

Artificial intelligence for social good
Artificial intelligence for social goodArtificial intelligence for social good
Artificial intelligence for social good
 
AI in 60 minutes
AI in 60 minutesAI in 60 minutes
AI in 60 minutes
 
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HRCOMPRENDE, PERSONALIZZA, INTERAGISCE E  IMPARA: L’AI COGNITIVA PER L’HR
COMPRENDE, PERSONALIZZA, INTERAGISCE E IMPARA: L’AI COGNITIVA PER L’HR
 
Understand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational AgentsUnderstand, Answer and Argument: Conversational Agents
Understand, Answer and Argument: Conversational Agents
 
AI For Profiling Your Customers
AI For Profiling Your CustomersAI For Profiling Your Customers
AI For Profiling Your Customers
 
AI for Personalized Chatbot
AI for Personalized ChatbotAI for Personalized Chatbot
AI for Personalized Chatbot
 
Tourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel BookingsTourist Knowledge Graph Creation to Automating Travel Bookings
Tourist Knowledge Graph Creation to Automating Travel Bookings
 
The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1The SentiME System at the SSA Challenge Task 1
The SentiME System at the SSA Challenge Task 1
 
Context-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity LinkingContext-Enhanced Adaptive Entity Linking
Context-Enhanced Adaptive Entity Linking
 
From Data to Knowledge for Tourists
From Data to Knowledge for TouristsFrom Data to Knowledge for Tourists
From Data to Knowledge for Tourists
 
Enabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart CityEnabling Visitors to Explore a Smart City
Enabling Visitors to Explore a Smart City
 
NEEL2015 challenge summary
NEEL2015 challenge summaryNEEL2015 challenge summary
NEEL2015 challenge summary
 
Inductive Entity Typing Alignment
Inductive Entity Typing AlignmentInductive Entity Typing Alignment
Inductive Entity Typing Alignment
 
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
Benchmarking the Extraction and Disambiguation of Named Entities on the Seman...
 
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot FrameworksCrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
CrossLanguageSpotter: A Library for Detecting Relations in Polyglot Frameworks
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 
The NERD project
The NERD projectThe NERD project
The NERD project
 
L'enorme archivio di dati: il Web
L'enorme archivio di dati: il WebL'enorme archivio di dati: il Web
L'enorme archivio di dati: il Web
 
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of DataNERD: Evaluating Named Entity Recognition Tools in the Web of Data
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
 
Zenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of DataZenaminer: driving the SCORM tandard towards the Web of Data
Zenaminer: driving the SCORM tandard towards the Web of Data
 

Kürzlich hochgeladen

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Kürzlich hochgeladen (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Learning with the Web. Structuring data to ease machine understanding

  • 1. Learning with the Web Structuring data to ease machine understanding http://twitter.com/giusepperizzo
  • 2. July 11th, 2013 Università di Torino, Italy 2/44 Google Knowledge Graph Viewer
  • 3. July 11th, 2013 Università di Torino, Italy 3/44 Google Knowledge Graph
  • 4. July 11th, 2013 Università di Torino, Italy 4/44 The Google Knowledge Graph bulk: encyclopedic sources
  • 5. July 11th, 2013 Università di Torino, Italy 5/44 Web community has highlithed the road, but ...
  • 6. July 11th, 2013 Università di Torino, Italy 6/44 Vast wealth of unstructured data “80% of data on the Web and on internal corporate intranets is unstructured" “80% of data on the Web and on internal corporate intranets is unstructured” “Semantic Web and Information Extraction Workshop”, SWAIE at RANLP2013
  • 7. July 11th, 2013 Università di Torino, Italy 7/44 The entire digital universe, going to be part of the Web “unstructured data will account for 90 percent of all data created in the next decade” IDC IVIEW, “Extracting Value from Chaos”, June 2011
  • 8. July 11th, 2013 Università di Torino, Italy 8/44 Structured means making those resources available to be easily processed by machines
  • 9. July 11th, 2013 Università di Torino, Italy 9/44 A Web of Linked Entities http://wole2013.eurecom.fr http://wole2012.eurecom.fr ➢ GGG (global giant graph) http://goo.gl/fH3h ➢ Nodes are Web entities ➢ Entities provide disambiguation pointers ➢ Entities can be univocally referred (disambiguated) ➢ Entities as centroids for topic generation and undestanding
  • 10. July 11th, 2013 Università di Torino, Italy 10/44 Chapter 1: Named Entity Recognition (NER) and Named Entity Linking (NEL)
  • 11. July 11th, 2013 Università di Torino, Italy 11/44 I want to book a room in an hotel located in the heart of Paris, just a stone’s throw from the Eiffel Tower Eric Charton, “Named Entity Detection and Entity Linking in the Context of Semantic Web: Exploring the ambiguity question”
  • 12. July 11th, 2013 Università di Torino, Italy 12/44 Part of Speech I want to book a room in .. Paris PRP VBP TO VB DT NN IN .. NNP I want to book a room in .. Paris NER: What is Paris? NEL: Which Paris are we talking about?
  • 13. July 11th, 2013 Università di Torino, Italy 13/44 What is Paris? Type ambiguity asteroid location/city film
  • 14. July 11th, 2013 Università di Torino, Italy 14/44 Entity recognition I want to book a room in .. Paris PRP VBP TO VB DT NN IN .. NNP I want to book a room in .. Paris O O O O O O O .. LOC
  • 15. July 11th, 2013 Università di Torino, Italy 15/44 NER: State of the art ➢ CRFs (Conditional Random Fields) ➢ FSM (Finite-State Machine) ➢ HMM (Hidden Markov Model) ➢ Gazetteers ➢ Wikipedia/DBpedia ➢ In-house dictionaries
  • 16. July 11th, 2013 Università di Torino, Italy 16/44 Which Paris? Name ambiguity Paris, Kentucky Paris, Maine Paris, Tennessee Paris, France Paris, Ontario Paris, Idaho
  • 17. July 11th, 2013 Università di Torino, Italy 17/44 Entity linking I want to book a room in .. Paris PRP VBP TO VB DT NN IN .. NNP I want to book a room in .. Paris O O O O O O O .. LOC O O O O O O O .. http://en.wikipedia.org/wiki/Paris
  • 18. July 11th, 2013 Università di Torino, Italy 18/44 Ambiguity resolution: linking to an external knowledge base ➢ Wikipedia/DBpedia ➢ Gigaword Corpus ➢ In-house dataset ➢ LOD dataset ➢ DBLP ➢ ACM ➢ BBC ➢ ...
  • 19. July 11th, 2013 Università di Torino, Italy 19/44 NEL: State of the art ➢ Clustering ➢ Vector Space Model (Cosine similarity or Maximum Entropy) – it requires a priori knowledge of the spotted entities ➢ Conditional probability – it requires a priori knowledge of the spotted entities ➢ Dictionaries ➢ Wikipedia/DBpedia ➢ In-house dataset
  • 20. July 11th, 2013 Università di Torino, Italy 20/44 Processing natural language texts ➢ Several attempts from the Web community to structure the large wealth of data available ➢ Numerous off-the-shelf systems (commercial, and academic) that perform the NER+NEL chain ➢ AlchemyAPI ➢ DBpedia Spotlight ➢ Wikimeta ➢ TextRazor ➢ Stanford CRF ➢ ...
  • 21. July 11th, 2013 Università di Torino, Italy 21/44 The NERD initiative http://nerd.eurecom.fr
  • 22. July 11th, 2013 Università di Torino, Italy 22/44 Combination of off-the-shelf systems and properly trained CRFs
  • 23. July 11th, 2013 Università di Torino, Italy 23/44 The strength of this approach lies in the fact that the supported off-the-shelf systems have access to large knowledge bases of entities such as DBpedia and Freebase, while CRFs are domain specific
  • 24. July 11th, 2013 Università di Torino, Italy 24/44 Diversity Alchemy API DBpedia Spotlight Extractiv Lupedia Open Calais Saplo Semi Tags Wikimeta Yahoo! Zemanta Classification schema Alchemy DBpedia FreeBase Scema.org Extractiv DBpedia LinkedM DB Open Calais Saplo ConLL- 3 ESTER Yahoo FreeBase Number of classes 324 320 34 319 95 5 4 7 13 81
  • 25. July 11th, 2013 Università di Torino, Italy 25/44 NERD Ontology NERD type Occurrence Person 10 Organization 10 Country 6 Company 6 Location 6 Continent 5 City 5 RadioStation 5 Album 5 Product 5 ... ... The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data
  • 26. July 11th, 2013 Università di Torino, Italy 26/44 Learning with the Web ➢ FSM-core based ➢ combination of the NERD supported off-the-shelf systems ➢ ML-core based ➢ combination of the NERD supported off-the-shelf systems – and a CRF, properly trained with the given corpus
  • 27. July 11th, 2013 Università di Torino, Italy 27/44 Challenges and benchmark
  • 28. July 11th, 2013 Università di Torino, Italy 28/44 ETAPE 2012 - Entity Extraction Challenge ➢ French transcripts of radio and video programs ➢ Challenge objective: entity typing ➢ Sumitted system: ➢ FSM-core based ➢ Given annotation priority to the systems that have fine grained classification schemes ➢ Ranked 7th/7
  • 29. July 11th, 2013 Università di Torino, Italy 29/44 #MSM'13 - Concept Extraction Challenge ➢ English Twitter microposts ➢ Challenge objective: entity typing ➢ Submitted system: ➢ ML-core based: SVM ➢ Features = linguistic features (some of them are capitalization, 3 chars of prefix and suffix, POS), output of a CRF properly trained with the challenge training dataset, outputs of the off-the-shelf systems ➢ Ranked 2nd/22
  • 30. July 11th, 2013 Università di Torino, Italy 30/44 CoNLL-2003 ➢ English newswire corpus ➢ Benchmark objective: entity typing ➢ System: ➢ ML-core based: SVM and NB ➢ Features = linguistic features (some of them are capitalization, 3 chars of prefix, 3 chars of suffix, POS), output of a CRF properly trained with the challenge training dataset, output of the off-the-shelf systems ➢ Results: outperformed significantly the performances of all the systems (off-the-shelf) used as inputs and the Stanford CRF properly trained with the CoNLL-2003 training corpus
  • 31. July 11th, 2013 Università di Torino, Italy 31/44 TAC KBP 2011 ➢ English newswire corpus ➢ Benchmark objective: entity linking ➢ System: ➢ FSM-core based ➢ Features: outputs of the off-the-shelf systems, harmonized with the Gigaword corpus ongoing
  • 32. July 11th, 2013 Università di Torino, Italy 32/44 NERD in action http://nerd.eurecom.fr/annotation/247957
  • 33. July 11th, 2013 Università di Torino, Italy 33/44 Chapter 2: Annotating streams of heterogeneous data coming from social platforms for topic generation
  • 34. July 11th, 2013 Università di Torino, Italy 34/44 The Social Web is growing fast and is becoming of a crucial importance for research and companies
  • 35. July 11th, 2013 Università di Torino, Italy 35/44 Social Web = Big Data Gartner “3V” definition: Volume, Velocity, Variety of microposts
  • 36. July 11th, 2013 Università di Torino, Italy 36/44 Microposts ➢ Short (~140 characters) and informal text ➢ Grammar free text ➢ Slang ➢ Media items ➢ Picture ➢ Video
  • 37. July 11th, 2013 Università di Torino, Italy 37/44 Can we make sense out of the massive and rapidly changing amount of information shared in the Social Web?
  • 38. July 11th, 2013 Università di Torino, Italy 38/44 Live topic generation http://youtu.be/8iRiwz7cDYY
  • 39. July 11th, 2013 Università di Torino, Italy 39/44 http://mediafinder.eurecom.fr
  • 40. July 11th, 2013 Università di Torino, Italy 40/44 Tracking and analyzing an event ➢ 1 week period ➢ We collected microposts enclosed with pictures ➢ We followed the 2013 Italian Election ➢ We compared the results with the articles published in those days on famous newspapers http://youtu.be/jIMdnwMoWnk
  • 41. July 11th, 2013 Università di Torino, Italy 41/44 http://mediafinder.eurecom.fr/story/elezioni2013
  • 42. July 11th, 2013 Università di Torino, Italy 42/44 Outlook: an entity graph from the open and Social Web
  • 43. July 11th, 2013 Università di Torino, Italy 43/44 Thanks for your time and attention http://www.slideshare.net/giusepperizzo
  • 44. July 11th, 2013 Università di Torino, Italy 44/44 Do you have any questions?