SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
G. Futia F. Cairo F. Morando L. Leschiutta
Exploiting Linked Open Data
and Natural Language Processing for
Classification of Political Speech
Krems, 22nd
May 2014
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 2
Introduction
●
Our goal:
● assist anyone interested in automatic categorization of political
speeches, to identify unambiguously the main political trends
addressed by the White House
●
What we have to achieve our goal:
● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool:
– it leverages DBpedia knowledge base and English Wikipedia
linguistic corpus
– it exploits Linked Open Data (LOD) and Natural Language
Processing (NLP) techniques
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 3
DBpedia
● A crowd-sourced community effort to extract
structured information from Wikipedia and a
central interlinking hub for the Linking Open Data
project.
● It is a suitable knowledge base for text classification
(Mendes et al., 2012; Hellmann et al., 2013; Steinmetz
et al., 2013)
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 4
Why DBpedia for US
political speeches?
Comparison between the
coverage of US politics and the
coverage of politics of other
countries
The coverage of politics in Wikipedia is “often very good for recent or
prominent topics but is lacking on older or more obscure topics”
(Brown, 2011).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 5
Text Categorization Approach
● An instance-based approch:TellMeFirst assigns target
documents to classes based on a local comparison between
a set of pre-classified documents and the target
document itself
● This training set consists of all the Wikipedia paragraphs
where a wikilink occurs.These paragraphs are stored in a
Lucene index, where each document represents a DBpedia
resource
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 6
Success rate (%) of the TellMeFirst classification
process on US Presidents profiles
1st topic Within the
first 2 topics
Within the
first 7 topics
Full text of the Presidents profiles 95.4% 100% 100%
President profiles without name
and surname
45.4% 61.3% 90.9%
TellMeFirst provides as output the seven most relevant topics
(in the form of DBpedia URI) of the document sorted by relevance
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 7
whitehouse.gov
●
3173 videos in English were available on the White House
website on the 24th of November 2013
● These videos are categorized according to a taxonomy not
related to the subject of the speeches
● They need a semantic layer that point out the content of the
speeches, so that questions such as “what is the First Lady
talking about?” could be automatically answered
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 8
Not just a bag-of-words tool
Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right)
«President Obama Speaks on the Affordable Care Act»
http://1.usa.gov/1jR4Ky2
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 9
Results (i)
Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009
Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88%
Patient Protection and
Affordable Care Act
286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71%
American Recovery and
Reinvestment Act of 2009
278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88%
Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78%
Amount and percentage of topic
occurrences extracted with TellMeFirst
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 10
Results (ii)
● “New Deal” (141 occurrences), probably used as a metaphor
within the political speeches of President Obama
● “Libya” has a value corresponding to 1.00% in 2011.This result can
be related to the full-scale revolt beginning on 17 February 2011 in
Libya
● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This
result is related to the marine oil spill which took place in the Gulf
of Mexico that began on 20 april 2010
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 11
Correlation among topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 12
A focus on the First Lady (i)
● According to Michelle Obama’s page on the White House
website, the First Lady “looks forward to continuing her work
on the issues close to her heart”:
● supporting military families
● helping working women balance career and family
encouraging national service
● promoting the arts and arts education
● fostering healthy eating and healthy living for children and
families across the country
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 13
A focus on the First Lady (ii)
● We tested whether TellMeFirst confirms or not these
impressions and claims, manually selecting nine Wikipedia
categories which seemed to be related to these issues
● We then interrogated the SPARQL end-point of DBpedia with
a query to collect all the topics of these categories
●
We then associated each topic to one or more of the nine
high-level categories: these categories encompassed
almost 75% of the topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 14
A focus on the First Lady (iii)
Wikipedia Category First Lady sp.
9 categories
All speeches
9 categories
Government of the United States 26.68% 32.68%
Education 21.64% 5.40%
Nutrition 19.96% 1.61%
Social issues 14.71% 28.38%
Barack Obama 13.66% 14.00%
Health care 11.34% 7.57%
Arts 8.61% 1.11%
Military personnel 3.99% 3.16%
Gender equality 2.73% 0.84%
Others (unclassified topics) 25.63% 38.34%
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 15
Conclusions (i)
● The ability for citizens to easily retrieve the content of political
speeches and decisions is a crucial factor in e-participation
● Not guaranteed by a traditional keywords search, as in
most of the public administration websites (the White
House website included)
● Example: in a keyword-based system, by typing the word
"education", for instance, users get as result only videos that
have the word education in their title
● All terms that belong to the semantic area of education
are omitted
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 16
Conclusions (ii)
● When documents are semantically classified through
DBpedia URIs all synonyms, hypernyms and hyponyms of
lemmas are traced to the same concept making
user search more effective
● Leveraging Wikipedia categories would allow to go
even a step further, taking advantage of the links
between concepts as designed by the Wikipedia
community
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 17
Next steps
● Building a content search/navigation layer around the
scraping/classification module
● Integration with other Linked Open Data repositories on the
Web, combining the extracted topics with other information
(President Obama's federal budget proposal?)
Thank you!
Giuseppe Futia (giuseppe.futia@polito.it)
This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in
particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 19
Appendix - The algorithm
●
The classifier needs to hold in memory all the instances of the
training set and calculate, during classification stage, the vector
distance between training documents and target documents.
● Specifically, the algorithm used by TMF is k-Nearest Neighbor
(kNN), a type of memory-based approach which selects the
categories for a target document on the basis of the k most
similar documents within the vector space.
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 20
Appendix - Scoring formula
●
In a Lucene query, both the target document and the training
set become weighed terms vectors, where terms are weighted
by means of the TF-IDF algorithm.The query returns a list of
documents in the form of DBpedia URIs, ordered by similarity
score. Scoring formula is:
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 21
Appendix - Basic concepts
● Natural Language Processing - A field of computer science,
concerned with the interactions between computers and human
(natural) languages.
● Linked Data - A recommended best practice for exposing, sharing,
and connecting pieces of data, information, and knowledge on the
Semantic Web using URIs and RDF.
● DBpedia - A crowd-sourced community effort to extract
structured information from Wikipedia and a central interlinking hub
for the Linking Open Data project.

Weitere ähnliche Inhalte

Andere mochten auch (12)

2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
 
imtiyaz cv'16
imtiyaz cv'16imtiyaz cv'16
imtiyaz cv'16
 
HND Rayner Meyer 1
HND Rayner Meyer 1HND Rayner Meyer 1
HND Rayner Meyer 1
 
Piri Point
Piri PointPiri Point
Piri Point
 
2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド2015年JSET全国大会 SIG-05 SIGセッションスライド
2015年JSET全国大会 SIG-05 SIGセッションスライド
 
PoonamMalhotra_CV
PoonamMalhotra_CVPoonamMalhotra_CV
PoonamMalhotra_CV
 
fannie mae 2005 Form 10-K
fannie mae 2005 Form 10-Kfannie mae 2005 Form 10-K
fannie mae 2005 Form 10-K
 
intel First Quarter 2008
intel First Quarter 2008 intel First Quarter 2008
intel First Quarter 2008
 
Montaggio Doccia Chiocciola
Montaggio Doccia ChiocciolaMontaggio Doccia Chiocciola
Montaggio Doccia Chiocciola
 
vijesh resume
vijesh resumevijesh resume
vijesh resume
 
sprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rdsprint nextel Quarterly Results 2007 3rd
sprint nextel Quarterly Results 2007 3rd
 
Lineárny park Petržalka
Lineárny park PetržalkaLineárny park Petržalka
Lineárny park Petržalka
 

Ähnlich wie Cedem futia-2014

Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to Deliberation
Prof. Alexander Gerber
 
Cooperation needs on Field Operational Tests
Cooperation needs on Field Operational TestsCooperation needs on Field Operational Tests
Cooperation needs on Field Operational Tests
euroFOT
 
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritusUNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
Ed Dodds
 
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP - IE Dissertation Support Project
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 

Ähnlich wie Cedem futia-2014 (20)

A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
 
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
Civic Monitoring - the example of the Italian open finance platforms OpenCoes...
 
MOOC Strategies in Higher Education
MOOC Strategies in Higher EducationMOOC Strategies in Higher Education
MOOC Strategies in Higher Education
 
Scanning for emerging s&t issues
Scanning for emerging s&t issuesScanning for emerging s&t issues
Scanning for emerging s&t issues
 
OpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG AlbanyOpenCoesione Monithon at CTG Albany
OpenCoesione Monithon at CTG Albany
 
Science & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to DeliberationScience & Society -- From Dissemination to Deliberation
Science & Society -- From Dissemination to Deliberation
 
G4 report from breakout group 4
G4 report from breakout group 4G4 report from breakout group 4
G4 report from breakout group 4
 
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
SocialUniversity:How Do Universities Use Social Media? An Empirical Survey of...
 
Open Data in and from schools
Open Data in and from schoolsOpen Data in and from schools
Open Data in and from schools
 
The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4The Role ok TAG and extend TAG in SDG 4
The Role ok TAG and extend TAG in SDG 4
 
Open learning- Text analysis basics
Open learning- Text analysis basicsOpen learning- Text analysis basics
Open learning- Text analysis basics
 
Cooperation needs on Field Operational Tests
Cooperation needs on Field Operational TestsCooperation needs on Field Operational Tests
Cooperation needs on Field Operational Tests
 
The Role of Technical Advisory Group
The Role of Technical Advisory GroupThe Role of Technical Advisory Group
The Role of Technical Advisory Group
 
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop ReportResearch Policy Monitoring in the Era of Open Science & Big Data Workshop Report
Research Policy Monitoring in the Era of Open Science & Big Data Workshop Report
 
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...Virtual meeting on the COVID-19 response in the area of Communication and Hum...
Virtual meeting on the COVID-19 response in the area of Communication and Hum...
 
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritusUNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
UNESCO , ICT and the Millennium Institute - Tapio Varis, professor emeritus
 
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM Michela Insenga: 1.3)	INSTEM – Innovation Network in STEM
Michela Insenga: 1.3) INSTEM – Innovation Network in STEM
 
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
IDSP19C#F - B - Mingjun Lan - Updated - What ideologies and realities can be ...
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
NEGOTIATING THE ANALOG MAINSTREAM WITH DIGITAL METHODS IN HAND VISIONS FROM T...
 

Mehr von Danube University Krems, Centre for E-Governance

Mehr von Danube University Krems, Centre for E-Governance (20)

Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17Smart Cities workshop at CeDEM17
Smart Cities workshop at CeDEM17
 
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
 
#CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities #CeDEM17 - Financial Payments and Smart Cities
#CeDEM17 - Financial Payments and Smart Cities
 
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects#CeDEM2017 Smart Cities of Self-Determined Data Subjects
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
 
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...Open Data as Enabler of Public Service Co-creation:Exploring the Drivers and...
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
 
DatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP projectDatalEt-Ecosystem Provider - The DEEP project
DatalEt-Ecosystem Provider - The DEEP project
 
Towards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice systemTowards Open Justice: ICT acceptance in the Greek justice system
Towards Open Justice: ICT acceptance in the Greek justice system
 
[X]CHANGING PERSPECTIVES
[X]CHANGING PERSPECTIVES[X]CHANGING PERSPECTIVES
[X]CHANGING PERSPECTIVES
 
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Using fuzzy cognitive maps as decision support tool for smart cities  goraczekUsing fuzzy cognitive maps as decision support tool for smart cities  goraczek
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
 
Understanding of smartphone divide dal yong
Understanding of smartphone divide  dal yongUnderstanding of smartphone divide  dal yong
Understanding of smartphone divide dal yong
 
The motivations behind open access publishing judith schossboeck
The motivations behind open access publishing  judith schossboeckThe motivations behind open access publishing  judith schossboeck
The motivations behind open access publishing judith schossboeck
 
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwakSocial media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
 
Social media and citizen engagement in asia skoric
Social media and citizen engagement in asia  skoricSocial media and citizen engagement in asia  skoric
Social media and citizen engagement in asia skoric
 
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulosRealizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
 
Post 2015 paris c limate conference politics on the internet manuela hartwig
Post 2015 paris c limate conference politics on the internet  manuela hartwigPost 2015 paris c limate conference politics on the internet  manuela hartwig
Post 2015 paris c limate conference politics on the internet manuela hartwig
 
Open government and national sovereignty ivo babaja
Open government and national sovereignty  ivo babajaOpen government and national sovereignty  ivo babaja
Open government and national sovereignty ivo babaja
 
Health r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chungHealth r isk communication in the digital era myojung chung
Health r isk communication in the digital era myojung chung
 
An analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigoAn analysis of japanese local government facebook profiles muneo kaigo
An analysis of japanese local government facebook profiles muneo kaigo
 
GovCamp 2016 - Co-Creation
GovCamp 2016 - Co-CreationGovCamp 2016 - Co-Creation
GovCamp 2016 - Co-Creation
 
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
 

Kürzlich hochgeladen

{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
hyt3577
 

Kürzlich hochgeladen (20)

WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptx
 
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's DevelopmentNara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
 
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
 
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Julius Randle's Injury Status: Surgery Not Off the Table
Julius Randle's Injury Status: Surgery Not Off the TableJulius Randle's Injury Status: Surgery Not Off the Table
Julius Randle's Injury Status: Surgery Not Off the Table
 
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Indirapuram Escorts >༒8448380779 Escort Service
 
02052024_First India Newspaper Jaipur.pdf
02052024_First India Newspaper Jaipur.pdf02052024_First India Newspaper Jaipur.pdf
02052024_First India Newspaper Jaipur.pdf
 
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkoEmbed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
 
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 135 Noida Escorts >༒8448380779 Escort Service
 
declarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdfdeclarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdf
 
Embed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdhEmbed-4.pdf lkdiinlajeklhndklheduhuekjdh
Embed-4.pdf lkdiinlajeklhndklheduhuekjdh
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
 
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover BackVerified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
Verified Love Spells in Little Rock, AR (310) 882-6330 Get My Ex-Lover Back
 
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Greater Noida Escorts >༒8448380779 Escort Service
 

Cedem futia-2014

  • 1. G. Futia F. Cairo F. Morando L. Leschiutta Exploiting Linked Open Data and Natural Language Processing for Classification of Political Speech Krems, 22nd May 2014
  • 2. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 2 Introduction ● Our goal: ● assist anyone interested in automatic categorization of political speeches, to identify unambiguously the main political trends addressed by the White House ● What we have to achieve our goal: ● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool: – it leverages DBpedia knowledge base and English Wikipedia linguistic corpus – it exploits Linked Open Data (LOD) and Natural Language Processing (NLP) techniques
  • 3. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 3 DBpedia ● A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project. ● It is a suitable knowledge base for text classification (Mendes et al., 2012; Hellmann et al., 2013; Steinmetz et al., 2013)
  • 4. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 4 Why DBpedia for US political speeches? Comparison between the coverage of US politics and the coverage of politics of other countries The coverage of politics in Wikipedia is “often very good for recent or prominent topics but is lacking on older or more obscure topics” (Brown, 2011).
  • 5. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 5 Text Categorization Approach ● An instance-based approch:TellMeFirst assigns target documents to classes based on a local comparison between a set of pre-classified documents and the target document itself ● This training set consists of all the Wikipedia paragraphs where a wikilink occurs.These paragraphs are stored in a Lucene index, where each document represents a DBpedia resource
  • 6. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 6 Success rate (%) of the TellMeFirst classification process on US Presidents profiles 1st topic Within the first 2 topics Within the first 7 topics Full text of the Presidents profiles 95.4% 100% 100% President profiles without name and surname 45.4% 61.3% 90.9% TellMeFirst provides as output the seven most relevant topics (in the form of DBpedia URI) of the document sorted by relevance
  • 7. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 7 whitehouse.gov ● 3173 videos in English were available on the White House website on the 24th of November 2013 ● These videos are categorized according to a taxonomy not related to the subject of the speeches ● They need a semantic layer that point out the content of the speeches, so that questions such as “what is the First Lady talking about?” could be automatically answered
  • 8. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 8 Not just a bag-of-words tool Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right) «President Obama Speaks on the Affordable Care Act» http://1.usa.gov/1jR4Ky2
  • 9. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 9 Results (i) Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009 Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88% Patient Protection and Affordable Care Act 286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71% American Recovery and Reinvestment Act of 2009 278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88% Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78% Amount and percentage of topic occurrences extracted with TellMeFirst
  • 10. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 10 Results (ii) ● “New Deal” (141 occurrences), probably used as a metaphor within the political speeches of President Obama ● “Libya” has a value corresponding to 1.00% in 2011.This result can be related to the full-scale revolt beginning on 17 February 2011 in Libya ● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This result is related to the marine oil spill which took place in the Gulf of Mexico that began on 20 april 2010
  • 11. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 11 Correlation among topics
  • 12. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 12 A focus on the First Lady (i) ● According to Michelle Obama’s page on the White House website, the First Lady “looks forward to continuing her work on the issues close to her heart”: ● supporting military families ● helping working women balance career and family encouraging national service ● promoting the arts and arts education ● fostering healthy eating and healthy living for children and families across the country
  • 13. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 13 A focus on the First Lady (ii) ● We tested whether TellMeFirst confirms or not these impressions and claims, manually selecting nine Wikipedia categories which seemed to be related to these issues ● We then interrogated the SPARQL end-point of DBpedia with a query to collect all the topics of these categories ● We then associated each topic to one or more of the nine high-level categories: these categories encompassed almost 75% of the topics
  • 14. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 14 A focus on the First Lady (iii) Wikipedia Category First Lady sp. 9 categories All speeches 9 categories Government of the United States 26.68% 32.68% Education 21.64% 5.40% Nutrition 19.96% 1.61% Social issues 14.71% 28.38% Barack Obama 13.66% 14.00% Health care 11.34% 7.57% Arts 8.61% 1.11% Military personnel 3.99% 3.16% Gender equality 2.73% 0.84% Others (unclassified topics) 25.63% 38.34%
  • 15. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 15 Conclusions (i) ● The ability for citizens to easily retrieve the content of political speeches and decisions is a crucial factor in e-participation ● Not guaranteed by a traditional keywords search, as in most of the public administration websites (the White House website included) ● Example: in a keyword-based system, by typing the word "education", for instance, users get as result only videos that have the word education in their title ● All terms that belong to the semantic area of education are omitted
  • 16. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 16 Conclusions (ii) ● When documents are semantically classified through DBpedia URIs all synonyms, hypernyms and hyponyms of lemmas are traced to the same concept making user search more effective ● Leveraging Wikipedia categories would allow to go even a step further, taking advantage of the links between concepts as designed by the Wikipedia community
  • 17. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 17 Next steps ● Building a content search/navigation layer around the scraping/classification module ● Integration with other Linked Open Data repositories on the Web, combining the extracted topics with other information (President Obama's federal budget proposal?)
  • 18. Thank you! Giuseppe Futia (giuseppe.futia@polito.it) This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
  • 19. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 19 Appendix - The algorithm ● The classifier needs to hold in memory all the instances of the training set and calculate, during classification stage, the vector distance between training documents and target documents. ● Specifically, the algorithm used by TMF is k-Nearest Neighbor (kNN), a type of memory-based approach which selects the categories for a target document on the basis of the k most similar documents within the vector space.
  • 20. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 20 Appendix - Scoring formula ● In a Lucene query, both the target document and the training set become weighed terms vectors, where terms are weighted by means of the TF-IDF algorithm.The query returns a list of documents in the form of DBpedia URIs, ordered by similarity score. Scoring formula is:
  • 21. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 21 Appendix - Basic concepts ● Natural Language Processing - A field of computer science, concerned with the interactions between computers and human (natural) languages. ● Linked Data - A recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. ● DBpedia - A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project.