SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
A Descriptive Analysis of Twitter Activity Around
Boston Terror Attacks
Álvaro Cuesta David F. Barrero María D. R-Moreno
Computer Engineering Department
Universidad de AlcalĂĄ, Spain
ICCCI 2013
Craiova, Romania
September 11, 2013
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Summary
1 Introduction
Motivation
Objectives
Case studies
2 Framework
Framework overview
Framework messaging
Framework components
3 Sentiment analysis
Overview
ClassiïŹer
4 Case studies
Boston Terror Attack
Political analysis
5 Conclusions and future work
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Motivation
Great expansion of social networks in the last
years
One of the most successfull ones is Twitter
Microblogging platform
Short messages known as tweets
Open nature
Twitter oïŹ€ers great research opportunities
Open nature
Distributed human sensor network
Easy data extraction, diïŹƒcult data
processing
Twitter + sentiment analysis
Lack of tools for sentiment analysis in
Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Objectives
Twitter oïŹ€ers excelent API ... however there is a need of some
infraestructure (mainly storage and reporting)
Objectives
1 Develop a framework for Twitter data extraction and analysis
2 Provide reporting tools
3 Foundation for sentiment analysis in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Case studies
In order to assess the framework, we have included two study
cases
Event driven - Boston terror attack
Regular usage - Political activity on Twitter in Spanish
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Overview
Requirements
Easy to use, extensible, massive data processing
Design decisions
Modular design: Collection of independent scripts
Focus on open data formats
Built around the database: MongoDB
Set of independent scripts interchanging data
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework messaging
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Miner
Miner
Extracts and stores
tweets
Stream API
Several ïŹlters
Written in Python
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Database
Database
Storage for futher
processing
MongoDB
NoSQL database
High performance
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Reporting
Reporting
CSV export for
futher processing
R processing
Extensibility
Powerful libraries
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework components
Framework architecture
Framework components: Sentiment analysis
Sentiment analysis
Supervised learning
Need of labeling
Tools for labeling
ClassiïŹer building
ClassiïŹer testing
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
ClassiïŹer
Sentiment analysis
Overview
Supervised learning with Natural Language Toolkit (NLTK)
Three classes: “Positive”, “negative” and “neutral”
Need of labeled corpus
Several ones in English ...
... none in Spanish
Need of thousands manually classiïŹed tweets
Collaborative labeling
Web application to label tweets
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
ClassiïŹer
Sentiment analysis
ClassiïŹer
NaĂŻve Bayes classiïŹer
Stop words removed
Some parameters to set
Optimus parameter setting depends on the dataset
Need of classiïŹer evaluation
Tester
Cross validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack
Main objective
Evaluate the platform
Secondary objective
Describe activity around an event
Stream by string ïŹlter
The event
Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston
Internet witch-hunt motivated by the release of some photos
Shooting and manhunt
Data adquisition
Begin: Tue, 16 Apr 2013 00:43 (GMT)
End: Tue, 23 Apr 2013 00:43 (GMT)
Filter: “Maratón de Boston” (Boston Marathon in Spanish)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Boston Terror Attack: Dataset description
Value Relative Average
Tweets 28,892 1.16/user
No-retweets 16,029 55.48 %
Reweets 12,863 44.52 %
Geolocalized 255 0.88 %
Users 24,989
Mentions 18,937 65.54 %
Replies 849 2.94 %
Non-replies 18,088 62.61 %
Size 96.39 MB 3.38 KB/tweet
Index size 0.91 MB
Disk 132.99 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
Case study
Boston Terror attack: activity
Apr 17 Apr 19 Apr 21 Apr 23
010002500
Time
Tweets
Tweets
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Non−retweets
Tweets (excluding RTs)
Apr 17 Apr 19 Apr 21 Apr 23
04001000
Time
Retweets
Retweets
Dashed line: Bombing
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
Case study
Boston Terror attack: activity
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
50150
Time
Tweets
Tweets
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
2060120
Time
Non−retweets
Tweets (excluding RTs)
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
0204060
Time
Retweets
Retweets
Dotted line: Photo release
Solid line: Shooting
Gray background: Manhunt
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Overview
Main objective
Evaluate sentiment analysis
Secondary objective
Describe regular Twitter activity
Stream by user ïŹlter
Selection of Spanish political actors
Selected by activity and controversy
Account owner Accounts
Political party @PPopular, @PSOE, @iunida, @UPyD
Politician @agarzon, @EduMadina, @ToniCanto1, @Re-
villaMiguelA, @ccifuentes, @_Rubalcaba_
Journalist @jordievole, @iescolar
Activist organization @LA_PAH
Data adquisition
From Tue, 16 Apr 2013 00:00 (GMT)
End: 18 Apr 2013 04:00 (GMT)
Filter: Account name (“@account”)
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Dataset description
Value Relative Average
Tweets 65,043 1.9/user
No-retweets 28,175 43.32 %
Reweets 36,868 56.68 %
Geolocalized 528 0.81 %
Users 34,195
Mentions 56,713 87.19 %
Non-replies 46,981 72.23 %
Replies 9,732 14.96 %
Size 227.51 MB 3.58 KB/tweet
Index size 2.05 MB
Disk 237.95 MB
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
Case study
Political analysis: Activity
Tue Wed Thu
015003500
Time
Tweets
Tweets
Tue Wed Thu
05001500
Time
Non−retweets
Tweets (excluding RTs)
Tue Wed Thu
010002000
Time
Retweets
Retweets
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
9, 884 tweets were manually classiïŹed in a collaborative way
4, 739 non-neutral tweets
1, 062 positives, 3, 677 negatives
Unbalanced dataset
We tried several parameters for the NaĂŻve Bayes classiïŹer
N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3}
Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10
10-fold cross-validation
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Political analysis: Sentiment analysis
Accuracy
NaiveBayes-1_2-min3 0.8543
NaiveBayes-1-min3 0.8510
NaiveBayes-1_3-min3 0.8507
NaiveBayes-1-min4 0.8476
NaiveBayes-1_3-min5 0.8474
NaiveBayes-1_2-min4 0.8469
NaiveBayes-1_3-min4 0.8467
NaiveBayes-1_3-min1 0.8459
NaiveBayes-1-min6 0.8452
NaiveBayes-1-min1 0.8448
NaiveBayes-1_2-min5 0.8446
NaiveBayes-1_3-min6 0.8438
NaiveBayes-1_2-min6 0.8436
NaiveBayes-1-min5 0.8406
NaiveBayes-1_2-min1 0.8389
NaiveBayes-2_3-min6 0.8385
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
Case study
Political analysis: Normalized sentiment
Tue Wed Thu
0.00.20.40.60.81.0
Time
Positive
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Conclusions and future work
We developed a framework that eases data extraction and
analysis on Twitter
Ready for production
It will be released soon with a free licence
We brieïŹ‚y described two case studies
Event driven activity - Boston terror attacks
Regular activity - Political activity
Sentiment analysis is intrinsically diïŹƒcult
Future work
Lemmalization
Natural language processing
Time series analysis
ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
Thanks for your attention!
David F. Barrero
david@aut.uah.es
@dfbarrero

Weitere Àhnliche Inhalte

Andere mochten auch

Temas para tesis de diseño gråfico
Temas para tesis de diseño gråfico Temas para tesis de diseño gråfico
Temas para tesis de diseño gråfico bastiano10
 
Temas de diseño grafico
Temas de diseño graficoTemas de diseño grafico
Temas de diseño graficorapero1115
 
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMemoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMaximiliano Vilchez
 
Resumen de propuestas de tesis
Resumen de propuestas de tesisResumen de propuestas de tesis
Resumen de propuestas de tesisEdgardo Vegega
 
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCMemoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCRodrigo Moren Pizarro
 
Diseño Gråfico Digital en Software Libre v3.21
Diseño Gråfico Digital en Software Libre v3.21Diseño Gråfico Digital en Software Libre v3.21
Diseño Gråfico Digital en Software Libre v3.21Leonardo J. Caballero G.
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Memorias descriptivas
Memorias descriptivasMemorias descriptivas
Memorias descriptivaszonibri
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 

Andere mochten auch (10)

Memoria descriptiva maqueta sobre burgos
Memoria descriptiva maqueta sobre burgosMemoria descriptiva maqueta sobre burgos
Memoria descriptiva maqueta sobre burgos
 
Temas para tesis de diseño gråfico
Temas para tesis de diseño gråfico Temas para tesis de diseño gråfico
Temas para tesis de diseño gråfico
 
Temas de diseño grafico
Temas de diseño graficoTemas de diseño grafico
Temas de diseño grafico
 
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/LinuxMemoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
Memoria descriptiva aplicada el rediseño del sitio web Canaima GNU/Linux
 
Resumen de propuestas de tesis
Resumen de propuestas de tesisResumen de propuestas de tesis
Resumen de propuestas de tesis
 
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUCMemoria Proyecto de Título Diseño Industrial - Referencial DuocUC
Memoria Proyecto de Título Diseño Industrial - Referencial DuocUC
 
Diseño Gråfico Digital en Software Libre v3.21
Diseño Gråfico Digital en Software Libre v3.21Diseño Gråfico Digital en Software Libre v3.21
Diseño Gråfico Digital en Software Libre v3.21
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Memorias descriptivas
Memorias descriptivasMemorias descriptivas
Memorias descriptivas
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 

Ähnlich wie Presentacion

Identification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaIdentification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaHila Becker
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
sun2021.pdf
sun2021.pdfsun2021.pdf
sun2021.pdfJayaramB11
 
Changing trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementChanging trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementMunesh Kumar
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20Monisha100
 
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaCredibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaIIIT Hyderabad
 
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Sebastian Dennerlein
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...University of Groningen (The Netherlands)
 
Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Wylliams Santos
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksIRJET Journal
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisZina Petrushyna
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresCybersecurity Education and Research Centre
 
Microblogging meets politics
Microblogging meets politicsMicroblogging meets politics
Microblogging meets politicsGabriela Grosseck
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network AnalysisMarc Smith
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Anthony Fisher Camilleri
 
Berlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannBerlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannCornelius Puschmann
 
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...Danube University Krems, Centre for E-Governance
 

Ähnlich wie Presentacion (20)

Identification and Characterization of Events in Social Media
Identification and Characterization of Events in Social MediaIdentification and Characterization of Events in Social Media
Identification and Characterization of Events in Social Media
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
sun2021.pdf
sun2021.pdfsun2021.pdf
sun2021.pdf
 
Changing trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurementChanging trends in citation analysis and challenges in API measurement
Changing trends in citation analysis and challenges in API measurement
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaCredibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
 
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
Web 2.0 Messaging Tools for Knowledge Management? Exploring the Potentials of...
 
Amia now! session one
Amia now! session oneAmia now! session one
Amia now! session one
 
Amia now! session one
Amia now! session oneAmia now! session one
Amia now! session one
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...Multivocal literature reviews in software engineering: preliminary findings f...
Multivocal literature reviews in software engineering: preliminary findings f...
 
Spammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social NetworksSpammer Detection and Fake User Identification on Social Networks
Spammer Detection and Fake User Identification on Social Networks
 
A Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data AnalysisA Near-Real Time Application for Twitter Data Analysis
A Near-Real Time Application for Twitter Data Analysis
 
Exploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasuresExploration of gaps in Bitly's spam detection and relevant countermeasures
Exploration of gaps in Bitly's spam detection and relevant countermeasures
 
Microblogging meets politics
Microblogging meets politicsMicroblogging meets politics
Microblogging meets politics
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
Blockchain for Education: A Study on Digital Accreditation of Personal and Ac...
 
Berlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram HorstmannBerlin 6 Open Access Conference: Wolfram Horstmann
Berlin 6 Open Access Conference: Wolfram Horstmann
 
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
Anja Adler – Liquid Democracy-Norm, Code and Developers of Democracy beyond R...
 

KĂŒrzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

KĂŒrzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Presentacion

  • 1. Introduction Framework Sentiment analysis Case studies Conclusions A Descriptive Analysis of Twitter Activity Around Boston Terror Attacks Álvaro Cuesta David F. Barrero MarĂ­a D. R-Moreno Computer Engineering Department Universidad de AlcalĂĄ, Spain ICCCI 2013 Craiova, Romania September 11, 2013 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
  • 2. Introduction Framework Sentiment analysis Case studies Conclusions Summary 1 Introduction Motivation Objectives Case studies 2 Framework Framework overview Framework messaging Framework components 3 Sentiment analysis Overview ClassiïŹer 4 Case studies Boston Terror Attack Political analysis 5 Conclusions and future work ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
  • 3. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Motivation Great expansion of social networks in the last years One of the most successfull ones is Twitter Microblogging platform Short messages known as tweets Open nature Twitter oïŹ€ers great research opportunities Open nature Distributed human sensor network Easy data extraction, diïŹƒcult data processing Twitter + sentiment analysis Lack of tools for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
  • 4. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Objectives Twitter oïŹ€ers excelent API ... however there is a need of some infraestructure (mainly storage and reporting) Objectives 1 Develop a framework for Twitter data extraction and analysis 2 Provide reporting tools 3 Foundation for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
  • 5. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Case studies In order to assess the framework, we have included two study cases Event driven - Boston terror attack Regular usage - Political activity on Twitter in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
  • 6. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Overview Requirements Easy to use, extensible, massive data processing Design decisions Modular design: Collection of independent scripts Focus on open data formats Built around the database: MongoDB Set of independent scripts interchanging data ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
  • 7. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework messaging ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
  • 8. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Miner Miner Extracts and stores tweets Stream API Several ïŹlters Written in Python ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
  • 9. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Database Database Storage for futher processing MongoDB NoSQL database High performance ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
  • 10. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Reporting Reporting CSV export for futher processing R processing Extensibility Powerful libraries ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
  • 11. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Sentiment analysis Sentiment analysis Supervised learning Need of labeling Tools for labeling ClassiïŹer building ClassiïŹer testing ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
  • 12. Introduction Framework Sentiment analysis Case studies Conclusions Overview ClassiïŹer Sentiment analysis Overview Supervised learning with Natural Language Toolkit (NLTK) Three classes: “Positive”, “negative” and “neutral” Need of labeled corpus Several ones in English ... ... none in Spanish Need of thousands manually classiïŹed tweets Collaborative labeling Web application to label tweets ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
  • 13. Introduction Framework Sentiment analysis Case studies Conclusions Overview ClassiïŹer Sentiment analysis ClassiïŹer NaĂŻve Bayes classiïŹer Stop words removed Some parameters to set Optimus parameter setting depends on the dataset Need of classiïŹer evaluation Tester Cross validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
  • 14. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack Main objective Evaluate the platform Secondary objective Describe activity around an event Stream by string ïŹlter The event Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston Internet witch-hunt motivated by the release of some photos Shooting and manhunt Data adquisition Begin: Tue, 16 Apr 2013 00:43 (GMT) End: Tue, 23 Apr 2013 00:43 (GMT) Filter: “MaratĂłn de Boston” (Boston Marathon in Spanish) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
  • 15. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack: Dataset description Value Relative Average Tweets 28,892 1.16/user No-retweets 16,029 55.48 % Reweets 12,863 44.52 % Geolocalized 255 0.88 % Users 24,989 Mentions 18,937 65.54 % Replies 849 2.94 % Non-replies 18,088 62.61 % Size 96.39 MB 3.38 KB/tweet Index size 0.91 MB Disk 132.99 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
  • 16. Case study Boston Terror attack: activity Apr 17 Apr 19 Apr 21 Apr 23 010002500 Time Tweets Tweets Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Non−retweets Tweets (excluding RTs) Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Retweets Retweets Dashed line: Bombing Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  • 17. Case study Boston Terror attack: activity Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 50150 Time Tweets Tweets Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 2060120 Time Non−retweets Tweets (excluding RTs) Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 0204060 Time Retweets Retweets Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  • 18. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Overview Main objective Evaluate sentiment analysis Secondary objective Describe regular Twitter activity Stream by user ïŹlter Selection of Spanish political actors Selected by activity and controversy Account owner Accounts Political party @PPopular, @PSOE, @iunida, @UPyD Politician @agarzon, @EduMadina, @ToniCanto1, @Re- villaMiguelA, @ccifuentes, @_Rubalcaba_ Journalist @jordievole, @iescolar Activist organization @LA_PAH Data adquisition From Tue, 16 Apr 2013 00:00 (GMT) End: 18 Apr 2013 04:00 (GMT) Filter: Account name (“@account”) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
  • 19. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Dataset description Value Relative Average Tweets 65,043 1.9/user No-retweets 28,175 43.32 % Reweets 36,868 56.68 % Geolocalized 528 0.81 % Users 34,195 Mentions 56,713 87.19 % Non-replies 46,981 72.23 % Replies 9,732 14.96 % Size 227.51 MB 3.58 KB/tweet Index size 2.05 MB Disk 237.95 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
  • 20. Case study Political analysis: Activity Tue Wed Thu 015003500 Time Tweets Tweets Tue Wed Thu 05001500 Time Non−retweets Tweets (excluding RTs) Tue Wed Thu 010002000 Time Retweets Retweets
  • 21. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis 9, 884 tweets were manually classiïŹed in a collaborative way 4, 739 non-neutral tweets 1, 062 positives, 3, 677 negatives Unbalanced dataset We tried several parameters for the NaĂŻve Bayes classiïŹer N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3} Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10 10-fold cross-validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
  • 22. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis Accuracy NaiveBayes-1_2-min3 0.8543 NaiveBayes-1-min3 0.8510 NaiveBayes-1_3-min3 0.8507 NaiveBayes-1-min4 0.8476 NaiveBayes-1_3-min5 0.8474 NaiveBayes-1_2-min4 0.8469 NaiveBayes-1_3-min4 0.8467 NaiveBayes-1_3-min1 0.8459 NaiveBayes-1-min6 0.8452 NaiveBayes-1-min1 0.8448 NaiveBayes-1_2-min5 0.8446 NaiveBayes-1_3-min6 0.8438 NaiveBayes-1_2-min6 0.8436 NaiveBayes-1-min5 0.8406 NaiveBayes-1_2-min1 0.8389 NaiveBayes-2_3-min6 0.8385 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
  • 23. Case study Political analysis: Normalized sentiment Tue Wed Thu 0.00.20.40.60.81.0 Time Positive
  • 24. Introduction Framework Sentiment analysis Case studies Conclusions Conclusions and future work We developed a framework that eases data extraction and analysis on Twitter Ready for production It will be released soon with a free licence We brieïŹ‚y described two case studies Event driven activity - Boston terror attacks Regular activity - Political activity Sentiment analysis is intrinsically diïŹƒcult Future work Lemmalization Natural language processing Time series analysis ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
  • 25. Thanks for your attention! David F. Barrero david@aut.uah.es @dfbarrero