SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Social Media and Digital
Volunteering in Disaster
Management
Carlos Castillo
Universitat Pompeu Fabra
Data Science for Emergency Management Workshop
Co-Located with IEEE Big Data. Boston, MA, US, Dec. 2018.
www.bigcrisisdata.org
Research topics in this space just my opinion
2
Crowded
Event detection
Isolated messages
Hashtag-based collection
Crowd labeling
Isolated quality aspects
“Over-the-fence” transfer
Having some elbow room
“Big picture” inferences
Conversational streams
Adaptive information filtering
Participatory mining
Holistic content quality
Interdisciplinary research
Carlos Castillo — www.bigcrisisdata.org
From "actionable insights"
to the "big picture"
3Carlos Castillo — www.bigcrisisdata.org
"Big picture" questions
Understanding general parameters of a disaster
How many people were affected?
(Number of displaced, injured, dead)
How much will the disaster cost?
(Damaged or destroyed infrastructure)
How many resource units need to be mobilized?
(Beds in emergency shelters, tons of food, water)
What is the extent of the affected area?
4Carlos Castillo — www.bigcrisisdata.org
Problematic for supervised learning
5
Source: USGS
Large-scale disasters are fortunately rare
Impact distributions are skewed
Dependency between disaster impacts
and social media response is not trivial
(concave response conjecture)
What we learn in one place and time may
not transfer to another
Carlos Castillo — www.bigcrisisdata.org
Let's not forget we have other types of data
(traditional, or "authoritative" sources)
6
Auth Social Auth Social
Predictions
Model
Modeling and data fusion
done by user
Data fusion done by user
Example 1: Floods
Top: Sensor data
Bottom: Tweets about floods
Flood activity in Twitter is more pronounced
in populated areas close to the Elbe river
7[De Albuquerque et al. 2015]
Auth
Soci
al
Example 2: Landslides
8
Authoritative (e.g., rainfall) and non-authoritative
(e.g., Twitter, Youtube, Facebook) information
can be combined.
Keyword search: landslide, mudslide
[Musaev et al. 2014;2015]
Auth Social
Predictions
Model
9
Auth Social
Predictions
Model
Could we do this?
[Musaev et al. 2014;2015]
10
Auth Social
Predictions
Model 1 Model 2
What about modeling an intermediate variable?
Partial
prediction
Example with intermediate variable: rain and floods
Based on geocoded tweets (<2% of total) filtered to
select those having keywords related to rain ("chuva"
in Portuguese).
Used to estimate rainfall (supervised setting), which is
then fed into the flood risk prediction model
11[Restrepo-Estrada et al. 2018]
Auth Social
Predictions
Model 1 Model 2Partial
prediction
Challenge: biases [Olteanu et al. 2016]
12Carlos Castillo — www.bigcrisisdata.org
General biases
Population biases
(e.g., Twitter users more affluent, white, young than general population)
Behavioral biases
(decisions to perform actions or not depends on things we don't observe)
Content biases / linking biases
(e.g., choices to post or not post certain types of content)
Temporal variations
Redundancy which can be helpful or not
13Carlos Castillo — www.bigcrisisdata.org
Issues at the data source/platform
People's behavior in a platform are affected, among others, by ...
Functional biases
What the platform allows or encourages
(e.g., longer tweets beneficial for our work?)
Normative biases
Written and unwritten norms of what is acceptable behavior
14Carlos Castillo — www.bigcrisisdata.org
Data acquisition and processing introduce biases
The way in which we acquire, query, and filter data introduces biases
We often collect data in an adversarial manner
Our choice of query (e.g., by geo, by keywords) changes the data we obtain
The way in which we clean, enrich, and aggregate data introduces biases
Our annotation processes are not perfect
15Carlos Castillo — www.bigcrisisdata.org
We often classify when we want to quantify
Instead of using a text quantification
framework, we do it indirectly by
running a classifier and then selecting
a threshold
16Tweet sentiment: From classification to quantification [Gao and Sebastiani 2015]
W F S
If shade represents classification accuracy
(certain to uncertain): are there more
messages about water, food, or shelter?
Interpretation challenges
Mixing qualitative and quantitative methods
Usage of abstract metrics vs domain-specific metrics
E.g., “dollars saved, lives preserved, time conserved, effort reduced, quality
of living increased” [Wagstaff 2012]
More on the limits of social data:
http://www.aolteanu.com/SocialDataLimitsTutorial/
17
Hybrid systems
18Carlos Castillo — www.bigcrisisdata.org
Typical crowdsourcing tasks
19
Credit: MicroMappers / SBTF / QCRI
Typical crowdsourcing tasks (cont.)
20[Ofli et al. 2016]
Credit: MicroMappers / UAViators / QCRI
Crisis Mapping
21
22
Volunteers are fast!
They take just 15 seconds
to label an item, so 5 of
them can do about one
item every 3 seconds.
Carlos Castillo — www.bigcrisisdata.org
23
In 3 seconds, more items arrive than are tagged
For instance: 1M per day = 11.6 per second:
≈ 35 new items in 3 seconds.
24
Space is not the problem. Time is.
These items waited for too long: information expires!
Carlos Castillo — www.bigcrisisdata.org
25
Hundreds of people working 24/7
would be needed to keep up with
these high item arrival rates
Maybe AI can help?
26
AI can learn from humans
and tag 30-40 items per
second!
[Imran et al. 2013]
27
Copies waiting to
be tagged by
humans
AI creates copies of
items that were hard for
it to tag
[Imran et al. 2013]
28
AI learns from the more
ambiguous cases and
improves over time
[Imran et al. 2013]
A general framework: Expert-Machine-Crowd
The crowd annotates a few items and provides training data
The machine annotates most items
The expert designs and validates the annotations
29[Imran et al. 2013]
Real-time annotations
30
https://www.youtube.com/watch?v=uKgE3yWJ0_I
Credit: UAViators / SBTF
Could volunteers do more?
31Carlos Castillo — www.bigcrisisdata.org
What motivates them?
32
Values: "I feel it is important to help others''
Understanding: "It lets me learn through direct, hands-on experience''
Enhancement: "It makes me feel better about myself''
Career: "It may help me get my foot in the door at a place where I want to work''
Social: "People I know share an interest in community service''
Protective: "It is a good escape from my own troubles''
[Clary and Snyder 1999]
Example motivations
33
Coleman et al. [2009]:
● Altruism
● Professional or personal interest
● Intellectual stimulation
● Social reward
● Enhanced personal reputation
Capelo et al. [2012]:
● Ideology
● Personal satisfaction
● Community
● Humanitarian values
● Desire to apply and improve
technical knowledge
Starbird et al. [2012]:
● Social capital: new friends and/or stronger
relationships
● Symbolic capital: reputation
● Self-improvement: learning new skills
● Benevolence: to benefit others
● Entertainment
Some elements in common with Free/Libre
Open Source Software groups [Benkler, 2006]
Some of these corroborated by various surveys
In this setting, they could do much more
Create and refine categorization schemes
Detect and describe outliers
Generate hypotheses
Suggest high-level interpretations
… participatory mining instead of mere crowd processing
34Carlos Castillo — www.bigcrisisdata.org
Conclusions
35Carlos Castillo — www.bigcrisisdata.org
Conclusions
36
Two interesting directions:
From "actionable insights" to the "big picture"
From crowd processing to participatory mining
Two powerful combinations:
Authoritative and non-authoritative data
Human and machine intelligence
Carlos Castillo — www.bigcrisisdata.org
Thank you!
@chatox @bigcrisisdata
Free chapters available:
http://bigcrisisdata.org/
37

Weitere ähnliche Inhalte

Ähnlich wie Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017

Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationCarlos Castillo (ChaTo)
 
Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?CesToronto
 
A Preservation Compass
A Preservation CompassA Preservation Compass
A Preservation CompassEducopia
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Saurabh Mishra
 
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...crooksAndrew
 
Understanding Disengagement from Social Media: A Research Agenda
Understanding Disengagement from Social Media: A Research AgendaUnderstanding Disengagement from Social Media: A Research Agenda
Understanding Disengagement from Social Media: A Research AgendaUniversity of Sydney
 
Baban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxBaban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxwilcockiris
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science JournalismLiliana Bounegru
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science JournalismJonathan Gray
 
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...Matthias Stürmer
 
3of3 ‘Visual Storytelling’ to Communicate Sustainability
3of3 ‘Visual Storytelling’ to Communicate Sustainability3of3 ‘Visual Storytelling’ to Communicate Sustainability
3of3 ‘Visual Storytelling’ to Communicate SustainabilityArlene Birt
 
Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyMuhammad Imran
 
Cordeiro education2030berkeleyopensciencesummit2010
Cordeiro education2030berkeleyopensciencesummit2010Cordeiro education2030berkeleyopensciencesummit2010
Cordeiro education2030berkeleyopensciencesummit2010Open Science Summit
 
Community Resilience: Challenges, Requirements, and Organizational Models
Community Resilience: Challenges, Requirements, and Organizational ModelsCommunity Resilience: Challenges, Requirements, and Organizational Models
Community Resilience: Challenges, Requirements, and Organizational ModelsVincenzo De Florio
 
Social computing, sustainability and energy and the environment.
Social computing, sustainability and energy and the environment.Social computing, sustainability and energy and the environment.
Social computing, sustainability and energy and the environment.Thomas Erickson
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Darlene Cavalier
 
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014Araz Taeihagh
 

Ähnlich wie Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017 (20)

Keynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open InvitationKeynote talk: Big Crisis Data, an Open Invitation
Keynote talk: Big Crisis Data, an Open Invitation
 
Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?Why aren't Evaluators using Digital Media Analytics?
Why aren't Evaluators using Digital Media Analytics?
 
A Preservation Compass
A Preservation CompassA Preservation Compass
A Preservation Compass
 
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
Breakout 3. AI for Sustainable Development and Human Rights: Inclusion, Diver...
 
Crisis Computing
Crisis ComputingCrisis Computing
Crisis Computing
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...
Leveraging Crowdsourced data for Agent-based modeling: Opportunities, Example...
 
Understanding Disengagement from Social Media: A Research Agenda
Understanding Disengagement from Social Media: A Research AgendaUnderstanding Disengagement from Social Media: A Research Agenda
Understanding Disengagement from Social Media: A Research Agenda
 
Baban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docxBaban Hasnat is a professor of international business and ec.docx
Baban Hasnat is a professor of international business and ec.docx
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
Using Data for Science Journalism
Using Data for Science JournalismUsing Data for Science Journalism
Using Data for Science Journalism
 
ase-social-informatics (6)
ase-social-informatics (6)ase-social-informatics (6)
ase-social-informatics (6)
 
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...
Data Colonialism and Digital Sustainability: Problems and Solutions to Curren...
 
3of3 ‘Visual Storytelling’ to Communicate Sustainability
3of3 ‘Visual Storytelling’ to Communicate Sustainability3of3 ‘Visual Storytelling’ to Communicate Sustainability
3of3 ‘Visual Storytelling’ to Communicate Sustainability
 
Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A Survey
 
Cordeiro education2030berkeleyopensciencesummit2010
Cordeiro education2030berkeleyopensciencesummit2010Cordeiro education2030berkeleyopensciencesummit2010
Cordeiro education2030berkeleyopensciencesummit2010
 
Community Resilience: Challenges, Requirements, and Organizational Models
Community Resilience: Challenges, Requirements, and Organizational ModelsCommunity Resilience: Challenges, Requirements, and Organizational Models
Community Resilience: Challenges, Requirements, and Organizational Models
 
Social computing, sustainability and energy and the environment.
Social computing, sustainability and energy and the environment.Social computing, sustainability and energy and the environment.
Social computing, sustainability and energy and the environment.
 
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
Citizen Science overview for ASU HSD598 graduate course, "Citizen Science"
 
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014
Experiments on Crowdsourcing Policy Assessment - Oxford IPP 2014
 

Mehr von Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Text similarity and the vector space model
Text similarity and the vector space modelText similarity and the vector space model
Text similarity and the vector space modelCarlos Castillo (ChaTo)
 

Mehr von Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Clustering
ClusteringClustering
Clustering
 
Text similarity and the vector space model
Text similarity and the vector space modelText similarity and the vector space model
Text similarity and the vector space model
 
Social Media Mining and Retrieval
Social Media Mining and RetrievalSocial Media Mining and Retrieval
Social Media Mining and Retrieval
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017

  • 1. Social Media and Digital Volunteering in Disaster Management Carlos Castillo Universitat Pompeu Fabra Data Science for Emergency Management Workshop Co-Located with IEEE Big Data. Boston, MA, US, Dec. 2018. www.bigcrisisdata.org
  • 2. Research topics in this space just my opinion 2 Crowded Event detection Isolated messages Hashtag-based collection Crowd labeling Isolated quality aspects “Over-the-fence” transfer Having some elbow room “Big picture” inferences Conversational streams Adaptive information filtering Participatory mining Holistic content quality Interdisciplinary research Carlos Castillo — www.bigcrisisdata.org
  • 3. From "actionable insights" to the "big picture" 3Carlos Castillo — www.bigcrisisdata.org
  • 4. "Big picture" questions Understanding general parameters of a disaster How many people were affected? (Number of displaced, injured, dead) How much will the disaster cost? (Damaged or destroyed infrastructure) How many resource units need to be mobilized? (Beds in emergency shelters, tons of food, water) What is the extent of the affected area? 4Carlos Castillo — www.bigcrisisdata.org
  • 5. Problematic for supervised learning 5 Source: USGS Large-scale disasters are fortunately rare Impact distributions are skewed Dependency between disaster impacts and social media response is not trivial (concave response conjecture) What we learn in one place and time may not transfer to another Carlos Castillo — www.bigcrisisdata.org
  • 6. Let's not forget we have other types of data (traditional, or "authoritative" sources) 6 Auth Social Auth Social Predictions Model Modeling and data fusion done by user Data fusion done by user
  • 7. Example 1: Floods Top: Sensor data Bottom: Tweets about floods Flood activity in Twitter is more pronounced in populated areas close to the Elbe river 7[De Albuquerque et al. 2015] Auth Soci al
  • 8. Example 2: Landslides 8 Authoritative (e.g., rainfall) and non-authoritative (e.g., Twitter, Youtube, Facebook) information can be combined. Keyword search: landslide, mudslide [Musaev et al. 2014;2015] Auth Social Predictions Model
  • 9. 9 Auth Social Predictions Model Could we do this? [Musaev et al. 2014;2015]
  • 10. 10 Auth Social Predictions Model 1 Model 2 What about modeling an intermediate variable? Partial prediction
  • 11. Example with intermediate variable: rain and floods Based on geocoded tweets (<2% of total) filtered to select those having keywords related to rain ("chuva" in Portuguese). Used to estimate rainfall (supervised setting), which is then fed into the flood risk prediction model 11[Restrepo-Estrada et al. 2018] Auth Social Predictions Model 1 Model 2Partial prediction
  • 12. Challenge: biases [Olteanu et al. 2016] 12Carlos Castillo — www.bigcrisisdata.org
  • 13. General biases Population biases (e.g., Twitter users more affluent, white, young than general population) Behavioral biases (decisions to perform actions or not depends on things we don't observe) Content biases / linking biases (e.g., choices to post or not post certain types of content) Temporal variations Redundancy which can be helpful or not 13Carlos Castillo — www.bigcrisisdata.org
  • 14. Issues at the data source/platform People's behavior in a platform are affected, among others, by ... Functional biases What the platform allows or encourages (e.g., longer tweets beneficial for our work?) Normative biases Written and unwritten norms of what is acceptable behavior 14Carlos Castillo — www.bigcrisisdata.org
  • 15. Data acquisition and processing introduce biases The way in which we acquire, query, and filter data introduces biases We often collect data in an adversarial manner Our choice of query (e.g., by geo, by keywords) changes the data we obtain The way in which we clean, enrich, and aggregate data introduces biases Our annotation processes are not perfect 15Carlos Castillo — www.bigcrisisdata.org
  • 16. We often classify when we want to quantify Instead of using a text quantification framework, we do it indirectly by running a classifier and then selecting a threshold 16Tweet sentiment: From classification to quantification [Gao and Sebastiani 2015] W F S If shade represents classification accuracy (certain to uncertain): are there more messages about water, food, or shelter?
  • 17. Interpretation challenges Mixing qualitative and quantitative methods Usage of abstract metrics vs domain-specific metrics E.g., “dollars saved, lives preserved, time conserved, effort reduced, quality of living increased” [Wagstaff 2012] More on the limits of social data: http://www.aolteanu.com/SocialDataLimitsTutorial/ 17
  • 18. Hybrid systems 18Carlos Castillo — www.bigcrisisdata.org
  • 19. Typical crowdsourcing tasks 19 Credit: MicroMappers / SBTF / QCRI
  • 20. Typical crowdsourcing tasks (cont.) 20[Ofli et al. 2016] Credit: MicroMappers / UAViators / QCRI
  • 22. 22 Volunteers are fast! They take just 15 seconds to label an item, so 5 of them can do about one item every 3 seconds. Carlos Castillo — www.bigcrisisdata.org
  • 23. 23 In 3 seconds, more items arrive than are tagged For instance: 1M per day = 11.6 per second: ≈ 35 new items in 3 seconds.
  • 24. 24 Space is not the problem. Time is. These items waited for too long: information expires! Carlos Castillo — www.bigcrisisdata.org
  • 25. 25 Hundreds of people working 24/7 would be needed to keep up with these high item arrival rates Maybe AI can help?
  • 26. 26 AI can learn from humans and tag 30-40 items per second! [Imran et al. 2013]
  • 27. 27 Copies waiting to be tagged by humans AI creates copies of items that were hard for it to tag [Imran et al. 2013]
  • 28. 28 AI learns from the more ambiguous cases and improves over time [Imran et al. 2013]
  • 29. A general framework: Expert-Machine-Crowd The crowd annotates a few items and provides training data The machine annotates most items The expert designs and validates the annotations 29[Imran et al. 2013]
  • 31. Could volunteers do more? 31Carlos Castillo — www.bigcrisisdata.org
  • 32. What motivates them? 32 Values: "I feel it is important to help others'' Understanding: "It lets me learn through direct, hands-on experience'' Enhancement: "It makes me feel better about myself'' Career: "It may help me get my foot in the door at a place where I want to work'' Social: "People I know share an interest in community service'' Protective: "It is a good escape from my own troubles'' [Clary and Snyder 1999]
  • 33. Example motivations 33 Coleman et al. [2009]: ● Altruism ● Professional or personal interest ● Intellectual stimulation ● Social reward ● Enhanced personal reputation Capelo et al. [2012]: ● Ideology ● Personal satisfaction ● Community ● Humanitarian values ● Desire to apply and improve technical knowledge Starbird et al. [2012]: ● Social capital: new friends and/or stronger relationships ● Symbolic capital: reputation ● Self-improvement: learning new skills ● Benevolence: to benefit others ● Entertainment Some elements in common with Free/Libre Open Source Software groups [Benkler, 2006] Some of these corroborated by various surveys
  • 34. In this setting, they could do much more Create and refine categorization schemes Detect and describe outliers Generate hypotheses Suggest high-level interpretations … participatory mining instead of mere crowd processing 34Carlos Castillo — www.bigcrisisdata.org
  • 35. Conclusions 35Carlos Castillo — www.bigcrisisdata.org
  • 36. Conclusions 36 Two interesting directions: From "actionable insights" to the "big picture" From crowd processing to participatory mining Two powerful combinations: Authoritative and non-authoritative data Human and machine intelligence Carlos Castillo — www.bigcrisisdata.org
  • 37. Thank you! @chatox @bigcrisisdata Free chapters available: http://bigcrisisdata.org/ 37