This document discusses using social media and digital volunteering in disaster management. It outlines how crowdsourcing can be used to extract insights from social media data during disasters through tasks like event detection, content labeling, and quality assessment. However, it notes challenges like biases in the data. The document proposes moving beyond individual insights to develop a "big picture" understanding of disasters. It also suggests moving beyond basic crowd processing to more advanced participatory mining with volunteers. Combining authoritative data with social media and integrating human and machine intelligence are presented as promising approaches.
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
1. Social Media and Digital
Volunteering in Disaster
Management
Carlos Castillo
Universitat Pompeu Fabra
Data Science for Emergency Management Workshop
Co-Located with IEEE Big Data. Boston, MA, US, Dec. 2018.
www.bigcrisisdata.org
2. Research topics in this space just my opinion
2
Crowded
Event detection
Isolated messages
Hashtag-based collection
Crowd labeling
Isolated quality aspects
“Over-the-fence” transfer
Having some elbow room
“Big picture” inferences
Conversational streams
Adaptive information filtering
Participatory mining
Holistic content quality
Interdisciplinary research
Carlos Castillo — www.bigcrisisdata.org
4. "Big picture" questions
Understanding general parameters of a disaster
How many people were affected?
(Number of displaced, injured, dead)
How much will the disaster cost?
(Damaged or destroyed infrastructure)
How many resource units need to be mobilized?
(Beds in emergency shelters, tons of food, water)
What is the extent of the affected area?
4Carlos Castillo — www.bigcrisisdata.org
5. Problematic for supervised learning
5
Source: USGS
Large-scale disasters are fortunately rare
Impact distributions are skewed
Dependency between disaster impacts
and social media response is not trivial
(concave response conjecture)
What we learn in one place and time may
not transfer to another
Carlos Castillo — www.bigcrisisdata.org
6. Let's not forget we have other types of data
(traditional, or "authoritative" sources)
6
Auth Social Auth Social
Predictions
Model
Modeling and data fusion
done by user
Data fusion done by user
7. Example 1: Floods
Top: Sensor data
Bottom: Tweets about floods
Flood activity in Twitter is more pronounced
in populated areas close to the Elbe river
7[De Albuquerque et al. 2015]
Auth
Soci
al
8. Example 2: Landslides
8
Authoritative (e.g., rainfall) and non-authoritative
(e.g., Twitter, Youtube, Facebook) information
can be combined.
Keyword search: landslide, mudslide
[Musaev et al. 2014;2015]
Auth Social
Predictions
Model
11. Example with intermediate variable: rain and floods
Based on geocoded tweets (<2% of total) filtered to
select those having keywords related to rain ("chuva"
in Portuguese).
Used to estimate rainfall (supervised setting), which is
then fed into the flood risk prediction model
11[Restrepo-Estrada et al. 2018]
Auth Social
Predictions
Model 1 Model 2Partial
prediction
13. General biases
Population biases
(e.g., Twitter users more affluent, white, young than general population)
Behavioral biases
(decisions to perform actions or not depends on things we don't observe)
Content biases / linking biases
(e.g., choices to post or not post certain types of content)
Temporal variations
Redundancy which can be helpful or not
13Carlos Castillo — www.bigcrisisdata.org
14. Issues at the data source/platform
People's behavior in a platform are affected, among others, by ...
Functional biases
What the platform allows or encourages
(e.g., longer tweets beneficial for our work?)
Normative biases
Written and unwritten norms of what is acceptable behavior
14Carlos Castillo — www.bigcrisisdata.org
15. Data acquisition and processing introduce biases
The way in which we acquire, query, and filter data introduces biases
We often collect data in an adversarial manner
Our choice of query (e.g., by geo, by keywords) changes the data we obtain
The way in which we clean, enrich, and aggregate data introduces biases
Our annotation processes are not perfect
15Carlos Castillo — www.bigcrisisdata.org
16. We often classify when we want to quantify
Instead of using a text quantification
framework, we do it indirectly by
running a classifier and then selecting
a threshold
16Tweet sentiment: From classification to quantification [Gao and Sebastiani 2015]
W F S
If shade represents classification accuracy
(certain to uncertain): are there more
messages about water, food, or shelter?
17. Interpretation challenges
Mixing qualitative and quantitative methods
Usage of abstract metrics vs domain-specific metrics
E.g., “dollars saved, lives preserved, time conserved, effort reduced, quality
of living increased” [Wagstaff 2012]
More on the limits of social data:
http://www.aolteanu.com/SocialDataLimitsTutorial/
17
22. 22
Volunteers are fast!
They take just 15 seconds
to label an item, so 5 of
them can do about one
item every 3 seconds.
Carlos Castillo — www.bigcrisisdata.org
23. 23
In 3 seconds, more items arrive than are tagged
For instance: 1M per day = 11.6 per second:
≈ 35 new items in 3 seconds.
24. 24
Space is not the problem. Time is.
These items waited for too long: information expires!
Carlos Castillo — www.bigcrisisdata.org
25. 25
Hundreds of people working 24/7
would be needed to keep up with
these high item arrival rates
Maybe AI can help?
26. 26
AI can learn from humans
and tag 30-40 items per
second!
[Imran et al. 2013]
27. 27
Copies waiting to
be tagged by
humans
AI creates copies of
items that were hard for
it to tag
[Imran et al. 2013]
28. 28
AI learns from the more
ambiguous cases and
improves over time
[Imran et al. 2013]
29. A general framework: Expert-Machine-Crowd
The crowd annotates a few items and provides training data
The machine annotates most items
The expert designs and validates the annotations
29[Imran et al. 2013]
32. What motivates them?
32
Values: "I feel it is important to help others''
Understanding: "It lets me learn through direct, hands-on experience''
Enhancement: "It makes me feel better about myself''
Career: "It may help me get my foot in the door at a place where I want to work''
Social: "People I know share an interest in community service''
Protective: "It is a good escape from my own troubles''
[Clary and Snyder 1999]
33. Example motivations
33
Coleman et al. [2009]:
● Altruism
● Professional or personal interest
● Intellectual stimulation
● Social reward
● Enhanced personal reputation
Capelo et al. [2012]:
● Ideology
● Personal satisfaction
● Community
● Humanitarian values
● Desire to apply and improve
technical knowledge
Starbird et al. [2012]:
● Social capital: new friends and/or stronger
relationships
● Symbolic capital: reputation
● Self-improvement: learning new skills
● Benevolence: to benefit others
● Entertainment
Some elements in common with Free/Libre
Open Source Software groups [Benkler, 2006]
Some of these corroborated by various surveys
34. In this setting, they could do much more
Create and refine categorization schemes
Detect and describe outliers
Generate hypotheses
Suggest high-level interpretations
… participatory mining instead of mere crowd processing
34Carlos Castillo — www.bigcrisisdata.org
36. Conclusions
36
Two interesting directions:
From "actionable insights" to the "big picture"
From crowd processing to participatory mining
Two powerful combinations:
Authoritative and non-authoritative data
Human and machine intelligence
Carlos Castillo — www.bigcrisisdata.org