Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017

Social Media and Digital
Volunteering in Disaster
Management
Carlos Castillo
Universitat Pompeu Fabra
Data Science for Emergency Management Workshop
Co-Located with IEEE Big Data. Boston, MA, US, Dec. 2018.
www.bigcrisisdata.org

Research topics in this space just my opinion
2
Crowded
Event detection
Isolated messages
Hashtag-based collection
Crowd labeling
Isolated quality aspects
“Over-the-fence” transfer
Having some elbow room
“Big picture” inferences
Conversational streams
Adaptive information filtering
Participatory mining
Holistic content quality
Interdisciplinary research
Carlos Castillo — www.bigcrisisdata.org

From "actionable insights"
to the "big picture"
3Carlos Castillo — www.bigcrisisdata.org

"Big picture" questions
Understanding general parameters of a disaster
How many people were affected?
(Number of displaced, injured, dead)
How much will the disaster cost?
(Damaged or destroyed infrastructure)
How many resource units need to be mobilized?
(Beds in emergency shelters, tons of food, water)
What is the extent of the affected area?

Problematic for supervised learning
5
Source: USGS
Large-scale disasters are fortunately rare
Impact distributions are skewed
Dependency between disaster impacts
and social media response is not trivial
(concave response conjecture)
What we learn in one place and time may
not transfer to another

Let's not forget we have other types of data
(traditional, or "authoritative" sources)
6
Auth Social Auth Social
Predictions
Model
Modeling and data fusion
done by user
Data fusion done by user

Example 1: Floods
Top: Sensor data
Bottom: Tweets about floods
Flood activity in Twitter is more pronounced
in populated areas close to the Elbe river
7[De Albuquerque et al. 2015]
Auth
Soci
al

Example 2: Landslides
8
Authoritative (e.g., rainfall) and non-authoritative
(e.g., Twitter, Youtube, Facebook) information
can be combined.
Keyword search: landslide, mudslide
[Musaev et al. 2014;2015]
Auth Social
Predictions
Model

9
Auth Social
Predictions
Model
Could we do this?
[Musaev et al. 2014;2015]

10
Auth Social
Predictions
Model 1 Model 2
What about modeling an intermediate variable?
Partial
prediction

Example with intermediate variable: rain and floods
Based on geocoded tweets (<2% of total) filtered to
select those having keywords related to rain ("chuva"
in Portuguese).
Used to estimate rainfall (supervised setting), which is
then fed into the flood risk prediction model
11[Restrepo-Estrada et al. 2018]
Auth Social
Predictions
Model 1 Model 2Partial
prediction

Challenge: biases [Olteanu et al. 2016]

General biases
Population biases
(e.g., Twitter users more affluent, white, young than general population)
Behavioral biases
(decisions to perform actions or not depends on things we don't observe)
Content biases / linking biases
(e.g., choices to post or not post certain types of content)
Temporal variations
Redundancy which can be helpful or not

Issues at the data source/platform
People's behavior in a platform are affected, among others, by ...
Functional biases
What the platform allows or encourages
(e.g., longer tweets beneficial for our work?)
Normative biases
Written and unwritten norms of what is acceptable behavior

Data acquisition and processing introduce biases
The way in which we acquire, query, and filter data introduces biases
We often collect data in an adversarial manner
Our choice of query (e.g., by geo, by keywords) changes the data we obtain
The way in which we clean, enrich, and aggregate data introduces biases
Our annotation processes are not perfect

We often classify when we want to quantify
Instead of using a text quantification
framework, we do it indirectly by
running a classifier and then selecting
a threshold
16Tweet sentiment: From classification to quantification [Gao and Sebastiani 2015]
W F S
If shade represents classification accuracy
(certain to uncertain): are there more
messages about water, food, or shelter?

Interpretation challenges
Mixing qualitative and quantitative methods
Usage of abstract metrics vs domain-specific metrics
E.g., “dollars saved, lives preserved, time conserved, effort reduced, quality
of living increased” [Wagstaff 2012]
More on the limits of social data:
http://www.aolteanu.com/SocialDataLimitsTutorial/
17

Hybrid systems

Typical crowdsourcing tasks
19
Credit: MicroMappers / SBTF / QCRI

Typical crowdsourcing tasks (cont.)
20[Ofli et al. 2016]
Credit: MicroMappers / UAViators / QCRI

22
Volunteers are fast!
They take just 15 seconds
to label an item, so 5 of
them can do about one
item every 3 seconds.

23
In 3 seconds, more items arrive than are tagged
For instance: 1M per day = 11.6 per second:
≈ 35 new items in 3 seconds.

24
Space is not the problem. Time is.
These items waited for too long: information expires!

25
Hundreds of people working 24/7
would be needed to keep up with
these high item arrival rates
Maybe AI can help?

26
AI can learn from humans
and tag 30-40 items per
second!
[Imran et al. 2013]

27
Copies waiting to
be tagged by
humans
AI creates copies of
items that were hard for
it to tag
[Imran et al. 2013]

28
AI learns from the more
ambiguous cases and
improves over time
[Imran et al. 2013]

A general framework: Expert-Machine-Crowd
The crowd annotates a few items and provides training data
The machine annotates most items
The expert designs and validates the annotations
29[Imran et al. 2013]

Real-time annotations
30
https://www.youtube.com/watch?v=uKgE3yWJ0_I
Credit: UAViators / SBTF

Could volunteers do more?

What motivates them?
32
Values: "I feel it is important to help others''
Understanding: "It lets me learn through direct, hands-on experience''
Enhancement: "It makes me feel better about myself''
Career: "It may help me get my foot in the door at a place where I want to work''
Social: "People I know share an interest in community service''
Protective: "It is a good escape from my own troubles''
[Clary and Snyder 1999]

Example motivations
33
Coleman et al. [2009]:
● Altruism
● Professional or personal interest
● Intellectual stimulation
● Social reward
● Enhanced personal reputation
Capelo et al. [2012]:
● Ideology
● Personal satisfaction
● Community
● Humanitarian values
● Desire to apply and improve
technical knowledge
Starbird et al. [2012]:
● Social capital: new friends and/or stronger
relationships
● Symbolic capital: reputation
● Self-improvement: learning new skills
● Benevolence: to benefit others
● Entertainment
Some elements in common with Free/Libre
Open Source Software groups [Benkler, 2006]
Some of these corroborated by various surveys

In this setting, they could do much more
Create and refine categorization schemes
Detect and describe outliers
Generate hypotheses
Suggest high-level interpretations
… participatory mining instead of mere crowd processing

Conclusions

Conclusions
36
Two interesting directions:
From "actionable insights" to the "big picture"
From crowd processing to participatory mining
Two powerful combinations:
Authoritative and non-authoritative data
Human and machine intelligence

Thank you!
@chatox @bigcrisisdata
Free chapters available:
http://bigcrisisdata.org/
37

Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017

Ähnlich wie Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017 (20)

Mehr von Carlos Castillo (ChaTo)

Mehr von Carlos Castillo (ChaTo) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017