We present an end-to-end social media image processing system called Image4Act. The system aims at collecting, denoising, and classifying imagery content posted on social media platforms to help humanitarian organizations in gaining situational awareness and launching relief operations. The system combines human computation and machine learning techniques to process high-volume social media imagery content in real time during natural and human-made disasters. To cope with the noisy nature of the social media imagery data, we use a deep neural network and perceptual hashing techniques to filter out irrelevant and duplicate images. Furthermore, we present a specific use case to assess the severity of infrastructure damage incurred by a disaster. The evaluations of the system on existing disaster datasets as well as a real-world deployment during a recent cyclone prove the effectiveness of the system.
Image4Act: Online Social Media Image Processing for Disaster Response
1. Image4Act: Online Social Media Image
Processing for Disaster Response
Firoj Alam, Muhammad Imran, Ferda Ofli
Qatar Computing Research Institute
Hamad Bin Khalifa University, Qatar
2. Time-Critical Events and Information Gaps
Info. Info. Info.
Disaster event (earthquake, flood) Destruction, Damage
Information gathering
Humanitarian organizations and local administration
Need information to help and launch response
Information gathering,
especially in real-time, is
the most challenging part
Relief operations
Disaster
3. 2013 Pakistan Earthquake
September 28 at 07:34 UTC
2010 Haiti Earthquake
January 12 at 21:53 UTC
Social Media Data and Opportunities
Social Media
Platforms
Availability of Immense Data:
Around 16 thousands tweets
per minute were posted during
the hurricane Sandy in the US.
Opportunities:
- Early warning and event detection
- Situational awareness
- Actionable information
- Rapid crisis response
- Post-disaster analysis
Disease outbreaks
6. Social Media is Noisy
(Irrelevant & Duplicate Content)
Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc.
Posted during the 2015 Nepal earthquake
Examples of near-duplicate images posted during the 2015 Nepal Earthquake
10. Relevancy Filtering
Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc.
Performance of the relevancy filtering
Task: Build a binary classifier to identify irrelevant images
Approach: Transfer learning
(fine-tune a pre-trained convolutional neural network, e.g., VGG16)
11. Duplicate Filtering
Examples of near-duplicate images
Task: Compute similarity between a pair of images
Approach: Perceptual Hash + Hamming Distance (w/ threshold)
12. Before/After Image Filtering
Number of images that remain in our dataset after each image filtering operation
~ 2 %
~ 2 %
~ 50 %
~ 58 %
~ 50 %
~ 30 %
Assume tagging an image costs $1, we could have gotten the same job done
by paying $17k less, almost saving 2/3s of the budget!!!
13. Infrastructure Damage Assessment
• Three-class classification
– Categories: severe, mild & little-to-none
• Distinction between categories is ambiguous.
• Agreement among human annotators is low.
– in particular for mild category
• Fine-tuning a pre-trained CNN (e.g., VGG16)
15. Thanks – Q & A
Follow this project: @aidr_qcri
We are looking for a PostDoc
(Computer vision, natural language processing, system development)
Contact us: mimran@hbku.edu.qa
Hinweis der Redaktion
Sudden onset disasters and emergencies such as earthquakes, floods, bring destructions and damage to our critical infrastructure such as roads, bridges and critical buildings. Government organizations and other humanitarian organization seek information after a disaster hit to gain situational awareness and other actionable information.
SM played a major role during disasters such as 2005 Hurricane Katrina, the 2011 Japanese earthquake and tsunami, and more recently Typhoon Haiyan, followed by the Nepal tragedy. Consequently, more and more emergency managers are turning to social media as a vital tool in disaster management. Twitter, the most used tool for updates, response and relief, enabled greater connectivity and information sharing capabilities.
During situations like mass emergencies, disasters, epidemics nothing better than Social Media platforms like Twitter which provides unique opportunities for both affected people and
Emergency responders. People share situational awareness messages, and ask for help, donations, food, water, shelter etc. On the other hand responders want to help.
A number of works have been proposed to use textual reports on social media for various humanitarian tasks. In this work, we use images that people post on social media during disasters.
These are a few examples images collected during three disasters.
A number of tasks can be done with these images:
Damage detection
Damage severity measurement
Injured people detection
Conditions and capacity of the shelters
However, collected images are not always so nice and clean. Social media imagery data is noisy. There is a lot of irrelevant and duplicate content that needs to be filtered out.
The first component of the system is the relevancy filtering.
As you can imagine, we do not always get nice and clean images as I have been showing you so far… Here are some example images, taken from our datasets, showing cartoons, banners, advertisements, celebrities, and so on and so forth… Obviously, these images are not relevant to the disaster event, and shall be excluded from further processing.
For this purpose, we used a subset of our dataset where we sampled irrelevant images from the none category and relevant images from the severe and mild categories to build a binary classifier. Keep in mind that our focus is on getting rid of the irrelevant content.
For training this binary classifier, we considered the transfer learning approach in which we took a state-of-the-art image classification model, these days this happens to be convolutional neural networks, in our particular case, we used VGG16, named after the Andrew Zisserman’s research group at Oxford that proposed the architecture. This network is originally trained on 1 million images for recognizing 1000 object categories. We fine-tune this network for the binary classification scenario using our own dataset.
The performance of the resulting relevancy filtering model is shown in the table, which seems pretty good.
The second important component of the system is duplicate or near-duplicate filtering…
For example, there are cases when people simply re-tweet an existing tweet containing an image, or they post images with little modification (e.g. cropping/resizing, background padding, changing intensity, embedding text, etc.).
Such posting behavior produces high number of near-duplicate images in an online data collection, which should again be eliminated from the processing pipeline.
To perform this task, we need to compute a similarity score or a distance measure between a given pair of images. Based on a certain threshold, then we decide whether the given pair of images are duplicate or not.
For this purpose, we used the popular perceptual hashing, which represents the fingerprint of an image derived from various features from its contents. Perceptual hash functions maintain perceptual equality of images hence they are robust in detecting even slight changes in the binary representation of two similar images.
To determine a threshold, we created again a collection of image pairs, for which we know whether they are duplicate or not… Then, for different threshold values, we computed the overall accuracy achived by that threshold value. As you can see from the plot, a threshold of 10 seem to work fine for us.
Consequently, our image filtering pipeline reduces the size of the raw image data collection by almost a factor of 3 while retaining the most relevant and informative image content for further analyses.
So, now that we have a nicer and cleaner set of images, it is time to do more interesting things such as assessing the level of infrastructure damage, which is very crucial for humanitarian organizations, they crave for this kind of timely information.
Low prevalence of the mild category compared to other categories has also some effect
Remember, damage assessment during disaster times is a core situational awareness task for many humanitarian organizations that traditionally takes weeks or months. But here we show we can do this within hours or days after the disaster strikes.