SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Image4Act: Online Social Media Image
Processing for Disaster Response
Firoj Alam, Muhammad Imran, Ferda Ofli
Qatar Computing Research Institute
Hamad Bin Khalifa University, Qatar
Time-Critical Events and Information Gaps
Info. Info. Info.
Disaster event (earthquake, flood) Destruction, Damage
Information gathering
Humanitarian organizations and local administration
Need information to help and launch response
Information gathering,
especially in real-time, is
the most challenging part
Relief operations
Disaster
2013 Pakistan Earthquake
September 28 at 07:34 UTC
2010 Haiti Earthquake
January 12 at 21:53 UTC
Social Media Data and Opportunities
Social Media
Platforms
Availability of Immense Data:
Around 16 thousands tweets
per minute were posted during
the hurricane Sandy in the US.
Opportunities:
- Early warning and event detection
- Situational awareness
- Actionable information
- Rapid crisis response
- Post-disaster analysis
Disease outbreaks
Social Media Images During Disasters
Damage Severity Assessment from Images
Social Media is Noisy
(Irrelevant & Duplicate Content)
Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc.
Posted during the 2015 Nepal earthquake
Examples of near-duplicate images posted during the 2015 Nepal Earthquake
Automatic Image Processing Pipeline
Detailed Architecture
Image URLs
DB
Tweet
Collector
Image
Collector
Image
Filtering
Relevancy
filtering model
De-duplication
model
Web
Crowd Task
Manager
Image
Classifier(s)
Persist
In-memory DB
Crowd tasks
& answers
Image
downloading
Tweets Images Images Images
Is relevant? Is duplicate?
Classified Images
(filesystem)
Damage
Images
Injured
People
Rescue
efforts
Image
Hash DB
Database
In-memory DB
Is URL duplicate?
Persister
Classified
images paths
Postgres DB
Crowd
Images Labels
Labeled Datasets
NE: Nepal earthquake -- EE: Ecuador earthquake – TR: Typhoon Ruby – HM: Hurricane Matthew
Relevancy Filtering
Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc.
Performance of the relevancy filtering
Task: Build a binary classifier to identify irrelevant images
Approach: Transfer learning
(fine-tune a pre-trained convolutional neural network, e.g., VGG16)
Duplicate Filtering
Examples of near-duplicate images
Task: Compute similarity between a pair of images
Approach: Perceptual Hash + Hamming Distance (w/ threshold)
Before/After Image Filtering
Number of images that remain in our dataset after each image filtering operation
~ 2 %
~ 2 %
~ 50 %
~ 58 %
~ 50 %
~ 30 %
Assume tagging an image costs $1, we could have gotten the same job done
by paying $17k less, almost saving 2/3s of the budget!!!
Infrastructure Damage Assessment
• Three-class classification
– Categories: severe, mild & little-to-none
• Distinction between categories is ambiguous.
• Agreement among human annotators is low.
– in particular for mild category
• Fine-tuning a pre-trained CNN (e.g., VGG16)
Deployment and Evaluation during
Cyclone Debbie Event
Randomly selected 500 images
Manually labeled irrelevant images
Relevancy Filtering
- Precision: 0.67
Duplicate Images
- Precision: 0.92
Thanks – Q & A
Follow this project: @aidr_qcri
We are looking for a PostDoc
(Computer vision, natural language processing, system development)
Contact us: mimran@hbku.edu.qa

Weitere ähnliche Inhalte

Ähnlich wie Image4Act: Online Social Media Image Processing for Disaster Response

Identifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis EventsIdentifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis Events
IIIT Hyderabad
 
Assignment 12 part 2 - draft 3
Assignment 12   part 2 - draft 3Assignment 12   part 2 - draft 3
Assignment 12 part 2 - draft 3
Abc Abc
 
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] editedAssignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
ksumbland
 
Assignment #12: Planning For Documentary (Part 3)
Assignment #12: Planning For Documentary (Part 3)Assignment #12: Planning For Documentary (Part 3)
Assignment #12: Planning For Documentary (Part 3)
media_jojo
 
Assignment #12: Planning For Documentary (Part 2)
Assignment #12: Planning For Documentary (Part 2)Assignment #12: Planning For Documentary (Part 2)
Assignment #12: Planning For Documentary (Part 2)
media_jojo
 
Assignment 12 (ii)_-_planning_for_documentary
Assignment 12 (ii)_-_planning_for_documentaryAssignment 12 (ii)_-_planning_for_documentary
Assignment 12 (ii)_-_planning_for_documentary
Abc Abc
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Artificial Intelligence Institute at UofSC
 

Ähnlich wie Image4Act: Online Social Media Image Processing for Disaster Response (20)

Multimedia rescue 161018
Multimedia rescue 161018Multimedia rescue 161018
Multimedia rescue 161018
 
Identifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis EventsIdentifying and Characterizing User Communities on Twitter during Crisis Events
Identifying and Characterizing User Communities on Twitter during Crisis Events
 
Multimodal Combination.pdf
Multimodal Combination.pdfMultimodal Combination.pdf
Multimodal Combination.pdf
 
Tsunami Earthquake Early Warning Prototype Plus
Tsunami Earthquake Early Warning Prototype Plus Tsunami Earthquake Early Warning Prototype Plus
Tsunami Earthquake Early Warning Prototype Plus
 
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...
 
Assignment 12 part 2 - draft 3
Assignment 12   part 2 - draft 3Assignment 12   part 2 - draft 3
Assignment 12 part 2 - draft 3
 
Automatic Image Filtering on Social Networks Using Deep Learning and Perceptu...
Automatic Image Filtering on Social Networks Using Deep Learning and Perceptu...Automatic Image Filtering on Social Networks Using Deep Learning and Perceptu...
Automatic Image Filtering on Social Networks Using Deep Learning and Perceptu...
 
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] editedAssignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
Assignment 12 (ii)_-_planning_for_documentary_draft_three[1] edited
 
Assignment #12: Planning For Documentary (Part 3)
Assignment #12: Planning For Documentary (Part 3)Assignment #12: Planning For Documentary (Part 3)
Assignment #12: Planning For Documentary (Part 3)
 
Web 2.0 Technology Building Situational Awareness: Free and Open Source Too...
Web 2.0 Technology  Building Situational Awareness:  Free and Open Source Too...Web 2.0 Technology  Building Situational Awareness:  Free and Open Source Too...
Web 2.0 Technology Building Situational Awareness: Free and Open Source Too...
 
Assignment #12: Planning For Documentary (Part 2)
Assignment #12: Planning For Documentary (Part 2)Assignment #12: Planning For Documentary (Part 2)
Assignment #12: Planning For Documentary (Part 2)
 
Assignment 12 (ii)_-_planning_for_documentary
Assignment 12 (ii)_-_planning_for_documentaryAssignment 12 (ii)_-_planning_for_documentary
Assignment 12 (ii)_-_planning_for_documentary
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Crowdsourcing Fire and Floods
Crowdsourcing Fire and FloodsCrowdsourcing Fire and Floods
Crowdsourcing Fire and Floods
 
DATA ANALYSIS AND PHASE DETECTION DURING NATURAL DISASTER BASED ON SOCIAL DATA
DATA ANALYSIS AND PHASE DETECTION DURING NATURAL DISASTER BASED ON SOCIAL DATADATA ANALYSIS AND PHASE DETECTION DURING NATURAL DISASTER BASED ON SOCIAL DATA
DATA ANALYSIS AND PHASE DETECTION DURING NATURAL DISASTER BASED ON SOCIAL DATA
 
A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...A short introduction to multimedia forensics the science discovering the hist...
A short introduction to multimedia forensics the science discovering the hist...
 
Designing performance task
Designing performance taskDesigning performance task
Designing performance task
 
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...
 
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA TalkSocial Media & Web Mining for Public Services of Smart Cities - SSA Talk
Social Media & Web Mining for Public Services of Smart Cities - SSA Talk
 
Crisis Mapping
Crisis MappingCrisis Mapping
Crisis Mapping
 

Mehr von Muhammad Imran

Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific Mashups
Muhammad Imran
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platform
Muhammad Imran
 

Mehr von Muhammad Imran (14)

Processing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A SurveyProcessing Social Media Messages in Mass Emergency: A Survey
Processing Social Media Messages in Mass Emergency: A Survey
 
AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)AIDR Tutorial (Artificial Intelligence for Disaster Response)
AIDR Tutorial (Artificial Intelligence for Disaster Response)
 
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
A Robust Framework for Classifying Evolving Document Streams in an Expert-Mac...
 
Summarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis ScenarioSummarizing Situational Tweets in Crisis Scenario
Summarizing Situational Tweets in Crisis Scenario
 
The Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster ResponseThe Role of Social Media and Artificial Intelligence for Disaster Response
The Role of Social Media and Artificial Intelligence for Disaster Response
 
Introduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster ResponseIntroduction to Machine Learning: An Application to Disaster Response
Introduction to Machine Learning: An Application to Disaster Response
 
Artificial Intelligence for Disaster Response
Artificial Intelligence for Disaster ResponseArtificial Intelligence for Disaster Response
Artificial Intelligence for Disaster Response
 
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
A Real-time Heuristic-based Unsupervised Method for Name Disambiguation in Di...
 
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
Coordinating Human and Machine Intelligence to Classify Microblog Communica0o...
 
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
Tweet4act: Using Incident-Specific Profiles for Classifying Crisis-Related Me...
 
Extracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social MediaExtracting Information Nuggets from Disaster-Related Messages in Social Media
Extracting Information Nuggets from Disaster-Related Messages in Social Media
 
Domain Specific Mashups
Domain Specific MashupsDomain Specific Mashups
Domain Specific Mashups
 
Reseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECOReseval Mashup Platform Talk at SECO
Reseval Mashup Platform Talk at SECO
 
ResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platformResEval: Resource-oriented Research Impact Evaluation platform
ResEval: Resource-oriented Research Impact Evaluation platform
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Image4Act: Online Social Media Image Processing for Disaster Response

  • 1. Image4Act: Online Social Media Image Processing for Disaster Response Firoj Alam, Muhammad Imran, Ferda Ofli Qatar Computing Research Institute Hamad Bin Khalifa University, Qatar
  • 2. Time-Critical Events and Information Gaps Info. Info. Info. Disaster event (earthquake, flood) Destruction, Damage Information gathering Humanitarian organizations and local administration Need information to help and launch response Information gathering, especially in real-time, is the most challenging part Relief operations Disaster
  • 3. 2013 Pakistan Earthquake September 28 at 07:34 UTC 2010 Haiti Earthquake January 12 at 21:53 UTC Social Media Data and Opportunities Social Media Platforms Availability of Immense Data: Around 16 thousands tweets per minute were posted during the hurricane Sandy in the US. Opportunities: - Early warning and event detection - Situational awareness - Actionable information - Rapid crisis response - Post-disaster analysis Disease outbreaks
  • 4. Social Media Images During Disasters
  • 6. Social Media is Noisy (Irrelevant & Duplicate Content) Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc. Posted during the 2015 Nepal earthquake Examples of near-duplicate images posted during the 2015 Nepal Earthquake
  • 8. Detailed Architecture Image URLs DB Tweet Collector Image Collector Image Filtering Relevancy filtering model De-duplication model Web Crowd Task Manager Image Classifier(s) Persist In-memory DB Crowd tasks & answers Image downloading Tweets Images Images Images Is relevant? Is duplicate? Classified Images (filesystem) Damage Images Injured People Rescue efforts Image Hash DB Database In-memory DB Is URL duplicate? Persister Classified images paths Postgres DB Crowd Images Labels
  • 9. Labeled Datasets NE: Nepal earthquake -- EE: Ecuador earthquake – TR: Typhoon Ruby – HM: Hurricane Matthew
  • 10. Relevancy Filtering Examples of irrelevant images showing cartoons, banners, advertisements, celebrities, etc. Performance of the relevancy filtering Task: Build a binary classifier to identify irrelevant images Approach: Transfer learning (fine-tune a pre-trained convolutional neural network, e.g., VGG16)
  • 11. Duplicate Filtering Examples of near-duplicate images Task: Compute similarity between a pair of images Approach: Perceptual Hash + Hamming Distance (w/ threshold)
  • 12. Before/After Image Filtering Number of images that remain in our dataset after each image filtering operation ~ 2 % ~ 2 % ~ 50 % ~ 58 % ~ 50 % ~ 30 % Assume tagging an image costs $1, we could have gotten the same job done by paying $17k less, almost saving 2/3s of the budget!!!
  • 13. Infrastructure Damage Assessment • Three-class classification – Categories: severe, mild & little-to-none • Distinction between categories is ambiguous. • Agreement among human annotators is low. – in particular for mild category • Fine-tuning a pre-trained CNN (e.g., VGG16)
  • 14. Deployment and Evaluation during Cyclone Debbie Event Randomly selected 500 images Manually labeled irrelevant images Relevancy Filtering - Precision: 0.67 Duplicate Images - Precision: 0.92
  • 15. Thanks – Q & A Follow this project: @aidr_qcri We are looking for a PostDoc (Computer vision, natural language processing, system development) Contact us: mimran@hbku.edu.qa

Hinweis der Redaktion

  1. Sudden onset disasters and emergencies such as earthquakes, floods, bring destructions and damage to our critical infrastructure such as roads, bridges and critical buildings. Government organizations and other humanitarian organization seek information after a disaster hit to gain situational awareness and other actionable information.
  2. SM played a major role during disasters such as 2005 Hurricane Katrina, the 2011 Japanese earthquake and tsunami, and more recently Typhoon Haiyan, followed by the Nepal tragedy. Consequently, more and more emergency managers are turning to social media as a vital tool in disaster management. Twitter, the most used tool for updates, response and relief, enabled greater connectivity and information sharing capabilities. During situations like mass emergencies, disasters, epidemics nothing better than Social Media platforms like Twitter which provides unique opportunities for both affected people and Emergency responders. People share situational awareness messages, and ask for help, donations, food, water, shelter etc. On the other hand responders want to help.
  3. A number of works have been proposed to use textual reports on social media for various humanitarian tasks. In this work, we use images that people post on social media during disasters. These are a few examples images collected during three disasters. A number of tasks can be done with these images: Damage detection Damage severity measurement Injured people detection Conditions and capacity of the shelters
  4. However, collected images are not always so nice and clean. Social media imagery data is noisy. There is a lot of irrelevant and duplicate content that needs to be filtered out.
  5. The first component of the system is the relevancy filtering. As you can imagine, we do not always get nice and clean images as I have been showing you so far… Here are some example images, taken from our datasets, showing cartoons, banners, advertisements, celebrities, and so on and so forth… Obviously, these images are not relevant to the disaster event, and shall be excluded from further processing. For this purpose, we used a subset of our dataset where we sampled irrelevant images from the none category and relevant images from the severe and mild categories to build a binary classifier. Keep in mind that our focus is on getting rid of the irrelevant content. For training this binary classifier, we considered the transfer learning approach in which we took a state-of-the-art image classification model, these days this happens to be convolutional neural networks, in our particular case, we used VGG16, named after the Andrew Zisserman’s research group at Oxford that proposed the architecture. This network is originally trained on 1 million images for recognizing 1000 object categories. We fine-tune this network for the binary classification scenario using our own dataset. The performance of the resulting relevancy filtering model is shown in the table, which seems pretty good.
  6. The second important component of the system is duplicate or near-duplicate filtering… For example, there are cases when people simply re-tweet an existing tweet containing an image, or they post images with little modification (e.g. cropping/resizing, background padding, changing intensity, embedding text, etc.). Such posting behavior produces high number of near-duplicate images in an online data collection, which should again be eliminated from the processing pipeline. To perform this task, we need to compute a similarity score or a distance measure between a given pair of images. Based on a certain threshold, then we decide whether the given pair of images are duplicate or not. For this purpose, we used the popular perceptual hashing, which represents the fingerprint of an image derived from various features from its contents. Perceptual hash functions maintain perceptual equality of images hence they are robust in detecting even slight changes in the binary representation of two similar images. To determine a threshold, we created again a collection of image pairs, for which we know whether they are duplicate or not… Then, for different threshold values, we computed the overall accuracy achived by that threshold value. As you can see from the plot, a threshold of 10 seem to work fine for us.
  7. Consequently, our image filtering pipeline reduces the size of the raw image data collection by almost a factor of 3 while retaining the most relevant and informative image content for further analyses.
  8. So, now that we have a nicer and cleaner set of images, it is time to do more interesting things such as assessing the level of infrastructure damage, which is very crucial for humanitarian organizations, they crave for this kind of timely information. Low prevalence of the mild category compared to other categories has also some effect Remember, damage assessment during disaster times is a core situational awareness task for many humanitarian organizations that traditionally takes weeks or months. But here we show we can do this within hours or days after the disaster strikes.