SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Empowering First Responders through Automated
Multimodal Content Moderation
Divam Gupta, Indira Sen, Niharika Sachdeva, Ponnurangam
Kumaraguru, Arun Balaji Buduru
Why should we care about Sensitive content?
Why should we care about Sensitive content?
Why should we care about Sensitive content?
- Event or crises related sensitive
content can cause offline ramifications
- Have large-scale social and economic
impact
Who does it affect?
- Community moderators
strongly affected by
exposure to such content
Why multimodal?
● Most of the tweets contain
multimedia content such as
images , videos , etc
● Current text based models fail
when the main content is in the
tweet
● With a multimodal approach we
can jointly model different
content sources of the tweet
Roadmap
- Why should we care about sensitive content?
- Previous Work
- What is sensitive content?
- Data Collection
- Methodology
- Results
- Takeaways
Previous Work and Research Gaps
Content
Moderation
- Detecting personal attacks
using Logistic Regression
and large scale
annotations by et al. [1]
(Forms our baseline)
- Detecting hate speech in
Yahoo comments using
advanced NLP techniques
by et al. [2]
Previous Work and Research Gaps
Multimodal
detection
- Multimodal detection of
pro-anorexia content using
CNNs [3]
-
Previous Work and Research Gaps
Content
Moderation
Multimodal
detection
Our work
What is sensitive content?
Sensitivity Rulebook
Hate Speech
shows the citizen disrespect "on grounds of religion, race, place of birth,
residence, language, caste or community or any other ground whatsoever".
Violent/Gory
violent or gory content that's primarily intended to be shocking, sensational,
or disrespectful.
Political
Criticism
Content that brings or attempts to bring into hatred or contempt, or excites
or attempts to excite disaffection towards the Government.
Some examples:
Situational
Information
Event based content that is informative; curating or producing content;
contribute to situational awareness; situational information; contextual
information to better understand the situation
Mobilisation
Content that seeks to organize a movement or protest or content that
reports such an event
Text Sensitivity Dataset
● Level 1 Dataset:
○ Tweets from sensitive hashtags and non sensitive hashtags collected.
Sensitive Hashtag No of tweets
AsaramBapuji 190696
Freekashmir 74237
3rdhinduadhiveshan 38823
Owaisi 33098
lovejihad 24297
Non Sensitive hashtag No of tweets
Nifty 202894
IndvsSA 136096
MondayMotivation 110178
IPLfinal 103083
MWC16 92309
Text Sensitivity Dataset
● Level 2 Dataset:
○ Tweets from sensitive hashtags and annotated manually using codebook (one
of more sensitive categories is marked as sensitive).
Hashtag # Sensitive Tweets # Non Sensitive Tweets
CauveryProtest 2129 796
JaichandKejriwal 768 270
DhakaEid 1280 64
TamilNaduBandh 334 85
Kashmir 358 110
Jallikattu 1329 363
Image Sensitivity Dataset
- 4,500
sensitive and
nonsensitive
images.
Roadmap
- Why should we care about sensitive content?
- What is sensitive content?
- Data Collection
- Methodology
- Results
- Takeaways
Multimodal Sensitivity detection
Detecting Sensitivity in Text
● We use Recurrent Neural Networks for classifying the text
as sensitive and non-sensitive
● We learn randomly initialized word embeddings along with
the RNN classifier.
● The hidden state of the last time-step is passed to a fully
connected layer with softmax to predict the probability of
sensitivity
Detecting Sensitivity in Images
● We use a two stream Convolutional Neural Network to
classify sensitive images
● The object recognition model is pre-trained on the
ImageNet dataset
● The object recognition model is pre-trained on MIT Places
dataset
Multimodal Sensitivity detection
● We combine both the text models and the image models
which enables the model to learn the features jointly
● We concatenate the intermediate outputs of the image
model and the text model.
● In the end, we use a fully connected layer with softmax to
predict the probability of sensitivity
● We show the improvement in the results if we combine the
two models
Multimodal Sensitivity detection
Multilevel Sensitivity Classification
● Due to the skewness of the data, we get a lot of positives.
● To solve this we train a model to filter out the tweets which
are definitely not sensitive.
● We train the level 1 model on weakly annotated large data
● After filtering out the tweets, we train a level 2 classifier
which gives the final sensitivity score
Quantitative Results
Method F1 Score Accuracy
VGG16 Finetuning 0.5350 0.5500
VGG16 Features + SVM 0.8065 0.8069
Object Model 0.8343 0.8438
Object + Scene Model 0.8547 0.8550
● Results on the Image Only Dataset
Quantitative Results
Method F1 Score Accuracy
SVM Baseline 0.682 0.701
2 layer word LSTM (level 1
text model)
0.7372 0.7385
Character Level GRU( level
2 text model )
0.7180 0.7619
Word Level GRU ( level 2
text model )
0.7760 0.7816
Image + Text Model 0.8013 0.8051
● Results on the Tweets Dataset
Hyperparameters of the Best Performing Model
(Text + Image)
We got the optimal hyperparameters via grid search using cross
validation
Hyperparameter Value
Number of tokens 30
Dimension of the word embeddings 150
Number of GRU units 512
Image Size 224 x 224
Learning rate 0.01
Qualitative Results: Visualizing the text model
● We use gradient based class activation mapping to find out
the words contributing to the sensitivity score
● We see words like boycott, fighters etc are contributing to
the sensitivity score
Two suspected Bangladeshi
terrorists arrested with fake
aadhaar card along with an arms
dealer in Kolkata
Entire nation should boycott this movie.
We r never allow to someone destroy our
history. We will fight & we will win.
Indian commando, three
fighters killed in Kashmir
Visualizing the image model
● We use class activation mapping to visualize the areas of
the image contributing to the sensitivity
Qualitative analysis: Human Moderator Study
● We label 100 nonsensitive random tweets and 100
sensitive tweets with our classifier.
● Two annotators look at the scores given by our system and
find 75 % to be correctly labeled
● There is only one false negative, implying that our system
has a very low miss rate
Labeled Positive Labeled Negative
Positive 99 1
Negative 33 67
Conclusion
● large corpus of weakly and a smaller dataset annotated by
first responders
● A multi-model classifier, for detecting sensitive content on
social media
● We show the superiority of our model by improving the
performance against other state of the art models
● We also inspect the model to see what it is learning
● Future work: extend to videos, gifs and include other kinds
of sensitive content
References
1. Wulczyn, Ellery, Nithum Thain, and Lucas Dixon. "Ex machina:
Personal attacks seen at scale." Proceedings of the 26th
International Conference on World Wide Web. International World
Wide Web Conferences Steering Committee, 2017.
2. Nobata, Chikashi, et al. "Abusive language detection in online
user content." Proceedings of the 25th international conference on
world wide web. International World Wide Web Conferences
Steering Committee, 2016.
3. Chancellor, Stevie, et al. "Multimodal Classification of
Moderated Online Pro-Eating Disorder Content." Proceedings of
the 2017 CHI Conference on Human Factors in Computing Systems.
ACM, 2017.
Thanks!
arunb@iiitd.ac.in

Más contenido relacionado

Ähnlich wie Empowering First Responders through Automated Multimodal Content Moderation

Weird News Ranking : IRE project
Weird News Ranking : IRE projectWeird News Ranking : IRE project
Weird News Ranking : IRE projectRupali Aher
 
Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationSameera Horawalavithana
 
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...SAMIMAKTAR9
 
AGGRESSION DETECTION USING MACHINE LEARNING MODEL
AGGRESSION DETECTION USING MACHINE LEARNING MODELAGGRESSION DETECTION USING MACHINE LEARNING MODEL
AGGRESSION DETECTION USING MACHINE LEARNING MODELIRJET Journal
 
Trending Topics in Machine Learning
Trending Topics in Machine LearningTrending Topics in Machine Learning
Trending Topics in Machine LearningTechsparks
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...Wuhan University
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainNishant Jain
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceHiba Akroush
 
Elderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionElderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionTanvi Mittal
 
Multi-level game learning analytics for serious games - VSGames 2018
Multi-level game learning analytics for serious games - VSGames 2018Multi-level game learning analytics for serious games - VSGames 2018
Multi-level game learning analytics for serious games - VSGames 2018Iván Pérez Colado
 
Sensitive label privacy protection on social
Sensitive label privacy protection on socialSensitive label privacy protection on social
Sensitive label privacy protection on socialIEEEFINALYEARPROJECTS
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
A Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionA Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionCamella Taylor
 
AI and Video Marketing.docx
AI and Video Marketing.docxAI and Video Marketing.docx
AI and Video Marketing.docxDigiworq
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 

Ähnlich wie Empowering First Responders through Automated Multimodal Content Moderation (20)

Weird News Ranking : IRE project
Weird News Ranking : IRE projectWeird News Ranking : IRE project
Weird News Ranking : IRE project
 
Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...
LOne A Novel Approach Towards Fake News Detection Using Customized Bidirectio...
 
AGGRESSION DETECTION USING MACHINE LEARNING MODEL
AGGRESSION DETECTION USING MACHINE LEARNING MODELAGGRESSION DETECTION USING MACHINE LEARNING MODEL
AGGRESSION DETECTION USING MACHINE LEARNING MODEL
 
Trending Topics in Machine Learning
Trending Topics in Machine LearningTrending Topics in Machine Learning
Trending Topics in Machine Learning
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human Brain
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Elderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionElderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detection
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Multi-level game learning analytics for serious games - VSGames 2018
Multi-level game learning analytics for serious games - VSGames 2018Multi-level game learning analytics for serious games - VSGames 2018
Multi-level game learning analytics for serious games - VSGames 2018
 
Sensitive label privacy protection on social
Sensitive label privacy protection on socialSensitive label privacy protection on social
Sensitive label privacy protection on social
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
A Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft DetectionA Hybrid Theory Of Power Theft Detection
A Hybrid Theory Of Power Theft Detection
 
AI and Video Marketing.docx
AI and Video Marketing.docxAI and Video Marketing.docx
AI and Video Marketing.docx
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 

Mehr von IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

Mehr von IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Último

First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideMonika860882
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basicsKNaveenKumarECE
 
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptxssuser886c55
 
عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..mennamohamed200y
 
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...Luuk Brederode
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfErik Friis-Madsen
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityMorshed Ahmed Rahath
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting AlgorithmsAshutosh Satapathy
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptxHusseinMishbak
 
Tekom Netherlands | The evolving landscape of Simplified Technical English b...
Tekom Netherlands | The evolving landscape of Simplified Technical English  b...Tekom Netherlands | The evolving landscape of Simplified Technical English  b...
Tekom Netherlands | The evolving landscape of Simplified Technical English b...Shumin Chen
 
Chapter 2 Canal Falls at Mnnit Allahabad .pptx
Chapter 2 Canal Falls at Mnnit Allahabad .pptxChapter 2 Canal Falls at Mnnit Allahabad .pptx
Chapter 2 Canal Falls at Mnnit Allahabad .pptxButcher771
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesJacopo Nardiello
 
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdftpo482247
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structureswendy cai
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxkhfaizan534
 
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024California Asphalt Pavement Association
 
Governors ppt.pdf .
Governors ppt.pdf                              .Governors ppt.pdf                              .
Governors ppt.pdf .happycocoman
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationJeyporess2021
 

Último (20)

First Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slideFirst Review Group 1 PPT.pptx with slide
First Review Group 1 PPT.pptx with slide
 
introduction to python, fundamentals and basics
introduction to python, fundamentals and basicsintroduction to python, fundamentals and basics
introduction to python, fundamentals and basics
 
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
0950_Rodriguez_200520_Work_done-GEOGalicia_ELAB-converted.pptx
 
عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..عناصر نباتية PDF.pdfbotanical elements..
عناصر نباتية PDF.pdfbotanical elements..
 
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
PhD summary of Luuk Brederode, presented at 2023-10-17 to Veitch Lister Consu...
 
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdfWave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
Wave Energy Technologies Overtopping 1 - Tom Thorpe.pdf
 
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna MunicipalityField Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
Field Report on present condition of Ward 1 and Ward 2 of Pabna Municipality
 
Searching and Sorting Algorithms
Searching and Sorting AlgorithmsSearching and Sorting Algorithms
Searching and Sorting Algorithms
 
Artificial organ courses Hussein L1-C2.pptx
Artificial organ courses Hussein  L1-C2.pptxArtificial organ courses Hussein  L1-C2.pptx
Artificial organ courses Hussein L1-C2.pptx
 
Tekom Netherlands | The evolving landscape of Simplified Technical English b...
Tekom Netherlands | The evolving landscape of Simplified Technical English  b...Tekom Netherlands | The evolving landscape of Simplified Technical English  b...
Tekom Netherlands | The evolving landscape of Simplified Technical English b...
 
Chapter 2 Canal Falls at Mnnit Allahabad .pptx
Chapter 2 Canal Falls at Mnnit Allahabad .pptxChapter 2 Canal Falls at Mnnit Allahabad .pptx
Chapter 2 Canal Falls at Mnnit Allahabad .pptx
 
The Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on KubernetesThe Art of Cloud Native Defense on Kubernetes
The Art of Cloud Native Defense on Kubernetes
 
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf12. Stairs by U Nyi Hla ngae from Myanmar.pdf
12. Stairs by U Nyi Hla ngae from Myanmar.pdf
 
Support nodes for large-span coal storage structures
Support nodes for large-span coal storage structuresSupport nodes for large-span coal storage structures
Support nodes for large-span coal storage structures
 
Caltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavementsCaltrans view on recycling of in-place asphalt pavements
Caltrans view on recycling of in-place asphalt pavements
 
presentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptxpresentation by faizan[1] [Read-Only].pptx
presentation by faizan[1] [Read-Only].pptx
 
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
Caltrans District 8 Update for the CalAPA Spring Asphalt Conference 2024
 
FOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG studentsFOREST FIRE USING IoT-A Visual to UG students
FOREST FIRE USING IoT-A Visual to UG students
 
Governors ppt.pdf .
Governors ppt.pdf                              .Governors ppt.pdf                              .
Governors ppt.pdf .
 
A brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station PresentationA brief about Jeypore Sub-station Presentation
A brief about Jeypore Sub-station Presentation
 

Empowering First Responders through Automated Multimodal Content Moderation

  • 1. Empowering First Responders through Automated Multimodal Content Moderation Divam Gupta, Indira Sen, Niharika Sachdeva, Ponnurangam Kumaraguru, Arun Balaji Buduru
  • 2. Why should we care about Sensitive content?
  • 3. Why should we care about Sensitive content?
  • 4. Why should we care about Sensitive content? - Event or crises related sensitive content can cause offline ramifications - Have large-scale social and economic impact
  • 5. Who does it affect? - Community moderators strongly affected by exposure to such content
  • 6. Why multimodal? ● Most of the tweets contain multimedia content such as images , videos , etc ● Current text based models fail when the main content is in the tweet ● With a multimodal approach we can jointly model different content sources of the tweet
  • 7. Roadmap - Why should we care about sensitive content? - Previous Work - What is sensitive content? - Data Collection - Methodology - Results - Takeaways
  • 8. Previous Work and Research Gaps Content Moderation - Detecting personal attacks using Logistic Regression and large scale annotations by et al. [1] (Forms our baseline) - Detecting hate speech in Yahoo comments using advanced NLP techniques by et al. [2]
  • 9. Previous Work and Research Gaps Multimodal detection - Multimodal detection of pro-anorexia content using CNNs [3] -
  • 10. Previous Work and Research Gaps Content Moderation Multimodal detection Our work
  • 11. What is sensitive content?
  • 12. Sensitivity Rulebook Hate Speech shows the citizen disrespect "on grounds of religion, race, place of birth, residence, language, caste or community or any other ground whatsoever". Violent/Gory violent or gory content that's primarily intended to be shocking, sensational, or disrespectful. Political Criticism Content that brings or attempts to bring into hatred or contempt, or excites or attempts to excite disaffection towards the Government. Some examples: Situational Information Event based content that is informative; curating or producing content; contribute to situational awareness; situational information; contextual information to better understand the situation Mobilisation Content that seeks to organize a movement or protest or content that reports such an event
  • 13. Text Sensitivity Dataset ● Level 1 Dataset: ○ Tweets from sensitive hashtags and non sensitive hashtags collected. Sensitive Hashtag No of tweets AsaramBapuji 190696 Freekashmir 74237 3rdhinduadhiveshan 38823 Owaisi 33098 lovejihad 24297 Non Sensitive hashtag No of tweets Nifty 202894 IndvsSA 136096 MondayMotivation 110178 IPLfinal 103083 MWC16 92309
  • 14. Text Sensitivity Dataset ● Level 2 Dataset: ○ Tweets from sensitive hashtags and annotated manually using codebook (one of more sensitive categories is marked as sensitive). Hashtag # Sensitive Tweets # Non Sensitive Tweets CauveryProtest 2129 796 JaichandKejriwal 768 270 DhakaEid 1280 64 TamilNaduBandh 334 85 Kashmir 358 110 Jallikattu 1329 363
  • 15. Image Sensitivity Dataset - 4,500 sensitive and nonsensitive images.
  • 16. Roadmap - Why should we care about sensitive content? - What is sensitive content? - Data Collection - Methodology - Results - Takeaways
  • 18. Detecting Sensitivity in Text ● We use Recurrent Neural Networks for classifying the text as sensitive and non-sensitive ● We learn randomly initialized word embeddings along with the RNN classifier. ● The hidden state of the last time-step is passed to a fully connected layer with softmax to predict the probability of sensitivity
  • 19. Detecting Sensitivity in Images ● We use a two stream Convolutional Neural Network to classify sensitive images ● The object recognition model is pre-trained on the ImageNet dataset ● The object recognition model is pre-trained on MIT Places dataset
  • 20. Multimodal Sensitivity detection ● We combine both the text models and the image models which enables the model to learn the features jointly ● We concatenate the intermediate outputs of the image model and the text model. ● In the end, we use a fully connected layer with softmax to predict the probability of sensitivity ● We show the improvement in the results if we combine the two models
  • 22. Multilevel Sensitivity Classification ● Due to the skewness of the data, we get a lot of positives. ● To solve this we train a model to filter out the tweets which are definitely not sensitive. ● We train the level 1 model on weakly annotated large data ● After filtering out the tweets, we train a level 2 classifier which gives the final sensitivity score
  • 23. Quantitative Results Method F1 Score Accuracy VGG16 Finetuning 0.5350 0.5500 VGG16 Features + SVM 0.8065 0.8069 Object Model 0.8343 0.8438 Object + Scene Model 0.8547 0.8550 ● Results on the Image Only Dataset
  • 24. Quantitative Results Method F1 Score Accuracy SVM Baseline 0.682 0.701 2 layer word LSTM (level 1 text model) 0.7372 0.7385 Character Level GRU( level 2 text model ) 0.7180 0.7619 Word Level GRU ( level 2 text model ) 0.7760 0.7816 Image + Text Model 0.8013 0.8051 ● Results on the Tweets Dataset
  • 25. Hyperparameters of the Best Performing Model (Text + Image) We got the optimal hyperparameters via grid search using cross validation Hyperparameter Value Number of tokens 30 Dimension of the word embeddings 150 Number of GRU units 512 Image Size 224 x 224 Learning rate 0.01
  • 26. Qualitative Results: Visualizing the text model ● We use gradient based class activation mapping to find out the words contributing to the sensitivity score ● We see words like boycott, fighters etc are contributing to the sensitivity score Two suspected Bangladeshi terrorists arrested with fake aadhaar card along with an arms dealer in Kolkata Entire nation should boycott this movie. We r never allow to someone destroy our history. We will fight & we will win. Indian commando, three fighters killed in Kashmir
  • 27. Visualizing the image model ● We use class activation mapping to visualize the areas of the image contributing to the sensitivity
  • 28. Qualitative analysis: Human Moderator Study ● We label 100 nonsensitive random tweets and 100 sensitive tweets with our classifier. ● Two annotators look at the scores given by our system and find 75 % to be correctly labeled ● There is only one false negative, implying that our system has a very low miss rate Labeled Positive Labeled Negative Positive 99 1 Negative 33 67
  • 29. Conclusion ● large corpus of weakly and a smaller dataset annotated by first responders ● A multi-model classifier, for detecting sensitive content on social media ● We show the superiority of our model by improving the performance against other state of the art models ● We also inspect the model to see what it is learning ● Future work: extend to videos, gifs and include other kinds of sensitive content
  • 30. References 1. Wulczyn, Ellery, Nithum Thain, and Lucas Dixon. "Ex machina: Personal attacks seen at scale." Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 2017. 2. Nobata, Chikashi, et al. "Abusive language detection in online user content." Proceedings of the 25th international conference on world wide web. International World Wide Web Conferences Steering Committee, 2016. 3. Chancellor, Stevie, et al. "Multimodal Classification of Moderated Online Pro-Eating Disorder Content." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 2017.