SlideShare ist ein Scribd-Unternehmen logo
Hate Speech in Pixels:
Detection of Offensive Memes
towards Automatic Moderation
Benet Oriol Sàbat
Co-Directed by:
Xavier Giró
Cristian Canton
Contents
● Motivation
● System Description
● Experiments - Results
● Qualitative Results
● Further Work
● Conclusion
2
Motivation (I): Memes
What are memes?
3
Motivation (II): Hate Memes
What are hate memes?
4
Motivation (II): Hate Memes
What are hate memes?
5
Motivation (III): Hate Memes Detection
Hate Speech Detection
6
Overall System
Hate Speech Detection
7
OCR Extraction (I)
Hate Speech Detection
8
OCR Extraction (II)
OCR
When you act up in class
and your teacher starts
calling your parents but
you gave her the number to
Pizza Hut
Tesseract 4.0
Uses neural networks
0.5s / image → previous extraction
9
Text Feature Extraction (I)
Hate Speech Detection
10
Text Feature Extraction (II)
When you act up in class
and your teacher starts
calling your parents but
you gave her the number to
Pizza Hut
OCR Text Embedder
Feature
Vector
[0.32,
-0.79,
...,
1.04,
0.02]
11
(t1
, t2
, …, tM
)
Text Feature Extraction (III). BERT
When you act up in class
and your teacher starts
calling your parents but
you gave her the number to
Pizza Hut
BERT
Feature
Vector
[0.32,
-0.79,
...,
1.04,
0.02]
12
(t1
, t2
, …, tM
)
Image Feature Extraction (I)
Hate Speech Detection
13
Image Feature Extraction (II)
Image
embedder
[0.01,
-1.2,
…
0.5,
0.52]
14
(i1
, i2
, …, iN
)
Image Feature Extraction (III)
We make the assumption that hidden layers have relevant information for tasks other
than ImageNet classification (for which it was trained) [ref].
15
Scheme of the VGG-16
Feature Fusion (I)
Hate Speech Detection
16
Feature Fusion (II). Concatenation
Feature fusion
Image Embedding
Text Embedding
Image + Text Embedding
Concatenation
(i1
, i2
, …, iN
)
(t1
, t2
, …, tM
)
(i1
, i2
, …, iN,
t1
, t2
, …, tM
)
17
Hate Predictor (I)
Hate Speech Detection
18
Hate Predictor (II)
(i1
, i2
, …, iN,
t1
, t2
, …, tM
)
19
Feature fusion Hate score ∈ R
Dataset (I)
20
● No labelled data for our task
● Downloaded (neutral or non-hate memes from the Reddit Memes
Dataset (3325 memes)
● Downloaded from Google images Memes with the following keywords
(1695):
○ racist meme: 643 memes
○ jew meme: 551 memes
○ muslim meme. 501 memes
● Total of 5020 memes.
● Dubious quality of annotations
● Train: 85%
● Validation: 15%
Implementation - Setup
21
● Main framework: Python
● Neural Nets Framework: PyTorch
● VGG16 Implementation and Pretrained weights: Torchvision
● BERT Implementation and Pretrained weights:
https://github.com/huggingface/pytorch-pretrained-BERT7
● OCR: Tesseract 4.0 -> Pytesseract wrapper for Python
Preprocessing
22
● Previous OCR extraction → Much faster training process.
● Character sequence to BERT Tokens sequence (BERT Input)
● Crop / Pad BERT Token sequence to 50 tokens
● Images to size 224x224 (VGG inputs size)
Experiments and Results (I). Baseline
23
● No baseline for our task.
● Starting point:
○ Frozen VGG16 and BERT
○ Classifier. A Multi-Layer Perceptron (MLP) with two Hidden Layers, Hidden size =
100.
○ Optimizer: SGD with momentum. Learining rate = 0.01, momentum = 0.9.
○ Batch size = 30
○ Loss function: Mean Squared Error (MSE).
Result: 82.6% Validation Accuracy
In this figure we observe in (a) the validation Accuracy and in (b) the train loss.
(a) (b)
Experiments and Results (II). Data Augmentation
24
● Resize image to 255x255 (Instead of 224x224)
● Randomly crop 224x244 patch
● Result: Accuracy 82.0%
Experiments and Results (III). Capacity Reduction
25
● No data Augmentation
● Hidden size = 50 (not 100)
● Result: Accuracy 82%
Experiments and Results (IV). Dropout
26
● No data Augmentation
● Hidden size = 100
● Result: Accuracy 81 %
● Dropout:
○ All the MLP layers (p=0.5)
Experiments and Results (V). Dropout
27
● No data Augmentation
● Hidden size = 50
● Result: Accuracy 81.7%
● Dropout:
○ First MLP layer (p=0.2)
Experiments and Results (VI).
28
Regularization Summary:
● Baseline: 82.6%. Overfitting
● Data augmentation (Random Cropping): 81%. Overfitting
● Capacity Reduction: 82%. Overfitting
● Dropout:
○ All the MLP, p=0.5, 81%, random forgetting
○ First MLP HL 50, p=0.2, 81.7%, no overfitting
Multimodal Fusion. Mono-mode systems
29
Dataset lower
bound!
Fine-tuning the descriptors (I). BERT
30
Text Only classifier, with and without BERT finetuning
Fine-tuning the descriptors (II). BERT & VGG
31
After unfreezing BERT and VGG’s classifier (top layers) we got a accuracy of 83.0%
Fine-tuning the descriptors (III). BERT & VGG
32
Progressive Fine-Tuning. We unfreze the weights at epoch X.
(a) for validation accuracy and (b) for validation loss.
Blue: no fine.tuning. Light Blue: finetuning from epoch 10. Acc: 83.7%. Pink: Finetuning from epoch
50. Acc: 84.3%.
Fine-tuning the descriptors (IV). Summary
33
Failed experiments (I). Unsupervised Pretraining
34
Hate Speech Detection
Architecture
Unsupervised
task (image +text
matching)
We downloaded 1500 unlabelled images, and separated them from the labelled data.
We were not able to learn anything from this task (50% accuracy).
Failed experiments (II). Introducing expert knowledge
35
We make a list of 12 words that can potentially be hate speech. We one-hot encode the
presence of these words in the OCR extracted text and concatenate this vector along with
image and text features.
Qualitative analysis (I). Best predictions
36
Qualitative analysis (II). Worse predictions
37
Further work
38
● Dataset
○ Poor annotation
○ Probably visually biased
○ Small
● Descriptors
○ XLNet Models
○ Expert knowledge
● Better ways of fusing multimode embeddings.
● OCR extraction
Conclusions
39
● Accuracy up to 84.4%
● Explored regularization techniques
● This unsupervised pre-training is useless
● Poor dataset
● Need to find a way to introduce expert knowledge.
40
41

Weitere ähnliche Inhalte

Was ist angesagt?

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
Sushanti Acharya
 
BERT
BERTBERT
ChatGPT: Le bon la brute et le changement
ChatGPT: Le bon la brute et le changementChatGPT: Le bon la brute et le changement
ChatGPT: Le bon la brute et le changement
Jérémie Guay
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
annusharma26
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
Ashish Mundra
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Sunil Kandari
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
Grigory Sapunov
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
Anuj Gupta
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learning
ayatan2
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
BigData Republic
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
Charles Martin
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
Kishor Datta Gupta
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
Nitish J Prabhu
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
RAKESH P
 
Natural Language Processing for Medical Data
Natural Language Processing for Medical DataNatural Language Processing for Medical Data
Natural Language Processing for Medical Data
Anja Pilz
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
Ml ppt
Ml pptMl ppt
Ml ppt
Alpna Patel
 

Was ist angesagt? (20)

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Natural Language Processing using Text Mining
Natural Language Processing using Text MiningNatural Language Processing using Text Mining
Natural Language Processing using Text Mining
 
BERT
BERTBERT
BERT
 
ChatGPT: Le bon la brute et le changement
ChatGPT: Le bon la brute et le changementChatGPT: Le bon la brute et le changement
ChatGPT: Le bon la brute et le changement
 
Introduction to ChatGPT
Introduction to ChatGPTIntroduction to ChatGPT
Introduction to ChatGPT
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learning
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
 
LLM avalanche June 2023.pdf
LLM avalanche June 2023.pdfLLM avalanche June 2023.pdf
LLM avalanche June 2023.pdf
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 
Natural Language Processing for Medical Data
Natural Language Processing for Medical DataNatural Language Processing for Medical Data
Natural Language Processing for Medical Data
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Ml ppt
Ml pptMl ppt
Ml ppt
 

Ähnlich wie Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Ricardo Guerrero Gómez-Olmedo
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
Saad Elbeleidy
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
Van Huy
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
Amgad Muhammad
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Seldon
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
ssuserdfc773
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human Brain
Nishant Jain
 
Practical ML
Practical MLPractical ML
Practical ML
Antonio Pitasi
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine Learning
SolivarLabs
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
Omid Vahdaty
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
Mary Margarat
 
ML in Android
ML in AndroidML in Android
ML in Android
Jose Antonio Corbacho
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
BigML, Inc
 
eam2
eam2eam2
eam2
butest
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
NETFest
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
GDSC UofT Mississauga
 

Ähnlich wie Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation (20)

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
KaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep LearningKaoNet: Face Recognition and Generation App using Deep Learning
KaoNet: Face Recognition and Generation App using Deep Learning
 
Eye deep
Eye deepEye deep
Eye deep
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud MLScaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Machine learning_ Replicating Human Brain
Machine learning_ Replicating Human BrainMachine learning_ Replicating Human Brain
Machine learning_ Replicating Human Brain
 
Practical ML
Practical MLPractical ML
Practical ML
 
Important Concepts for Machine Learning
Important Concepts for Machine LearningImportant Concepts for Machine Learning
Important Concepts for Machine Learning
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
 
ML in Android
ML in AndroidML in Android
ML in Android
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
eam2
eam2eam2
eam2
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
CSSC ML Workshop
CSSC ML WorkshopCSSC ML Workshop
CSSC ML Workshop
 

Mehr von Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Kürzlich hochgeladen

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Kürzlich hochgeladen (20)

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

  • 1. Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation Benet Oriol Sàbat Co-Directed by: Xavier Giró Cristian Canton
  • 2. Contents ● Motivation ● System Description ● Experiments - Results ● Qualitative Results ● Further Work ● Conclusion 2
  • 4. Motivation (II): Hate Memes What are hate memes? 4
  • 5. Motivation (II): Hate Memes What are hate memes? 5
  • 6. Motivation (III): Hate Memes Detection Hate Speech Detection 6
  • 8. OCR Extraction (I) Hate Speech Detection 8
  • 9. OCR Extraction (II) OCR When you act up in class and your teacher starts calling your parents but you gave her the number to Pizza Hut Tesseract 4.0 Uses neural networks 0.5s / image → previous extraction 9
  • 10. Text Feature Extraction (I) Hate Speech Detection 10
  • 11. Text Feature Extraction (II) When you act up in class and your teacher starts calling your parents but you gave her the number to Pizza Hut OCR Text Embedder Feature Vector [0.32, -0.79, ..., 1.04, 0.02] 11 (t1 , t2 , …, tM )
  • 12. Text Feature Extraction (III). BERT When you act up in class and your teacher starts calling your parents but you gave her the number to Pizza Hut BERT Feature Vector [0.32, -0.79, ..., 1.04, 0.02] 12 (t1 , t2 , …, tM )
  • 13. Image Feature Extraction (I) Hate Speech Detection 13
  • 14. Image Feature Extraction (II) Image embedder [0.01, -1.2, … 0.5, 0.52] 14 (i1 , i2 , …, iN )
  • 15. Image Feature Extraction (III) We make the assumption that hidden layers have relevant information for tasks other than ImageNet classification (for which it was trained) [ref]. 15 Scheme of the VGG-16
  • 16. Feature Fusion (I) Hate Speech Detection 16
  • 17. Feature Fusion (II). Concatenation Feature fusion Image Embedding Text Embedding Image + Text Embedding Concatenation (i1 , i2 , …, iN ) (t1 , t2 , …, tM ) (i1 , i2 , …, iN, t1 , t2 , …, tM ) 17
  • 18. Hate Predictor (I) Hate Speech Detection 18
  • 19. Hate Predictor (II) (i1 , i2 , …, iN, t1 , t2 , …, tM ) 19 Feature fusion Hate score ∈ R
  • 20. Dataset (I) 20 ● No labelled data for our task ● Downloaded (neutral or non-hate memes from the Reddit Memes Dataset (3325 memes) ● Downloaded from Google images Memes with the following keywords (1695): ○ racist meme: 643 memes ○ jew meme: 551 memes ○ muslim meme. 501 memes ● Total of 5020 memes. ● Dubious quality of annotations ● Train: 85% ● Validation: 15%
  • 21. Implementation - Setup 21 ● Main framework: Python ● Neural Nets Framework: PyTorch ● VGG16 Implementation and Pretrained weights: Torchvision ● BERT Implementation and Pretrained weights: https://github.com/huggingface/pytorch-pretrained-BERT7 ● OCR: Tesseract 4.0 -> Pytesseract wrapper for Python
  • 22. Preprocessing 22 ● Previous OCR extraction → Much faster training process. ● Character sequence to BERT Tokens sequence (BERT Input) ● Crop / Pad BERT Token sequence to 50 tokens ● Images to size 224x224 (VGG inputs size)
  • 23. Experiments and Results (I). Baseline 23 ● No baseline for our task. ● Starting point: ○ Frozen VGG16 and BERT ○ Classifier. A Multi-Layer Perceptron (MLP) with two Hidden Layers, Hidden size = 100. ○ Optimizer: SGD with momentum. Learining rate = 0.01, momentum = 0.9. ○ Batch size = 30 ○ Loss function: Mean Squared Error (MSE). Result: 82.6% Validation Accuracy In this figure we observe in (a) the validation Accuracy and in (b) the train loss. (a) (b)
  • 24. Experiments and Results (II). Data Augmentation 24 ● Resize image to 255x255 (Instead of 224x224) ● Randomly crop 224x244 patch ● Result: Accuracy 82.0%
  • 25. Experiments and Results (III). Capacity Reduction 25 ● No data Augmentation ● Hidden size = 50 (not 100) ● Result: Accuracy 82%
  • 26. Experiments and Results (IV). Dropout 26 ● No data Augmentation ● Hidden size = 100 ● Result: Accuracy 81 % ● Dropout: ○ All the MLP layers (p=0.5)
  • 27. Experiments and Results (V). Dropout 27 ● No data Augmentation ● Hidden size = 50 ● Result: Accuracy 81.7% ● Dropout: ○ First MLP layer (p=0.2)
  • 28. Experiments and Results (VI). 28 Regularization Summary: ● Baseline: 82.6%. Overfitting ● Data augmentation (Random Cropping): 81%. Overfitting ● Capacity Reduction: 82%. Overfitting ● Dropout: ○ All the MLP, p=0.5, 81%, random forgetting ○ First MLP HL 50, p=0.2, 81.7%, no overfitting
  • 29. Multimodal Fusion. Mono-mode systems 29 Dataset lower bound!
  • 30. Fine-tuning the descriptors (I). BERT 30 Text Only classifier, with and without BERT finetuning
  • 31. Fine-tuning the descriptors (II). BERT & VGG 31 After unfreezing BERT and VGG’s classifier (top layers) we got a accuracy of 83.0%
  • 32. Fine-tuning the descriptors (III). BERT & VGG 32 Progressive Fine-Tuning. We unfreze the weights at epoch X. (a) for validation accuracy and (b) for validation loss. Blue: no fine.tuning. Light Blue: finetuning from epoch 10. Acc: 83.7%. Pink: Finetuning from epoch 50. Acc: 84.3%.
  • 33. Fine-tuning the descriptors (IV). Summary 33
  • 34. Failed experiments (I). Unsupervised Pretraining 34 Hate Speech Detection Architecture Unsupervised task (image +text matching) We downloaded 1500 unlabelled images, and separated them from the labelled data. We were not able to learn anything from this task (50% accuracy).
  • 35. Failed experiments (II). Introducing expert knowledge 35 We make a list of 12 words that can potentially be hate speech. We one-hot encode the presence of these words in the OCR extracted text and concatenate this vector along with image and text features.
  • 36. Qualitative analysis (I). Best predictions 36
  • 37. Qualitative analysis (II). Worse predictions 37
  • 38. Further work 38 ● Dataset ○ Poor annotation ○ Probably visually biased ○ Small ● Descriptors ○ XLNet Models ○ Expert knowledge ● Better ways of fusing multimode embeddings. ● OCR extraction
  • 39. Conclusions 39 ● Accuracy up to 84.4% ● Explored regularization techniques ● This unsupervised pre-training is useless ● Poor dataset ● Need to find a way to introduce expert knowledge.
  • 40. 40
  • 41. 41