Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

Hate Speech in Pixels:
Detection of Oﬀensive Memes
towards Automatic Moderation
Benet Oriol Sàbat
Co-Directed by:
Xavier Giró
Cristian Canton

Contents
● Motivation
● System Description
● Experiments - Results
● Qualitative Results
● Further Work
● Conclusion
2

Motivation (I): Memes
What are memes?
3

Motivation (II): Hate Memes
What are hate memes?
4

Motivation (II): Hate Memes
What are hate memes?
5

Motivation (III): Hate Memes Detection
Hate Speech Detection
6

Overall System
7

OCR Extraction (I)
8

OCR Extraction (II)
OCR
When you act up in class
and your teacher starts
calling your parents but
you gave her the number to
Pizza Hut
Tesseract 4.0
Uses neural networks
0.5s / image → previous extraction
9

Text Feature Extraction (I)
10

Text Feature Extraction (II)
Pizza Hut
OCR Text Embedder
Feature
Vector
[0.32,
-0.79,
...,
1.04,
0.02]
11
(t1
, t2
, …, tM
)

Text Feature Extraction (III). BERT
Pizza Hut
BERT
Feature
Vector
[0.32,
-0.79,
...,
1.04,
0.02]
12
(t1
, t2
, …, tM
)

Image Feature Extraction (I)
13

Image Feature Extraction (II)
Image
embedder
[0.01,
-1.2,
…
0.5,
0.52]
14
(i1
, i2
, …, iN
)

Image Feature Extraction (III)
We make the assumption that hidden layers have relevant information for tasks other
than ImageNet classiﬁcation (for which it was trained) [ref].
15
Scheme of the VGG-16

Feature Fusion (I)
16

Feature Fusion (II). Concatenation
Feature fusion
Image Embedding
Text Embedding
Image + Text Embedding
Concatenation
(i1
, i2
, …, iN
)
(t1
, t2
, …, tM
)
(i1
, i2
, …, iN,
t1
, t2
, …, tM
)
17

Hate Predictor (I)
18

Hate Predictor (II)
(i1
, i2
, …, iN,
t1
, t2
, …, tM
)
19
Feature fusion Hate score ∈ R

Dataset (I)
20
● No labelled data for our task
● Downloaded (neutral or non-hate memes from the Reddit Memes
Dataset (3325 memes)
● Downloaded from Google images Memes with the following keywords
(1695):
○ racist meme: 643 memes
○ jew meme: 551 memes
○ muslim meme. 501 memes
● Total of 5020 memes.
● Dubious quality of annotations
● Train: 85%
● Validation: 15%

Implementation - Setup
21
● Main framework: Python
● Neural Nets Framework: PyTorch
● VGG16 Implementation and Pretrained weights: Torchvision
● BERT Implementation and Pretrained weights:
https://github.com/huggingface/pytorch-pretrained-BERT7
● OCR: Tesseract 4.0 -> Pytesseract wrapper for Python

Preprocessing
22
● Previous OCR extraction → Much faster training process.
● Character sequence to BERT Tokens sequence (BERT Input)
● Crop / Pad BERT Token sequence to 50 tokens
● Images to size 224x224 (VGG inputs size)

Experiments and Results (I). Baseline
23
● No baseline for our task.
● Starting point:
○ Frozen VGG16 and BERT
○ Classiﬁer. A Multi-Layer Perceptron (MLP) with two Hidden Layers, Hidden size =
100.
○ Optimizer: SGD with momentum. Learining rate = 0.01, momentum = 0.9.
○ Batch size = 30
○ Loss function: Mean Squared Error (MSE).
Result: 82.6% Validation Accuracy
In this ﬁgure we observe in (a) the validation Accuracy and in (b) the train loss.
(a) (b)

Experiments and Results (II). Data Augmentation
24
● Resize image to 255x255 (Instead of 224x224)
● Randomly crop 224x244 patch
● Result: Accuracy 82.0%

Experiments and Results (III). Capacity Reduction
25
● No data Augmentation
● Hidden size = 50 (not 100)
● Result: Accuracy 82%

Experiments and Results (IV). Dropout
26
● Hidden size = 100
● Result: Accuracy 81 %
● Dropout:
○ All the MLP layers (p=0.5)

Experiments and Results (V). Dropout
27
● Hidden size = 50
● Result: Accuracy 81.7%
● Dropout:
○ First MLP layer (p=0.2)

Experiments and Results (VI).
28
Regularization Summary:
● Baseline: 82.6%. Overfitting
● Data augmentation (Random Cropping): 81%. Overfitting
● Capacity Reduction: 82%. Overfitting
● Dropout:
○ All the MLP, p=0.5, 81%, random forgetting
○ First MLP HL 50, p=0.2, 81.7%, no overfitting

Multimodal Fusion. Mono-mode systems
29
Dataset lower
bound!

Fine-tuning the descriptors (I). BERT
30
Text Only classiﬁer, with and without BERT ﬁnetuning

Fine-tuning the descriptors (II). BERT & VGG
31
After unfreezing BERT and VGG’s classiﬁer (top layers) we got a accuracy of 83.0%

Fine-tuning the descriptors (III). BERT & VGG
32
Progressive Fine-Tuning. We unfreze the weights at epoch X.
(a) for validation accuracy and (b) for validation loss.
Blue: no ﬁne.tuning. Light Blue: ﬁnetuning from epoch 10. Acc: 83.7%. Pink: Finetuning from epoch
50. Acc: 84.3%.

Fine-tuning the descriptors (IV). Summary
33

Failed experiments (I). Unsupervised Pretraining
34
Architecture
Unsupervised
task (image +text
matching)
We downloaded 1500 unlabelled images, and separated them from the labelled data.
We were not able to learn anything from this task (50% accuracy).

Failed experiments (II). Introducing expert knowledge
35
We make a list of 12 words that can potentially be hate speech. We one-hot encode the
presence of these words in the OCR extracted text and concatenate this vector along with
image and text features.

Qualitative analysis (I). Best predictions
36

Qualitative analysis (II). Worse predictions
37

Further work
38
● Dataset
○ Poor annotation
○ Probably visually biased
○ Small
● Descriptors
○ XLNet Models
○ Expert knowledge
● Better ways of fusing multimode embeddings.
● OCR extraction

Conclusions
39
● Accuracy up to 84.4%
● Explored regularization techniques
● This unsupervised pre-training is useless
● Poor dataset
● Need to ﬁnd a way to introduce expert knowledge.

Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation

Ähnlich wie Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation (20)

Mehr von Universitat Politècnica de Catalunya

Mehr von Universitat Politècnica de Catalunya (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation