Human Emotions based on Facial Expression using CNN
1. Human Emotions based on Facial
Expression using CNN
Big Data Ecosystems Spring 17 - Course Project
Team 12 - Happy Gator
Chaitanya Maddala, Vineeth Kamisetty, Sandeep Basavaraju
Department of CISE
University of Florida
2. Introduction
Humans use different forms of communications such as speech, gestures and
emotions. Understanding one’s emotion through facial expression is
challenging when compared to the speech and gestures.
The input into our system is an image of the facial expression; then, the model
will predict the emotion.
Goal: Giving the capability to an artificial neural network to interpret human facial
expression, that is to recognize one of six categories of human emotions(Angry,
Fear ,Happy, Sad, Surprise, Neutral).
3. Applications
Surveillance and behavioural classification by law enforcement
Automatic camera capture when person smiles
Humanization of artificial intelligent systems
4. Literature Survey
Imagenet classification with deep convolutional neural networks.
A landmark paper in the history of the deep learning by Krizhevsky, Sutskever
and Hinton on Image classification, in which a neural network with 5
convolutional, 3 max pooling, and 3 fully connected layers was developed.
Trained with 1.2 million images from the ImageNet LSVRC-2010 contest
Obtained a error rate of 37.5%, which is the best ever reported.
It demonstrated the capability of CNN along with max pooling and techniques
to reduce overfitting like dropout.
5. Recognizing semantic features in faces using deep learning
One of the most recent studies on emotion recognition describes a neural
network able to recognize race, age, gender, and emotion from pictures of
faces.
The dataset used for the later category is from the Facial Expression
Recognition Challenge (FERC-2013).
A deep neural network consisting of 3 convolutional layers, 1 fully connected
layer, and some small layers in between obtained an average accuracy of
67% on emotion classification,
Equal to previous state-of-the-art publications on that dataset.
Literature Survey
6. DataSets
FERC(Facial Expression Recognition Challenge)-2013
28000+ face samples of training data
4000+ test data images.
● RafD(Radboud faces database):High quality images transformed to gray scale
and to model input image size i.e 48x48
7. Image preprocessing
We used OpenCV to capture the live image.
Using Haar Cascade image processing technique to detect the faces.
We found that there was situation where it didn’t detect the faces in the live
image due to lack of contrast.
So we employed histogram equalization to improve detection by increasing
contrast.
8. Image preprocessing
● Haar-cascade :
○ Face detection using Haar-cascade is based upon the training of a Binary classifier system
using number of positive images that represent the object to be recognized (like faces of
different persons at different backgrounds) and even large number of negative images that
represent objects or feature not to be detected(images that are not faces but can be anything
else like chair, table, wall, etc.)
Actual Image Extracted face
9. TFLearn
TFlearn is a modular and transparent deep learning library built on top of Tensorflow.TFLearn is a
high-level API for fast neural network building and training.
Layers:Defining a model using Tensorflow completely is time consuming and repetitive, TFLearn has
"layers" that represent an abstract set of operations , which make building neural networks more
convenient.
Tensoflow:
with tf.name_scope('conv1'):
W = tf.Variable(tf.random_normal([5, 5, 1, 32]), dtype=tf.float32, name='Weights')
b = tf.Variable(tf.random_normal([32]), dtype=tf.float32, name='biases')
x = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
x = tf.add_bias(W, b)
x = tf.nn.relu(x)
TfLearn:
tflearn.conv_2d(x, 32, 5, activation='relu', name='conv1')
10. Training, Evaluating & Predicting: Tensorflow does not have any pre-built API to train a
network.But TFLearn has set of functions that can handles any neural network training, regardless of the
number of inputs, outputs and optimizers. TFLearn provides a model wrapper ('DNN') that automatically
performs neural network classifier tasks, such as training, prediction, save/restore
network = ....
network = regression(network, optimizer='sgd', loss='categorical_crossentropy')
model = DNN(network)
model.fit(X, Y)
Scopes & Weights sharing : TFLearn makes it easy to share variables among multiple layers and
hence is suitable for distributed training. It supports 'scope' argument layers with same scope name will
share the same weights.
def my_model(x):
x = tflearn.fully_connected(x, 32, scope='fc1')
x = tflearn.fully_connected(x, 32, scope='fc2')
TFLearn
14. Models:
Different final models comparison ->
Model A Model B Model C Model D Model res
Conv
(3x32)
+
Conv
(5x32)
Conv(5x32) Conv(5x64) Conv(5x64) Conv(5x 64)
Conv
(3x64)
+
Conv
(5x64)
Conv(5x64) Conv(5x64) Conv(5x64) Residual_bottlene
ck
(3,16,64)
Conv
(3x128)
+
Conv(5x128)
Conv(5x128) Conv(5x128) Conv(5x64) Residual_bottlene
ck
(1,32,128)
fc(1024) fc(1024) fc(1024) Conv(4x128) Residual_bottlene
ck
(2,32,128)
fc(1024) fc(1024) fc(1024) Conv(4x128) Residual_bottlene
ck
(1,64,256)
- - - - Residual_bottlene
ck
(2,64,256)
- - - fc(3072) fc(3072)
15. Experimental Results: FERC
Model Training
Accuracy
Validation
Accuracy
Testing Accuracy
(Top-1 &Top- 2)
Model A 51.4 48.7 48.67% & 68.07%
Model B 45.38 45.50 43.91% & 61.85%
Model C 71.15 59.68 60.1% & 78.37%
Model D 57 51 49.18% & 69.10%
Model res 69 59.35 58.15% & 75.24%
16. Experimental Results: RaFD
Model Complete RafD (1400)
top 1 and top 2
Model C 59.12% & 82.88%
Model Test dataset top 1 and top 2 (400 images of RaFD)
Model C 63.34% & 83.29%
Model C + RafD trained (1000 images) 91.15% & 98.52%
17. Accuracy of existing models
Network proposed by Krizhevsky and Hinton consists of three convolutional
layers and two fully connected layers, combined with max pooling layers for
reducing the image size and a dropout layer to reduce the chance of
overfitting
The final accuracy on the testing data of FERC is around 71% as of 2016.
18. Training graphs
Training accuracy for all 5 models Validation Accuracy for all 5 models
model A, model B, model C, model D, model resnet
19. Performance Evaluation
The trained model is tested with the 3500 images test data set provided from
FERC-2013.
The model is also tested on RaFD data frontal faces and the accuracy results
obtained is …..
Top-1 and Top-2 accuracy results on the testing set are recorded and compared
for evaluating the trained model.
Top-2 accuracy on FERC test data : 78.97%
20. Results and Summary
In addition to results from FERC-2013 test data set, it would also include from
other datasets.
A comparison of multiple datasets would be shown i.e training and testing on
different datasets.
22. Prediction Matrix:
This gives us insights about confusions
across emotions.
From the figure, we can infer that fear is
confused with sadness.
23. Model A vs Model C
Prediction matrix for Model A for FERC dataset Prediction matrix for Model C for FERC dataset.
24. Analyzing features extracted
in between layers
Activation Map for Surprise -------------->
Activation Map for Happy ----------------->
25. Final Demo :
● An application which can recognize emotion at real time capturing images
using OpenCV.
● Capture live images from video frame, format it to grayscale 48x48 pixels
● Image is sent to the model for prediction and output the emotion to video
frame.
27. References:
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing
system.
Kaggle. Challenges in representation learning:Facial expression recognition
challenge, 2013.
Y. Lv, Z. Feng, and C. Xu. Facial expression recognition via deep learning. In
Smart Computing (SMARTCOMP), 2014 International.
TFlearn Tearn: Deep learning library featuring a higher-level api for tensorow.
URL :http://tflearn.org/.