The document describes a study that used deep learning algorithms to classify workload levels based on electroencephalography (EEG) data. Five deep learning models - artificial neural networks, support vector machines, radial basis function, linear discriminant analysis, and stacked autoencoders - were trained on EEG features extracted from subjects performing high, medium, and low workload tasks. The trained models achieved accurate classification of workload levels based on new EEG data, demonstrating the potential of using deep learning with EEG for workload monitoring.
2. CERTIFICATE
This is to certify that the project entitled ‘Brain Informatics Using Deep
Learning’ is the bonafide work of Vikramank Singh conducted in the Biomedical
Engineering Department of the Institute of Nuclear Medicine and Allied Sciences,
DRDO, Delhi under the supervision and guidance of Mr. Sushil Chandra, Scientist
‘F’.
Sh. Sushil Chandra
Scientist ‘F’
Biomedical Engg. Department
INMAS (DRDO)
1
3. ACKNOWLEDGEMENT
I hereby take this opportunity to express my sincere gratitude to all the people
who have contributed with their knowledge and experience in aiding me with my
project. It would have been quite a difficult task for me to complete this work.
I am thankful to Mr. Sushil Chandra, Scientist ‘F’ & Head B.M.E. Deptt. , INMAS
(DRDO) for coordinating this training and giving me an invaluable opportunity to
work in a competitive yet amicable atmosphere and providing me with all the
facilities and paraphernalia required to carry out this project. His profound
knowledge and understanding provided me with an entirely new perspective on
my project. It was always a new and unique experience working with him.
I would like to express my gratitude towards Mrs. Greeshma Sharma my project
monitor for her worthwhile suggestions and fruitful help and also for all the
knowledge she imparted to me during the course of time.
Finally I would like to express my deep appreciation to my family and friends
who have been a constant source of inspiration. I am eternally grateful to them
for always encouraging and being with me whereever and whenever I needed
them.
2
4.
About the Organization
Defense Research and Development Organization (DRDO)
DRDO was formed in 1958 from the amalgamation of the then already
functioning Technical Development Establishment (TDEs) of the Indian Army and
the Directorate of Technical Development & Production (DTDP) with the Defense
Science Organization (DSO). DRDO was then a small organization with 10
establishments or laboratories. Over the years, it has grown multi-directionally in
terms of the variety of subject disciplines, number of laboratories, achievements.
Today, DRDO is a network of more than 50 laboratories which are deeply
engaged in developing defense technologies covering various disciplines, like
aeronautics, armaments, electronics, combat vehicles, engineering systems,
instrumentation, missiles, advanced computing and simulation, special materials,
naval systems, life sciences, training, information systems and agriculture.
Presently, the Organization is backed by over 5000 scientists and about 25,000
other scientific, technical and supporting personnel. Several major projects for
the development of missiles, armaments, light combat aircrafts, radars, electronic
warfare systems etc. are on hand and significant achievements have already
been made in several such technologies.
Institute of Nuclear Medicine and Allied Sciences (INMAS)
At the instance of Pandit Jawaharlal Nehru, the first Prime Minister of India, a
Radiation Cell was established in 1956 at Defence Science Laboratory, Delhi.The
initial assignment was to undertake a study on the consequences of the use of
nuclear and other weapons of mass destruction. But it was soon realized that
nuclear energy can also be harnessed for the good of the mankind.
Radioisotopes could find peaceful medical applications. The scope of work was,
therefore, enlarged and the cell upgraded to Radiation Medicine Division in 1959.
As awareness increased, so did the work and a full-fledged establishment was
created in 1961 and named Institute of Nuclear Medicine and Allied Sciences.
Since then it has traversed a long way, carrying out R&D and providing service as
a model of excellence in various aspects of Nuclear Medicine and Allied Sciences.
The activities of the Institute have proliferated enormously over the years. Its
areas of activity have been diversified to cover many fields of radiation and
bio-medical sciences.
3
5.
Vision
The Vision of INMAS has been identified as to be a centre of excellence in
biomedical and clinical research with special reference to ionizing radiation.
Mission
The Mission of INMAS is clinical research in nuclear medicine and non-invasive
imaging methods with a focus on biological radio-protectors and thyroid
disorders.
Basic Background and Theory
Project Background
Institute of Nuclear Medicine and Allied Sciences (INMAS), a wing of Defence
Research and Development Organization (DRDO) is currently in the third year of
it four-year project “Cognition Enhancement using Non-Invasive Interventions”.
This project would not only benefit the training regimen for defence personnel as
it would enhance their reasoning, attention, planning, decision making, memory
and sensory input processing abilities, but would also contribute to the treatment
of cognitive disorders like ADD and ADHD, executive disfunctioning in stroke
patients, autism and cognitive skill degradation due to natural ageing.
Fig 1: Research at BME, INMAS
6.
Brain Informatics Using Deep Learning
Final Research Report
4th
February 2016
1. Abstract
Electroencephalography (EEG) technology has gained growing popularity in
various applications. In this report we propose a deep learning based automated
system which can classify the workload into 3 categories - High, Medium and Low
using the Electroencephalographic signals (EEG) acquired by an inexpensive EEG
device (Emotiv EEG). Workload is a critical factor influencing the performance of
an individual in any field ranging from Research, corporate job to Army personels.
In this study, a 14 channel EEG was used to acquire the brain signals while the
subjects were given some tasks to perform which were divided based on the
workload they can cause on an individual. The then acquired signals were passed
through various deep learning algorithms as training sets. The trained deep
learning models were then used for classification of workload on an individual by
just acquiring the EEG signals of that individual and pass them through those
models.
Keywords: Deep Learning, Artificial Neural Networks, Radial Basis Function,
Support Vector Machines (SVM), Stacked Autoencoders, Linear
Discriminant Analysis (LDA), EEG, EEG Feature Extraction
2. Introduction
In this research work we made use of five deep learning algorithms to train and
then compare the results of each of the algorithms to figure out which algorithm
best suited our results. The Emotiv EEG machine was used to gather the 14
channel data. Since, the Electroencephalographic data is found to contain a lot of
noise and other disturbing elements which if directly fed into the algorithms as
the training data can bring out aberrant results. Hence, the acquired EEG data
was then treated with various digital signal processing techniques to filter out the
noise and other elements and try to make the signal as pure as possible.
Various noise reduction filter were applied to eliminate the noise from the data
as far as possible. The filtered data was then passed through butterworth filter in
order to perform feature extraction of EEG signals. The Alpha, Beta, Gamma,
7. Delta and Theta Features were extracted from the EEG signals based on their
frequencies. These features of EEG were then used as the input training sets to
train the various deep learning algorithm. The five deep learning algorithms
used were - Artificial Neural Networks (ANNs), Support Vector Machines (SVM),
Radial Basis Function (RBF), Linear Discriminant Analysis (LDAs) and Stacked
Autoencoders. We will go through each and every algorithm below in detail.
Once the models were trained and the classification was performed, the next step
in the study was to discern any correlation between various features of the EEG
signals in case of all the three load cases. We also calculated the significant
difference between various features in case of each workload condition using
various statistical methods.
2.1 Artificial Neural Networks
The first deep learning model that we made use of was the Artificial Neural
Network. We developed a deep neural network consisting of 1 hidden layer with
8 hidden neurons. The input to the network were the 14 channel EEG signals and
thus the input layer consisted of 14 neurons. The output that we wanted was a
classifier which could classify, on the basis of EEG signals, the workload in 3
categories and hence the output layer consisted of 3 neurons.
The below figure shows how the artificial neural network appeared visually.
Fig 1 - Deep Neural Network (14, 8, 3)
8. The 14 input neurons represent the 14 EEG channels - AF3, F7, F3, FC5, T7, P7, O1,
O2, P8, T8, FC6, F4, F8 and AF4. The 3 output neurons represent the BL (Base
Line) i.e no workload, LWL (Low Workload) and HWL (High Workload). We
made use of R programming to perform the entire research work and the above
shown neural network was also coded in R. We made use of Resilient
Backpropagation technique (+Rprop) to train the deep neural net.
In order to train the deep neural network, we first needed to normalize the entire
input data set. We made use of normalize function available in the RSNNS
package on the CRAN server for R programming. The testing data was also
normalized before being fed into the network for testing. The obtained
classification output thus was in a normalized form and we had to denormalize
the output using the denormalization function available in the same package
mentioned above. The denormalized values thus obtained were the actual values
which represented whether the workload is Base, Low or High.
The data that we had was of 10 students which we further divided in a ratio of
8:2 which would then be used for training : testing. We trained the neural net
with the EEG data of 8 students and then tested the deep net with the data of 2
students.
The input / training data which we fed into the neural net was as shown below.
9.
The output set of the 14 channel EEG signals was transformed into a binary
matrix format where the 3 columns are in a format (1,0,0) which signify that for
each pair of signal it can only be any one of the 3 cases. Hence, when the output
of the neural net was denormalized using the denormalization function, the
output of 3 neurons where in the same format (0,1,0) which was satisfied by the
input data set.
2.2 Support Vector Machines
Support Vector Machines are based on the concept of decision planes that define
decision boundaries. A decision plane is one that separates between a set of
objects having different class memberships. Classification tasks based on
drawing separating lines to distinguish between objects of different class
memberships are known as hyperplane classifiers. Support Vector Machines are
particularly suited to handle such tasks.
The illustration below shows the basic idea behind Support Vector Machines.
Here we see the original objects (left side of the schematic) mapped, i.e.,
rearranged, using a set of mathematical functions, known as kernels. The process
of rearranging the objects is known as mapping (transformation). Note that in
this new setting, the mapped objects (right side of the schematic) is linearly
separable and, thus, instead of constructing the complex curve (left schematic),
all we have to do is to find an optimal line that can separate the GREEN and the
RED objects.
In our case also, we made use of SVM as one of the classification models to
classify the workloads. We made use of the Kernel function in the SVM for the
classification. In R programming, the SVM was used where the kernel type was
“Radial”. The output of the SVM was pretty much accurate like that of the ANN.
10. The same dataset was used to train the SVM which was used to train the Artificial
Neural Network.
2.3 Stacked Autoencoders (SDAs)
A stacked autoencoder is a neural network consisting of multiple layers of sparse
autoencoders in which the outputs of each layer is wired to the inputs of the
successive layer. Formally, consider a stacked autoencoder with n layers. Using
notation from the autoencoder section, let W(k,1)
,W(k,2)
,b(k,1)
,b(k,2)
denote the
parameters W(1)
,W(2)
,b(1)
,b(2)
for kth autoencoder. Then the encoding step for the
stacked autoencoder is given by running the encoding step of each layer in
forward order:
The decoding step is given by running the decoding stack of each autoencoder in
reverse order:
The information of interest is contained within a(n)
, which is the activation of the
deepest layer of hidden units. This vector gives us a representation of the input in
terms of higher-order features.
A good way to obtain good parameters for a stacked autoencoder is to use greedy
layer-wise training. To do this, first train the first layer on raw input to obtain
parameters W(1,1)
,W(1,2)
,b(1,1)
,b(1,2)
. Use the first layer to transform the raw input into
a vector consisting of activation of the hidden units, A. Train the second layer on
this vector to obtain parameters W(2,1)
,W(2,2)
,b(2,1)
,b(2,2)
. Repeat for subsequent
layers, using the output of each layer as input for the subsequent layer.
This method trains the parameters of each layer individually while freezing
parameters for the remainder of the model. To produce better results, after this
phase of training is complete, fine-tuning using backpropagation can be used to
improve the results by tuning the parameters of all layers are changed at the
same time.
A stacked autoencoder enjoys all the benefits of any deep network of greater
expressive power.
Further, it often captures a useful "hierarchical grouping" or "part-whole
decomposition" of the input. To see this, recall that an autoencoder tends to learn
features that form a good representation of its input. The first layer of a stacked
autoencoder tends to learn first-order features in the raw input (such as edges in
11. an image). The second layer of a stacked autoencoder tends to learn second-order
features corresponding to patterns in the appearance of first-order features (e.g.,
in terms of what edges tend to occur together--for example, to form contour or
corner detectors). Higher layers of the stacked autoencoder tend to learn even
higher-order features.
The training and testing process of a Stacked Autoencoder was pretty much the
same as that of the ANN. Initially, the training dataset was normalized and then
fed into the neural net. The output thus obtained was in a normalized form and
was necessary to de normalize the output to get it into a conducive form. The
output however of a SDA was not that accurate when compared to that of ANN
and SVM.
2.4 Radial Basis Function (RBF)
In the field of mathematical modeling, a radial basis function network is an
artificial neural network that uses radial basis functions as activation functions.
The output of the network is a linear combination of radial basis functions of the
inputs and neuron parameters.
Radial basis function (RBF) networks typically have three layers: an input layer, a
hidden layer with a non-linear RBF activation function and a linear output layer.
The input can be modeled as a vector of real numbers . The output of the
network is then a scalar function of the input vector, , and is given
by
RBF networks are typically trained by a two-step algorithm. In the first step, the
center vectors of the RBF functions in the hidden layer are chosen. This step
can be performed in several ways; centers can be randomly sampled from some
set of examples, or they can be determined using k-means clustering. Note that
this step is unsupervised. A third backpropagation step can be performed to
fine-tune all of the RBF net's parameters.[3]
The second step simply fits a linear model with coefficients to the hidden
layer's outputs with respect to some objective function. A common objective
function, at least for regression/function estimation, is the least squares function:
12.
where
.
We have explicitly included the dependence on the weights. Minimization of the
least squares objective function by optimal choice of weights optimizes accuracy
of fit.
There are occasions in which multiple objectives, such as smoothness as well as
accuracy, must be optimized. In that case it is useful to optimize a regularized
objective function such as
where
and
where optimization of S maximizes smoothness and is known as a
regularization parameter.
In our case, the weighted- SSE plot v/s Iterations shows a gradual reduction thus
indicating a positive sign, however some disturbances in between shows that the
model is still not an ideal one.
13. The above diagram shows the image of the SSE v/s iteration plot along with the
result being shown at the top.
2.5 Linear Discriminant Analysis (LDA)
Linear discriminant analysis (LDA) is a generalization of Fisher's linear
discriminant, a method used in statistics, pattern recognition and machine
learning to find a linear combination of features that characterizes or separates
two or more classes of objects or events. The resulting combination may be used
as a linear classifier, or, more commonly, for dimensionality reduction before
later classification.
LDA is closely related to analysis of variance (ANOVA) and regression analysis,
which also attempt to express one dependent variable as a linear combination of
other features or measurements. However, ANOVA uses categorical independent
variables and a continuous dependent variable, whereas discriminant analysis
has continuous independent variables and a categorical dependent variable (i.e.
the class label).[3]
Logistic regression and probit regression are more similar to
LDA than ANOVA is, as they also explain a categorical variable by the values of
continuous independent variables. These other methods are preferable in
applications where it is not reasonable to assume that the independent variables
are normally distributed, which is a fundamental assumption of the LDA method.
LDA is also closely related to principal component analysis (PCA) and factor
analysis in that they both look for linear combinations of variables which best
explain the data. LDA explicitly attempts to model the difference between the
classes of data. PCA on the other hand does not take into account any difference in
class, and factor analysis builds the feature combinations based on differences
rather than similarities. Discriminant analysis is also different from factor
analysis in that it is not an interdependence technique: a distinction between
independent variables and dependent variables (also called criterion variables)
must be made.
In the case where there are more than two classes, the analysis used in the
derivation of the Fisher discriminant can be extended to find a subspace which
appears to contain all of the class variability. This generalization is due to C. R.
Rao. Suppose that each of C classes has a mean and the same covariance .
Then the scatter between class variability may be defined by the sample
covariance of the class means
14.
where is the mean of the class means. The class separation in a direction in
this case will be given by
This means that when is an eigenvector of the separation will be equal
to the corresponding eigenvalue.
If is diagonalizable, the variability between features will be contained in
the subspace spanned by the eigenvectors corresponding to the C − 1 largest
eigenvalues (since is of rank C − 1 at most). These eigenvectors are primarily
used in feature reduction, as in PCA. The eigenvectors corresponding to the
smaller eigenvalues will tend to be very sensitive to the exact choice of training
data, and it is often necessary to use regularisation as described in the next
section.
If classification is required, instead of dimension reduction, there are a number of
alternative techniques available. For instance, the classes may be partitioned, and
a standard Fisher discriminant or LDA used to classify each partition. A common
example of this is "one against the rest" where the points from one class are put in
one group, and everything else in the other, and then LDA applied. This will result
in C classifiers, whose results are combined. Another common method is pairwise
classification, where a new classifier is created for each pair of classes (giving C(C
− 1)/2 classifiers in total), with the individual classifiers combined to produce a
final classification.
The LDA plot for the given training dataset came out to be as below.
15.
3. Correlation and Significance Analysis
In this section we perform a statistical analysis over the features of the EEG
signals to check whether there exists any significant relationship or correlation
between the these components of alpha, beta, gamma and delta of EEG signals.
We performed this analysis for each of the workload (Base Line, Low and High)
and with a pair of each possible combination to check the relativity.
Firstly, we performed the one-way ANOVA (Analysis of Variance) test to calculate
any significance difference between the values for each class.
The first table shows the significant difference between the alpha values and the
beta values for the Base Line class. The P-value for this is greater than 0.05 and
hence we can say that there is a significant difference between the values of
16. alpha and beta for the Base Line. Similarly, we can calculate the same for each
and every class as done above.
The next step is performing the correlation analysis. We made use of the
Pearson’s Correlation technique and compared the Pearson’s co-efficient to check
the positive or negative correlation between these components for all the 3
classes.
The first table shows the correlation between the all the possible combinations
of components of EEG for the Base Line class. Thus, we can make significant
conclusions from the above tables.
4. Conclusion
Thus, we made use of 14 channels of EEG to calculate the workload on any
individual using 5 deep learning techniques and at the end made use of various
statistical methods to draw inferences from the obtained results. Below is shown
a visualization of the channels location on the head surface where we can locate
the 14 channels that we had used to train our models.
The models that we made use of showed some variances in their results and thus
all of them cannot be termed as the best models for the workload classification.
The Artificial Neural Networks and the Support Vector Machines were among the
best working algorithms for the classification and can be more trusted over the
others.
17.
Fig: EEG channel visualization
In our case we had made use of 14 channel EEG device named Emotiv. The
machine developed however can be used for all the pair of channels - 14, 128,
256. The UI developed in R shiny is so designed that a drop-down menu can be
used to select which kind of data the user is trying to train the machine with.
Following is a screenshot of the complete application developed in R shiny -
18. 5. References
1. NEURAL NETWORK CLASSIFICATION OF EEG SIGNALS BY USING AR
WITH MLE PREPROCESSING FOR EPILEPTIC SEIZURE DETECTION
Abdulhamit Subasia , M. Kemal Kiymika*, Ahmet Alkana , Etem
Koklukayab a Department of Electrical and Electronics Engineering,
Kahramanmaraş Sütçü İmam University, 46100 Kahramanmaraş, Turkey.
b Department of Electrical and Electronics Engineering, Sakarya
University 54187 Sakarya, Turkey.
2. CLASSIFYING MENTAL ACTIVITIES FROM EEG-P300 SIGNALS USING
ADAPTIVE NEURAL NETWORKS, Arjon Turnip and Keum-Shik Hong.
3. Epileptic EEG detection using neural networks and post-classification L.M.
Patnaik a,∗, Ohil K. Manyam
4. Multi-class SVM for EEG Signal Classification Using Wavelet Based
Approximate Entropy, A. S. Muthanantha Murugavel, S. Ramakrishnan
5. Support Vector Machine Technique for EEG Signals P Bhuvaneswari
Research Scholar Bharathiar University Coimbatore, J Satheesh Kumar
Assistant Professor, Bharathiar University Coimbatore.