SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
CUDA Accelerated Face Recognition
Numaan. A
Third Year Undergraduate
Department of Electrical Engineering
Indian Institute of Technology, Madras, India
ee08b044@smail.iitm.ac.in
Sibi. A
NeST-NVIDIA Centre for GPU Computing
NeST, India
sibi.a@nestgroup.net
July 26, 2010
Abstract
This paper presents a study of the efficiency and
performance speedup achieved by applying Graph-
ics Processing Units for Face Recognition Solutions.
We explore one of the possibilities of parallelizing and
optimizing a well-known Face Recognition algorithm,
Principal Component Analysis (PCA) with Eigen-
faces.
1 Introduction
In recent years, the Graphics Processing Units (GPU)
has been the subject of extensive research and the
computation speed of GPUs has been rapidly increas-
ing. The computational power of the latest genera-
tion of GPUs, measured in Flops1
, is several times
that of a high end CPU and for this reason, they
are being increasingly used for non-graphics applica-
tions or general-purpose computing (GPGPU). Tra-
ditionally, this power of the GPUs could only be
harnessed through graphics APIs and was primarily
used only by professionals familiar with these APIs.
1floating-point operations per second
CUDA (Compute Unified Device Architecture) de-
veloped by NVIDIA, tries to address this issue by in-
troducing a familiar C like development environment
to GPGPU programming and allows programmers to
launch hundreds of concurrent threads to run on the
“massively” parallel NVIDIA GPUs, with very little
software overhead. This paper portrays our efforts to
use this power to tame a computationally intensive,
yet highly parallelizable PCA based algorithm used in
face recognition solutions. We developed both CPU
serial code and GPU parallel code to compare the ex-
ecution time in each case and measure the speed up
achieved by the GPU over the CPU.
2 PCA Theory
2.1 Introduction
Principal Component Analysis (PCA) is one of the
early and most successful techniques that have been
used for face recognition. PCA aims to reduce the
dimensionality of data so that it can be economically
represented and processed. Information contained in
a human face is highly redundant, with each pixel
highly correlated to its neighboring pixels and the
1
main idea of using PCA for face recognition is to
remove this redundancy, and extract the features re-
quired for comparison of faces. To increase accuracy
of the algorithm, we are using a slightly modified
method called Improved PCA, in which the images
used for training are grouped into different classes
and each class contains multiple images of a single
person with different facial expressions. The mathe-
matics behind PCA is described in the following sub-
sections.
2.2 Mathematics of PCA
Firstly, the 2-D facial images are resized using bilin-
ear interpolation to reduce the dimension of data and
increase the speed of computation. The resized im-
age is represented as a 1-D vector by concatenating
each row into one long vector . Let’s suppose we have
M training samples per class, each of size N (=total
pixels in the resized image). Let the training vectors
be represented as xi. pj’s represent pixel values.
xi = [p1 . . . pN ]T
, i = 1, . . . , M
The training vectors are then normalized by sub-
tracting the class mean image m from them.
m =
1
M
M
i=1
xi
Let wi be the normalized images.
wi = xi − m
The matrix W is composed by placing the normal-
ized image vectors wi side by side. The eigenvectors
and eigenvalues of the covariance matrix C is com-
puted.
C = WWT
The size of C is N × N which could be enormous.
For example, images of size 16 × 16 give a covari-
ance of size 256 × 256. It is not practical to solve
for eigenvectors of C directly. Hence the eigenvectors
of the surrogate matrix WT
W of size M × M are
computed and the first M −1 eigenvectors and eigen-
values of C are given by Wdi and µi, where di and
µi are eigenvectors and eigenvalues of the surrogate
matrix, respectively.
The eigenvectors corresponding to non-zero eigen-
values of the covariance matrix make an orthonormal
basis for the subspace within which most of the image
data can be represented with minimum error. The
eigenvector associated with the highest eigenvalue re-
flects the greatest variance in the image. These eigen-
vectors are known as eigenimages or eigenfaces and
when normalized look like faces. The eigenvectors of
all the classes are computed similarly and all these
eigenvectors are placed side by side to make up the
eigenspace S.
The mean image of the entire training set, m
is computed and each training vector xi is normal-
ized. The normalized training vectors wi are pro-
jected onto the eigenspace S, and the projected fea-
ture vectors yi of the training samples are obtained.
wi = xi − m
yi = ST
wi
The simplest method for determining which class
the test face falls under is to find the class k, that
minimizes the Euclidean distance. The test image
is projected onto the eigenspace and the Euclidean
distance between the projected test image and each
of the projected training samples are computed. If
the minimum Euclidean distance falls under a prede-
fined threshold θ, the face is classified as belonging to
the class to which contained the feature vector that
yielded the minimum Euclidean distance.
3 CPU Implementation
3.1 Database
For our implementation, we are using one of the most
common databases used for testing face recognition
solutions, called the ORL Face Database (formerly
known as the AT&T Database). The ORL Database
2
contains 400 grayscale images of resolution 112 ×
92 of 40 subjects with 10 images per subject. The
images are taken under various situations, such as
different time, different angles, different expressions
(happy, angry, surprise, etc.) and different face de-
tails (with/without spectacles, with/without beard,
different hair styles etc). To truly show the power
of a GPU, the image database has to be large. So
in our implementation we scaled the ORL Database
multiple times by copying and concatenating, to cre-
ate bigger databases. This allowed us to measure the
GPU performance for very large databases, as high
as 15000 images.
3.2 Training phase
The most time consuming operation in the training
phase is the extraction of feature vector from the
training samples by projecting each one of them to
the eigenspace. The computation of eigenfaces and
eigenspace is relatively less intensive since we have
used the surrogate matrix workaround to decrease the
size of the matrix and compute the eigenvectors with
ease. Hence we have decided to parallelize only pro-
jection step of the training process. The steps till the
projection of training samples are done in MATLAB
and the projection is written in C.
The MATLAB routine acquires the images from
files, resizes them to a standard 16 × 16 resolution
and computes the eigenfaces and eigenspace. The
data required for the projection step, namely, the
resized training samples, the database mean image
and the eigenvectors, are then dumped into a bi-
nary file with is later read by the C routine to com-
plete the training process. The C routine reads the
data dumped by the MATLAB routine and extracts
the feature vectors by normalizing the resized train-
ing samples and projecting each of them onto the
eigenspace. With this the training process is com-
plete and the feature vectors are dumped onto a bi-
nary file to be used in the testing process.
3.3 Testing phase
The entire testing process is written in C. OpenCV
is used to acquire the testing images from file, and
resize them to the standard 16 × 16 resolution using
bilinear interpolation. The resized image is normal-
ized with the database mean image and projected
onto the eigenspace computed in the training phase.
The euclidean distance between the test image fea-
ture vector and the training sample feature vectors
are computed and the index of the feature vector
yielding the minimum euclidean distance is found.
The face that yielded this feature vector is the most
probable match for the input test face.
4 GPU Implementation
4.1 Training phase
4.1.1 Introduction
As mentioned in Section 3.2, only the final feature ex-
traction process of the training phase is parallelized.
Before the training samples can be projected, all the
data required for the projection process is copied to
the device’s global memory and the time taken for
copying is noted. As all the data are of a read-only
nature, they are bound as texture to take advantage
of the cached texture memory.
4.1.2 Kernel
The projection process is highly parallelizable and
can be parallelized in two ways. The threads can be
launched to parallelize the computation of a partic-
ular feature vector, wherein, each thread computes a
single element of the feature vector. Or, the threads
can be launched to parallelize projection of multiple
training samples, wherein each thread projects and
computes the feature vector of a particular training
sample. Since the number of training samples is large,
the latter is adopted for the projection operation in
training phase. We have adopted the former in the
testing phase, where only one image has to be pro-
jected, details of which are explained in Section 4.2.
Before the projection kernel is called, the execution
configuration is set. The number of threads per block,
T1, is set to a standard of 256 and the total number of
blocks is B1, where B1 = (ceil) (N1/T1), N1 = total
number of training samples.
3
Each thread projects and computes the feature vec-
tor of a particular training sample by serially comput-
ing each element of the feature vector one by one.
Each element of the feature vector is obtained by
taking inner product of the training image vector
and the corresponding eigenvector in the eigenspace.
The training sample is normalized with the database
mean image element by element as it is fetched from
texture memory and the intermediate sum of the in-
ner product with eigenvector is stored in the shared
memory. After each element of the feature vector is
computed, the data is written back into the global
memory and the next element is computed. All the
data is aligned in a columnar fashion to avoid un-
coalesced memory accesses and shared memory bank
conflicts.
Figure 1: Threads in Projection Kernel (Training)
After the kernel has finished running on the device
the entire data is copied back to the host memory
and dumped as a binary file to be used in the testing
phase.
4.2 Testing Phase
4.2.1 Introduction
The testing process is completely run on the GPU
and is handled by three kernels. The first kernel
normalizes and projects it onto the eigenspace and
extracts the feature vector. The second kernel par-
allely computes the euclidean distance between the
feature vector of the test image and that of the train-
ing images. The final kernel, finds the minimum of
the euclidean distance and index of the training sam-
ple which yielded that minimum. The resized test
image, database mean image, eigenvectors and the
projected training samples are first copied to the de-
vice memory and the test image, mean image and
eigenvectors are bound as texture to take advantage
of the cached texture memory. Due to the relatively
larger size of the projected training samples data and
the limitation on maximum texture memory, the pro-
jected training samples are not bound as texture.
Figure 2: Recognition pipeline
4.2.2 Projection Kernel
As mentioned in section 4.1.2, the projection kernel
in testing process is parallelized to concurrently com-
pute each element of the feature vector. The number
of threads per block, T2, is set to a standard of 256
and the total number of blocks is B2, where B2 =
(ceil) (N2/T2), N2 = size of feature vector.
Each thread computes each element of the feature
vector, which is obtained by taking inner product of
the test image vector and the corresponding eigenvec-
tor in the eigenspace. The test image is normalized
with the database mean image, element by element
as it is fetched from texture memory and the inter-
mediate sum of the inner product with eigenvector is
4
stored in the shared memory.
Figure 3: Threads in Projection Kernel (Testing)
After the entire feature vector is computed, the
data is written back into global memory. The colum-
nar alignment of all the eigenvectors avoids unco-
alesced memory accesses and shared memory bank
conflicts.
4.2.3 Euclidean Distance Kernel
The kernel for computing the euclidean distance is
very similar to the projection kernel used in training
phase. Threads are launched to concurrently com-
pute the euclidean distance between the test image
feature vector and the training sample feature vec-
tors. The number of threads per block, T3, is set to
a standard of 256 and the total number of blocks is
B3, where B3 = (ceil) (N3/T3), N3 = total number
of training samples.
Each thread computes a particular euclidean dis-
tance serially. The difference of each element of the
vectors are computed, squared and summed. The in-
termediate sum is stored in shared memory. After all
the euclidean distances are computed, the data from
the shared memory is written to the global mem-
ory. The columnar alignment of training sample fea-
ture vectors avoids uncoalesced memory accesses and
shared memory bank conflicts.
Figure 4: Threads in Euclidean Distance Kernel
4.2.4 Minimum Kernel
The minimum kernel computes the minimum value
of euclidean distance and its distance. The vector
containing euclidean distances is divided into smaller
blocks and each thread serially finds the value and
index of the minimum in a particular block. The ker-
nel is called iteratively with fewer and fewer threads
till only one block is left. After execution of the ker-
nel the minimum value and its index is copid back
to host memory. The training sample at the index
computed by the kernel is the most probable match
for the test image.
5 Performance Statistics
To test the performance of CPU and GPU, 5 images
per subject of the ORL Database was selected as the
training set. This set of 200 images was then repli-
cated and concatenated to create databases of size
ranging from 1000 to 15000 images. The eigenvectors
corresponding to the 4 highest eigenvalues per class,
were selected for forming the eigenspace. This led
to feature vectors which grew in size as the database
grew. This replicated database was trained with CPU
and GPU and the execution time was noted and the
GPU speedup for the training process was calculated.
5
For testing, one image per subject from the ORL
Database were selected and the total time taken by
the CPU and GPU to test all 40 test images was
noted and was used to calculate the GPU speedup
for testing process.
To get accurate performance statistics, the training
and testing processes were run multiple times on dif-
ferent CPUs and GPUs. The following graphs were
plotted with the data obtained from the performance
tests. All the CPU times are based on single-core
performances.
Fig. 5 shows the time taken three different CPUs
to execute the projection process during training.
Figure 5: Training time for different CPUs
Fig. 6 shows the time taken by different NVIDIA
GPUs to execute the projection process during train-
ing. It includes the time taken for data transfers to
and from the device.
Figure 6: Training time for different GPUs
Fig. 7 shows the performance speedup of different
GPUs over Intel Core 2 Quad Q9550 CPU during
training databases of varying sizes.
Figure 7: Training Speedup
Fig. 8 shows total time taken by different CPUs
for testing 40 images.
6
Figure 8: Testing using GPU
Fig. 9 shows total time taken by GPUs to test 40
images. For this test, the trained database was copied
to device memory once and 40 images were tested
one by one. It includes time taken for transferring
test image to device and getting match index from
device.
Figure 9: Testing using GPU
Fig. 10 shows the performance speedup of differ-
ent GPUs over Intel Core 2 Quad Q9550 CPU when
testing 40 images.
Figure 10: Testing Speedup
Fig 11 shows the execution time of the recognition
pipeline on the GPU for varying database sizes.
Figure 11: Recognition pipeline on CPU
7
Fig. 12 shows the execution time of the recognition
pipeline on the GPU for varying database sizes. It is
the time taken to transfer test image to device, find
the match and transfer its index back to host.
Figure 12: Recognition Pipeline on GPU
Fig. 13 shows the performance speedup of differ-
ent GPUs over Intel Core 2 Quad Q9550 CPU when
executing the recognition pipeline.
Figure 13: Recognition Pipeline Speedup
6 Conclusion
The recognition rate of a PCA based face recognition
solution depends heavily on the exhaustiveness of the
training samples. Higher the number of training sam-
ples, higher the recognition rate. But as the num-
ber of training samples increases, CPUs get highly
strained and the training process will take several
minutes to complete (refer Fig. 5). But the same
process, when run on a GPU, will be completed in
a manner of seconds (refer Fig. 6). The highest
speedup achieved was 207x for training process, 330x
for the recognition pipeline and 165x for overall test-
ing process on the latest GeForce GTX 480 GPU, for
a database size of 15,000 images.
The execution time of the recognition pipeline on
the GPU is in the order of a few milli seconds even
for very large databases and and this allows the GPU
based testing to be integrated with real time video
and used for other applications involving large vol-
umes of test images. Our primary purpose in writ-
ing this paper is to make clear, the high perfor-
mance boosts that can be obtained by developing
GPU based face recognition solutions.
8
7 Future Works
Our future plans on this field include the paralleliza-
tion of other face recognition algorithms like LDA
(Linear Discriminant Analysis) and to replace the eu-
clidean distance based matching process with a neu-
ral network based one. We feel that algorithms with
a high degree of parallelism in them, like neural net-
works, will benefit immensely, if implemented on the
GPU. We are also working on integrating the GPU
recognition pipeline with real time video.
9

Weitere ähnliche Inhalte

Was ist angesagt?

Images in matlab
Images in matlabImages in matlab
Images in matlabAli Alvi
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System OperationsIRJET Journal
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformIOSR Journals
 
An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...riyaniaes
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNNaren Chandra Kattla
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelIJECEIAES
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkNaren Chandra Kattla
 
A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...Pablo Velarde A
 
Lecture 2 Introduction to digital image
Lecture 2 Introduction to digital imageLecture 2 Introduction to digital image
Lecture 2 Introduction to digital imageVARUN KUMAR
 
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...ijwmn
 
A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption cscpconf
 
Comparison of Different Methods for Fusion of Multimodal Medical Images
Comparison of Different Methods for Fusion of Multimodal Medical ImagesComparison of Different Methods for Fusion of Multimodal Medical Images
Comparison of Different Methods for Fusion of Multimodal Medical ImagesIRJET Journal
 
Lecture 1 Introduction to image processing
Lecture 1 Introduction to image processingLecture 1 Introduction to image processing
Lecture 1 Introduction to image processingVARUN KUMAR
 
Lecture 4 Relationship between pixels
Lecture 4 Relationship between pixelsLecture 4 Relationship between pixels
Lecture 4 Relationship between pixelsVARUN KUMAR
 
ROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGIJERA Editor
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET Journal
 
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...IJCSIS Research Publications
 

Was ist angesagt? (18)

Images in matlab
Images in matlabImages in matlab
Images in matlab
 
IRJET - Hand Gesture Recognition to Perform System Operations
IRJET -  	  Hand Gesture Recognition to Perform System OperationsIRJET -  	  Hand Gesture Recognition to Perform System Operations
IRJET - Hand Gesture Recognition to Perform System Operations
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
 
An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...
 
Electricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANNElectricity Demand Forecasting Using ANN
Electricity Demand Forecasting Using ANN
 
Black-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX modelBlack-box modeling of nonlinear system using evolutionary neural NARX model
Black-box modeling of nonlinear system using evolutionary neural NARX model
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
 
A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...A complete user adaptive antenna tutorial demonstration. a gui based approach...
A complete user adaptive antenna tutorial demonstration. a gui based approach...
 
A0270107
A0270107A0270107
A0270107
 
Lecture 2 Introduction to digital image
Lecture 2 Introduction to digital imageLecture 2 Introduction to digital image
Lecture 2 Introduction to digital image
 
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
BER Performance of Antenna Array-Based Receiver using Multi-user Detection in...
 
A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption A Novel Algorithm for Watermarking and Image Encryption
A Novel Algorithm for Watermarking and Image Encryption
 
Comparison of Different Methods for Fusion of Multimodal Medical Images
Comparison of Different Methods for Fusion of Multimodal Medical ImagesComparison of Different Methods for Fusion of Multimodal Medical Images
Comparison of Different Methods for Fusion of Multimodal Medical Images
 
Lecture 1 Introduction to image processing
Lecture 1 Introduction to image processingLecture 1 Introduction to image processing
Lecture 1 Introduction to image processing
 
Lecture 4 Relationship between pixels
Lecture 4 Relationship between pixelsLecture 4 Relationship between pixels
Lecture 4 Relationship between pixels
 
ROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEGROI Based Image Compression in Baseline JPEG
ROI Based Image Compression in Baseline JPEG
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog Images
 
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
GPU Parallel Computing of Support Vector Machines as applied to Intrusion Det...
 

Ähnlich wie Real-Time Face Tracking with GPU Acceleration

One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based patternijcsit
 
Face Recognition Using Neural Networks
Face Recognition Using Neural NetworksFace Recognition Using Neural Networks
Face Recognition Using Neural NetworksCSCJournals
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd Iaetsd
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Editor IJARCET
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Editor IJARCET
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdfKhondokerAbuNaim
 
Eigenfaces , Fisherfaces and Dimensionality_Reduction
Eigenfaces , Fisherfaces and Dimensionality_ReductionEigenfaces , Fisherfaces and Dimensionality_Reduction
Eigenfaces , Fisherfaces and Dimensionality_Reductionmostafayounes012
 
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...csandit
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCAIRJET Journal
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionIJTET Journal
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodIJERA Editor
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniquesIRJET Journal
 
Brain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineBrain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineIRJET Journal
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcscpconf
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructioncsandit
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback dannyijwest
 

Ähnlich wie Real-Time Face Tracking with GPU Acceleration (20)

One dimensional vector based pattern
One dimensional vector based patternOne dimensional vector based pattern
One dimensional vector based pattern
 
IPT.pdf
IPT.pdfIPT.pdf
IPT.pdf
 
Y34147151
Y34147151Y34147151
Y34147151
 
Face Recognition Using Neural Networks
Face Recognition Using Neural NetworksFace Recognition Using Neural Networks
Face Recognition Using Neural Networks
 
Iaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognitionIaetsd multi-view and multi band face recognition
Iaetsd multi-view and multi band face recognition
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113Volume 2-issue-6-2108-2113
Volume 2-issue-6-2108-2113
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdf
 
Eigenfaces , Fisherfaces and Dimensionality_Reduction
Eigenfaces , Fisherfaces and Dimensionality_ReductionEigenfaces , Fisherfaces and Dimensionality_Reduction
Eigenfaces , Fisherfaces and Dimensionality_Reduction
 
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
FACE RECOGNITION USING PRINCIPAL COMPONENT ANALYSIS WITH MEDIAN FOR NORMALIZA...
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
Quality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint CompressionQuality Prediction in Fingerprint Compression
Quality Prediction in Fingerprint Compression
 
Multimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution MethodMultimodal Biometrics Recognition by Dimensionality Diminution Method
Multimodal Biometrics Recognition by Dimensionality Diminution Method
 
Survey paper on image compression techniques
Survey paper on image compression techniquesSurvey paper on image compression techniques
Survey paper on image compression techniques
 
Brain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector MachineBrain Tumor Classification using Support Vector Machine
Brain Tumor Classification using Support Vector Machine
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONMEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION
 
Median based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstructionMedian based parallel steering kernel regression for image reconstruction
Median based parallel steering kernel regression for image reconstruction
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 

Mehr von QuEST Global (erstwhile NeST Software)

High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsQuEST Global (erstwhile NeST Software)
 
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion Approach
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion ApproachFocal Cortical Dysplasia Lesion Analysis with Complex Diffusion Approach
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion ApproachQuEST Global (erstwhile NeST Software)
 

Mehr von QuEST Global (erstwhile NeST Software) (20)

High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
HPC Platform options: Cell BE and GPU
HPC Platform options: Cell BE and GPUHPC Platform options: Cell BE and GPU
HPC Platform options: Cell BE and GPU
 
A Whitepaper on Hybrid Set-Top-Box
A Whitepaper on Hybrid Set-Top-BoxA Whitepaper on Hybrid Set-Top-Box
A Whitepaper on Hybrid Set-Top-Box
 
CUDA Accelerated Face Recognition
CUDA Accelerated Face RecognitionCUDA Accelerated Face Recognition
CUDA Accelerated Face Recognition
 
Test Optimization Using Adaptive Random Testing Techniques
Test Optimization Using Adaptive Random Testing TechniquesTest Optimization Using Adaptive Random Testing Techniques
Test Optimization Using Adaptive Random Testing Techniques
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
UpnP in Digital Home Networking
UpnP in Digital Home NetworkingUpnP in Digital Home Networking
UpnP in Digital Home Networking
 
MR Brain Volume Analysis Using BrainAssist
MR Brain Volume Analysis Using BrainAssistMR Brain Volume Analysis Using BrainAssist
MR Brain Volume Analysis Using BrainAssist
 
Image Denoising Using WEAD
Image Denoising Using WEADImage Denoising Using WEAD
Image Denoising Using WEAD
 
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion Approach
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion ApproachFocal Cortical Dysplasia Lesion Analysis with Complex Diffusion Approach
Focal Cortical Dysplasia Lesion Analysis with Complex Diffusion Approach
 
Speckle Reduction in Images with WEAD and WECD
Speckle Reduction in Images with WEAD and WECD  Speckle Reduction in Images with WEAD and WECD
Speckle Reduction in Images with WEAD and WECD
 
Software Defined Networking – Virtualization of Traffic Engineering
Software Defined Networking – Virtualization of Traffic EngineeringSoftware Defined Networking – Virtualization of Traffic Engineering
Software Defined Networking – Virtualization of Traffic Engineering
 
A New Generation Software Test Automation Framework – CIVIM
A New Generation Software Test Automation Framework – CIVIMA New Generation Software Test Automation Framework – CIVIM
A New Generation Software Test Automation Framework – CIVIM
 
BX-D – A Business Component & XML Driven Test Automation Framework
BX-D – A Business Component & XML Driven Test Automation FrameworkBX-D – A Business Component & XML Driven Test Automation Framework
BX-D – A Business Component & XML Driven Test Automation Framework
 
An Effective Design and Verification Methodology for Digital PLL
An Effective Design and Verification Methodology for Digital PLLAn Effective Design and Verification Methodology for Digital PLL
An Effective Design and Verification Methodology for Digital PLL
 
FaSaT An Interoperable Test Automation Solution
FaSaT An Interoperable Test Automation SolutionFaSaT An Interoperable Test Automation Solution
FaSaT An Interoperable Test Automation Solution
 
An Improved Hybrid Model for Molecular Image Denoising
An Improved Hybrid Model for Molecular Image DenoisingAn Improved Hybrid Model for Molecular Image Denoising
An Improved Hybrid Model for Molecular Image Denoising
 
A 1.2V 10-bit 165MSPS Video ADC
A 1.2V 10-bit 165MSPS Video ADCA 1.2V 10-bit 165MSPS Video ADC
A 1.2V 10-bit 165MSPS Video ADC
 
Advanced Driver Assistance System using FPGA
Advanced Driver Assistance System using FPGAAdvanced Driver Assistance System using FPGA
Advanced Driver Assistance System using FPGA
 
Real Time Video Processing in FPGA
Real Time Video Processing in FPGA Real Time Video Processing in FPGA
Real Time Video Processing in FPGA
 

Kürzlich hochgeladen

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Kürzlich hochgeladen (20)

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Real-Time Face Tracking with GPU Acceleration

  • 1. CUDA Accelerated Face Recognition Numaan. A Third Year Undergraduate Department of Electrical Engineering Indian Institute of Technology, Madras, India ee08b044@smail.iitm.ac.in Sibi. A NeST-NVIDIA Centre for GPU Computing NeST, India sibi.a@nestgroup.net July 26, 2010 Abstract This paper presents a study of the efficiency and performance speedup achieved by applying Graph- ics Processing Units for Face Recognition Solutions. We explore one of the possibilities of parallelizing and optimizing a well-known Face Recognition algorithm, Principal Component Analysis (PCA) with Eigen- faces. 1 Introduction In recent years, the Graphics Processing Units (GPU) has been the subject of extensive research and the computation speed of GPUs has been rapidly increas- ing. The computational power of the latest genera- tion of GPUs, measured in Flops1 , is several times that of a high end CPU and for this reason, they are being increasingly used for non-graphics applica- tions or general-purpose computing (GPGPU). Tra- ditionally, this power of the GPUs could only be harnessed through graphics APIs and was primarily used only by professionals familiar with these APIs. 1floating-point operations per second CUDA (Compute Unified Device Architecture) de- veloped by NVIDIA, tries to address this issue by in- troducing a familiar C like development environment to GPGPU programming and allows programmers to launch hundreds of concurrent threads to run on the “massively” parallel NVIDIA GPUs, with very little software overhead. This paper portrays our efforts to use this power to tame a computationally intensive, yet highly parallelizable PCA based algorithm used in face recognition solutions. We developed both CPU serial code and GPU parallel code to compare the ex- ecution time in each case and measure the speed up achieved by the GPU over the CPU. 2 PCA Theory 2.1 Introduction Principal Component Analysis (PCA) is one of the early and most successful techniques that have been used for face recognition. PCA aims to reduce the dimensionality of data so that it can be economically represented and processed. Information contained in a human face is highly redundant, with each pixel highly correlated to its neighboring pixels and the 1
  • 2. main idea of using PCA for face recognition is to remove this redundancy, and extract the features re- quired for comparison of faces. To increase accuracy of the algorithm, we are using a slightly modified method called Improved PCA, in which the images used for training are grouped into different classes and each class contains multiple images of a single person with different facial expressions. The mathe- matics behind PCA is described in the following sub- sections. 2.2 Mathematics of PCA Firstly, the 2-D facial images are resized using bilin- ear interpolation to reduce the dimension of data and increase the speed of computation. The resized im- age is represented as a 1-D vector by concatenating each row into one long vector . Let’s suppose we have M training samples per class, each of size N (=total pixels in the resized image). Let the training vectors be represented as xi. pj’s represent pixel values. xi = [p1 . . . pN ]T , i = 1, . . . , M The training vectors are then normalized by sub- tracting the class mean image m from them. m = 1 M M i=1 xi Let wi be the normalized images. wi = xi − m The matrix W is composed by placing the normal- ized image vectors wi side by side. The eigenvectors and eigenvalues of the covariance matrix C is com- puted. C = WWT The size of C is N × N which could be enormous. For example, images of size 16 × 16 give a covari- ance of size 256 × 256. It is not practical to solve for eigenvectors of C directly. Hence the eigenvectors of the surrogate matrix WT W of size M × M are computed and the first M −1 eigenvectors and eigen- values of C are given by Wdi and µi, where di and µi are eigenvectors and eigenvalues of the surrogate matrix, respectively. The eigenvectors corresponding to non-zero eigen- values of the covariance matrix make an orthonormal basis for the subspace within which most of the image data can be represented with minimum error. The eigenvector associated with the highest eigenvalue re- flects the greatest variance in the image. These eigen- vectors are known as eigenimages or eigenfaces and when normalized look like faces. The eigenvectors of all the classes are computed similarly and all these eigenvectors are placed side by side to make up the eigenspace S. The mean image of the entire training set, m is computed and each training vector xi is normal- ized. The normalized training vectors wi are pro- jected onto the eigenspace S, and the projected fea- ture vectors yi of the training samples are obtained. wi = xi − m yi = ST wi The simplest method for determining which class the test face falls under is to find the class k, that minimizes the Euclidean distance. The test image is projected onto the eigenspace and the Euclidean distance between the projected test image and each of the projected training samples are computed. If the minimum Euclidean distance falls under a prede- fined threshold θ, the face is classified as belonging to the class to which contained the feature vector that yielded the minimum Euclidean distance. 3 CPU Implementation 3.1 Database For our implementation, we are using one of the most common databases used for testing face recognition solutions, called the ORL Face Database (formerly known as the AT&T Database). The ORL Database 2
  • 3. contains 400 grayscale images of resolution 112 × 92 of 40 subjects with 10 images per subject. The images are taken under various situations, such as different time, different angles, different expressions (happy, angry, surprise, etc.) and different face de- tails (with/without spectacles, with/without beard, different hair styles etc). To truly show the power of a GPU, the image database has to be large. So in our implementation we scaled the ORL Database multiple times by copying and concatenating, to cre- ate bigger databases. This allowed us to measure the GPU performance for very large databases, as high as 15000 images. 3.2 Training phase The most time consuming operation in the training phase is the extraction of feature vector from the training samples by projecting each one of them to the eigenspace. The computation of eigenfaces and eigenspace is relatively less intensive since we have used the surrogate matrix workaround to decrease the size of the matrix and compute the eigenvectors with ease. Hence we have decided to parallelize only pro- jection step of the training process. The steps till the projection of training samples are done in MATLAB and the projection is written in C. The MATLAB routine acquires the images from files, resizes them to a standard 16 × 16 resolution and computes the eigenfaces and eigenspace. The data required for the projection step, namely, the resized training samples, the database mean image and the eigenvectors, are then dumped into a bi- nary file with is later read by the C routine to com- plete the training process. The C routine reads the data dumped by the MATLAB routine and extracts the feature vectors by normalizing the resized train- ing samples and projecting each of them onto the eigenspace. With this the training process is com- plete and the feature vectors are dumped onto a bi- nary file to be used in the testing process. 3.3 Testing phase The entire testing process is written in C. OpenCV is used to acquire the testing images from file, and resize them to the standard 16 × 16 resolution using bilinear interpolation. The resized image is normal- ized with the database mean image and projected onto the eigenspace computed in the training phase. The euclidean distance between the test image fea- ture vector and the training sample feature vectors are computed and the index of the feature vector yielding the minimum euclidean distance is found. The face that yielded this feature vector is the most probable match for the input test face. 4 GPU Implementation 4.1 Training phase 4.1.1 Introduction As mentioned in Section 3.2, only the final feature ex- traction process of the training phase is parallelized. Before the training samples can be projected, all the data required for the projection process is copied to the device’s global memory and the time taken for copying is noted. As all the data are of a read-only nature, they are bound as texture to take advantage of the cached texture memory. 4.1.2 Kernel The projection process is highly parallelizable and can be parallelized in two ways. The threads can be launched to parallelize the computation of a partic- ular feature vector, wherein, each thread computes a single element of the feature vector. Or, the threads can be launched to parallelize projection of multiple training samples, wherein each thread projects and computes the feature vector of a particular training sample. Since the number of training samples is large, the latter is adopted for the projection operation in training phase. We have adopted the former in the testing phase, where only one image has to be pro- jected, details of which are explained in Section 4.2. Before the projection kernel is called, the execution configuration is set. The number of threads per block, T1, is set to a standard of 256 and the total number of blocks is B1, where B1 = (ceil) (N1/T1), N1 = total number of training samples. 3
  • 4. Each thread projects and computes the feature vec- tor of a particular training sample by serially comput- ing each element of the feature vector one by one. Each element of the feature vector is obtained by taking inner product of the training image vector and the corresponding eigenvector in the eigenspace. The training sample is normalized with the database mean image element by element as it is fetched from texture memory and the intermediate sum of the in- ner product with eigenvector is stored in the shared memory. After each element of the feature vector is computed, the data is written back into the global memory and the next element is computed. All the data is aligned in a columnar fashion to avoid un- coalesced memory accesses and shared memory bank conflicts. Figure 1: Threads in Projection Kernel (Training) After the kernel has finished running on the device the entire data is copied back to the host memory and dumped as a binary file to be used in the testing phase. 4.2 Testing Phase 4.2.1 Introduction The testing process is completely run on the GPU and is handled by three kernels. The first kernel normalizes and projects it onto the eigenspace and extracts the feature vector. The second kernel par- allely computes the euclidean distance between the feature vector of the test image and that of the train- ing images. The final kernel, finds the minimum of the euclidean distance and index of the training sam- ple which yielded that minimum. The resized test image, database mean image, eigenvectors and the projected training samples are first copied to the de- vice memory and the test image, mean image and eigenvectors are bound as texture to take advantage of the cached texture memory. Due to the relatively larger size of the projected training samples data and the limitation on maximum texture memory, the pro- jected training samples are not bound as texture. Figure 2: Recognition pipeline 4.2.2 Projection Kernel As mentioned in section 4.1.2, the projection kernel in testing process is parallelized to concurrently com- pute each element of the feature vector. The number of threads per block, T2, is set to a standard of 256 and the total number of blocks is B2, where B2 = (ceil) (N2/T2), N2 = size of feature vector. Each thread computes each element of the feature vector, which is obtained by taking inner product of the test image vector and the corresponding eigenvec- tor in the eigenspace. The test image is normalized with the database mean image, element by element as it is fetched from texture memory and the inter- mediate sum of the inner product with eigenvector is 4
  • 5. stored in the shared memory. Figure 3: Threads in Projection Kernel (Testing) After the entire feature vector is computed, the data is written back into global memory. The colum- nar alignment of all the eigenvectors avoids unco- alesced memory accesses and shared memory bank conflicts. 4.2.3 Euclidean Distance Kernel The kernel for computing the euclidean distance is very similar to the projection kernel used in training phase. Threads are launched to concurrently com- pute the euclidean distance between the test image feature vector and the training sample feature vec- tors. The number of threads per block, T3, is set to a standard of 256 and the total number of blocks is B3, where B3 = (ceil) (N3/T3), N3 = total number of training samples. Each thread computes a particular euclidean dis- tance serially. The difference of each element of the vectors are computed, squared and summed. The in- termediate sum is stored in shared memory. After all the euclidean distances are computed, the data from the shared memory is written to the global mem- ory. The columnar alignment of training sample fea- ture vectors avoids uncoalesced memory accesses and shared memory bank conflicts. Figure 4: Threads in Euclidean Distance Kernel 4.2.4 Minimum Kernel The minimum kernel computes the minimum value of euclidean distance and its distance. The vector containing euclidean distances is divided into smaller blocks and each thread serially finds the value and index of the minimum in a particular block. The ker- nel is called iteratively with fewer and fewer threads till only one block is left. After execution of the ker- nel the minimum value and its index is copid back to host memory. The training sample at the index computed by the kernel is the most probable match for the test image. 5 Performance Statistics To test the performance of CPU and GPU, 5 images per subject of the ORL Database was selected as the training set. This set of 200 images was then repli- cated and concatenated to create databases of size ranging from 1000 to 15000 images. The eigenvectors corresponding to the 4 highest eigenvalues per class, were selected for forming the eigenspace. This led to feature vectors which grew in size as the database grew. This replicated database was trained with CPU and GPU and the execution time was noted and the GPU speedup for the training process was calculated. 5
  • 6. For testing, one image per subject from the ORL Database were selected and the total time taken by the CPU and GPU to test all 40 test images was noted and was used to calculate the GPU speedup for testing process. To get accurate performance statistics, the training and testing processes were run multiple times on dif- ferent CPUs and GPUs. The following graphs were plotted with the data obtained from the performance tests. All the CPU times are based on single-core performances. Fig. 5 shows the time taken three different CPUs to execute the projection process during training. Figure 5: Training time for different CPUs Fig. 6 shows the time taken by different NVIDIA GPUs to execute the projection process during train- ing. It includes the time taken for data transfers to and from the device. Figure 6: Training time for different GPUs Fig. 7 shows the performance speedup of different GPUs over Intel Core 2 Quad Q9550 CPU during training databases of varying sizes. Figure 7: Training Speedup Fig. 8 shows total time taken by different CPUs for testing 40 images. 6
  • 7. Figure 8: Testing using GPU Fig. 9 shows total time taken by GPUs to test 40 images. For this test, the trained database was copied to device memory once and 40 images were tested one by one. It includes time taken for transferring test image to device and getting match index from device. Figure 9: Testing using GPU Fig. 10 shows the performance speedup of differ- ent GPUs over Intel Core 2 Quad Q9550 CPU when testing 40 images. Figure 10: Testing Speedup Fig 11 shows the execution time of the recognition pipeline on the GPU for varying database sizes. Figure 11: Recognition pipeline on CPU 7
  • 8. Fig. 12 shows the execution time of the recognition pipeline on the GPU for varying database sizes. It is the time taken to transfer test image to device, find the match and transfer its index back to host. Figure 12: Recognition Pipeline on GPU Fig. 13 shows the performance speedup of differ- ent GPUs over Intel Core 2 Quad Q9550 CPU when executing the recognition pipeline. Figure 13: Recognition Pipeline Speedup 6 Conclusion The recognition rate of a PCA based face recognition solution depends heavily on the exhaustiveness of the training samples. Higher the number of training sam- ples, higher the recognition rate. But as the num- ber of training samples increases, CPUs get highly strained and the training process will take several minutes to complete (refer Fig. 5). But the same process, when run on a GPU, will be completed in a manner of seconds (refer Fig. 6). The highest speedup achieved was 207x for training process, 330x for the recognition pipeline and 165x for overall test- ing process on the latest GeForce GTX 480 GPU, for a database size of 15,000 images. The execution time of the recognition pipeline on the GPU is in the order of a few milli seconds even for very large databases and and this allows the GPU based testing to be integrated with real time video and used for other applications involving large vol- umes of test images. Our primary purpose in writ- ing this paper is to make clear, the high perfor- mance boosts that can be obtained by developing GPU based face recognition solutions. 8
  • 9. 7 Future Works Our future plans on this field include the paralleliza- tion of other face recognition algorithms like LDA (Linear Discriminant Analysis) and to replace the eu- clidean distance based matching process with a neu- ral network based one. We feel that algorithms with a high degree of parallelism in them, like neural net- works, will benefit immensely, if implemented on the GPU. We are also working on integrating the GPU recognition pipeline with real time video. 9