Speeding Up the Evaluation of Evolutionary Learning Systems Using GPGPUs

BioHEL GBML System
BioHEL using CUDA
Experiments and results
Conclusions and Further Work

Speeding up the Evaluation of Evolutionary
Learning Systems using GPGPUs

María A. Franco, Natalio Krasnogor and Jaume Bacardit

University of Nottingham, UK,
ASAP Research Group,
School of Computer Science
{mxf,nxk,jqb}@cs.nott.ac.uk

July 10, 2010

M. Franco, N. Krasnogor, J. Bacardit. Uni. Nottingham Speeding up Evolutionary Learning using GPGPUs 1 / 27

BioHEL GBML System
BioHEL using CUDA

Motivation

Nowadays the data collection rate easily exceeds the
processing/data-mining rate.
Real-life problems = big + complex.

There is a need to improve the efﬁciency of evolutionary
learning systems to cope with large scale
domains[Sastry, 2005][Bacardit and Llorà, 2009].

This work is focused on boosting the performance of the
BioHEL system by using the processing capacity inside
GPGPUs.


BioHEL GBML System
BioHEL using CUDA

Outline

1 BioHEL
BioHEL GBML System
Characteristics of BioHEL
2 BioHEL using CUDA
How does CUDA works?
Challenges of using CUDA
Implementation details
3 Experiments and results
Stage 1: Raw evaluation
Stage 2: Integration with BioHEL
4 Conclusions and Further Work


BioHEL GBML System
BioHEL using CUDA BioHEL GBML System
Experiments and results Characteristics of BioHEL

The BioHEL GBML System

BIOinformatics-oriented Hierarchical Evolutionary Learning
- BioHEL[Bacardit et al., 2009]
BioHEL was designed to handle large scale bioinformatics
datasets[Stout et al., 2008]
BioHEL is a GBML system that employs the Iterative Rule
Learning (IRL) paradigm
First used in EC in Venturini’s SIA system[Venturini, 1993]
Widely used for both Fuzzy and non-fuzzy evolutionary
learning


BioHEL GBML System


The fitness function based on the
Minimum-Description-Length (MDL)[Rissanen, 1978]
principle that tries to
Evolve accurate rules
Evolve high coverage rules
Evolve rules with low complexity, as general as possible

BioHEL applies a supervised learning paradigm. To
compute the fitness we need three metrics per classifier
computed from the training set:
1 # of instances that match the condition of the rule
2 # of instances that match the action of the rule
3 # of instances that match the condition and the action


BioHEL GBML System


The ILAS windowing scheme[Bacardit, 2004]
Efﬁciency enhancement method, the training set is
divided in non-overlapping strata and each iteration uses
a different strata for its ﬁtness calculations.


BioHEL GBML System

Evaluation process in BioHEL

The computationally heavy
stage of the evaluation
process is the match
process.
We perform stages 1 and 2
inside the GPGPU.


BioHEL GBML System
BioHEL using CUDA

Why using GPGPUs?

This GPGPUs acceleration have been used already in
GP[Langdon and Harrison, 2008], GAs[Maitre et al., 2009]
and LCS[Loiacono and Lanzi, 2009]
The use of GPGPUs in Machine Learning involves a
greater challenge because it deals with very large
volumes of data
However, this also means that it is potentially more
parallelizable.


BioHEL GBML System
BioHEL using CUDA

CUDA Architecture

NVIDIA Computed Uniﬁed Device Architecture (CUDA) is a
parallel computing architecture that exploits the capacity
within NVIDIA’s Graphic Processor Units.
CUDA runs thousands of threads at the same time ⇒
SPMD (Same Program Multiple Data) paradigm


BioHEL GBML System
BioHEL using CUDA

CUDA Architecture


BioHEL GBML System
BioHEL using CUDA

CUDA Memories

Different types of memory with different access speed.
The memory is limited
The memory copy operations involve a considerable
amount of execution time
Since we are aiming to work with large scale datasets a
good strategy to minimize the execution time is based on
the memory usage


BioHEL GBML System
BioHEL using CUDA

The ideal way would be...

Copy all the classiﬁers and instances and launch one
thread for each classiﬁer-instance comparison

BioHEL GBML System
BioHEL using CUDA


Problem 1: Memory copy operations
In each iteration the classiﬁers and the instances to
compare are different. They need to be copied again into
global memory in each iteration.

Problem 2: Memory bounds
It might be not possible to store all the classiﬁers and the
example instances to make all the comparisons at the
same time.


BioHEL GBML System
BioHEL using CUDA

Solution: Memory calculations

If all the instances ﬁt in memory...
We copy the instances only one time at the beginning of each
GA run and access different windows by using a memory offset.


BioHEL GBML System
BioHEL using CUDA

Solution: Memory calculations

If all the instances fit in memory...
We copy the instances only one time at the beginning of each
GA run and access different windows by using a memory offset.

If all the instances do not fit in memory...
We calculate the number of classifiers and instances that fit in
memory in order to minimise the memory copy operations.


BioHEL GBML System
BioHEL using CUDA


Problem 3: Output structure size
If we make N × M comparisons we will have to copy back to
host a structure of size O(N × M) which is very slow

Solution: Compute the total in device memory


BioHEL GBML System
BioHEL using CUDA


Problem 3: Output structure size
If we make N × M comparisons we will have to copy back to
host a structure of size O(N × M) which is very slow

Solution: Compute the total in device memory
Reduce the three values in device memory in other to
minimize the time spend in memory copy operations.


BioHEL GBML System
BioHEL using CUDA

Kernel functions


BioHEL GBML System
Set up
BioHEL using CUDA

Experiments Set up

Two stages of experiments
Evaluation process independently
Integration with the learning process
Different functions to manage discrete, continuous and
mixed problems
We check the integration with the ILAS windowing system

Cuda Experiments Serial Experiments
Pentium 4 of 3.6GHz, 2GB HPC facility and the UoN
RAM and a Tesla C1060 with each node with 2 quad-core
4GB of global memory and processors (Intel Xeon
30 multiprocessors E5472 3.0GHz)


BioHEL GBML System
Set up
BioHEL using CUDA

Speed in the evaluation process

Name |T| #Att #Disc #Cont #Cl T. Serial (s) T.CUDA (s) Speed Up

sat 5790 36 0 36 6 3.60± 0.21 1.92±0.01 1.9
wav 4539 40 0 40 3 2.57± 0.08 1.59±0.01 1.6
Cont.

pen 9892 16 0 16 10 4.94± 0.24 2.25±0.02 2.2
SS 75583 300 0 300 3 770.61± 119.49 14.69±0.23 52.4
CN 234638 180 0 180 2 1555.90± 452.79 42.35±0.55 36.7

adu 43960 14 8 6 2 147.86± 30.93 10.38±0.09 14.2
far 90868 29 24 5 8 420.78± 90.58 23.13±1.04 18.2
kdd 444619 41 15 26 23 1715.66± 632.40 95.89±1.42 17.9
Mixed

SA 493788 270 26 244 2 3776.36±1212.84 90.45±1.17 41.8
Par 235929 18 18 0 2 863.72± 163.13 60.04±0.58 14.4
c-4 60803 42 42 0 3 343.75± 71.93 17.86±0.18 19.2


BioHEL GBML System
Set up
BioHEL using CUDA

Speed Up vs. Training set size

Speed Up according to the training set size
60
adu - 14atts
pen - 16atts
Par - 18atts
far - 29atts
50 sat - 36atts
wav - 40atts
kdd - 41atts
c-4 - 42atts
CN - 180atts
40 SA - 270atts
SS - 300atts
Speed Up

30

20

10

0
100 1000 10000 100000 1e+06
Training set size


BioHEL GBML System
Set up
BioHEL using CUDA

Integration with the ILAS Windowing scheme

Total Speedup According to the Number of Windows
700
adu - 14atts
pen - 16atts
Par - 18atts
600 far - 29atts
sat - 36atts
wav - 40atts
kdd - 41atts
c-4 - 42atts
500 CN - 180atts
SA - 270atts
SS - 300atts
400
Speed Up

300

200

100

0
5 10 15 20 25 30 35 40 45 50
Number of Windows


BioHEL GBML System
Set up
BioHEL using CUDA

How do we get speed up?

The continuous problems get more speed up than the
mixed problems.
The problems with large number of attributes or large
training sets get more speed up.
There is a sweet-spot were the combination between the
CUDA ﬁtness function and the ILAS Windowing scheme
produces more speedup.


BioHEL GBML System
Set up
BioHEL using CUDA

Speed up of BioHEL using CUDA


sat 5790 36 0 36 6 0.03± 0.01 25.91± 2.45 3.7
wav 4539 40 0 40 3 75.47± 9.38 24.69± 0.81 3.1
Cont.

pen 9892 16 0 16 10 149.70± 19.93 40.04± 2.94 3.7
SS 75583 300 0 300 3 347979.80± 60982.74 5992.28±247.50 58.1
CN 234638 180 0 180 2 821464.70±167542.04 18644.31±943.98 44.1

adu 43960 14 8 6 2 5422.78± 1410.71 271.73± 26.03 20.0
far 90868 29 24 5 8 2471.28± 701.83 94.99± 41.53 26.0
kdd 444619 41 15 26 23 76442.32± 23533.21 2102.414±191.34 36.4
Mixed

SA 493788 270 26 244 2 1252976.80±203186.55 28759.71±552.00 38.3
Par 235929 18 18 0 2 524706.70± 98949.46 19559.79±671.70 26.8
c-4 60803 42 42 0 3 52917.95± 8059.55 2417.83±170.19 21.9


BioHEL GBML System
Set up
BioHEL using CUDA

Speed up of BioHEL using CUDA


sat 5790 36 0 36 6 0.03± 0.01 25.91± 2.45 3.7
wav 4539 40 0 40 3 75.47± 9.38 24.69± 0.81 3.1
Cont.

pen 9892 16 0 16 10 149.70± 19.93 40.04± 2.94 3.7
SS 75583 300 0 300 3 347979.80± 60982.74 5992.28±247.50 58.1
CN 234638 180 0 180 2 821464.70±167542.04 18644.31±943.98 44.1

adu 43960 14 8 6 2 5422.78± 1410.71 271.73± 26.03 20.0
far 90868 29 24 5 8 2471.28± 701.83 94.99± 41.53 26.0
kdd 444619 41 15 26 23 76442.32± 23533.21 2102.414±191.34 36.4
Mixed

SA 493788 270 26 244 2 1252976.80±203186.55 28759.71±552.00 38.3
Par 235929 18 18 0 2 524706.70± 98949.46 19559.79±671.70 26.8
c-4 60803 42 42 0 3 52917.95± 8059.55 2417.83±170.19 21.9

The experiments that took 2 weeks to ﬁnish now run in 8 hours!


BioHEL GBML System
BioHEL using CUDA Conclusions
Experiments and results Further Work

Conclusions

CUDA allows us to exploit the intrinsic parallelism within
the populations by checking a group of individuals at the
same time.
Now we can handle much larger problems, which is the
case of most of the real life problems.
This fusion between CUDA and genetic algorithms helps
pushing forward the boundaries of evolutionary learning by
overcoming technical boundaries


BioHEL GBML System

Further work

Extend our methodology to use more than one GPGPU at
the time.
Develop models to determine the limits of the usage of
CUDA. Disable the CUDA evaluation when the training set
is small.
Study the impact of the ILAS in the accuracy.
Adapt the CUDA methodology to other evolutionary
learning systems.


BioHEL GBML System

Bacardit, J. (2004).
Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and
run-time.
PhD thesis, Ramon Llull University, Barcelona, Spain.

Bacardit, J., Burke, E., and Krasnogor, N. (2009).
Improving the scalability of rule-based evolutionary learning.
Memetic Computing, 1(1):55–67.

Bacardit, J. and Llorà, X. (2009).
Large scale data mining using genetics-based machine learning.
In GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary
Computation Conference, pages 3381–3412, New York, NY, USA. ACM.

Langdon, W. B. and Harrison, A. P. (2008).
GP on SPMD parallel graphics hardware for mega bioinformatics data mining.
Soft Comput., 12(12):1169–1183.

Loiacono, D. and Lanzi, P. L. (2009).
Speeding up matching in XCS.
In 12th International Workshop on Learning Classiﬁer Systems.

Maitre, O., Baumes, L. A., Lachiche, N., Corma, A., and Collet, P. (2009).
Coarse grain parallelization of evolutionary algorithms on GPGPU cards with EASEA.
In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 1403–1410,
Montreal, Québec, Canada. ACM.

Rissanen, J. (1978).
Modeling by shortest data description.
Automatica, vol. 14:465–471.


BioHEL GBML System

Sastry, K. (2005).
Principled efﬁciency enhancement techniques.
Genetic and Evolutionary Computation Conference - GECCO 2005- Tutorial.

Stout, M., Bacardit, J., Hirst, J. D., and Krasnogor, N. (2008).
Prediction of recursive convex hull class assignments for protein residues.
Bioinformatics, 24(7):916–923.

Venturini, G. (1993).
SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.
In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on Machine
Learning, pages 280–296. Springer-Verlag.


BioHEL GBML System

Questions or comments?


BioHEL GBML System

Iterative Rule Learning

IRL has been used for many years in the ML community, with the
name of separate-and-conquer

Algorithm 4.1: I TERATIVE RULE L EARNING(Examples)

Theory ← ∅
whileExample = ∅
Rule ← FindBestRule(Examples)

Covered ← Cover (Rule, Examples)



if RuleStoppingCriterion(Rule, Theory , Examples)

do
 then exit

Examples ← Examples − Covered



Theory ← Theory ∪ Rule

return (Theory )


BioHEL GBML System

BioHEL ﬁtness function

Coverage term penalizes rules that do not cover a minimum
percentage of examples system


Speeding Up the Evaluation of Evolutionary Learning Systems Using GPGPUs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Speeding Up the Evaluation of Evolutionary Learning Systems Using GPGPUs

Ähnlich wie Speeding Up the Evaluation of Evolutionary Learning Systems Using GPGPUs (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Speeding Up the Evaluation of Evolutionary Learning Systems Using GPGPUs