Presentation HC-4016, Heterogeneous Implementation of Neural Network Algorithms, by Dmitri Yudanov and Leon Reznik at the AMD Developer Summit (APU13) November 11-13, 2013.
4. NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
OUTLINE
! From
Biological
to
Ar?ficial
Neural
Networks
(ANN)
! ANN
Applica?ons
‒ Applica?on
categories
‒ Examples
! Why
ANN?
! Why
Spiking
Neural
Network
(SNN)?
4
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
5. FROM
BIOLOGICAL
TO
ARTIFICIAL
NEURAL
NETWORK
(ANN)
NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
! ANN
is
simplifica?on
of
biological
neural
network
! ANN
consists
of
simple
elements
(neurons)
analogous
to
the
biological
neurons
in
the
brain.
! The
neurons
are
connected
by
weighted
links
and
form
a
network.
! The
links
pass
signals
(numbers)
from
one
neuron
to
another.
Neurons
operate
on
the
weighted
signals
and
retransmit
the
results
! The
network
can
learn
by
adjus?ng
the
weights
(the
behavior
is
encoded
in
weights).
5
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
6. ANN
APPLICATION
CATEGORIES
NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
!
Based
on
patent
and
applica?on
search
(US
Patent
and
Trademark
Office,
EU
Patent
Office,
Google
Patent
Search.
Conducted
in
2012
by
students
of
Machine
Learning
class
(Dr.
Leon
Reznik,
RIT)
6
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
7. WHY
ANN?
EXAMPLES
NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
! Recogni@on
‒ Character
(e.g.
mail),
speech,
image
(e.g.
image
clustering),
odor
(e.g.
locust
antennal
lobe),
face
and
emo?on
! Gaming
‒ AI
features
in
games
! Robo@cs
‒ Vision,
spa?al
naviga?on
and
planning
(e.g.
mental
maps
with
place
cells),
posi?oning,
decision
making
! Control
‒ Missile
guidance
‒ An?-‐lock
brakes
(Ford)
‒ Self-‐driving
cars,
UAVs
! Crime
preven@on
and
security
‒ Bomb
sniffer
(JFK
airport)
‒ Credit
card
fraud
detec?on
(Visa)
7
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
! Biomedical
‒ Neuroscience:
Brain
modeling
and
simula?on
‒ US
BRAIN
Ini?a?ve
(expected
300
EB/day)
‒ EU
Human
brain
project
‒ Neurology:
(e.g.
disease
modeling
and
forecas?ng,
ModelDB)
‒ Cardiology:
(e.g.
adap?ve
biventricular
pacemaker)
‒ Prosthesis:
BCI
neuromosphic
chips
! Financial
analysis
‒ Mortgage
risk
evalua?on
(AVCO,
Irvine)
‒ Currency
trading
(Ci?bank)
!
Difficul@es
‒ Need
to
compute
fast
but
problem
size
is
large
‒ How
to
get
the
right
ANN
circuit
for
an
applica?on?
8. WHY
ANN?
NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
! Novel
algorithms.
‒ Conven?onal
algorithms
performance
is
not
sa?sfactory
in
numerous
problems
with
dynamic
changes
(e.g.
face
recogni?on
may
fail
if
the
view
angle
is
different
or
the
person
is
smiling).
! Learning,
adaptability.
‒ Con?nuously
learn
from
the
available
data
and
adapt
to
new
condi?ons.
! Reliability.
‒ Performance
tends
to
degrade
gracefully
under
par?al
damage.
Parts
of
networks
can
learn
to
perform
func?on
of
damaged
parts.
In
contrast,
most
programs
and
engineered
systems
are
brijle:
if
you
remove
some
arbitrary
parts,
very
likely
the
whole
system
ceases
to
func?on.
8
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
! Low
power.
Neuromorphic
engineering
‒ Switching
speed
of
biological
neurons
is
less
than
1KHz
(CPU
3GHz)
‒ Switching
energy
of
biological
neurons
~
1.0E-‐17
Joules/op
(CPU
1.0E-‐5
joules/op)
‒ Conduc?on
speed
of
biological
neural
network
~
100
m/s
! Parallel.
‒ Brain
performs
massively
parallel
computa?ons
very
efficiently.
Data
and
processing
have
global
impact.
For
example,
complex
visual
percep?on
occurs
within
less
than
100
ms,
that
is,
10
processing
steps.
! AI.
Consciousness.
Intelligence.
Self-‐
awareness.
9. WHY
SNN?
NEURAL
NETWORK
CATEGORIES
NEURAL
NETWORKS:
ORIGIN,
FEATURES,
APPLICATIONS
Learning Ability
Biological
ASNN
iSNN
x ty
SOM
ple
Neural
Gas
m
Co
LVQ
Recurrent
RBF
MLP
Hopfield
ADALINE
Rosenblaj
Time Dynamics
! Which
level
of
abstrac?on
to
choose?
! Which
one
is
the
right
for
the
target
applica?on?
! Point-‐to-‐point
connected
spiking
neural
network
(SNN):
?me
(spikes),
polychroniza?on
(memory
capacity),
unsupervised
learning
(synap?c
plas?city)
9
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
12. HETEROGENEOUS
IMPLEMENTATION:
SIMULATORS
AND
ABSTRACTION
LEVEL
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Popula?on
model
‒ Nengo
! Point-‐neuron
network
models
! Compartmental
neuron
and
membrane
models
‒ NEST
‒ PCSIM
‒ Brian
12
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
‒ NEURON
‒ GENESIS
! Reac?on-‐diffusion
model
of
biochemical
signaling
pathways
‒ STEPS
13. SNN
MODELS:
TRADEOFFS
SNN
SIMULATION
PRINCIPLES
HH
IZ
! Integrate-‐and-‐Fire
(IF):
simple,
but
has
poor
spiking
response
! Hodgkin-‐Huxley
(HH):
has
reach
response,
but
complex
IF
13
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
! Izhikevich
(IZ):
simple,
has
reach
response,
but
phenomenological
15. TIME-‐DRIVEN
(SYNCHRONOUS)
SIMULATION
SNN
SIMULATION
PRINCIPLES
! Events
aligned
to
?me
grid
‒ Can
update
all
neurons
at
the
same
?me
‒ Good
for
parallel
implementa?on
! Time
quan?za?on
error
‒ Delayed
or
missing
events
‒ Can
be
controlled
by
size
of
dt:
the
smaller
the
size
the
smaller
the
error,
but
the
more
computa?on
per
unit
?me
15
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
16. EVENT-‐DRIVEN
(ASYNCHRONOUS)
SIMULATION
SNN
SIMULATION
PRINCIPLES
! Events
are
unique
in
?me:
‒ A
single
event
can
change
the
state
of
the
whole
system
‒ Have
to
update
neurons
sequen?ally
in
the
order
of
events
‒ Minimum
transmission
latency
is
unknown
‒ Assumes
analy?cal
solu?on
for
the
model
equa?ons
‒ …
or
?med
event-‐driven
update
! Time
quan?za?on
error
‒ No
error
caused
by
simula?on
type
‒ Bejer
event
accuracy
‒ Good
for
STDP
16
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
17. TIMED
EVENT-‐DRIVEN
(HYBRID)
SIMULATION
SNN
SIMULATION
PRINCIPLES
! Events
are
unique
in
?me:
‒ A
single
event
can
change
the
state
of
the
whole
system,
but
not
within
the
minimum
transmission
delay
‒ Time
grid:
dt
is
equal
to
the
minimum
delay
‒ Update
all
neurons
at
the
same
?me
every
dt
increment
‒ Also
between
dt
increments
update
every
neuron
in
the
order
of
events
it
receives
within
the
increment.
‒ Good
for
parallel
implementa?on,
but
there
is
computa?on
divergence
across
neurons.
! Time
quan?za?on
error
‒ No
error
caused
by
simula?on
type
‒ Bejer
event
accuracy
‒ Good
for
STDP
17
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
18. NUMERICAL
INTEGRATION
METHODS
SNN
SIMULATION
PRINCIPLES
! Mo@va@on.
Need
to
solve
ini?al
value
problem
(IVP)
! Euler.
Compute
next
y
based
on
tangent
to
current
y.
! Modified
Euler.
Predict
with
Euler,
correct
with
average
slope.
! Runge-‐KuXa
(4th
Order).
Evaluate
and
average.
! Bulirsch–Stoer
‒ Uses
Modified
midpoint
method
with
evalua?on
and
error
tolerance
check
using
extrapola?on
with
ra?onal
func?ons.
Provides
adap?ve
order.
Generally
more
suited
for
smooth
func?ons.
! Parker-‐Sochacki
‒ Uses
expression
of
IVP
in
terms
of
power
series.
Provides
adap?ve
order.
18
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
20. NUMERICAL
INTEGRATION
METHODS:
PARCKER-‐SOCHACKI
SNN
SIMULATION
PRINCIPLES
! A
typical
IVP
! Assume
that
solu?on
func?on
can
be
represented
with
power
series.
! Therefore,
its
deriva?ve
based
on
Maclaurin
series
proper?es
is
! As
a
result:
20
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
21. NUMERICAL
INTEGRATION
METHODS:
PARCKER-‐SOCHACKI
SNN
SIMULATION
PRINCIPLES
! If
is
linear:
! Ship
it
to
eliminate
constant
term:
! As
a
result,
the
equa?on
becomes:
! Benefit:
adap?ve
order
and
error
tolerance
control
‒ Local
Lipschitz
constant
determines
the
number
of
itera?ons
for
achieving
certain
error
tolerance:
! With
finite
order
N:
! Parallelism:
‒ Loop-‐level
parallelism
‒ Parallel
reduc?on
21
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
22. SUMMARY
SNN
SIMULATION
PRINCIPLES
! Result
! Neuron/Synapse
Model
! Simula?on
Type
! Integra?on
Method
! Applica?on
! Requirements
22
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
24. OUTLINE
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Simula?on
Flow
‒ Synchronous
‒ Hybrid
‒ Combined
! Implementa?on
of
Hybrid
Simula?on
Type
‒ Simula?on
Flow
‒ Simula?on
Phases
‒ Update
‒ Expand
‒ Sort
‒ Results
! Heterogeneous
Implementa?on
of
Synchronous
Simula?on
Type
‒ NEST
Simulator
‒ Sopware
Architecture
24
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
25. SYNCHRONOUS
SIMULATION
FLOW
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Simula?on
step
(dt)
has
two
phases:
‒ Update:
‒ Compute
new
state
for
all
neurons.
‒ Detect
spiked
neurons
and
process
them
separately
to
update
spike
history
(divergence
reduc?on).
‒ Propaga?on:
‒ Expand
spikes
to
arriving
events.
25
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
26. HYBRID
SIMULATION
FLOW
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Simula?on
step
(dt)
has
two
phases:
‒ Update:
‒ Compute
new
state
for
all
neurons
at
the
?mes
of
arriving
spikes
(event-‐driven).
‒ Detect
spiked
neurons
and
process
them
separately
to
compute
spike
?me
and
update
spike
history
(divergence
reduc?on).
‒ Propaga?on:
‒ Expand
spikes
to
arriving
events.
‒ Sort
the
events
that
are
due
for
delivery
in
the
current
?me
step
by
arrival
?me
for
each
neuron.
‒ Create
a
pointer
array
that
maps
neurons
to
their
sorted
events.
26
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
27. COMBINED
SIMULATION
FLOW
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Exchange
spikes
between
compute
nodes
(MPI)
‒ Spike
is
(?me
stamp,
source
neuron
ID)
! Store
spikes
in
the
spike
ring
buffer
‒ How
many
ring
segments?
int(max
delay
/
min
delay)
‒ The
ring
‘rotates’
every
step
by
one
segment
! Expand
spikes
‒ Spike
segments
are
matched
with
relevant
delay
segments
(synap?c
connec?vity
matrix)
‒ Arrival
?me
is
computed
‒ Synap?c
events
due
filtered
! Sort
synap?c
events
by
arrival
?me
for
each
target
neuron
(event-‐driven
only)
! Update
neurons
! Update
synapses
! Gather
new
spikes
27
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
28. IMPLEMENTATION
OF
HYBRID
SIMULATION:
UPDATE
PHASE
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Wave-‐fronts
(WFs)
work
on
their
segments
of
neurons
represented
by
parameters
and
state
stored
in
global
memory
(GM)
! A
work-‐item
(WI)
takes
a
neuron
and
updates
its
state
at
every
arriving
event
! The
state
is
stored
back
to
GM
! Spike
data
is
accumulated
in
local
data
store
(LDS)
and
flushed
to
GM
periodically.
! Spiked
neurons
are
processed
in
a
separate
kernel
(divergence
reduc?on)
‒ Spike
?me
is
computed
with
Newton
Raphson
method
(NR)
‒ Spiked
neurons
are
updated
for
the
rest
of
arriving
events.
28
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
29. IMPLEMENTATION
OF
HYBRID
SIMULATION:
EXPAND
PHASE
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Load
source
spike
packets
from
GM
and
stored
them
in
con?guous
array
in
LDS.
! Load
synap?c
pointer
to
LDS.
‒ Each
neuron
is
connected
to
100s
or
even
1000s
of
other
neurons.
Synap?c
pointer
describes
where
to
get
synap?c
data
for
target
neurons
for
known
spike
source
neuron.
! Main
loop
‒ A
WF
picks
a
source
spike
(?me
stamp,
source
neuron
ID)
and
the
pointer
‒ A
WI
loads
synap?c
data
for
a
target
neuron,
computes
arrival
?me
and
stores
synap?c
event
in
the
ring
buffer
in
GM.
! Alone
the
way
the
sort
histogram
(required
in
radix
sort)
is
loaded
and
stored
in
LDS.
It
is
updated
reflec?ng
the
newly
created
synap?c
events.
29
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
30. IMPLEMENTATION
OF
HYBRID
SIMULATION:
SORT
PHASE
SNN:
HETEROGENEOUS
IMPLEMENTATION
Radix
sort
example:
1
bit
radix.
LSD
sort.
! We
need
to
order
synap?c
events
by
arrival
?me
and
by
target
ID
! Radix
sort:
select
next
radix
from
LSD
to
MSD
and
group
numbers
based
on
radix
value
from
smallest
to
largest
‒ Group
numbers
based
on
current
radix
and
compute
histogram
(count
of
numbers
with
the
same
radix
value)
‒ Scan
histogram:
compute
prefix
sum
(global
offset
for
the
next
grouping).
! 8
passes
for
32-‐bit
addressing
and
4-‐bit
radix.
30
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
31. IMPLEMENTATION
OF
HYBRID
SIMULATION:
PERFORMANCE
SNN:
HETEROGENEOUS
IMPLEMENTATION
Network
Size
(neurons)
Average
Synapses
per
Neuron
Average
Events
per
Step
Average
Spikes
per
Step
Total
Synapse
Count
(millions)
“Tahi@”
GPU
Time
2,100,000
90
230,000
2,522
190
13.5
131,000
1,458
370,000
257
191
5.7
16,000
11,677
300,000
25
191
3.2
! Size-‐connec?on
scalability
in
mul?-‐precision
networks
with
per-‐WF
precision
alloca?on
! 1000
itera?ons,
250
us
step
! Randomly-‐connected
SNN
with
only
AMPA
synapses
! Speedups
up
to
100
depending
on
configura?on
and
compared
devices
31
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
per
Step,
(ms)
32. HETEROGENEOUS
IMPLEMENTATION:
SIMULATOR
ARCHITECTURE
SNN:
HETEROGENEOUS
IMPLEMENTATION
!
!
!
!
Interface:
Python
–
SLI
–
Network
class
(C++)
Object-‐oriented:
Nodes
–
Connec?ons
–
Events
Network:
administrates
node
connec?ons
Scheduler:
orchestrates
simula?on
‒ Node
management:
update,
prepare,
finalize
‒ Execu?on
type
selec?on:
serial,
p-‐threads,
OpenMP
‒ Step
scheduling
‒ Event
transmission
via
Communicator
! Communicator
‒ Inter-‐process
communica?on
‒ MPI
! Features
‒ Primarily
used
as
a
vehicle
for
neuroscience
research
‒ Generic,
suitable
for
SNN
applica?ons
‒ Both
?me-‐
and
event-‐driven
simula?on
types
‒ Flexible
node
dynamics,
a
variety
of
built-‐in
models
‒ Communica?on
infrastructure
to
deliver
both
discrete
and
con?nuous
events
at
the
same
?me.
‒ Emphasis
on
correctness,
performance
and
scalability
32
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
33. HETEROGENEOUS
IMPLEMENTATION:
SOFTWARE
ARCHITECTURE
SNN:
HETEROGENEOUS
IMPLEMENTATION
! Simplified
UML
diagram
for
heterogeneous
part
of
implementa?on
! Neuron
model
templates
(single
and
double
precision)
with
OpenCL™
update
phase
! Object-‐oriented
design
with
shared
vector
members
(data
redundancy
reduc?on)
! STL-‐like
containers
with
OpenCL™
memory
/
buffer
types
underneath
! On-‐a-‐fly
CPU-‐GPU
execu?on
steering:
adaptability
! Data
structure
size
stability:
sta?s?cal
monitoring,
steering,
error
repor?ng
33
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
34. CONCLUSION
HETEROGENEOUS
IMPLEMENTATION
OF
NEURAL
NETWORK
ALGORITHMS
! Thank
You!
34
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
35. LITERATURE
HETEROGENEOUS
IMPLEMENTATION
OF
NEURAL
NETWORK
ALGORITHMS
!
R.
Breje,
et
al.,
"Simula?on
of
networks
of
spiking
neurons:
A
review
of
tools
and
strategies,"
Journal
of
Computa0onal
Neurscience,
vol.
23,
no.
3,
pp.
349-‐398,
2007.
!
B
Gaster,
D
R
Kaeli,
L
Howes,
and
P
Mistry,
Heterogeneous
Compu?ng
with
OpenCL
™
:
Morgan
Kaufmann
Pub,
2011.
!
T
Harada
and
L
Howes.
(2011,
Dec.)
“Introduc?on
to
GPU
Radix
Sort.”
Heterogeneous
Compute.
[Online].
!
E.
M.
Izhikevich,
"Simple
model
of
spiking
neurons,"
Neural
Networks,
IEEE
Transac?ons
on,
vol.
14,
pp.
1569-‐-‐1572,
2003.
!
R
Stewart
and
W
Bair,
"Spiking
neural
network
simula?on:
numerical
integra?on
with
the
Parker-‐Sochacki
method,"
Journal
of
Computa?onal
Neuroscience,
vol.
27,
no.
1,
pp.
115-‐133,
August
2009.
!
D
Yudanov,
L
Reznik,
"Scalable
mul?-‐precision
simula?on
of
spiking
neural
networks
on
GPU
with
OpenCL
™."
Neural
Networks
(IJCNN),
The
2012
Interna?onal
Joint
Conference
on.
IEEE,
2012.
35
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL
36. THANKS
HETEROGENEOUS
IMPLEMENTATION
OF
NEURAL
NETWORK
ALGORITHMS
!
Wayne
Burleson
!
Mayank
Daga
!
Markus
Diesmann
!
Joseph
Dinh
!
Tan
Ho
!
Aus?n
Hung
!
Jeremy
Johnson
!
John
Keaty
!
Bingley
Li
!
Gewal?g
Marc-‐Oliver
!
Saul
Mar?nez
!
Haibin
Niu
!
Kyle
Pour
!
Jason
Shantz
!
Jason
Tang
!
Yury
Zaytsev
36
|
Heterogeneous
implementa?on
of
Neural
network
algorithms
|
NOVEMBER
2013
|
CONFIDENTIAL