SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Low Power High-Performance Computing on the
BeagleBoard Platform
E. Principi, V. Colagiacomo, S. Squartini, and F. Piazza
A3Lab, Department of Information Engineering
Universit`a Politecnica delle Marche
5th European DSP Education and Research Conference
13th and 14th September, 2012, Amsterdam, Netherlands
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Outline
1 Introduction
2 Purpose of this work
3 The BeagleCluster
Hardware Platform
Software Platform
4 Experiments
High-Performance Linpack
Matrix Multiplication
Speaker Diarization
Analysis of power consumption
5 Conclusions and Future Developments
2 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Introduction
High-performance computing clusters are employed in computation-
ally intensive tasks (e.g., weather prediction, astronomical mod-
elling).
Usually, they are evaluated only in terms of Floating Point Opera-
tions Per Second (FLOPS) (e.g., Top500 list).
The costs of energy and infrastructure exceed the costs of the
computational devices, and this gap is expected to grow by 2014
[Belady, 2007].
A new metric
FLOPS/Watt
3 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Introduction
High-performance computing clusters are employed in computation-
ally intensive tasks (e.g., weather prediction, astronomical mod-
elling).
Usually, they are evaluated only in terms of Floating Point Opera-
tions Per Second (FLOPS) (e.g., Top500 list).
The costs of energy and infrastructure exceed the costs of the
computational devices, and this gap is expected to grow by 2014
[Belady, 2007].
A new metric
FLOPS/Watt
3 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Introduction
High-performance computing clusters are employed in computation-
ally intensive tasks (e.g., weather prediction, astronomical mod-
elling).
Usually, they are evaluated only in terms of Floating Point Opera-
tions Per Second (FLOPS) (e.g., Top500 list).
The costs of energy and infrastructure exceed the costs of the
computational devices, and this gap is expected to grow by 2014
[Belady, 2007].
A new metric
FLOPS/Watt
3 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Introduction
High-performance computing clusters are employed in computation-
ally intensive tasks (e.g., weather prediction, astronomical mod-
elling).
Usually, they are evaluated only in terms of Floating Point Opera-
tions Per Second (FLOPS) (e.g., Top500 list).
The costs of energy and infrastructure exceed the costs of the
computational devices, and this gap is expected to grow by 2014
[Belady, 2007].
A new metric
FLOPS/Watt
3 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Tendency in the industry
• Use of processors traditionally employed in the mobile world.
• Canonical built a 42-core ARM cluster for compiling the
Ubuntu distribution.
• Calxeda developed the EnergyCore ECX-1000 series of
server-on-a-chip based on ARM Cortex-A9.
4 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Tendency in the industry
• Use of processors traditionally employed in the mobile world.
• Canonical built a 42-core ARM cluster for compiling the
Ubuntu distribution.
• Calxeda developed the EnergyCore ECX-1000 series of
server-on-a-chip based on ARM Cortex-A9.
4 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Tendency in the industry
• Use of processors traditionally employed in the mobile world.
• Canonical built a 42-core ARM cluster for compiling the
Ubuntu distribution.
• Calxeda developed the EnergyCore ECX-1000 series of
server-on-a-chip based on ARM Cortex-A9.
• Hewlett-Packard Redstone servers
• Four rack chassis = 2800
conventional servers
• Energy saving: 90%
• Space saving: 94%
• Currently employed in TryStack
free cloud service
(http://trystack.org)
4 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Purpose of this work
Develop
Develop an energy efficient cluster computer composed of off-the-
shelf inexpensive hardware and open software and propose it to the
scientific community.
Evaluate
Evaluate the cluster both through conventional benchmarks and a
real-time constrained speech processing application.
Measure
Measure the power consumption of the cluster, assess the energy
efficiency, and compare it with a laptop PC.
5 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Purpose of this work
Develop
Develop an energy efficient cluster computer composed of off-the-
shelf inexpensive hardware and open software and propose it to the
scientific community.
Evaluate
Evaluate the cluster both through conventional benchmarks and a
real-time constrained speech processing application.
Measure
Measure the power consumption of the cluster, assess the energy
efficiency, and compare it with a laptop PC.
5 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Purpose of this work
Develop
Develop an energy efficient cluster computer composed of off-the-
shelf inexpensive hardware and open software and propose it to the
scientific community.
Evaluate
Evaluate the cluster both through conventional benchmarks and a
real-time constrained speech processing application.
Measure
Measure the power consumption of the cluster, assess the energy
efficiency, and compare it with a laptop PC.
5 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Hardware Platform
Cluster description
The BeagleCluster is composed of five BeagleBoard-xM.
Beagleboard-xM
Processor TI DM3730
ARM subsystem Cortex-A8 @ 1 GHz
DSP subsystem C64x+ @ 800 MHz
Graphics accelerator PowerVR SGX @ 200 MHz
RAM 512 MB DDR @ 200 MHz
Network interface Ethernet 10/100
6 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Hardware Platform
Cluster description (cont.)
• Asymmetric topology: one head node, four worker nodes.
• Nodes are connected to a Hewlett-Packard ProCurve 1410-8G
switch through the BeagleBoard-xM 100 Mbit interface.
• Nodes are powered by a Lambda AC-DC power supply.
7 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Software Platform
Software Platform
• Operating system: ˚Angstr¨om GNU/Linux distribution (worker
nodes do not have a GUI).
• Tool-chain: CodeSourcery.
• Network File System: data and code are shared throughout
the cluster using Network File System.
• Cluster Command Control: a suite of tools for managing the
cluster (e.g., terminating processes, rebooting worker nodes,
pushing drive images).
• Message Passing Interface (Argonne National Laboratory
MPICH2): application programming interface that allows the
exchange of messages and data among processes running on
the nodes of a cluster.
8 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Software Platform
Software Platform (cont.)
• Ganglia: offers a web interface used to monitor the cluster
activity and to detect abnormal functioning.
9 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
High-Performance Linpack
High-Performance Linpack (HPL)
• HPL is the de-facto standard benchmark for floating point
performance measurement.
• It is employed in the Top500 and Green500 lists.
• HPL solves a dense system of linear equations using double
precision arithmetic.
• Parallelism is obtained by means of MPI.
• Computation is based on BLAS (Vesperix ATLAS-ARM).
10 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
High-Performance Linpack
High-Performance Linpack (HPL) (cont.)
MFLOPS
258.6
MFLOPS/W
13.26
Green500 500th position (June 2012)
Cray XT5 SixCore, Opteron Six Core 6C 2.6 GHz, XT4 Internal
Interconnect: 32.05 MFLOPS/W
Note
Arithmetic operations are performed in double precision in the
Vector Floating Point unit: NEON unit cannot be employed.
11 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
High-Performance Linpack
High-Performance Linpack (HPL) (cont.)
MFLOPS
258.6
MFLOPS/W
13.26
Green500 500th position (June 2012)
Cray XT5 SixCore, Opteron Six Core 6C 2.6 GHz, XT4 Internal
Interconnect: 32.05 MFLOPS/W
Note
Arithmetic operations are performed in double precision in the
Vector Floating Point unit: NEON unit cannot be employed.
11 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Matrix Multiplication
Matrix Multiplication
• This benchmark shows the performance improvement that can
be obtained using NEON optimized code.
• The benchmark multiplies an m × n matrix A with an n × p
matrix B.
• It operates dividing the rows of matrix A in groups, and
processing each group in a different node.
• Communication among nodes is based on MPI.
Platform Execution time
BeagleCluster 42.13 s
BeagleCluster w/ NEON 5.18 s
NEON optimized code significantly reduces the execution time ⇒
HPL performance can be improved by properly exploiting NEON
12 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Matrix Multiplication
Matrix Multiplication
• This benchmark shows the performance improvement that can
be obtained using NEON optimized code.
• The benchmark multiplies an m × n matrix A with an n × p
matrix B.
• It operates dividing the rows of matrix A in groups, and
processing each group in a different node.
• Communication among nodes is based on MPI.
Platform Execution time
BeagleCluster 42.13 s
BeagleCluster w/ NEON 5.18 s
NEON optimized code significantly reduces the execution time ⇒
HPL performance can be improved by properly exploiting NEON
12 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization
• A speaker diarization algorithm detects “who speaks now”.
• The algorithm here addressed is based on the real-time
implementation described in [Colagiacomo, et al. 2010].
• The calculation of the cross-correlations between the channel
i signal xi(t) and the channel j signal xj(t) is the most
computational demanding part:
Cij(t) = max
τ
{IFFT[FFT(xi(t)xj(t − τ)) • FFT(w(t))]} .
Here, t is the time index, τ is the correlation lag, w(t) is the
Hamming window and • denotes the element-wise product.
13 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization (cont.)
• Cluster-wide parallelism has been obtained assigning the
feature extraction stage of each channel to one of the worker
nodes.
• The server process in the head node dispatches audio frames
to the worker nodes through the MPI Bcast instruction and
performs the final classification.
• Performance have been evaluated in terms of Real-Time
Factor (RTF):
RTF = Total execution time
Speech segment duration
14 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization (cont.)
• Audio data: four lapel microphone signals of meeting
ES2009b contained in the AMI corpus.
• Comparison with an Asus F9SG laptop (Intel Core2 Duo
T8300 CPU running at 2.4 GHz and with 2 GB of RAM)
• Power consumption is measured switching the LCD monitor
off.
15 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization (cont.)
Single-board implementation results
• Real-time execution is achieved through the NEON instruction
set and reducing the number of cross-correlations: the
maximum of Cij(t) is searched incrementing τ by ∆τ > 1.
∆τ Laptop BeagleBoard-xM
1 2.47 12.73
16 0.25 1.02
32 0.18 0.63
64 0.14 0.44
128 0.12 0.36
The choice of ∆τ is critical both
for the laptop and the
BeagleBoard-xM.
16 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization (cont.)
Cluster-wide implementation results
∆τ Single-board Five nodes
1 12.73 4.71
16 1.02 1.69
32 0.63 1.63
64 0.44 1.56
128 0.36 1.55
• The MPI version is almost 3 times as fast as the single-board one when
∆τ = 1.
• As ∆τ increases, the MPI implementation performance decreases: the
communication overhead becomes the new bottleneck.
17 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Speaker Diarization (cont.)
Cluster-wide implementation
• This has been verified in a four nodes cluster.
• Nodes read audio data directly from the local file system.
• One of the worker nodes performs both the feature extraction
and the classification tasks.
∆τ Five nodes Four nodes (w/ local data)
1 4.71 3.35
16 1.69 0.33
32 1.63 0.23
64 1.56 0.18
128 1.55 0.16
Reducing the communication overhead real-time execution can be
achieved with ∆τ = 16.
18 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Speaker Diarization
Analysis of power consumption
BeagleCluster
20.32 W
Laptop
32.36 W
Energy ratio
Er =
RTFcluster · Pcluster
RTFlaptop · Plaptop
∼= 1.2
The communication overhead limits the energy efficiency of the Bea-
gleCluster.
Energy ratio of the four nodes cluster
Er
∼= 0.69
Reducing the communication overhead the BeagleCluster is more
efficient than the laptop PC.
19 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Conclusions
• A cluster computer based on the BeagleBoard-xM platform
has been described.
• The cluster is based on open software for executing parallel
tasks, management, and monitoring the nodes status.
• High Performance Linpack has been used to obtain the
number of floating point operations per second.
• The performance improvement that can be achieved using
NEON optimized code has been shown by means of a matrix
multiplication benchmark.
• Processing time and power consumption have been measured
by means of a cluster-wide speaker diarization algorithm to
evaluate the real-time capabilities and the energy efficiency of
the cluster.
20 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Conclusions (cont.)
• Results showed that using the 100 Mbit Ethernet interface,
the BeagleCluster consumes 1.2 times the energy spent by the
laptop PC.
• Removing the communication bottleneck, the BeagleCluster
achieves a superior energy efficiency.
• The cost of the 5 nodes cluster is 655 e. Compared to the
laptop PC, whose cost is 1100 e, the BeagleCluster is about
500 e cheaper.
21 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Future developments
• The software platform will be expanded with a resource
manager and a scheduler to enable the execution of batch
jobs.
• The energy efficiency will be assessed in a High-Availability
scenario, for example using the cluster for hosting websites.
• The use of more efficient hardware platforms (e.g.,
PandaBoards) and of the DM3730 DSP will be considered.
22 / 25
Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments
Thank you for your attention!
Emanuele Principi Vito Colagiacomo
e.principi@univpm.it s1037562@studenti.univpm.it
Stefano Squartini Francesco Piazza
s.squartini@univpm.it f.piazza@univpm.it
23 / 25
Manufacturer AMPROBE
Model LH41A
Measuring Range 0-40A, DC or AC peak
Resolution 1 mA in 4 A range
10 mA in 40 A range
Accuracy ±1.3% + 5 digits
Frequency Range DC in DC
40 Hz to 400 Hz in AC
24 / 25
High-Performance Linpack: details
Rmax 258.6 MFLOPS
Problem size 15000
Block size 16
Grid ratio 2 × 2
25 / 25
H. W. Meuer, “The TOP500 Project: Looking Back Over 15 Years of
Supercomputing Experience,” Informatik-Spektrum, vol. 31, no. 3, pp. 203–222,
2008. [Online]. Available: http://www.top500.org
C. L. Belady, “In the Data Center, Power and Cooling Cost More Than the IT
Equipment It Supports,” Electronics Cooling Magazine, vol. 13, no. 1, May 2007.
W.-c. Feng and K. Cameron, “The Green500 List: Encouraging Sustainable
Supercomputing,” IEEE Computer, vol. 40, no. 12, pp. 50–55, Dec. 2007.
[Online]. Available: http://www.green500.org
I. Ahmad and S. Ranka, Eds., Handbook of Energy-Aware and Green Computing,
1st ed., ser. Information Science. Boca Raton, US: CRC Press, Jan. 2012.
S. Andrade, J. Dourado, and C. Maciel, “Low-power cluster using OMAP3530,”
in Proc. of EDERC, Nice, France, Dec. 2010, pp. 220–224.
K. F¨urlinger, C. Klausecker, and D. Kranzlm¨uller, “Towards energy efficient
parallel computing on consumer electronic devices,” in Proc. of ICT-GLOW.
Berlin, Heidelberg: Springer-Verlag, 2011, pp. 1–9.
M. Brim, R. Flanery, A. Geist, B. Luethke, and S. L. Scott, “Cluster Command
and Control (C3) Tool Suite,” Parallel and Distributed Computing Practices,
vol. 4, no. 4, Dec. 2001.
25 / 25
Argonne National Laboratory, “MPICH2,”
http://www.mcs.anl.gov/research/projects/mpich2/.
M. L. Massie, B. N. Chun, and D. E. Culler, “The Ganglia distributed monitoring
system: design, implementation, and experience,” Parallel Computing, vol. 30,
no. 7, pp. 817–840, 2004.
M. Moattar and M. Homayounpour, “A review on speaker diarization systems
and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012.
E. Principi, R. Rotili, M. W¨ollmer, F. Eyben, S. Squartini, and B. Schuller,
“Real-Time Activity Detection in a Multi-Talker Reverberated Environment,”
Cognitive Computation, pp. 1–12, 2012.
V. Colagiacomo, E. Principi, S. Cifani, and S. Squartini, “Real-Time Speaker
Diarization on TI OMAP3530,” in Proc. of EDERC, Nice, France, Dec. 1st-2nd
2010.
InfiniBand Trade Association, “InfiniBand Architecture Specification Release
1.2.1,” Jan. 2008.
N. J. Boden, D. Cohen, R. E. Felderman, A. Kulawik, C. Seitz, J. N. Seizovic,
and W. Su, “Myrinet: A Gigabit-per-second Local Area Network,” IEEE Micro,
vol. 15, no. 1, pp. 29–36, Feb. 1995.
25 / 25

Weitere ähnliche Inhalte

Was ist angesagt?

IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...Kalman Graffi
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500Maho Nakata
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysisArun Joseph
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentAtakanAral
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료taeseon ryu
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer ChemistryPreferred Networks
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwarePooyan Jamshidi
 

Was ist angesagt? (20)

IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
QuantumChemistry500
QuantumChemistry500QuantumChemistry500
QuantumChemistry500
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud EnvironmentSubgraph Matching for Resource Allocation in the Federated Cloud Environment
Subgraph Matching for Resource Allocation in the Federated Cloud Environment
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
RL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content DeliveryRL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content Delivery
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Introduction to Chainer Chemistry
Introduction to Chainer ChemistryIntroduction to Chainer Chemistry
Introduction to Chainer Chemistry
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 

Andere mochten auch

Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizera3labdsp
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberatora3labdsp
 
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...a3labdsp
 
Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures a3labdsp
 
System Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic SplinesSystem Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic Splinesa3labdsp
 
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...a3labdsp
 

Andere mochten auch (8)

Optimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizerOptimized implementation of an innovative digital audio equalizer
Optimized implementation of an innovative digital audio equalizer
 
An Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial ReverberatorAn Advanced Implementation of a Digital Artificial Reverberator
An Advanced Implementation of a Digital Artificial Reverberator
 
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...
 
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
 
Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures Approximation of Real Impulse Response Using IIR Structures
Approximation of Real Impulse Response Using IIR Structures
 
System Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic SplinesSystem Identification Based on Hammerstein Models Using Cubic Splines
System Identification Based on Hammerstein Models Using Cubic Splines
 
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
 

Ähnlich wie Low Power High-Performance Computing on the BeagleBoard Platform

Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfSlide_N
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachGreenLabAtDI
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project OverviewFloris Sluiter
 
Implementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulationImplementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulationkarthik annam
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC
 
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...IRJET Journal
 
OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The SupercomputerAnkit Singh
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDLarry Smarr
 

Ähnlich wie Low Power High-Performance Computing on the BeagleBoard Platform (20)

Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdfParallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario Approach
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project Overview
 
Implementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulationImplementation of area optimized low power multiplication and accumulation
Implementation of area optimized low power multiplication and accumulation
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...
IRJET- Switch Level Implementation of A 4-Bit Logical Unit using Mixed Logic ...
 
OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020OpenACC Highlights: GTC Digital April 2020
OpenACC Highlights: GTC Digital April 2020
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 

Mehr von a3labdsp

Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...a3labdsp
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approacha3labdsp
 
Mixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response EqualizationMixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response Equalizationa3labdsp
 
Audio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound GenerationAudio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound Generationa3labdsp
 
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...a3labdsp
 
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...a3labdsp
 
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...An Efficient DSP Based Implementation of a Fast Convolution Approach with non...
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...a3labdsp
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulationa3labdsp
 

Mehr von a3labdsp (8)

Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
Evaluation of a Multipoint Equalization System based on Impulse Responses Pro...
 
Hybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical ApproachHybrid Reverberation Algorithm: a Practical Approach
Hybrid Reverberation Algorithm: a Practical Approach
 
Mixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response EqualizationMixed Time Frequency Approach for Multipoint Room Response Equalization
Mixed Time Frequency Approach for Multipoint Room Response Equalization
 
Audio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound GenerationAudio Morphing for Percussive Sound Generation
Audio Morphing for Percussive Sound Generation
 
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...
An Efficient DSP Implementation of a Dynamic Convolution Using Principal Comp...
 
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
Approximation of Dynamic Convolution Exploiting Principal Component Analysis:...
 
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...An Efficient DSP Based Implementation of a Fast Convolution Approach with non...
An Efficient DSP Based Implementation of a Fast Convolution Approach with non...
 
A Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response SimulationA Hybrid Approach for Real-time Room Acoustic Response Simulation
A Hybrid Approach for Real-time Room Acoustic Response Simulation
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Low Power High-Performance Computing on the BeagleBoard Platform

  • 1. Low Power High-Performance Computing on the BeagleBoard Platform E. Principi, V. Colagiacomo, S. Squartini, and F. Piazza A3Lab, Department of Information Engineering Universit`a Politecnica delle Marche 5th European DSP Education and Research Conference 13th and 14th September, 2012, Amsterdam, Netherlands
  • 2. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Outline 1 Introduction 2 Purpose of this work 3 The BeagleCluster Hardware Platform Software Platform 4 Experiments High-Performance Linpack Matrix Multiplication Speaker Diarization Analysis of power consumption 5 Conclusions and Future Developments 2 / 25
  • 3. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Introduction High-performance computing clusters are employed in computation- ally intensive tasks (e.g., weather prediction, astronomical mod- elling). Usually, they are evaluated only in terms of Floating Point Opera- tions Per Second (FLOPS) (e.g., Top500 list). The costs of energy and infrastructure exceed the costs of the computational devices, and this gap is expected to grow by 2014 [Belady, 2007]. A new metric FLOPS/Watt 3 / 25
  • 4. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Introduction High-performance computing clusters are employed in computation- ally intensive tasks (e.g., weather prediction, astronomical mod- elling). Usually, they are evaluated only in terms of Floating Point Opera- tions Per Second (FLOPS) (e.g., Top500 list). The costs of energy and infrastructure exceed the costs of the computational devices, and this gap is expected to grow by 2014 [Belady, 2007]. A new metric FLOPS/Watt 3 / 25
  • 5. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Introduction High-performance computing clusters are employed in computation- ally intensive tasks (e.g., weather prediction, astronomical mod- elling). Usually, they are evaluated only in terms of Floating Point Opera- tions Per Second (FLOPS) (e.g., Top500 list). The costs of energy and infrastructure exceed the costs of the computational devices, and this gap is expected to grow by 2014 [Belady, 2007]. A new metric FLOPS/Watt 3 / 25
  • 6. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Introduction High-performance computing clusters are employed in computation- ally intensive tasks (e.g., weather prediction, astronomical mod- elling). Usually, they are evaluated only in terms of Floating Point Opera- tions Per Second (FLOPS) (e.g., Top500 list). The costs of energy and infrastructure exceed the costs of the computational devices, and this gap is expected to grow by 2014 [Belady, 2007]. A new metric FLOPS/Watt 3 / 25
  • 7. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Tendency in the industry • Use of processors traditionally employed in the mobile world. • Canonical built a 42-core ARM cluster for compiling the Ubuntu distribution. • Calxeda developed the EnergyCore ECX-1000 series of server-on-a-chip based on ARM Cortex-A9. 4 / 25
  • 8. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Tendency in the industry • Use of processors traditionally employed in the mobile world. • Canonical built a 42-core ARM cluster for compiling the Ubuntu distribution. • Calxeda developed the EnergyCore ECX-1000 series of server-on-a-chip based on ARM Cortex-A9. 4 / 25
  • 9. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Tendency in the industry • Use of processors traditionally employed in the mobile world. • Canonical built a 42-core ARM cluster for compiling the Ubuntu distribution. • Calxeda developed the EnergyCore ECX-1000 series of server-on-a-chip based on ARM Cortex-A9. • Hewlett-Packard Redstone servers • Four rack chassis = 2800 conventional servers • Energy saving: 90% • Space saving: 94% • Currently employed in TryStack free cloud service (http://trystack.org) 4 / 25
  • 10. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Purpose of this work Develop Develop an energy efficient cluster computer composed of off-the- shelf inexpensive hardware and open software and propose it to the scientific community. Evaluate Evaluate the cluster both through conventional benchmarks and a real-time constrained speech processing application. Measure Measure the power consumption of the cluster, assess the energy efficiency, and compare it with a laptop PC. 5 / 25
  • 11. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Purpose of this work Develop Develop an energy efficient cluster computer composed of off-the- shelf inexpensive hardware and open software and propose it to the scientific community. Evaluate Evaluate the cluster both through conventional benchmarks and a real-time constrained speech processing application. Measure Measure the power consumption of the cluster, assess the energy efficiency, and compare it with a laptop PC. 5 / 25
  • 12. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Purpose of this work Develop Develop an energy efficient cluster computer composed of off-the- shelf inexpensive hardware and open software and propose it to the scientific community. Evaluate Evaluate the cluster both through conventional benchmarks and a real-time constrained speech processing application. Measure Measure the power consumption of the cluster, assess the energy efficiency, and compare it with a laptop PC. 5 / 25
  • 13. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Hardware Platform Cluster description The BeagleCluster is composed of five BeagleBoard-xM. Beagleboard-xM Processor TI DM3730 ARM subsystem Cortex-A8 @ 1 GHz DSP subsystem C64x+ @ 800 MHz Graphics accelerator PowerVR SGX @ 200 MHz RAM 512 MB DDR @ 200 MHz Network interface Ethernet 10/100 6 / 25
  • 14. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Hardware Platform Cluster description (cont.) • Asymmetric topology: one head node, four worker nodes. • Nodes are connected to a Hewlett-Packard ProCurve 1410-8G switch through the BeagleBoard-xM 100 Mbit interface. • Nodes are powered by a Lambda AC-DC power supply. 7 / 25
  • 15. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Software Platform Software Platform • Operating system: ˚Angstr¨om GNU/Linux distribution (worker nodes do not have a GUI). • Tool-chain: CodeSourcery. • Network File System: data and code are shared throughout the cluster using Network File System. • Cluster Command Control: a suite of tools for managing the cluster (e.g., terminating processes, rebooting worker nodes, pushing drive images). • Message Passing Interface (Argonne National Laboratory MPICH2): application programming interface that allows the exchange of messages and data among processes running on the nodes of a cluster. 8 / 25
  • 16. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Software Platform Software Platform (cont.) • Ganglia: offers a web interface used to monitor the cluster activity and to detect abnormal functioning. 9 / 25
  • 17. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments High-Performance Linpack High-Performance Linpack (HPL) • HPL is the de-facto standard benchmark for floating point performance measurement. • It is employed in the Top500 and Green500 lists. • HPL solves a dense system of linear equations using double precision arithmetic. • Parallelism is obtained by means of MPI. • Computation is based on BLAS (Vesperix ATLAS-ARM). 10 / 25
  • 18. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments High-Performance Linpack High-Performance Linpack (HPL) (cont.) MFLOPS 258.6 MFLOPS/W 13.26 Green500 500th position (June 2012) Cray XT5 SixCore, Opteron Six Core 6C 2.6 GHz, XT4 Internal Interconnect: 32.05 MFLOPS/W Note Arithmetic operations are performed in double precision in the Vector Floating Point unit: NEON unit cannot be employed. 11 / 25
  • 19. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments High-Performance Linpack High-Performance Linpack (HPL) (cont.) MFLOPS 258.6 MFLOPS/W 13.26 Green500 500th position (June 2012) Cray XT5 SixCore, Opteron Six Core 6C 2.6 GHz, XT4 Internal Interconnect: 32.05 MFLOPS/W Note Arithmetic operations are performed in double precision in the Vector Floating Point unit: NEON unit cannot be employed. 11 / 25
  • 20. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Matrix Multiplication Matrix Multiplication • This benchmark shows the performance improvement that can be obtained using NEON optimized code. • The benchmark multiplies an m × n matrix A with an n × p matrix B. • It operates dividing the rows of matrix A in groups, and processing each group in a different node. • Communication among nodes is based on MPI. Platform Execution time BeagleCluster 42.13 s BeagleCluster w/ NEON 5.18 s NEON optimized code significantly reduces the execution time ⇒ HPL performance can be improved by properly exploiting NEON 12 / 25
  • 21. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Matrix Multiplication Matrix Multiplication • This benchmark shows the performance improvement that can be obtained using NEON optimized code. • The benchmark multiplies an m × n matrix A with an n × p matrix B. • It operates dividing the rows of matrix A in groups, and processing each group in a different node. • Communication among nodes is based on MPI. Platform Execution time BeagleCluster 42.13 s BeagleCluster w/ NEON 5.18 s NEON optimized code significantly reduces the execution time ⇒ HPL performance can be improved by properly exploiting NEON 12 / 25
  • 22. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization • A speaker diarization algorithm detects “who speaks now”. • The algorithm here addressed is based on the real-time implementation described in [Colagiacomo, et al. 2010]. • The calculation of the cross-correlations between the channel i signal xi(t) and the channel j signal xj(t) is the most computational demanding part: Cij(t) = max τ {IFFT[FFT(xi(t)xj(t − τ)) • FFT(w(t))]} . Here, t is the time index, τ is the correlation lag, w(t) is the Hamming window and • denotes the element-wise product. 13 / 25
  • 23. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization (cont.) • Cluster-wide parallelism has been obtained assigning the feature extraction stage of each channel to one of the worker nodes. • The server process in the head node dispatches audio frames to the worker nodes through the MPI Bcast instruction and performs the final classification. • Performance have been evaluated in terms of Real-Time Factor (RTF): RTF = Total execution time Speech segment duration 14 / 25
  • 24. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization (cont.) • Audio data: four lapel microphone signals of meeting ES2009b contained in the AMI corpus. • Comparison with an Asus F9SG laptop (Intel Core2 Duo T8300 CPU running at 2.4 GHz and with 2 GB of RAM) • Power consumption is measured switching the LCD monitor off. 15 / 25
  • 25. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization (cont.) Single-board implementation results • Real-time execution is achieved through the NEON instruction set and reducing the number of cross-correlations: the maximum of Cij(t) is searched incrementing τ by ∆τ > 1. ∆τ Laptop BeagleBoard-xM 1 2.47 12.73 16 0.25 1.02 32 0.18 0.63 64 0.14 0.44 128 0.12 0.36 The choice of ∆τ is critical both for the laptop and the BeagleBoard-xM. 16 / 25
  • 26. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization (cont.) Cluster-wide implementation results ∆τ Single-board Five nodes 1 12.73 4.71 16 1.02 1.69 32 0.63 1.63 64 0.44 1.56 128 0.36 1.55 • The MPI version is almost 3 times as fast as the single-board one when ∆τ = 1. • As ∆τ increases, the MPI implementation performance decreases: the communication overhead becomes the new bottleneck. 17 / 25
  • 27. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Speaker Diarization (cont.) Cluster-wide implementation • This has been verified in a four nodes cluster. • Nodes read audio data directly from the local file system. • One of the worker nodes performs both the feature extraction and the classification tasks. ∆τ Five nodes Four nodes (w/ local data) 1 4.71 3.35 16 1.69 0.33 32 1.63 0.23 64 1.56 0.18 128 1.55 0.16 Reducing the communication overhead real-time execution can be achieved with ∆τ = 16. 18 / 25
  • 28. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Speaker Diarization Analysis of power consumption BeagleCluster 20.32 W Laptop 32.36 W Energy ratio Er = RTFcluster · Pcluster RTFlaptop · Plaptop ∼= 1.2 The communication overhead limits the energy efficiency of the Bea- gleCluster. Energy ratio of the four nodes cluster Er ∼= 0.69 Reducing the communication overhead the BeagleCluster is more efficient than the laptop PC. 19 / 25
  • 29. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Conclusions • A cluster computer based on the BeagleBoard-xM platform has been described. • The cluster is based on open software for executing parallel tasks, management, and monitoring the nodes status. • High Performance Linpack has been used to obtain the number of floating point operations per second. • The performance improvement that can be achieved using NEON optimized code has been shown by means of a matrix multiplication benchmark. • Processing time and power consumption have been measured by means of a cluster-wide speaker diarization algorithm to evaluate the real-time capabilities and the energy efficiency of the cluster. 20 / 25
  • 30. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Conclusions (cont.) • Results showed that using the 100 Mbit Ethernet interface, the BeagleCluster consumes 1.2 times the energy spent by the laptop PC. • Removing the communication bottleneck, the BeagleCluster achieves a superior energy efficiency. • The cost of the 5 nodes cluster is 655 e. Compared to the laptop PC, whose cost is 1100 e, the BeagleCluster is about 500 e cheaper. 21 / 25
  • 31. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Future developments • The software platform will be expanded with a resource manager and a scheduler to enable the execution of batch jobs. • The energy efficiency will be assessed in a High-Availability scenario, for example using the cluster for hosting websites. • The use of more efficient hardware platforms (e.g., PandaBoards) and of the DM3730 DSP will be considered. 22 / 25
  • 32. Introduction Purpose of this work The BeagleCluster Experiments Conclusions and Future Developments Thank you for your attention! Emanuele Principi Vito Colagiacomo e.principi@univpm.it s1037562@studenti.univpm.it Stefano Squartini Francesco Piazza s.squartini@univpm.it f.piazza@univpm.it 23 / 25
  • 33. Manufacturer AMPROBE Model LH41A Measuring Range 0-40A, DC or AC peak Resolution 1 mA in 4 A range 10 mA in 40 A range Accuracy ±1.3% + 5 digits Frequency Range DC in DC 40 Hz to 400 Hz in AC 24 / 25
  • 34. High-Performance Linpack: details Rmax 258.6 MFLOPS Problem size 15000 Block size 16 Grid ratio 2 × 2 25 / 25
  • 35. H. W. Meuer, “The TOP500 Project: Looking Back Over 15 Years of Supercomputing Experience,” Informatik-Spektrum, vol. 31, no. 3, pp. 203–222, 2008. [Online]. Available: http://www.top500.org C. L. Belady, “In the Data Center, Power and Cooling Cost More Than the IT Equipment It Supports,” Electronics Cooling Magazine, vol. 13, no. 1, May 2007. W.-c. Feng and K. Cameron, “The Green500 List: Encouraging Sustainable Supercomputing,” IEEE Computer, vol. 40, no. 12, pp. 50–55, Dec. 2007. [Online]. Available: http://www.green500.org I. Ahmad and S. Ranka, Eds., Handbook of Energy-Aware and Green Computing, 1st ed., ser. Information Science. Boca Raton, US: CRC Press, Jan. 2012. S. Andrade, J. Dourado, and C. Maciel, “Low-power cluster using OMAP3530,” in Proc. of EDERC, Nice, France, Dec. 2010, pp. 220–224. K. F¨urlinger, C. Klausecker, and D. Kranzlm¨uller, “Towards energy efficient parallel computing on consumer electronic devices,” in Proc. of ICT-GLOW. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 1–9. M. Brim, R. Flanery, A. Geist, B. Luethke, and S. L. Scott, “Cluster Command and Control (C3) Tool Suite,” Parallel and Distributed Computing Practices, vol. 4, no. 4, Dec. 2001. 25 / 25
  • 36. Argonne National Laboratory, “MPICH2,” http://www.mcs.anl.gov/research/projects/mpich2/. M. L. Massie, B. N. Chun, and D. E. Culler, “The Ganglia distributed monitoring system: design, implementation, and experience,” Parallel Computing, vol. 30, no. 7, pp. 817–840, 2004. M. Moattar and M. Homayounpour, “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012. E. Principi, R. Rotili, M. W¨ollmer, F. Eyben, S. Squartini, and B. Schuller, “Real-Time Activity Detection in a Multi-Talker Reverberated Environment,” Cognitive Computation, pp. 1–12, 2012. V. Colagiacomo, E. Principi, S. Cifani, and S. Squartini, “Real-Time Speaker Diarization on TI OMAP3530,” in Proc. of EDERC, Nice, France, Dec. 1st-2nd 2010. InfiniBand Trade Association, “InfiniBand Architecture Specification Release 1.2.1,” Jan. 2008. N. J. Boden, D. Cohen, R. E. Felderman, A. Kulawik, C. Seitz, J. N. Seizovic, and W. Su, “Myrinet: A Gigabit-per-second Local Area Network,” IEEE Micro, vol. 15, no. 1, pp. 29–36, Feb. 1995. 25 / 25