SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
Performance Optimization of Smoothed Particle
Hydrodynamics for Multi/Many-Core Architectures
Dr. Fabio Baruffa fabio.baruffa@lrz.de
Leibniz Supercomputing Centre
MC² Series: Colfax Research Webinar, http://mc2series.com
March 7th
, 2017
2
Work contributors
● Member of the IPCC @ LRZ
● Expert in performance
optimization and HPC systems
● Member of the IPCC @ LRZ
● Expert in computational
astrophysics and simulations
Dr. Fabio Baruffa
Sr. HPC Application Specialist
Leibniz Supercomputing Centre
Dr. Luigi Iapichino
Scientific Computing Expert
Leibniz Supercomputing Centre
Email contacts: fabio.baruffa@lrz.de luigi.iapichino@lrz.de
3
Intel®
Parallel Computing Centers (IPCC)
● The IPCCs are an Intel initiative for code
modernization of technical computing codes.
● The work primary focus on code optimization
increasing parallelism and scalability on
multi/many core architectures.
● Currently ~70 IPCCs are funded worldwide.
● Our target is to prepare the simulation software
for new platforms achieving high
nodel-level performance and multi-node
scalability.
4
Outline of the talk
Preprint of this work: https://arxiv.org/abs/1612.06090
● Overview of the code: P-Gadget3 and SPH.
● Challenges in code modernization approach.
● Multi-threading parallelism and scalability.
● Enabling vectorization through:
Data layout optimization (AoS → SoA).
Reducing conditional branching.
● Performance results and outlook.
5
Gadget intro
● Leading application for simulating the
formation of the cosmological large-scale structure
(galaxies and clusters) and of processes at
sub-resolution scale (e.g. star formation, metal
enrichment).
● Publicly available, cosmological
TreePM N-body + SPH code.
● Good scaling performance up to
O(100k) Xeon cores
(SuperMUC @ LRZ).
Introduction
6
Smoothed particle hydrodynamics (SPH)
Introduction
● SPH is a Lagrangian particle method for solving the equations of fluid
dynamics, widely used in astrophysics.
● It is a mesh-free method, based on a particle discretization of the
medium.
● The local estimation of gas density (and all other derivation of the
governing equations) is based on a kernel-weighted summation over
neighbor particles:
7
Gadget features
Introduction
The code can be run at different levels of
complexity:
● N-Body-only (a.k.a. dark matter) simulations.
● N-Body + gas component (SPH).
● Additional physics (sub-resolution) modules:
radiative cooling, star formation,…
● More physics → more memory required
per particles (up to ~ 300B / particle).
8
Features of the code
Gadget features
● Gadget has been first developed in the late 90s as serial code, has later
evolved as an MPI and a hybrid code.
● After the last public release Gadget-2, many research groups all over the
world have developed their own branches.
● The branch used for this project (P-Gadget3) has been used for more than
30 research papers over the last two years.
● The code have ~200 files, ~400k code lines, extensive use of #IFDEF, ext.
libs (fftw,hdf5).
9
Basic principles of our development
Basic principle of our development
● Our intention is to ensure:
● Portability on all modern architectures (Intel®
Xeon/MIC, Power, GPU,…);
● Readability for non-experts in HPC;
● Consistency with all the existing functionalities.
● We perform code modifications which are minimally invasive.
● The domain scientists have to be able to modify the code without coping
with performance questions.
10
Code modernization approach
Code modernization
● Scalar optimization: compiler flags, data casting, precision consistency.
● Vectorization: prepare the code for SIMD, avoid vector dependencies.
● Memory access: improve data layout, cache access.
● Multi-threading: enable OpenMP, manage scheduling and pinning.
● Communication: enable MPI, offloading computation.
https://software.intel.com/en-us/articles/what-
is-code-modernization; colfaxresearch.com
11
Code modernization approach
Code modernization
● Scalar optimization: compiler flags, data casting, precision consistency.
● Vectorization: prepare the code for SIMD, avoid vector dependencies.
● Memory access: improve data layout, cache access.
● Multi-threading: enable OpenMP, manage scheduling and pinning.
● Communication: enable MPI, offloading computation.
https://software.intel.com/en-us/articles/what-
is-code-modernization; colfaxresearch.com
Preparation for the next generation processors and efficient usage of the current
hardware
12
Target architectures for our project
Intel®
architectures
● E5-2650v2 Ivy-Bridge (IVB) @ 2.6 GHz,
8-cores / socket.
TDP: 95W, RCP: $1116.
● AVX.
Intel®
Xeon processor Intel®
Xeon Phi™ coprocessor
1st
generation
● Knights Corner (KNC) coprocessor 5110P
@ 1.1GHz, 60 cores.
TDP: 225W, RCP: N/D.
● Native / offload computing.
● Directly login via ssh.
● SIMD 512 bits.
13
Further tested architectures
Intel®
architectures
● E5-2697v3 Haswell (HSW) @ 2.3 GHz,
14-cores / socket.
TDP: 145W, RCP: $2702.
● AVX2, FMA.
● E5-2699v4 Broadwell (BDW) @ 2.2 GHz,
22-cores / socket.
TDP: 145W, RCP: $4115.
● AVX2, FMA.
Intel®
Xeon processors
● Knights Landing (KNL) Processor 7250
@ 1.4 GHz, 68 cores.
TDP: 215W, RCP: $4876.
● Available as bootable processor.
● Binary-compatible with x86.
● High bandwidth memory.
● New AVX512 instructions set.
Intel®
Xeon Phi™ processor
2nd
generation
14
Optimization strategy
Optimization strategy
●
We isolate the representative code kernel subfind_density and run it in as
a stand-alone application, avoiding the overhead from the whole simulation.
●
As most code components, it consists of two sub-phases of nearly equal
execution time (40 to 45% for each of them), namely the neighbour-finding
phase and the remaining physics computations.
●
Our physics workload: ~ 500k particles. This is a typical workload per node of
simulations with moderate resolution.
●
We focus mainly on node-level performance.
●
We use tools from the Intel®
Parallel Studio XE (VTune Amplifier and Advisor).
Simulation details:
www.magneticum.org
15
Isolation of a kernel code
Data serialization
● Serialization: the process of converting data structures or objects into
a format that can be stored and easily retrieved.
● This allows to isolate the computational kernel using realistic input
workload (~ 551MB).
● Dumping data for compression.
Object Byte stream Byte streamByte stream ObjectDB
file
mem
16
Initial profiling
Multi-threading parallelism
thread spinning
● Severe shared-memory
parallelization overhead
● At later iterations, the
particle list is locked and
unlocked constantly due
to the recomputation
● Spinning time 41%
17
Algorithm pseudocode
Subfind algorithm
more_particles = partlist.length;
while(more_particles){
  int i=0;                 
  while(!error && i<partlist.length){
  #pragma omp parallel
  {
    #pragma omp critical
    {
   p = partlist[i++];  
    }
    if(!must_compute(p)) continue;
    ngblist = find_neighbours(p);
    sort(ngblist);
    for(auto n:select(ngblist,K)) 
       compute_interaction(p,n);
  }
  more_particles = mark_for_recomputation(partlist);
}
while loop over the full particle list
each thread gets the next particle
(private p) to process
check for computation
actual computation
18
Removing lock contention
Subfind algorithm
todo_partlist = partlist;
while(partlist.length){
  error=0;
  #pragma omp parallel for schedule(dynamic)
  for(auto p:todo_partlist){
    if(something_is_wrog) error=1;
    ngblist = find_neighbours(p);
    sort(ngblist);
    for(auto n:select(ngblist,K)) 
       compute_interaction(p,n);
  }
//...check for any error
  todo_particles = mark_for_recomputation(partlist);
}
creating a todo particle list
iterations over the todo list
(private ngblist)
actual computation
No-checks for computation
19
Improved performance
Multi-threading parallelism
no spinning
● Lockless scheme
● Time spent in spinning
only 3%
20
Improved speed-up
Multi-threading parallelism
● On IVB
● speed-up: 1.8x
● parallel efficiency: 92%
● On KNC
● speed-up: 5.2x
● parallel efficiency: 57%
21
Obstacles to efficient auto-vectorization
for(n = 0, n < neighboring_particles, n++ ){
    j = ngblist[n];   
           
    if (particle n within smoothing_length){   
                        
       inlined_function1(…, &w);
       inlined_function2(…, &w);
       rho   += P_AoS[j].mass*w;
       vel_x += P_AoS[j].vel_x;
       …
       v2 += vel_x*vel_x + … vel_z*vel_z;      
   }
Target loop
for loop over neighbors
check for computation
computing physics
Particles properties via
AoS
22
struct ParticleAoS
{
  float pos[3];
  float vel[3];
  float mass;
}
struct ParticleSoA
{
  float *pos_x, *pos_y, *pos_z;
  float *vel_x, *vel_y, *vel_z;
  float mass;
}
Data layout
pos[0]
...
pos[1]
pos[2]
vel[0]
...
pos[0]
pos[1]
pos[2]
...
mass
xi+1
xi+2
xi+3
xi+4
xi+5
xi+6
xi
xi+7
...
pos_x
...
pos_x
pos_x
pos_x
pos_x
pos_x
pos_x
pos_x
pos_x
...
xi+1
xi+2
xi+3
xi+4
xi+5
xi+6
xi
xi+7
...particles[i]particles[i+1]
p.pos_x[i]
p.pos_x[i+1]
p.pos_x[i+2]
p.pos_x[i+3]
p.pos_x[i+4]
p.pos_x[i+5]
p.pos_x[i+6]
p.pos_x[i+7]
p.pos_x[i+8]
Memory Memory
Vector
Register
Vector
Register
AoS SoA
Data layout: AoS vs SoA
Automatically vectorized loops can
contain loads from not contiguous
memory locations → non-unit stride
● The compiler has issued hardware
gather/scatter instructions.
 
23
Proposed solution: SoA
●
New particle data structure: defined as Structure of Arrays (SoA).
●
From the original set, only variables used in the kernel are included in the
SoA: ~ 60 bytes per particle.
●
Software gather / scatter routines.
●
Minimally invasive code changes:
●
SoA in the kernel.
●
AoS exposed to other parts of the code.
Data layout
24
Implementation details
Data layout
struct ParticleSoA
{
  float *pos_x, … , *vel_x, …, mass;
}
Particle_SoA P_SoA;
P_SoA.pos_x = malloc(N*sizeof(float));
…
       
…
rho   += P_AoS[j].mass*w;
vel_x += P_AoS[j].vel_x;
…
       
…
rho   += P_SoA.mass[j]*w;
vel_x += P_SoA.vel_x[j];
…
       
struct ParticleAoS
{
  float pos[3], vel[3], mass;
}
Particle_AoS *P_AoS;
P_AoS = malloc(N*sizeof(Particle_AoS);
    
void gather_Pdata(struct Particle_SoA *dst, struct Particle_AoS *src, int N )
for(int i = 0, i < N, i++ ){
    dst ­> pos_x[i] = src[i].pos[1]; dst ­> pos_y[i] = src[i].pos[2]; … 
}   
25
AoS to SoA: performance outcomes
●
Gather+scatter overhead at
most 1.8% of execution time.
→ intensive data-reuse
●
Performance improvement:
●
on IVB: 13%, on KNC: 48%
●
Xeon/Xeon Phi performance
ratio: from 0.15 to 0.45.
●
The data structure is now
vectorization-ready.
Data layout
1/exec.time
higher is better
26
Optimizing for vectorization
●
Modern multi/many-core architectures rely on vectorization as an additional
layer of parallelism to deliver performance.
●
Mind the constraint: keep Gadget readable and portable for the wide user
community! Wherever possible, avoid programming in intrinsics.
●
Analysis with Intel®
Advisor 2016:
• Most of the vectorization potential (10 to 20% of the workload) in the
kernel “compute” loop.
• Prototype loop in Gadget: iteration over the neighbors of a given particle.
●
Similarity with many other N-body codes.
Vectorization
27
Vectorization: improvements from IVB to KNL
●
Vectorization through localized
masking (if-statement moved
inside the inlined functions).
●
Vector efficiency:
perf. gain / vector length
on IVB: 55%
on KNC: 42%
on KNL: 83%
Vectorization
- Yellow + red bar: kernel workload
- Red bar: target loop for vectorization
28
Node-level performance comparison between HSW,
KNC and KNL
Features of the KNL tests:
●
KMP Affinity: scatter;
Memory mode: Flat;
MCDRAM via numactl;
Cluster mode: Quadrant.
Results:
●
Our optimization improves the
speed-up on all systems.
●
Better threading scalability up
to 136 threads on KNL.
●
Hyperthreading performance is
different between KNC and KNL.
Performance results on Knights Landing
29
Performance comparison: first results including KNL
and Broadwell
●
Initial vs. optimized including all
optimizations for subfind_density
●
IVB, HSW, BDW: 1 socket w/o
hyperthreading.
KNC: 1 MIC, 240 threads.
KNL: 1 node, 136 threads.
●
Performance gain:
●
Xeon Phi: 13.7x KNC, 20.1x KNL.
●
Xeon: 2.6x IVB, 4.8x HSW,
4.7x BDW.
Performance results
lower is better
30
Summary and outlook
●
Code modernization as the iterative process for improving the performance of an
HPC application.
●
Our IPCC example: P-Gadget3.
Threading parallelism
Data layout Key points of our work, guided by analysis tools.
Vectorization
●
This effort is (mostly) portable! Good performance found on new architectures (KNL
and BDW) basically out-of-the-box.
●
For KNL, architecture-specific features (MCDRAM, large vector registers and NUMA
characteristics) are currently under investigation for different workloads.
●
Investment on the future of well-established community applications, and crucial for
the effective use of forthcoming HPC facilities.
https://arxiv.org/abs/1612.06090
31
Acknowledgements
●
Research supported by the Intel®
Parallel Computing Center program.
●
Project coauthors: Nicolay J. Hammer (LRZ), Vasileios Karakasis (CSCS).
●
P-Gadget3 developers: Klaus Dolag, Margarita Petkova, Antonio Ragagnin.
●
Research collaborator at Technical University of Munich (TUM): Nikola Tchipev.
●
TCEs at Intel: Georg Zitzlsberger, Heinrich Bockhorst.
●
Thanks to the IXPUG community for useful discussion.
●
Special thanks to Colfax Research for proposing this contribution to the MC² Series,
and for granting access to their computing facilities.

Weitere ähnliche Inhalte

Was ist angesagt?

Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Pradeep Singh
 

Was ist angesagt? (20)

QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
 
Developing an embedded video application on dual Linux + FPGA architecture
Developing an embedded video application on dual Linux + FPGA architectureDeveloping an embedded video application on dual Linux + FPGA architecture
Developing an embedded video application on dual Linux + FPGA architecture
 
RISC-V assembly
RISC-V assemblyRISC-V assembly
RISC-V assembly
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
AI is Impacting HPC Everywhere
AI is Impacting HPC EverywhereAI is Impacting HPC Everywhere
AI is Impacting HPC Everywhere
 
Improve Vectorization Efficiency
Improve Vectorization EfficiencyImprove Vectorization Efficiency
Improve Vectorization Efficiency
 
Cuda 2011
Cuda 2011Cuda 2011
Cuda 2011
 
An open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V coresAn open flow for dn ns on ultra low-power RISC-V cores
An open flow for dn ns on ultra low-power RISC-V cores
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
SFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mindSFO15-BFO2: Reducing the arm linux kernel size without losing your mind
SFO15-BFO2: Reducing the arm linux kernel size without losing your mind
 
Fpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aesFpga implementation of encryption and decryption algorithm based on aes
Fpga implementation of encryption and decryption algorithm based on aes
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
Memory, IPC and L4Re
Memory, IPC and L4ReMemory, IPC and L4Re
Memory, IPC and L4Re
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 

Andere mochten auch

ВIкторина "Фізика та кіно"
ВIкторина "Фізика та кіно"ВIкторина "Фізика та кіно"
ВIкторина "Фізика та кіно"
sveta7940
 
Проект "Розвиток космонавтики"
Проект "Розвиток космонавтики"Проект "Розвиток космонавтики"
Проект "Розвиток космонавтики"
sveta7940
 
Folkets Bryggeri sin uoffisielle sosiale medier strategi
Folkets Bryggeri sin uoffisielle sosiale medier strategi Folkets Bryggeri sin uoffisielle sosiale medier strategi
Folkets Bryggeri sin uoffisielle sosiale medier strategi
Frode L
 

Andere mochten auch (20)

SoC HPC: Design, Optimization, and Application to Algorithmic Trading
SoC HPC: Design, Optimization, and Application to Algorithmic TradingSoC HPC: Design, Optimization, and Application to Algorithmic Trading
SoC HPC: Design, Optimization, and Application to Algorithmic Trading
 
Sonny-Krikorian
Sonny-KrikorianSonny-Krikorian
Sonny-Krikorian
 
Cyberbullying
CyberbullyingCyberbullying
Cyberbullying
 
research and planning presentation
research and planning presentationresearch and planning presentation
research and planning presentation
 
ВIкторина "Фізика та кіно"
ВIкторина "Фізика та кіно"ВIкторина "Фізика та кіно"
ВIкторина "Фізика та кіно"
 
Проект "Розвиток космонавтики"
Проект "Розвиток космонавтики"Проект "Розвиток космонавтики"
Проект "Розвиток космонавтики"
 
Day 3 C2C - Smarter Africa Ghana Case Study
Day 3 C2C - Smarter Africa Ghana Case StudyDay 3 C2C - Smarter Africa Ghana Case Study
Day 3 C2C - Smarter Africa Ghana Case Study
 
[EN] ECM Enterprise Content Management | Dr. Ulrich Kampffmeyer | AIIM Confer...
[EN] ECM Enterprise Content Management | Dr. Ulrich Kampffmeyer | AIIM Confer...[EN] ECM Enterprise Content Management | Dr. Ulrich Kampffmeyer | AIIM Confer...
[EN] ECM Enterprise Content Management | Dr. Ulrich Kampffmeyer | AIIM Confer...
 
презентация кулигин
презентация кулигинпрезентация кулигин
презентация кулигин
 
Primavera software
Primavera softwarePrimavera software
Primavera software
 
El aparato reproductor
El aparato reproductorEl aparato reproductor
El aparato reproductor
 
Napoleon I Bonaparta
Napoleon I BonapartaNapoleon I Bonaparta
Napoleon I Bonaparta
 
Tamna strana fotosinteze
Tamna strana fotosintezeTamna strana fotosinteze
Tamna strana fotosinteze
 
The Problem With MarTech
The Problem With MarTechThe Problem With MarTech
The Problem With MarTech
 
Prince2 foundation and practitioner
Prince2 foundation and practitionerPrince2 foundation and practitioner
Prince2 foundation and practitioner
 
Folkets Bryggeri sin uoffisielle sosiale medier strategi
Folkets Bryggeri sin uoffisielle sosiale medier strategi Folkets Bryggeri sin uoffisielle sosiale medier strategi
Folkets Bryggeri sin uoffisielle sosiale medier strategi
 
Reinforced soil
Reinforced soilReinforced soil
Reinforced soil
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
LA MISTA - CUENTOS GROTESCOS
LA MISTA - CUENTOS GROTESCOSLA MISTA - CUENTOS GROTESCOS
LA MISTA - CUENTOS GROTESCOS
 

Ähnlich wie Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures

Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
IJERA Editor
 
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Gaurav Raina
 

Ähnlich wie Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures (20)

Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
HOW Series: Knights Landing
HOW Series: Knights LandingHOW Series: Knights Landing
HOW Series: Knights Landing
 
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
Design of 32 Bit Processor Using 8051 and Leon3 (Progress Report)
 
Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...Automatically partitioning packet processing applications for pipelined archi...
Automatically partitioning packet processing applications for pipelined archi...
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
 
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Multicore
MulticoreMulticore
Multicore
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
Feedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows AzureFeedback on Big Compute & HPC on Windows Azure
Feedback on Big Compute & HPC on Windows Azure
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures