SlideShare a Scribd company logo
1 of 21
Sasha Goldshtein
CTO
Sela Group
@goldshtn
blog.sashag.net
Task and Data Parallelism:
Real-World Examples
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
AGENDA
Multicore machines have been a cheap
commodity for >10 years
Adoption of concurrent programming is
still slow
Patterns and best practices are scarce
We discuss the APIs first…
…and then turn to examples, best
practices, and tips
2
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
TPL EVOLUTION
The Future
•DataFlow in
.NET 4.5
(NuGet)
•Augmented
with
language
support
(await,
async
methods)
2012
•Released in
full glory
with .NET 4.0
2010
•Incubated
for 3 years
as “Parallel
Extensions
for .NET”
2008
3
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
TASKS
A task is a unit of work
May be executed in parallel with other tasks
by a scheduler (e.g. Thread Pool)
Much more than threads, and yet much
cheaper
4
Task<string> t = Task.Factory.StartNew(
() => { return DnaSimulation(…); });
t.ContinueWith(r => Show(r.Exception),
TaskContinuationOptions.OnlyOnFaulted);
t.ContinueWith(r => Show(r.Result),
TaskContinuationOptions.OnlyOnRanToCompletion);
DisplayProgress();
try { //The C# 5.0 version
var task = Task.Run(DnaSimulation);
DisplayProgress();
Show(await task);
}
catch (Exception ex) {
Show(ex);
}
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
PARALLEL LOOPS
Ideal for parallelizing work over a
collection of data
Easy porting of for and foreach loops
Beware of inter-iteration dependencies!
5
Parallel.For(0, 100, i => {
...
});
Parallel.ForEach(urls, url => {
webClient.Post(url, options, data);
});
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
PARALLEL LINQ
Mind-bogglingly easy parallelization of
LINQ queries
Can introduce ordering into the pipeline,
or preserve order of original elements
6
var query = from monster in monsters.AsParallel()
where monster.IsAttacking
let newMonster = SimulateMovement(monster)
orderby newMonster.XP
select newMonster;
query.ForAll(monster => Move(monster));
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
MEASURING CONCURRENCY
Visual Studio Concurrency Visualizer to
the rescue
7
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
RECURSIVE PARALLELISM EXTRACTION
Divide-and-conquer algorithms are often
parallelized through the recursive call
Be careful with parallelization threshold
and watch out for dependencies
8
void FFT(float[] src, float[] dst, int n, int r, int s) {
if (n == 1) {
dst[r] = src[r];
} else {
FFT(src, n/2, r, s*2);
FFT(src, n/2, r+s, s*2);
//Combine the two halves in O(n) time
}
}
Parallel.Invoke(
() => FFT(src, n/2, r, s*2),
() => FFT(src, n/2, r+s, s*2)
);
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
SYMMETRIC DATA PROCESSING
For a large set of uniform data items that
need to processed, parallel loops are
usually the best choice and lead to ideal
work distribution
Inter-iteration dependencies complicate
things (think in-place blur)
9
Parallel.For(0, image.Rows, i => {
for (int j = 0; j < image.Cols; ++j) {
destImage.SetPixel(i, j, PixelBlur(image, i, j));
}
});
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
UNEVEN WORK DISTRIBUTION
With non-uniform data items, use custom
partitioning or manual distribution
Primes: 7 is easier to check than 10,320,647
10
var work = Enumerable.Range(0, Environment.ProcessorCount)
.Select(n => Task.Run(() =>
CountPrimes(start+chunk*n, start+chunk*(n+1))));
Task.WaitAll(work.ToArray());
VS
Parallel.ForEach(Partitioner.Create(Start, End, chunkSize),
chunk => CountPrimes(chunk.Item1, chunk.Item2)
);
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
COMPLEX DEPENDENCY MANAGEMENT
Must extract all dependencies and
incorporate them into the algorithm
Typical scenarios: 1D loops, dynamic
algorithms
Edit distance: each task depends on 2
predecessors, wavefront computation
11
C = x[i-1] == y[i-1] ? 0 : 1;
D[i, j] = min(
D[i-1, j] + 1,
D[i, j-1] + 1,
D[i-1, j-1] + C);
0,0
m,n
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
SYNCHRONIZATION > AGGREGATION
Excessive synchronization brings parallel
code to its knees
Try to avoid shared state, or minimize
access to it
Aggregate thread- or task-local state
and merge later
12
Parallel.ForEach(
Partitioner.Create(Start, End, ChunkSize),
() => new List<int>(), //initial local state
(range, pls, localPrimes) => { //aggregator
for (int i = range.Item1; i < range.Item2; ++i)
if (IsPrime(i)) localPrimes.Add(i);
return localPrimes;
},
localPrimes => { lock (primes) //combiner
primes.AddRange(localPrimes);
});
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
CREATIVE SYNCHRONIZATION
We implement a collection of stock prices,
initialized with 105 name/price pairs
107 reads/s, 106 “update” writes/s, 103 “add”
writes/day
Many reader threads, many writer threads
13
GET(key):
if safe contains key then return safe[key]
lock { return unsafe[key] }
PUT(key, value):
if safe contains key then safe[key] = value
lock { unsafe[key] = value }
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
LOCK-FREE PATTERNS (1)
Try to avoid Windows synchronization
and use hardware synchronization
Primitive operations such as
Interlocked.Increment,
Interlocked.CompareExchange
Retry pattern with
Interlocked.CompareExchange enables
arbitrary lock-free algorithms
14
int InterlockedMultiply(ref int x, int y) {
int t, r;
do {
t = x;
r = t * y;
}
while (Interlocked.CompareExchange(ref x, r, t) != t);
return r;
}
NewValue
Comparand
OldValue
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
LOCK-FREE PATTERNS (2)
User-mode spinlocks (SpinLock class) can
replace locks you acquire very often,
which protect tiny computations
15
class __DontUseMe__SpinLock {
private int _lck;
public void Enter() {
while (Interlocked.CompareExchange(ref _lck, 1, 0) != 0);
}
public void Exit() {
_lck = 0;
Thread.MemoryBarrier();
}
}
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
MISCELLANEOUS TIPS (1)
Don’t mix several concurrency
frameworks in the same process
Some parallel work is best organized in
pipelines – TPL DataFlow
16
BroadcastBlock
<Uri>
TransformBlock
<Uri, byte[]>
TransformBlock
<byte[],
string>
ActionBlock
<string>
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
MISCELLANEOUS TIPS (2)
Some parallel work can be offloaded to
the GPU – C++ AMP
17
void vadd_exp(float* x, float* y, float* z, int n) {
array_view<const float,1> avX(n, x), avY(n, y);
array_view<float,1> avZ(n, z);
avZ.discard_data();
parallel_for_each(avZ.extent, [=](index<1> i) ... {
avZ[i] = avX[i] + fast_math::exp(avY[i]);
});
avZ.synchronize();
}
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
MISCELLANEOUS TIPS (3)
Invest in SIMD parallelization of heavy
math or data-parallel algorithms
Make sure to take cache effects into
account, especially on MP systems
18
START:
movups xmm0, [esi+4*ecx]
addps xmm0, [edi+4*ecx]
movups [ebx+4*ecx], xmm0
sub ecx, 4
jns START
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
SUMMARY
 Avoid shared state and synchronization
 Parallelize judiciously and apply
thresholds
 Measure and understand performance
gains or losses
 Concurrency and parallelism are still
hard
 A body of best practices, tips, patterns,
examples is being built
19
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
ADDITIONAL REFERENCES
www.devconnections.com
GARBAGE COLLECTION PERFORMANCE TIPS
THANK YOU!
Sasha Goldshtein @goldshtn
sashag@sela.co.il blog.sashag.net
21

More Related Content

What's hot

PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Yulia Tsisyk
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithmTrupti Agrawal
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Garbage collection
Garbage collectionGarbage collection
Garbage collectionSeemal Afza
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFirst steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFelipe
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Neural network in matlab
Neural network in matlab Neural network in matlab
Neural network in matlab Fahim Khan
 
Java.util.concurrent.concurrent hashmap
Java.util.concurrent.concurrent hashmapJava.util.concurrent.concurrent hashmap
Java.util.concurrent.concurrent hashmapSrinivasan Raghvan
 

What's hot (20)

Parallel computation
Parallel computationParallel computation
Parallel computation
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Parallel algorithms
Parallel algorithms Parallel algorithms
Parallel algorithms
 
Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
nn network
nn networknn network
nn network
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFirst steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with Examples
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
Neural network in matlab
Neural network in matlab Neural network in matlab
Neural network in matlab
 
ADVANCED WORKSHOP IN MATLAB
ADVANCED WORKSHOP IN MATLABADVANCED WORKSHOP IN MATLAB
ADVANCED WORKSHOP IN MATLAB
 
Java.util.concurrent.concurrent hashmap
Java.util.concurrent.concurrent hashmapJava.util.concurrent.concurrent hashmap
Java.util.concurrent.concurrent hashmap
 
N ns 1
N ns 1N ns 1
N ns 1
 

Viewers also liked

.NET Garbage Collection Performance Tips
.NET Garbage Collection Performance Tips.NET Garbage Collection Performance Tips
.NET Garbage Collection Performance TipsSasha Goldshtein
 
Advanced Debugging with WinDbg and SOS
Advanced Debugging with WinDbg and SOSAdvanced Debugging with WinDbg and SOS
Advanced Debugging with WinDbg and SOSSasha Goldshtein
 
A History of Modern Garbage Collection Techniques
A History of Modern Garbage Collection TechniquesA History of Modern Garbage Collection Techniques
A History of Modern Garbage Collection TechniquesSasha Goldshtein
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsSyed Zaid Irshad
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...Ahmed kasim
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Dr.K. Thirunadana Sikamani
 
Smp and asmp architecture.
Smp and asmp architecture.Smp and asmp architecture.
Smp and asmp architecture.Gaurav Dalvi
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technologyAmirali Sharifian
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 

Viewers also liked (13)

.NET Garbage Collection Performance Tips
.NET Garbage Collection Performance Tips.NET Garbage Collection Performance Tips
.NET Garbage Collection Performance Tips
 
Advanced Debugging with WinDbg and SOS
Advanced Debugging with WinDbg and SOSAdvanced Debugging with WinDbg and SOS
Advanced Debugging with WinDbg and SOS
 
A History of Modern Garbage Collection Techniques
A History of Modern Garbage Collection TechniquesA History of Modern Garbage Collection Techniques
A History of Modern Garbage Collection Techniques
 
Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar ProcessorsInstruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
 
Concurrency basics
Concurrency basicsConcurrency basics
Concurrency basics
 
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...Multithreading: Exploiting Thread-Level  Parallelism to Improve Uniprocessor ...
Multithreading: Exploiting Thread-Level Parallelism to Improve Uniprocessor ...
 
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
Instruction Level Parallelism Compiler optimization Techniques Anna Universit...
 
Symmetric multiprocessing
Symmetric multiprocessingSymmetric multiprocessing
Symmetric multiprocessing
 
Smp and asmp architecture.
Smp and asmp architecture.Smp and asmp architecture.
Smp and asmp architecture.
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 

Similar to Task and Data Parallelism: Real-World Examples

Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on NetezzaAjay Ohri
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Raffi Khatchadourian
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot NetNeeraj Kaushik
 
Talk on Standard Template Library
Talk on Standard Template LibraryTalk on Standard Template Library
Talk on Standard Template LibraryAnirudh Raja
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up Craig Schumann
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new featuresMSDEVMTL
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new featuresMiguel Bernard
 
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)Spiros
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"LogeekNightUkraine
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and ParallelizationDmitri Nesteruk
 

Similar to Task and Data Parallelism: Real-World Examples (20)

Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
Talk on Standard Template Library
Talk on Standard Template LibraryTalk on Standard Template Library
Talk on Standard Template Library
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Java performance
Java performanceJava performance
Java performance
 
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
Software Frameworks for Deep Learning (D1L7 2017 UPC Deep Learning for Comput...
 
BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up BP206 - Let's Give Your LotusScript a Tune-Up
BP206 - Let's Give Your LotusScript a Tune-Up
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new features
 
C sharp 8.0 new features
C sharp 8.0 new featuresC sharp 8.0 new features
C sharp 8.0 new features
 
L04 Software Design Examples
L04 Software Design ExamplesL04 Software Design Examples
L04 Software Design Examples
 
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)elm-d3 @ NYC D3.js Meetup (30 June, 2014)
elm-d3 @ NYC D3.js Meetup (30 June, 2014)
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
Data herding
Data herdingData herding
Data herding
 

More from Sasha Goldshtein

Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing LandscapeSasha Goldshtein
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF AbyssSasha Goldshtein
 
Visual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkVisual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkSasha Goldshtein
 
Swift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSwift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSasha Goldshtein
 
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinC# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinSasha Goldshtein
 
Modern Backends for Mobile Apps
Modern Backends for Mobile AppsModern Backends for Mobile Apps
Modern Backends for Mobile AppsSasha Goldshtein
 
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Sasha Goldshtein
 
Mastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionMastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionSasha Goldshtein
 
Delivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesDelivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesSasha Goldshtein
 
Building Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendBuilding Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendSasha Goldshtein
 
Building iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesBuilding iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesSasha Goldshtein
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web ApplicationsSasha Goldshtein
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile ServicesSasha Goldshtein
 
First Steps in Android Development
First Steps in Android DevelopmentFirst Steps in Android Development
First Steps in Android DevelopmentSasha Goldshtein
 
First Steps in iOS Development
First Steps in iOS DevelopmentFirst Steps in iOS Development
First Steps in iOS DevelopmentSasha Goldshtein
 

More from Sasha Goldshtein (20)

Modern Linux Tracing Landscape
Modern Linux Tracing LandscapeModern Linux Tracing Landscape
Modern Linux Tracing Landscape
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF Abyss
 
Visual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET FrameworkVisual Studio 2015 and the Next .NET Framework
Visual Studio 2015 and the Next .NET Framework
 
Swift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS XSwift: Apple's New Programming Language for iOS and OS X
Swift: Apple's New Programming Language for iOS and OS X
 
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with XamarinC# Everywhere: Cross-Platform Mobile Apps with Xamarin
C# Everywhere: Cross-Platform Mobile Apps with Xamarin
 
Modern Backends for Mobile Apps
Modern Backends for Mobile AppsModern Backends for Mobile Apps
Modern Backends for Mobile Apps
 
.NET Debugging Workshop
.NET Debugging Workshop.NET Debugging Workshop
.NET Debugging Workshop
 
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
Performance and Debugging with the Diagnostics Hub in Visual Studio 2013
 
Mastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and ProductionMastering IntelliTrace in Development and Production
Mastering IntelliTrace in Development and Production
 
Introduction to RavenDB
Introduction to RavenDBIntroduction to RavenDB
Introduction to RavenDB
 
State of the Platforms
State of the PlatformsState of the Platforms
State of the Platforms
 
Delivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in MinutesDelivering Millions of Push Notifications in Minutes
Delivering Millions of Push Notifications in Minutes
 
Building Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET BackendBuilding Mobile Apps with a Mobile Services .NET Backend
Building Mobile Apps with a Mobile Services .NET Backend
 
Building iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile ServicesBuilding iOS and Android Apps with Mobile Services
Building iOS and Android Apps with Mobile Services
 
What's New in C++ 11?
What's New in C++ 11?What's New in C++ 11?
What's New in C++ 11?
 
Attacking Web Applications
Attacking Web ApplicationsAttacking Web Applications
Attacking Web Applications
 
Windows Azure Mobile Services
Windows Azure Mobile ServicesWindows Azure Mobile Services
Windows Azure Mobile Services
 
First Steps in Android Development
First Steps in Android DevelopmentFirst Steps in Android Development
First Steps in Android Development
 
First Steps in iOS Development
First Steps in iOS DevelopmentFirst Steps in iOS Development
First Steps in iOS Development
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Task and Data Parallelism: Real-World Examples

  • 1. Sasha Goldshtein CTO Sela Group @goldshtn blog.sashag.net Task and Data Parallelism: Real-World Examples
  • 2. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS AGENDA Multicore machines have been a cheap commodity for >10 years Adoption of concurrent programming is still slow Patterns and best practices are scarce We discuss the APIs first… …and then turn to examples, best practices, and tips 2
  • 3. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS TPL EVOLUTION The Future •DataFlow in .NET 4.5 (NuGet) •Augmented with language support (await, async methods) 2012 •Released in full glory with .NET 4.0 2010 •Incubated for 3 years as “Parallel Extensions for .NET” 2008 3
  • 4. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS TASKS A task is a unit of work May be executed in parallel with other tasks by a scheduler (e.g. Thread Pool) Much more than threads, and yet much cheaper 4 Task<string> t = Task.Factory.StartNew( () => { return DnaSimulation(…); }); t.ContinueWith(r => Show(r.Exception), TaskContinuationOptions.OnlyOnFaulted); t.ContinueWith(r => Show(r.Result), TaskContinuationOptions.OnlyOnRanToCompletion); DisplayProgress(); try { //The C# 5.0 version var task = Task.Run(DnaSimulation); DisplayProgress(); Show(await task); } catch (Exception ex) { Show(ex); }
  • 5. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS PARALLEL LOOPS Ideal for parallelizing work over a collection of data Easy porting of for and foreach loops Beware of inter-iteration dependencies! 5 Parallel.For(0, 100, i => { ... }); Parallel.ForEach(urls, url => { webClient.Post(url, options, data); });
  • 6. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS PARALLEL LINQ Mind-bogglingly easy parallelization of LINQ queries Can introduce ordering into the pipeline, or preserve order of original elements 6 var query = from monster in monsters.AsParallel() where monster.IsAttacking let newMonster = SimulateMovement(monster) orderby newMonster.XP select newMonster; query.ForAll(monster => Move(monster));
  • 7. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS MEASURING CONCURRENCY Visual Studio Concurrency Visualizer to the rescue 7
  • 8. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS RECURSIVE PARALLELISM EXTRACTION Divide-and-conquer algorithms are often parallelized through the recursive call Be careful with parallelization threshold and watch out for dependencies 8 void FFT(float[] src, float[] dst, int n, int r, int s) { if (n == 1) { dst[r] = src[r]; } else { FFT(src, n/2, r, s*2); FFT(src, n/2, r+s, s*2); //Combine the two halves in O(n) time } } Parallel.Invoke( () => FFT(src, n/2, r, s*2), () => FFT(src, n/2, r+s, s*2) );
  • 9. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS SYMMETRIC DATA PROCESSING For a large set of uniform data items that need to processed, parallel loops are usually the best choice and lead to ideal work distribution Inter-iteration dependencies complicate things (think in-place blur) 9 Parallel.For(0, image.Rows, i => { for (int j = 0; j < image.Cols; ++j) { destImage.SetPixel(i, j, PixelBlur(image, i, j)); } });
  • 10. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS UNEVEN WORK DISTRIBUTION With non-uniform data items, use custom partitioning or manual distribution Primes: 7 is easier to check than 10,320,647 10 var work = Enumerable.Range(0, Environment.ProcessorCount) .Select(n => Task.Run(() => CountPrimes(start+chunk*n, start+chunk*(n+1)))); Task.WaitAll(work.ToArray()); VS Parallel.ForEach(Partitioner.Create(Start, End, chunkSize), chunk => CountPrimes(chunk.Item1, chunk.Item2) );
  • 11. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS COMPLEX DEPENDENCY MANAGEMENT Must extract all dependencies and incorporate them into the algorithm Typical scenarios: 1D loops, dynamic algorithms Edit distance: each task depends on 2 predecessors, wavefront computation 11 C = x[i-1] == y[i-1] ? 0 : 1; D[i, j] = min( D[i-1, j] + 1, D[i, j-1] + 1, D[i-1, j-1] + C); 0,0 m,n
  • 12. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS SYNCHRONIZATION > AGGREGATION Excessive synchronization brings parallel code to its knees Try to avoid shared state, or minimize access to it Aggregate thread- or task-local state and merge later 12 Parallel.ForEach( Partitioner.Create(Start, End, ChunkSize), () => new List<int>(), //initial local state (range, pls, localPrimes) => { //aggregator for (int i = range.Item1; i < range.Item2; ++i) if (IsPrime(i)) localPrimes.Add(i); return localPrimes; }, localPrimes => { lock (primes) //combiner primes.AddRange(localPrimes); });
  • 13. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS CREATIVE SYNCHRONIZATION We implement a collection of stock prices, initialized with 105 name/price pairs 107 reads/s, 106 “update” writes/s, 103 “add” writes/day Many reader threads, many writer threads 13 GET(key): if safe contains key then return safe[key] lock { return unsafe[key] } PUT(key, value): if safe contains key then safe[key] = value lock { unsafe[key] = value }
  • 14. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS LOCK-FREE PATTERNS (1) Try to avoid Windows synchronization and use hardware synchronization Primitive operations such as Interlocked.Increment, Interlocked.CompareExchange Retry pattern with Interlocked.CompareExchange enables arbitrary lock-free algorithms 14 int InterlockedMultiply(ref int x, int y) { int t, r; do { t = x; r = t * y; } while (Interlocked.CompareExchange(ref x, r, t) != t); return r; } NewValue Comparand OldValue
  • 15. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS LOCK-FREE PATTERNS (2) User-mode spinlocks (SpinLock class) can replace locks you acquire very often, which protect tiny computations 15 class __DontUseMe__SpinLock { private int _lck; public void Enter() { while (Interlocked.CompareExchange(ref _lck, 1, 0) != 0); } public void Exit() { _lck = 0; Thread.MemoryBarrier(); } }
  • 16. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS MISCELLANEOUS TIPS (1) Don’t mix several concurrency frameworks in the same process Some parallel work is best organized in pipelines – TPL DataFlow 16 BroadcastBlock <Uri> TransformBlock <Uri, byte[]> TransformBlock <byte[], string> ActionBlock <string>
  • 17. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS MISCELLANEOUS TIPS (2) Some parallel work can be offloaded to the GPU – C++ AMP 17 void vadd_exp(float* x, float* y, float* z, int n) { array_view<const float,1> avX(n, x), avY(n, y); array_view<float,1> avZ(n, z); avZ.discard_data(); parallel_for_each(avZ.extent, [=](index<1> i) ... { avZ[i] = avX[i] + fast_math::exp(avY[i]); }); avZ.synchronize(); }
  • 18. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS MISCELLANEOUS TIPS (3) Invest in SIMD parallelization of heavy math or data-parallel algorithms Make sure to take cache effects into account, especially on MP systems 18 START: movups xmm0, [esi+4*ecx] addps xmm0, [edi+4*ecx] movups [ebx+4*ecx], xmm0 sub ecx, 4 jns START
  • 19. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS SUMMARY  Avoid shared state and synchronization  Parallelize judiciously and apply thresholds  Measure and understand performance gains or losses  Concurrency and parallelism are still hard  A body of best practices, tips, patterns, examples is being built 19
  • 21. www.devconnections.com GARBAGE COLLECTION PERFORMANCE TIPS THANK YOU! Sasha Goldshtein @goldshtn sashag@sela.co.il blog.sashag.net 21