SlideShare ist ein Scribd-Unternehmen logo
1 von 47
WHERETHEWILD
THINGS ARE
Benchmarking and
Micro-Optimisations
Matt Warren
@matthewwarren
http://mattwarren.org/
Premature Optimization
“We should forget about small efficiencies, say
about 97% of the time: premature
optimization is the root of all evil.Yet we
should not pass up our opportunities in that
critical 3%.“
- Donald Knuth
ProfilingTools
• ANTS Performance Profiler - Redgate
• dotTrace & dotMemory - Jet Brains
• PerfView - Microsoft (free)
• Visual Studio Profiling Tools (Ultimate, Premium or Professional)
• MiniProfiler - Stack Overflow (free)
BenchmarkDotNet
Why do you need a
benchmarking library?
static void Profile(int iterations, Action action)
{
action(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
action();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
private static T Result;
static void Profile<T>(int iterations, Func<T> func)
{
func(); // warm up
GC.Collect(); // clean up
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < iterations; i++)
{
Result = func();
}
watch.Stop();
Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds);
}
Benchmarking small code samples in C#, can this implementation be improved?
http://stackoverflow.com/q/1047218/4500
BenchmarkDotNet project
Andrey Akinshin (the ‘Boss’)
@andrey_akinshin
http://aakinshin.net/en/blog/
Matt Warren (me)
Adam Sitnik (.NET Core guru)
@SitnikAdam
http://adamsitnik.com/
.NET
Foundation
Goals of BenchmarkDotNet
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Benchmarking library that is:
•Accurate
•Easy-to-use
•Helpful
Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/
LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/
Goals of BenchmarkDotNet
Proper docs!
benchmarkdotnet.org/
What BenchmarkDotNet doesn’t do
•Multi-threaded benchmarks
•Integrate with C.I builds
•Unit test runner integration
•Anything else?
http://github.com/dotnet/BenchmarkDotNet/issues/
“Other Benchmarking tools are available”
• NBench
• https://github.com/petabridge/NBench
• Microsoft Xunit performance
• http://github.com/Microsoft/xunit-performance/
• Lambda Micro Benchmarking (“Clash of the Lambdas”)
• https://github.com/biboudis/LambdaMicrobenchmarking
• Etimo.Benchmarks
• http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/
• MeasureIt
• https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for-
doing-microbenchmarks-for-net/
How it works
An invocation of the target method is an operation.
A bunch of operations is an iteration.
Iteration types:
• Pilot:The best operation count will be chosen.
• IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated.
• MainWarmup:Warmup of the main method.
• MainTarget: Main measurements.
• Result = MainTarget – AverageOverhead
http://benchmarkdotnet.org/HowItWorks.htm
What happens under the covers?
Image credit Albert Rodríguez @UncleFirefox
DEMO
‘Hello World’ Benchmark
Scale of benchmarks
•millisecond - ms
• One thousandth of one second, single webapp request
•microsecond - us or µs
• One millionth of one second, several in-memory operations
•nanosecond - ns
• One billionth of one second, single operations
Who ‘times’ the timers?
[Benchmark]
public long StopwatchLatency()
{
return Stopwatch.GetTimestamp();
}
[Benchmark]
public long StopwatchGranularity()
{
// Loop until Stopwatch.GetTimestamp()
// gives us a different value
long lastTimestamp =
Stopwatch.GetTimestamp();
while (Stopwatch.GetTimestamp() ==
lastTimestamp)
{
}
return lastTimestamp;
}
[Benchmark]
public long DateTimeLatency()
{
return DateTime.Now.Ticks;
}
[Benchmark]
public long DateTimeGranularity()
{
// Loop until DateTime.Now
// gives us a different value
long lastTimestamp = DateTime.Now.Ticks;
while (DateTime.Now.Ticks == lastTimestamp)
{
}
return lastTimestamp;
}
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | ?? ns | ?? ns | ?? B |
StopwatchGranularity | ?? ns | ?? ns | ?? B |
DateTimeLatency | ?? ns | ?? ns | ?? B |
DateTimeGranularity | ?? ns | ?? ns | ?? B |
Who ‘times’ the timers?
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Allocated |
--------------------- |---------------- |------------ |---------- |
StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B |
StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B |
DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B |
DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB |
Who ‘times’ the timers?
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
[Benchmark(Baseline = true)]
public int ForLoopArray()
{
var counter = 0;
for (int i = 0; i < anArray.Length; i++)
counter += anArray[i];
return counter;
}
[Benchmark]
public int ForEachArray()
{
var counter = 0;
foreach (var i in anArray)
counter += i;
return counter;
}
[Benchmark]
public int ForLoopList()
{
var counter = 0;
for (int i = 0; i < aList.Count; i++)
counter += aList[i];
return counter;
}
[Benchmark]
public int ForEachList()
{
var counter = 0;
foreach (var i in aList)
counter += i;
return counter;
}
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | ?? ns | | ?? | |
ForEachArray | ?? ns | | ?? | |
ForLoopList | ?? ns | | ?? | |
ForEachList | ?? ns | | ?? | |
Loop-the-Loop
”Avoid foreach loop on everything except raw arrays?”
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdDev | Scaled | Scaled-StdDev |
--------------- |-------------- |------------ |------- |-------------- |
ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 |
ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 |
ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 |
ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
Loop-the-Loop – ‘for loop’ - Arrays
Loop-the-Loop – ‘for loop’ - Lists
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
[Benchmark]
public Dictionary<string, string> DictionaryEnumeration()
{
foreach (var item in dictionary) { ; }
return dictionary;
}
[Benchmark]
public IDictionary<string, string> IDictionaryEnumeration()
{
foreach (var item in iDictionary) { ; }
return iDictionary;
}
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
Method | Mean | StdErr | StdDev | Gen 0 | Allocated |
----------------------- |----------- |---------- |---------- |------- |---------- |
DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B |
IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B |
// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2 Measurements are per 1k Operations
Abstractions - IDictionary v Dictionary
Dictionary<string, string> dictionary =
new Dictionary<string, string>();
IDictionary<string, string> iDictionary =
(IDictionary<string, string>)dictionary;
// struct – so doesn't allocate
Dictionary<string, string>.Enumerator enumerator =
dictionary.GetEnumerator();
// interface - allocates 56 B (64-bit) and 32 B (32-bit)
IEnumerator<KeyValuePair<string, string>> enumerator =
iDictionary.GetEnumerator();
Low-level increments
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
private double a, b, c, d;
[Benchmark(OperationsPerInvoke = 4)]
public void MethodA()
{
a++; b++; c++; d++;
}
[Benchmark(OperationsPerInvoke = 4)]
public void MethodB()
{
a++; a++; a++; a++;
}
}
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
MethodA = Parallel, MethodB() = Sequential
Low-level increments
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------- |------------- |---------- |--------- |---------- |---------- |---------- |
Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns |
Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns |
Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns |
Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns |
Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns |
Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns |
MethodA = Parallel, MethodB() = Sequential
http://en.wikipedia.org/wiki/Instruction-level_parallelism
Search - Linear v Binary
private static int LinearSearch(
Data[] set, int key)
{
for (int i = 0; i < set.Length; i++)
{
var c = set[i].Key - key;
if (c == 0)
{
return i;
}
if (c > 0)
{
return ~i;
}
}
return ~set.Length;
}
private static int BinarySearch(
Data[] set, int key)
{
int i = 0;
int up = set.Length - 1;
while (i <= up)
{
int mid = (up - i) / 2 + i;
int c = set[mid].Key - key;
if (c == 0)
{
return mid;
}
if (c < 0)
i = mid + 1;
else
up = mid - 1;
}
return ~i;
}
Search - Linear v Binary
private readonly Data[][] dataSet;
private Data[] currentSet;
private int currentMid;
private int currentMax;
[Params(1, 2, 3, 4, 5, 7, 10, 12, 15)]
public int Size
{
set
{
currentSet = dataSet[value];
currentMax = value - 1;
currentMid = value / 2;
}
}
Linear
Search
v
Binary
Search
Linear
Search
v
Binary
Search
readonly fields
public struct Int256
{
private readonly long bits0, bits1,
bits2, bits3;
public Int256(long bits0, long bits1,
long bits2, long bits3)
{
this.bits0 = bits0; this.bits1 = bits1;
this.bits2 = bits2; this.bits3 = bits3;
}
public long Bits0 { get { return bits0; } }
public long Bits1 { get { return bits1; } }
public long Bits2 { get { return bits2; } }
public long Bits3 { get { return bits3; } }
}
private readonly Int256 readOnlyField =
new Int256(1L, 5L, 10L, 100L);
private Int256 field =
new Int256(1L, 5L, 10L, 100L);
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Program
{
[Benchmark]
public long GetValue()
{
return field.Bits0 + field.Bits1 +
field.Bits2 + field.Bits3;
}
[Benchmark]
public long GetReadOnlyValue()
{
return readOnlyField.Bits0 +
readOnlyField.Bits1 +
readOnlyField.Bits2 +
readOnlyField.Bits3;
}
}
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns |
GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
readonly fields
BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1
Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8
Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0
LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0
RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0
Runtime=Clr Allocated=0 B
Method | Job | Jit | Platform | Mean | StdErr | StdDev |
----------------- |------------- |---------- |--------- |---------- |---------- |---------- |
GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns |
GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns |
GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns |
GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns |
GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns |
GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns |
https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
MOAR Benchmarks!!
Analysing Optimisations in the Wire Serialiser
• http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/
Optimising LINQ
• http://mattwarren.org/2016/09/29/Optimising-LINQ/
Why is reflection slow?
• http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/
Why Exceptions should be Exceptional
• http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/
Resources
QUESTIONS?

Weitere ähnliche Inhalte

Was ist angesagt?

DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database Forum
Doug Burns
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
emBO_Conference
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
emBO_Conference
 

Was ist angesagt? (20)

DTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database ForumDTrace - Miracle Scotland Database Forum
DTrace - Miracle Scotland Database Forum
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
pstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle databasepstack, truss etc to understand deeper issues in Oracle database
pstack, truss etc to understand deeper issues in Oracle database
 
Verification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed SystemsVerification of Concurrent and Distributed Systems
Verification of Concurrent and Distributed Systems
 
Demystifying cost based optimization
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimization
 
NetConf 2018 BPF Observability
NetConf 2018 BPF ObservabilityNetConf 2018 BPF Observability
NetConf 2018 BPF Observability
 
A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN A deep dive about VIP,HAIP, and SCAN
A deep dive about VIP,HAIP, and SCAN
 
Profiling Ruby
Profiling RubyProfiling Ruby
Profiling Ruby
 
Vertica trace
Vertica traceVertica trace
Vertica trace
 
Down to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap DumpsDown to Stack Traces, up from Heap Dumps
Down to Stack Traces, up from Heap Dumps
 
Px execution in rac
Px execution in racPx execution in rac
Px execution in rac
 
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
Optimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTESOptimizing Parallel Reduction in CUDA : NOTES
Optimizing Parallel Reduction in CUDA : NOTES
 
The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196The Ring programming language version 1.7 book - Part 12 of 196
The Ring programming language version 1.7 book - Part 12 of 196
 
Performance tuning a quick intoduction
Performance tuning   a quick intoductionPerformance tuning   a quick intoduction
Performance tuning a quick intoduction
 
The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189The Ring programming language version 1.6 book - Part 11 of 189
The Ring programming language version 1.6 book - Part 11 of 189
 
Device-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded SystemsDevice-specific Clang Tooling for Embedded Systems
Device-specific Clang Tooling for Embedded Systems
 

Ähnlich wie Where the wild things are - Benchmarking and Micro-Optimisations

Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
RonnBlack
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling ruby
nasirj
 
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QAFest
 

Ähnlich wie Where the wild things are - Benchmarking and Micro-Optimisations (20)

Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"Adam Sitnik "State of the .NET Performance"
Adam Sitnik "State of the .NET Performance"
 
State of the .Net Performance
State of the .Net PerformanceState of the .Net Performance
State of the .Net Performance
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Deep dumpster diving 2010
Deep dumpster diving 2010Deep dumpster diving 2010
Deep dumpster diving 2010
 
Profiling ruby
Profiling rubyProfiling ruby
Profiling ruby
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016The Road To Reactive with RxJava JEEConf 2016
The Road To Reactive with RxJava JEEConf 2016
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Limits Profiling
Limits ProfilingLimits Profiling
Limits Profiling
 
Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨Nodejs性能分析优化和分布式设计探讨
Nodejs性能分析优化和分布式设计探讨
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
 
Benchmarking and PHPBench
Benchmarking and PHPBenchBenchmarking and PHPBench
Benchmarking and PHPBench
 
QA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wantedQA Fest 2019. Антон Молдован. Load testing which you always wanted
QA Fest 2019. Антон Молдован. Load testing which you always wanted
 
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard WorldMonitoring Complex Systems: Keeping Your Head on Straight in a Hard World
Monitoring Complex Systems: Keeping Your Head on Straight in a Hard World
 
Code instrumentation
Code instrumentationCode instrumentation
Code instrumentation
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Where the wild things are - Benchmarking and Micro-Optimisations

  • 3. Premature Optimization “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.Yet we should not pass up our opportunities in that critical 3%.“ - Donald Knuth
  • 4. ProfilingTools • ANTS Performance Profiler - Redgate • dotTrace & dotMemory - Jet Brains • PerfView - Microsoft (free) • Visual Studio Profiling Tools (Ultimate, Premium or Professional) • MiniProfiler - Stack Overflow (free)
  • 5.
  • 6.
  • 7.
  • 9. Why do you need a benchmarking library?
  • 10. static void Profile(int iterations, Action action) { action(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { action(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 11. private static T Result; static void Profile<T>(int iterations, Func<T> func) { func(); // warm up GC.Collect(); // clean up var watch = new Stopwatch(); watch.Start(); for (int i = 0; i < iterations; i++) { Result = func(); } watch.Stop(); Console.WriteLine("Time Elapsed {0} ms", watch.ElapsedMilliseconds); } Benchmarking small code samples in C#, can this implementation be improved? http://stackoverflow.com/q/1047218/4500
  • 12. BenchmarkDotNet project Andrey Akinshin (the ‘Boss’) @andrey_akinshin http://aakinshin.net/en/blog/ Matt Warren (me) Adam Sitnik (.NET Core guru) @SitnikAdam http://adamsitnik.com/
  • 14. Goals of BenchmarkDotNet Benchmarking library that is: •Accurate •Easy-to-use •Helpful
  • 15. Benchmarking library that is: •Accurate •Easy-to-use •Helpful Stopwatch under the hood http://aakinshin.net/en/blog/dotnet/stopwatch/ LegacyJIT-x86 and first method call http://aakinshin.net/en/blog/dotnet/legacyjitx86-and-first-method-call/ Goals of BenchmarkDotNet
  • 17. What BenchmarkDotNet doesn’t do •Multi-threaded benchmarks •Integrate with C.I builds •Unit test runner integration •Anything else? http://github.com/dotnet/BenchmarkDotNet/issues/
  • 18. “Other Benchmarking tools are available” • NBench • https://github.com/petabridge/NBench • Microsoft Xunit performance • http://github.com/Microsoft/xunit-performance/ • Lambda Micro Benchmarking (“Clash of the Lambdas”) • https://github.com/biboudis/LambdaMicrobenchmarking • Etimo.Benchmarks • http://etimo.se/blog/etimo-benchmarks-lightweight-net-benchmark-tool/ • MeasureIt • https://blogs.msdn.microsoft.com/vancem/2009/02/06/measureit-update-tool-for- doing-microbenchmarks-for-net/
  • 19. How it works An invocation of the target method is an operation. A bunch of operations is an iteration. Iteration types: • Pilot:The best operation count will be chosen. • IdleWarmup, IdleTarget: BenchmarkDotNet overhead will be evaluated. • MainWarmup:Warmup of the main method. • MainTarget: Main measurements. • Result = MainTarget – AverageOverhead http://benchmarkdotnet.org/HowItWorks.htm
  • 20. What happens under the covers? Image credit Albert Rodríguez @UncleFirefox
  • 22. Scale of benchmarks •millisecond - ms • One thousandth of one second, single webapp request •microsecond - us or µs • One millionth of one second, several in-memory operations •nanosecond - ns • One billionth of one second, single operations
  • 23. Who ‘times’ the timers? [Benchmark] public long StopwatchLatency() { return Stopwatch.GetTimestamp(); } [Benchmark] public long StopwatchGranularity() { // Loop until Stopwatch.GetTimestamp() // gives us a different value long lastTimestamp = Stopwatch.GetTimestamp(); while (Stopwatch.GetTimestamp() == lastTimestamp) { } return lastTimestamp; } [Benchmark] public long DateTimeLatency() { return DateTime.Now.Ticks; } [Benchmark] public long DateTimeGranularity() { // Loop until DateTime.Now // gives us a different value long lastTimestamp = DateTime.Now.Ticks; while (DateTime.Now.Ticks == lastTimestamp) { } return lastTimestamp; }
  • 24. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | ?? ns | ?? ns | ?? B | StopwatchGranularity | ?? ns | ?? ns | ?? B | DateTimeLatency | ?? ns | ?? ns | ?? B | DateTimeGranularity | ?? ns | ?? ns | ?? B | Who ‘times’ the timers?
  • 25. BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Job-FIDMNL : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Allocated | --------------------- |---------------- |------------ |---------- | StopwatchLatency | 12.9960 ns | 0.1609 ns | 0 B | StopwatchGranularity | 374.3049 ns | 2.4388 ns | 0 B | DateTimeLatency | 682.2320 ns | 8.9341 ns | 32 B | DateTimeGranularity | 996,025.6492 ns | 413.9175 ns | 47.34 kB | Who ‘times’ the timers?
  • 26. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” [Benchmark(Baseline = true)] public int ForLoopArray() { var counter = 0; for (int i = 0; i < anArray.Length; i++) counter += anArray[i]; return counter; } [Benchmark] public int ForEachArray() { var counter = 0; foreach (var i in anArray) counter += i; return counter; } [Benchmark] public int ForLoopList() { var counter = 0; for (int i = 0; i < aList.Count; i++) counter += aList[i]; return counter; } [Benchmark] public int ForEachList() { var counter = 0; foreach (var i in aList) counter += i; return counter; }
  • 27. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | ?? ns | | ?? | | ForEachArray | ?? ns | | ?? | | ForLoopList | ?? ns | | ?? | | ForEachList | ?? ns | | ?? | |
  • 28. Loop-the-Loop ”Avoid foreach loop on everything except raw arrays?” BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdDev | Scaled | Scaled-StdDev | --------------- |-------------- |------------ |------- |-------------- | ForLoopArray | 383.8279 ns | 2.9472 ns | 1.00 | 0.00 | ForEachArray | 392.5611 ns | 4.1286 ns | 1.02 | 0.01 | ForLoopList | 2,315.9658 ns | 12.1001 ns | 6.03 | 0.05 | ForEachList | 2,663.5771 ns | 21.9822 ns | 6.94 | 0.08 |
  • 29. Loop-the-Loop – ‘for loop’ - Arrays
  • 30. Loop-the-Loop – ‘for loop’ - Lists
  • 31. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; [Benchmark] public Dictionary<string, string> DictionaryEnumeration() { foreach (var item in dictionary) { ; } return dictionary; } [Benchmark] public IDictionary<string, string> IDictionaryEnumeration() { foreach (var item in iDictionary) { ; } return iDictionary; }
  • 32. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | IDictionaryEnumeration | ?? ns | ?? ns | ?? ns | ?? | ?? B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 33. Abstractions - IDictionary v Dictionary BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 DefaultJob : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 Method | Mean | StdErr | StdDev | Gen 0 | Allocated | ----------------------- |----------- |---------- |---------- |------- |---------- | DictionaryEnumeration | 24.0353 ns | 0.2403 ns | 0.9307 ns | - | 0 B | IDictionaryEnumeration | 41.6301 ns | 0.4479 ns | 2.1944 ns | 0.0086 | 32 B | // * Diagnostic Output - MemoryDiagnoser * Note: the Gen 0/1/2 Measurements are per 1k Operations
  • 34. Abstractions - IDictionary v Dictionary Dictionary<string, string> dictionary = new Dictionary<string, string>(); IDictionary<string, string> iDictionary = (IDictionary<string, string>)dictionary; // struct – so doesn't allocate Dictionary<string, string>.Enumerator enumerator = dictionary.GetEnumerator(); // interface - allocates 56 B (64-bit) and 32 B (32-bit) IEnumerator<KeyValuePair<string, string>> enumerator = iDictionary.GetEnumerator();
  • 35. Low-level increments [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { private double a, b, c, d; [Benchmark(OperationsPerInvoke = 4)] public void MethodA() { a++; b++; c++; d++; } [Benchmark(OperationsPerInvoke = 4)] public void MethodB() { a++; a++; a++; a++; } }
  • 36. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | Parallel | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Sequential | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | Parallel | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | Sequential | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | MethodA = Parallel, MethodB() = Sequential
  • 37. Low-level increments BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------- |------------- |---------- |--------- |---------- |---------- |---------- | Parallel | LegacyJitX64 | LegacyJit | X64 | 0.3420 ns | 0.0015 ns | 0.0057 ns | Sequential | LegacyJitX64 | LegacyJit | X64 | 2.2038 ns | 0.0014 ns | 0.0051 ns | Parallel | LegacyJitX86 | LegacyJit | X86 | 0.3276 ns | 0.0005 ns | 0.0020 ns | Sequential | LegacyJitX86 | LegacyJit | X86 | 2.5229 ns | 0.0048 ns | 0.0187 ns | Parallel | RyuJitX64 | RyuJit | X64 | 0.3686 ns | 0.0037 ns | 0.0144 ns | Sequential | RyuJitX64 | RyuJit | X64 | 0.8959 ns | 0.0023 ns | 0.0090 ns | MethodA = Parallel, MethodB() = Sequential http://en.wikipedia.org/wiki/Instruction-level_parallelism
  • 38. Search - Linear v Binary private static int LinearSearch( Data[] set, int key) { for (int i = 0; i < set.Length; i++) { var c = set[i].Key - key; if (c == 0) { return i; } if (c > 0) { return ~i; } } return ~set.Length; } private static int BinarySearch( Data[] set, int key) { int i = 0; int up = set.Length - 1; while (i <= up) { int mid = (up - i) / 2 + i; int c = set[mid].Key - key; if (c == 0) { return mid; } if (c < 0) i = mid + 1; else up = mid - 1; } return ~i; }
  • 39. Search - Linear v Binary private readonly Data[][] dataSet; private Data[] currentSet; private int currentMid; private int currentMax; [Params(1, 2, 3, 4, 5, 7, 10, 12, 15)] public int Size { set { currentSet = dataSet[value]; currentMax = value - 1; currentMid = value / 2; } }
  • 42. readonly fields public struct Int256 { private readonly long bits0, bits1, bits2, bits3; public Int256(long bits0, long bits1, long bits2, long bits3) { this.bits0 = bits0; this.bits1 = bits1; this.bits2 = bits2; this.bits3 = bits3; } public long Bits0 { get { return bits0; } } public long Bits1 { get { return bits1; } } public long Bits2 { get { return bits2; } } public long Bits3 { get { return bits3; } } } private readonly Int256 readOnlyField = new Int256(1L, 5L, 10L, 100L); private Int256 field = new Int256(1L, 5L, 10L, 100L); [LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job] public class Program { [Benchmark] public long GetValue() { return field.Bits0 + field.Bits1 + field.Bits2 + field.Bits3; } [Benchmark] public long GetReadOnlyValue() { return readOnlyField.Bits0 + readOnlyField.Bits1 + readOnlyField.Bits2 + readOnlyField.Bits3; } }
  • 43. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | ?? ns | ?? ns | ?? ns | GetValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | ?? ns | ?? ns | ?? ns | GetValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | ?? ns | ?? ns | ?? ns |
  • 44. readonly fields BenchmarkDotNet=v0.10.1, OS=Microsoft Windows NT 6.1.7601 Service Pack 1 Processor=Intel(R) Core(TM) i7-4800MQ CPU 2.70GHz, ProcessorCount=8 Frequency=2630673 Hz, Resolution=380.1309 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 LegacyJitX64 : Clr 4.0.30319.42000, 64bit LegacyJIT/clrjit-v4.6.1590.0;compatjit-v4.6.1590.0 LegacyJitX86 : Clr 4.0.30319.42000, 32bit LegacyJIT-v4.6.1590.0 RyuJitX64 : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1590.0 Runtime=Clr Allocated=0 B Method | Job | Jit | Platform | Mean | StdErr | StdDev | ----------------- |------------- |---------- |--------- |---------- |---------- |---------- | GetValue | LegacyJitX64 | LegacyJit | X64 | 0.7893 ns | 0.0078 ns | 0.0291 ns | GetReadOnlyValue | LegacyJitX64 | LegacyJit | X64 | 9.5362 ns | 0.0251 ns | 0.0971 ns | GetValue | LegacyJitX86 | LegacyJit | X86 | 1.4625 ns | 0.0506 ns | 0.1959 ns | GetReadOnlyValue | LegacyJitX86 | LegacyJit | X86 | 1.9743 ns | 0.0641 ns | 0.2481 ns | GetValue | RyuJitX64 | RyuJit | X64 | 0.3852 ns | 0.0183 ns | 0.0710 ns | GetReadOnlyValue | RyuJitX64 | RyuJit | X64 | 9.6406 ns | 0.0803 ns | 0.3109 ns | https://codeblog.jonskeet.uk/2014/07/16/micro-optimization-the-surprising-inefficiency-of-readonly-fields/
  • 45. MOAR Benchmarks!! Analysing Optimisations in the Wire Serialiser • http://mattwarren.org/2016/08/23/Analysing-Optimisations-in-the-Wire-Serialiser/ Optimising LINQ • http://mattwarren.org/2016/09/29/Optimising-LINQ/ Why is reflection slow? • http://mattwarren.org/2016/12/14/Why-is-Reflection-slow/ Why Exceptions should be Exceptional • http://mattwarren.org/2016/12/20/Why-Exceptions-should-be-Exceptional/

Hinweis der Redaktion

  1. This is what we aim for This is why we wanted to build BenchmarkDotNet
  2. Aim is to make a framework that can accurately measure milli, micro and nano benchmarks. But in reality the main use-cases are probably milli/micro benchmarks, so these must work above all else!!  (Even lower down!!!) picoseconds: 1…1000 ps One trillionth of one second, pipelining
  3. Avoid foreach loop on everything except raw arrays