Just like you can't defeat the laws of physics there are natural laws that ultimately decide software performance. Even the latest technology beta is still bound by Newton's laws, and you can't change the speed of light, even in the cloud!
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Natural Laws of Software Performance
1. Natural Laws of Software Performance The changing face of performance optimization
2. Who Am I? Kendall Miller One of the Founders of Gibraltar Software Small Independent Software Vendor Founded in 2008 Developers of VistaDB and Gibraltar Engineers, not Sales People Enterprise Systems Architect & Developer since 1995 BSE in Computer Engineering, University of Illinois Urbana-Champaign (UIUC) Twitter: @KendallMiller
3. Traditional Performance Optimization Run suspect use cases and find hotspots Very Linear Finds unexpected framework performance issues Final Polishing Step
4.
5. Algorithms and Asymptotics Asymptotic (or ‘Big Oh’) Notation Describes the growth rate of functions Answers the question… Does execution time of A grow faster or slower than B? The rules of asymptotic notation say A term of n^3 will tend to dominate a term of n^2 Therefore We can discount coefficients and lower order terms And so f(n) = 6n^2 + 2n + 3 + n^3 Can be expressed as O(n) = n^3
16. Before you Leap into Optimizing… Algorithms are your first step Cores are a constant multiplier, algorithms provide exponential effect Everything we talk about today is ignored in O(n) Parallel processing on cores can get you a quick boost trading cost for modest boost Other tricks can get you more (and get more out of parallel)
17. Fork / Join Parallel Processing Split a problem into a number of independent problems Process each partition independently (potentially in parallel) Merge the results back together to get the final outcome (if necessary)
18. Fork / Join Examples Multicore Processors Server Farm Web Server Original Http servers literally forked a process for each request
19. Fork / Join in .NET System.Threading.ThreadPool Parallel.ForEach PLINQ Parallel.Invoke
21. Fork / Join Usage Tasks that can be broken into “large enough” chunks that are independent of each other Little shared state required to process Tasks with a low Join cost
22. Pipelines Partition a task based on stages of processing instead of data for processing Each stage of the pipeline processes independently (and typically concurrently) Stages are typically connected by queues Producer (prev stage) & Consumer (next stage)
23. Pipeline Examples Order Entry & Order Processing Classic Microprocessor Design Break the instruction processing into stages and process one stage per clock cycle GPU Design Combines Fork/Join with Pipeline
24. Pipeline Examples in .NET Not the ASP.NET processing Pipeline No parallelism/multithreading/queueing Stream Processing Map Reduce BlockingCollection<T> Gibraltar Agent
25. Pipeline Usage Significant shared state between data elements prevents decoupling them Linear processing requirements within parts of the workflow
26. Speculative Processing Isn’t there something you could be doing? Do the work now when you can, throw the results away if they aren’t useful
28. Speculative Processing Usage Shift work from a future, performance critical operation to an earlier one. Either always valid (never has to be rolled back) or easy to roll back
29. Latency – The Silent Killer The time for the first bit to get from here to there Typical LAN: 0.4ms
30. It’s the Law Speed of Light: 3x10^8 M/S About 0.240 seconds to Geosynchronous orbit and back About 1 foot per nanosecond 3GHz : 1/3rd ns period = 4 inches
31. 5500 KM 18 ms TCP Socket Establish: 54ms London NewYork
32.
33. Caching Save results of earlier work nearby where they are handy to use again later Cheat: Don’t make the call Cheat More: Apply in front of anything that’s time consuming
34. Why Caching? Apps ask a lot of repeating questions. Stateless applications even more so Answers don’t change often Authoritative information is expensive Loading the world is impractical
35. Caching in Hardware Processor L1 Cache (typically same core) Processor L2 (shared by cores) Processor L3 (between proc & main RAM) Disk Controllers Disk Drives …
37. Go Asynchronous Delegate the latency to something that will notify you when it’s complete Do other useful stuff while waiting. Otherwise you’re just being efficient, not faster Maximize throughput by scheduling more work than can be done if there weren’t stalls
38. .NET Async Examples Standard Async IO Pattern .NET 4 Task<T> Combine with Queuing to maximize throughput even without parallelization
39. Visual Studio Async CTP async methods will compile to run asynchronously await forces method to stall execution until the async call completes before proceeding
40. Batching Get your money’s worth out of every latency hit Tradeoff storage for duration
43. Optimistic Messaging Assume it’s all going to work out and just keep sending Be ready to step back & go another way when it doesn’t work out
44. Side Points Stateful interaction general increases the cost of latency Minimize Copying It takes blocking time to copy data, introducing latency Your Mileage May Vary Latency on a LAN can be dramatically affected by hardware and configuration
45. Critical Lessons Learned Algorithms, Algorithms, Algorithms Plan for Latency & Failure Explicitly Design for Parallelism