This document discusses parallel computing technologies in Windows Server 2008 R2. It begins by explaining why parallel computing is needed due to limitations of increasing CPU speed alone. It then outlines the Windows Server 2008 R2 parallel computing platform, which uses multiple cores and supports 64-bit processing and virtualization. It also discusses programming models, tools, and techniques for developing parallel applications in Visual Studio 2010, including task parallelism, data parallelism, and debugging and profiling parallel apps.
1. Windows® Server 2008 R2: Develop With New Parallel Computing Technologies Clint Edmonson Architect Evangelist clinted@microsoft.com
2.
3. Moore’s LawOur Old Friend Gordon Moore From http://www.intel.com/technology/mooreslaw/ “The number of transistors on a chip will double about every 2 years” Typically manifested as clock-speed increases We mistakenly associate Moore’s Law with CPU speed
4. Why Can’t We Continue to Scale Up?Heat! Increased clock speed == increases power usage Increased power usage == increased heat output Image courtesy of http://www.phys.ncku.edu.tw/~htsu/humor/fry_egg.html
5. Solution: Scale OutMulti-core / Many-core Multi-core Two or more independent cores (or CPUs) Homogenous architecture May be integrated onto a single integrated circuit die May be integrated onto multiple dies in a single chip package Many-core Number of processors exceeds traditional multi-processor techniques Heterogeneous architecture All cores share resource and services like memory and disk access
7. How to Program for Parallel ComputingThe “Many-core Shift” Platform must manage resources effectively Programs must be written differently More than just managing threads Applications must scale up or down
10. Windows Server 2008 R2Platform Hardware Trends Power Efficiency Multicore & NUMA 64-bit Virtualization
11. Helpful Terminology Logical processor (a.k.a. thread execution engine) Core A processing unit With hyper-threading, it can consists of 2 or 4 logical processors Socket(a.k.a. processor, package, CPU) Physical processor Consists of one or more cores NUMA node Set of logical processors and cache that are close to one another Group 1 or more NUMA nodes Set of up to 64 processors
12. Windows Organizes Many-Cores via GroupNew with Windows 7 and Windows Server 2008 R2 Group NUMA = Non-Uniform Memory Access LP = Logical Processor NUMA Node Socket Socket Core Core LP LP LP LP Core Core NUMA Node
18. 3 Ways to Express Parallelism in Your Apps Imperative Task Parallelism (fine grained) Imperative Data Parallelism (structured) Declarative Data Parallelism (PLINQ)
19. Imperative Task Parallelism Fine-grained parallelism Express potential parallelism via expressions and statements that take the form of lightweight tasks You have fine-grained control Semantic is similar to how threads and the threadpool work today
20. Imperative Data Parallelism Structured parallelism Mechanisms used to express common imperative data-oriented operations For loops For each loop Invoke Think in terms of blocks of code Parallelize loops
21. Declarative Data Parallelism PLINQ Implementation of LINQ-to-objects that execute queries in parallel Express what you want to accomplish, rather than how you want to accomplish it Minimal impact to existing queries
22. Visual Studio 2010 (C++ Developer)Tools, Programming Models, Runtimes Tools Programming Models Agents Library Parallel PatternLibrary Parallel Debugger Tool Windows Data Structures C++ Concurrency Runtime Parallel Profiler Analysis Task Scheduler Resource Manager Operating System Threads Win7/R2: UMS Threads Native Library Tools
26. Parallel Performance AnalyzerCore Utilization View Profiling tools support multi-core/parallel execution Identify parallelism opportunities Enable performance tuning for parallel apps Improving the productivity of parallel development and performance tuning Integrated with the IDE Providing better visualizations Showing temporal relationships Illustrating Interactions with OS, libraries and I/O Exposing causes of inefficiency Providing actionable data by linking behavior to source code whenever possible Analysis Views Core utilization and concurrency Thread blocking Cross-core thread migration Platforms Windows Vista ®, Server 2008, and Windows 7 32 and 64-bit Native and managed environments Core Utilization / Concurrency View Other processes Number of cores Idle time Your process
27. Parallel Performance AnalyzerThread Blocking View Measure time for interesting segments Hide uninteresting threads Zoom in and out Detailed thread analysis (one channel per thread) Legend Thread execution breakdown
28. Parallel Performance AnalyzerCore Execution / Thread Migration View Each core in a swim lane One color per thread This thread migrates across all four cores Red indicates cross-core migrations
29. SummaryCall-to-Action Start learning to think in parallel Consider how your solution will scale on multi-core systems Utilize the parallel programming platform and tools to maximize your application scalability
Multi-core and many-core systems In general, a “multi-core” chip refers to eight or less homogeneous cores in one microprocessor package, whereas a “many-core” chip has more than eight possibly heterogeneous cores in one microprocessor package. In a many-core system, all cores share the resources and services, including memory and disk access, provided by the Operating System. Microsoft and industry partners anticipate the advent of affordable general-purpose “many-core” systems in a few years.
Herb Sutter, Microsoft Software Architect and C++ Standards Committee Chairman, made this statement in a recent Dr. Dobb’s Journal article: http://www.ddj.com/architect/208200273)What Herb meant is that developer can no longer count upon processor speed improvements (i.e. Moore’s Law) to effectively speed-up their serial applications. Developers must now deal with Parallel Computing in order to achieve application scaling.This talk is about how Windows Server 2008 R2 and complementary tools within Visual Studio 2010 dramatically improve the situation for developers new to Parallel Computing concepts.
Many-core systems hold the promise of delivering 10 to100 times the processing power in the next few years. Developing applications that harness the full power of many-core systems is difficult and requires software developers to transition from writing serial programs to writing parallel programs. Applications must scale up or down according to the capabilities of the system and must adapt to changing resources and power availability.
Let’s start with the Platform.
A primary platform technology enhancement is Scalability. R2 is the first Windows release to run on more than 64 processors. No previous Windows Operating System could run on greater than 64 processors at once.Here you see a snapshot of Task Manager from a system with 256 processors. This is the biggest system we could find to test with. But, systems of this size are going to be commodity in the near future. R2 is the platform of choice when that hardware trend comes to fruition.What this means for developers is that you’ll have a great Parallel Computing Platform upon which to build new and powerful solutions. Our second session today addresses this topic in much more detail.And yes, you can tell the difference in performance… even on systems with much less resources. R2 is very fast.
The primary motivating factor for general-purpose parallel computing is that commodity computing hardware has changed. Multi-core systems are now available at commodity prices and the trend is that even more ‘many-core‘ systems will be available in the near future.<click>The Operating System is now tasked with much greater system resource management requirements.And Programmers are now tasked with making the most of multi-core Systems.