Hs java open_party

Highly Scalable Java Programming for Multi-Core System Zhi Gan (ganzhi@gmail.com) IBM China Development Lab Next Generation Systems

Agenda ,[object Object],[object Object],[object Object],[object Object]

Continuing evolution of multicore Nehalem EX POWER 7 UltraSPARC T2 Varying trade-offs between thread speed & throughput Varying assumptions about memory footprint and working sets Max cores per chip 8 8 8 Max threads per core 2 4 8 Last level on-chip cache 24MB 32MB 4MB Memory controllers per chip 2 2 4 Max chips per system 8 32 4 Max system size (threads) 128 1,024 256

Patterson’s view of shifts in computer architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Source: David Patterson, “Future of Computer Architecture”, February 2006

NUMA is the new normal L3 cache L3 cache L3 cache L3 cache L1 & L2 Caches Ex Units Highest affinity between threads on a core Next highest affinity between cores on a chip Affinity between a chip and locally attached DRAM IBM Power 750 POWER 7 32 cores, 128 threads Note: Memory systems on all major platforms have similar hierarchical structure DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN DIMM DIMM SN

Balancing I/O and Server Capacity Ultra-dense DRAM (MAX5) Parallel disk array Very high speed random r/w Highest cost & power Limited capacity High speed sequential r/w Lowest cost per GB Virtually unlimited capacity (PBs) High speed random reads Lowest cost per IOPS High capacity (TBs) Enterprise NAND Flash

Software challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

The 1st Step: Profiling Parallel Application

Important Profiling Tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Java Lock Monitor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Multi-core SDK Dead Lock View Synchronization View

Best Practice for High Scalable Java Programming

What Is Lock Contention? From JLM tool website

Lock Operation Itself Is Expensive ,[object Object],[object Object]

Reduce Locking Scope ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],25%

Results from JLM report Reduced AVER_HTM

Lock Splitting ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],64%

Result from JLM report Reduced lock tries

Lock Striping ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],66%

Result from JLM report More locks with less AVER_HTM

Split Hot Points : Scalable Counter ,[object Object],[object Object]

Alternatives of Exclusive Lock ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example of AtomicLongArray ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],96%

Using Concurrent Container ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Using Immutable and Thread Local data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Reduce Memory Allocation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Rocket Science: Lock-Free Programming

Using Lock-Free/Wait-Free Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object]

Why Lock-Free Often Means Better Scalability? (I) Lock:All threads wait for one Lock free: No wait, but only one can succeed, Other threads need retry

Why Lock-Free Often Means Better Scalability? (II) Lock:All threads wait for one Lock free: No wait, but only one can succeed, Other threads often need to retry X X

Performance of A Lock-Free Stack Picture from: http:// www.infoq.com /articles/scalable-java-components

Performance of A Lock-Free HashMap Picture from: A Fast Lock-Free Hash Table by Cliff Click

References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hs java open_party

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Hs java open_party

Ähnlich wie Hs java open_party (20)

Mehr von Open Party

Mehr von Open Party (17)

Hs java open_party

Hinweis der Redaktion