SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Java at Scale:
Performance & GC
Presented to Dallas JUG
October 2013
Matt Schuetze
Product Manager
Where is Java Working?
• On the server
─ Enterprise applications: business rules
─ Monolithic & distributed computing

• On the client
─ Fat client computing
─ Thin client, browser-based

• Embedded
─ Android apps

© 2013 Azul Systems

2
What is Java’s Appeal?
• Portable
─ Write once, run anywhere (after testing everywhere)

• Productive
─ No bad features: no multiple inheritance, operator overloading
─ Do the Right Thing philosophy (vs. C++ Do the Efficient Thing)
─ Memory management reduces opportunities for error

• Efficient
─ Interpreter → JIT compilation → Dynamic recompilation

• Generic
─ Scala, Clojure, JRuby & more use Java runtime
─ Byte code is the new target architecture (ANDF)

• Scalable
─ Small to large platforms
© 2013 Azul Systems

3
Parkinson’s Law Applied to Software
• Hardware grows with Moore’s Law
─ Transistor counts double roughly every 18 months
─ Memory size grows around 100x every 10 years

• Application sizes grow with hardware
─
─
─
─
─

1980: 100 KB data on ¼ – ½ MB server
1990: 10 MB data on 16 – 32 MB server
2000:
1 GB data on 2 – 4 GB server
2010: 100 GB data on 256 GB server
(In-memory data size. Bigger data is cached or distributed.)

© 2013 Azul Systems

4
Big Memory Servers are the Standard
• Retail prices, major web server store (US $, Jan 2013)
• Cheap (< $1/GB/Month), and roughly linear to ~1TB
• 10s to 100s of GB/sec of memory bandwidth
─
─
─
─
─

© 2013 Azul Systems

24 vCore,
24 vCore,
32 vCore,
48 vCore,
64 vCore,

128 GB server
256 GB server
384 GB server
512 GB server
1 TB server

$5K
$8K
$14K
$19K
$36K

5
Has Java Kept Up? How Scalable is it?
• How big is your Java heap?
˃ .5 GB
˃ 1 GB
˃ 2 GB
˃ 4 GB
˃ 10 GB
˃ 20 GB
˃ 50 GB
˃ 100 GB
• Hardly anyone runs over 4 GB

© 2013 Azul Systems

6
Large Heaps are a Rarity
• Survey of heap sizes for Plumbr memory leak detector

─ Source: http://plumbr.eu/blog/most-popular-memory-configurations

© 2013 Azul Systems

7
Why So Few Big JVMs on Big Servers?
• Java performance gets worse with heap size

ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server

─ Pause frequency varies with application activity
─ Pause duration varies with amount to scan/copy
© 2013 Azul Systems

8
Think in Terms of Service Levels
• What are requirements (percentiles & worst case)?

─ Need to think beyond averages & standard deviations
─ GC pauses don’t fit a bell curve
© 2013 Azul Systems

9
A Classic Look at Application Response
• Key assumption: response time is a function of load

─
© 2013 Azul Systems

source: IBM CICS server documentation, “understanding response times”
10
Java Response Has a Different Look
• Pauses may track with load, but not in as obvious a way

─
© 2013 Azul Systems

source: ZOHO QEngine White Paper: performance testing report analysis
11
A Few Realities About GC
• First the good:
─ GC is very efficient, much better than malloc()
─ Dead objects cost nothing to collect
─ GC will find all the dead objects without help, even cyclic graphs

• Now the bad:
─ GC really does stop for ~1 second per GB of live objects
─ You can change when it happens, not if*
─ You can still have memory leaks
─ Hold on to objects so GC can’t release them
─ No pauses in a 20 minute test doesn’t mean they’re gone
─ “You can pay me now, or you can pay me later.”
* We’ll talk about that later…

© 2013 Azul Systems

12
How Does a Garbage Collector Work?
• Three phases to GC:
─
─
─

Identify the live objects
─ Start with stack & statics, flag everything we reach
Reclaim resources held by dead objects
─ Anything we didn’t flag in the 1st phase
Periodically relocate live objects (defrag)
─ Move objects together, correct references (remap)

Free

© 2013 Azul Systems

13
How Does a Garbage Collector Work?
• Three phases to GC:
─
─
─

Identify the live objects
─ Start with stack & statics, flag everything we reach
Reclaim resources held by dead objects
─ Anything we didn’t flag in the 1st phase
Periodically relocate live objects (defrag)
─ Move objects together, correct references (remap)

• Sample implementations:
─ Mark/sweep/compact for old generation
─ Three separate passes, minimal extra heap
─ Copying collector for new generation
─ Move as we flag, do it all in one pass
─ Requires 2x heap
© 2013 Azul Systems

14
Generational GC
Basic assumption: most objects die young

• Use copying collector on new objects
─ Scan small % of heap, need small space for copy area
─ Reclaim the most space for the least effort
─ Move objects that live long enough to old generation(s)

• Collect old gen as it fills up
─ Much less frequent, likely higher cost, lower benefit

• Requires a Remembered Set (e.g. via Card Marking)
─ Track references from outside into new gen
─ Use as roots for new gen collector scan

• Don’t absolutely need 2x memory for new gen GC
─ Can overflow into old gen space
© 2013 Azul Systems

15
GC Terminology
• Concurrent vs. Parallel
─ A concurrent collector does GC while the application runs
─ A parallel collector uses multiple CPU cores to perform GC
─ A collector may be neither, one, or both

• Concurrent vs. Stop-The-World
─ A STW collector pauses the application during part of GC
─ A STW collector is not concurrent; it may be parallel

• Incremental
─ An incremental collector does its work in discrete chunks
─ Probably STW, with big gaps between increments

© 2013 Azul Systems

16
GC Terminology 2
• Precise vs. Conservative
─ A conservative collector doesn’t know every object reference or

doesn’t know if some values are references or not
─ Can’t relocate objects if it can’t tell a ref from a value
─ A precise collector knows & can process every reference
─ Required to move objects
─ Compiler provides semantic information for the collector
─ Java relies on precise collection

• Safepoints
─ Places in execution (point or range) where collector can identify

every reference in a thread’s execution stack
─ We bring a thread to a safepoint and keep it there during GC
─ Might mean pausing the thread, might not (e.g. JNI)
─ Safepoints need to be reached frequently
─ Global safepoints apply to all threads (STW)
© 2013 Azul Systems

17
Typical GC Combinations
• New generation
─ Usually a copying collector
─ Usually monolithic, stop-the-world

• Old generation
─ Usually Mark/Sweep/Compact
─ May be stop-the-world, or concurrent, or mostly concurrent, or

incremental stop-the-world, or mostly incremental stop-the-world

• Mostly means not always
─ Fall back to monolithic stop-the-world (i.e. big pauses)

© 2013 Azul Systems

18
The Good Little Architect – A Moral Tale
A good architect must be able to impose her architectural
choices on her projects

• Once upon a time, Azul met an app with 18 sec pauses
─ App had 10s of millions of object finalizations every GC cycle
─ Back then, reference processing was a stop-the-world event

• Every class in the project had a finalizer
─ All the finalizers did was null every reference field
─ In theory, saves the GC from following pointers
─ Right for C++ reference counting, oh so wrong for Java

• Two morals:
─ Know the cost of your actions (learn the underlying system)
─ Just because it doesn’t cost now doesn’t mean it won’t later
© 2013 Azul Systems

19
Oracle HotSpot GC Options
• Parallel GC
─ New Gen: monolithic STW copying
─ Old Gen: monolithic STW mark/sweep/compact

• Concurrent Mark Sweep (CMS)
─ New Gen: monolithic STW copying
─ Old Gen: mostly concurrent non-compacting
─ Mostly concurrent marking (multipass)
─ Concurrent sweeping
─ No compaction: free list, no object movement
─ Fallback is monolithic STW mark/sweep/compact

© 2013 Azul Systems

20
Oracle HotSpot GC Options 2
• Garbage First (G1GC)
─ New Gen: monolithic STW copying
─ Old Gen:
─ Mostly concurrent marker
─ STW to catch up on mutations, reference processing
─ Track inter-region relationships in remembered sets
─ STW mostly incremental compactor
─ Compact regions that can be done in limited time
─ Delay compaction of popular objects & regions
─ Goal: “avoid, as much as possible, having a full GC”
─ Fallback is monolithic STW mark/sweep/compact
─ Required for compacting popular objects & regions

© 2013 Azul Systems

21
Where Do Pauses Matter?
• Interactive apps like ecommerce
─ Add many seconds to a transaction & maybe lose a customer
─ Batch apps care about start-to-finish time, not transactions

• Big data apps
─ Travel site wants to keep hotel inventory in memory
─ Search app wants to keep entire index in memory

• Efficiency & management
─ More work from fewer JVM instances

• Low latency apps
─ Financial apps process data as it arrives
─ Small number of msecs down to < 1 msec
─ Requires low latency OS & significant tuning
© 2013 Azul Systems

22
Characterizing GC Pauses
• Frequency relates to activity
─ Object creation rate
─ Object mutation rate

• Severity relates to memory size
─ The more we examine & copy, the longer it takes
─ New gen is usually not the problem (yet)

• Not how much GC overhead, but where it happens

© 2013 Azul Systems

23
Limits to GC Overhead
• Worst case: no empty memory = 100% GC
─ GC runs hard all the time, reclaiming nothing

• Best case: infinite empty memory = 0% GC
─ Just keep creating objects, never collecting

• In between, GC follows 1/x curve as memory grows
CPU
100%

0%
Live set
© 2013 Azul Systems

Heap size
24
How to Measure Pauses
• Identify the magnitude of the problem
─ jHiccup: free software from Azul’s CTO (jhiccup.com)
─ Does minimal work & records time to complete
─ Long delays indicate JVM wasn’t letting apps run
─ Run against your application
─ Results should map well to GC logs
─ Results will not include app inefficiencies
─ Run against idle JVM
─ Identify pauses from OS, VM, power management

• Don’t fix problems until you know where they lie

© 2013 Azul Systems

25
What To Do About Pauses
• Apply creative language (the Marketing solution)
─ “Guarantee a worst case of X msec, 99% of the time”
─ “Mostly concurrent, mostly incremental”
─ i.e. “Will at times exhibit long monolithic STW pauses”
─ “Fairly consistent”
─ i.e. “Will sometimes show results well outside this range”
─ “Typical pauses in the tens of milliseconds”
─ i.e. “Some pauses are a lot longer than that”

© 2013 Azul Systems

26
What To Do About Pauses
• Tune like crazy
─ Adjust GC parameters until behavior’s acceptable
─ A stopgap, not a solution

• Keep the heap small
─ Multiple small instances instead of fewer bigger ones
─ Move data out of heap (e.g. external cache)
─ Pool your objects (e.g. threads, DB connections)

• Commit ritual murder
─ Big heap, kill & restart instance before old gen GC
─ Yes, people really do this

• Change your GC
─ Move from one that rarely stalls to one that never stalls
© 2013 Azul Systems

27
Making JVM Pauseless: The Hard Parts
• Robust concurrent marking
─ References keep changing
─ Multipass marking is sensitive to mutation rate
─ Weak, Soft, Final references hard to deal with

• Concurrent compaction
─ Moving the objects isn’t the problem
─ It’s fixing all the references to the moved objects
─ How do you handle an app looking at a stale reference?
─ If you can’t, remapping is a monolithic STW operation

• New gen collection at scale
─ New gen is generally monolithic STW
─ Pauses are small because heaps are tiny
─ A 100 GB heap means new gen GC has a lot of work
© 2013 Azul Systems

28
Azul’s Zing JVM
• High performance production JVM
─ 64-bit Linux on X86
─ Red Hat, SuSE, Ubuntu, CentOS
─ Maximum heap size: 512 GB
─ Elastic memory to prevent out-of-memory failures
─ Overdraft protection for your JVM

• Always-on performance & execution monitoring
─ System level
─ JVM level
─ Application level

© 2013 Azul Systems

29
Azul’s C4 Collector
• Concurrent guaranteed-single-pass marker
─ Unaffected by mutation rate
─ Concurrent reference processing (weak, soft, final)

• Concurrent compactor
─ Moves objects without pausing your application
─ Remaps references without pausing your application
─ Can relocate entire generation (new/old) in every GC cycle

• Concurrent, compacting old generation
• Concurrent, compacting new generation
• No stop-the-world fallback. Ever.

© 2013 Azul Systems

30
Remember This Slide?
• Java performance gets worse with heap size

ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server

─ Pause frequency varies with application activity
─ Pause duration varies with amount to scan/copy
© 2013 Azul Systems

31
Think in Terms of Service Levels
• What are requirements (percentiles & worst case)?

─ Need to think beyond averages & standard deviations
─ GC pauses don’t fit a bell curve
© 2013 Azul Systems

32
In-Memory Computing with Lucene
• Wikipedia English language index in memory
─ 132 GB data in 240 GB heap

─

© 2013 Azul Systems

Ref: blog.MikeMcCandless.com

33
In-Memory Computing with Lucene
• Wikipedia English language index in memory
─ 132 GB data in 240 GB heap

─

© 2013 Azul Systems

Ref: blog.MikeMcCandless.com

34
Always-on Performance Monitoring
• System level activity: CPU, memory, network

© 2013 Azul Systems

35
Always-on Performance Monitoring
• JVM activity: CPU & memory

© 2013 Azul Systems

36
Real Time Execution Analysis

© 2013 Azul Systems

37
www.azulsystems.com
Technical papers
Free trials of Zing VM
Free licenses to OSS committers
Parallel GC

© 2013 Azul Systems

39
Concurrent Mark/Sweep

© 2013 Azul Systems

40
G1GC

© 2013 Azul Systems

41
Zing C4

© 2013 Azul Systems

42

Weitere ähnliche Inhalte

Was ist angesagt?

JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection Tuning
Kai Koenig
 

Was ist angesagt? (20)

Apache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming EngineApache Gearpump - Lightweight Real-time Streaming Engine
Apache Gearpump - Lightweight Real-time Streaming Engine
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
 
Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...Running Java Applications inside Kubernetes with Nested Container Architectur...
Running Java Applications inside Kubernetes with Nested Container Architectur...
 
Determinism in finance
Determinism in financeDeterminism in finance
Determinism in finance
 
Garbage First & You
Garbage First & YouGarbage First & You
Garbage First & You
 
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
Data on its way to history, interrupted by analytics and silicon (@pavlobaron)
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
What you need to know about GC
What you need to know about GCWhat you need to know about GC
What you need to know about GC
 
Gearpump akka streams
Gearpump akka streamsGearpump akka streams
Gearpump akka streams
 
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoDistributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Investing the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resourcesInvesting the Effects of Overcommitting YARN resources
Investing the Effects of Overcommitting YARN resources
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
JVM and Garbage Collection Tuning
JVM and Garbage Collection TuningJVM and Garbage Collection Tuning
JVM and Garbage Collection Tuning
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
 

Andere mochten auch

คู่มือการติดตั้งและใช้งานJoomla cms
คู่มือการติดตั้งและใช้งานJoomla cmsคู่มือการติดตั้งและใช้งานJoomla cms
คู่มือการติดตั้งและใช้งานJoomla cms
withawat na wanma
 

Andere mochten auch (6)

Howard\'s House Plansbook
Howard\'s House PlansbookHoward\'s House Plansbook
Howard\'s House Plansbook
 
คู่มือการติดตั้งและใช้งานJoomla cms
คู่มือการติดตั้งและใช้งานJoomla cmsคู่มือการติดตั้งและใช้งานJoomla cms
คู่มือการติดตั้งและใช้งานJoomla cms
 
The Red River Stoles
The Red River StolesThe Red River Stoles
The Red River Stoles
 
Enterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up SearchEnterprise Search Summit - Speeding Up Search
Enterprise Search Summit - Speeding Up Search
 
Def Catalog 042108
Def Catalog 042108Def Catalog 042108
Def Catalog 042108
 
Understanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About ThemUnderstanding Application Hiccups - and What You Can Do About Them
Understanding Application Hiccups - and What You Can Do About Them
 

Ähnlich wie Java at Scale, Dallas JUG, October 2013

Ähnlich wie Java at Scale, Dallas JUG, October 2013 (20)

JVM Performance Tuning
JVM Performance TuningJVM Performance Tuning
JVM Performance Tuning
 
ZGC-SnowOne.pdf
ZGC-SnowOne.pdfZGC-SnowOne.pdf
ZGC-SnowOne.pdf
 
JVM Memory Management Details
JVM Memory Management DetailsJVM Memory Management Details
JVM Memory Management Details
 
Garbage collection
Garbage collectionGarbage collection
Garbage collection
 
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory UsageChoosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
Choosing Right Garbage Collector to Increase Efficiency of Java Memory Usage
 
GC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance EngineerGC Tuning Confessions Of A Performance Engineer
GC Tuning Confessions Of A Performance Engineer
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...Elastic JVM  for Scalable Java EE Applications  Running in Containers #Jakart...
Elastic JVM for Scalable Java EE Applications Running in Containers #Jakart...
 
How Databases Work - for Developers, Accidental DBAs and Managers
How Databases Work - for Developers, Accidental DBAs and ManagersHow Databases Work - for Developers, Accidental DBAs and Managers
How Databases Work - for Developers, Accidental DBAs and Managers
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Chronicles Of Garbage Collection (GC)
Chronicles Of Garbage Collection (GC)Chronicles Of Garbage Collection (GC)
Chronicles Of Garbage Collection (GC)
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Profiler Guided Java Performance Tuning
Profiler Guided Java Performance TuningProfiler Guided Java Performance Tuning
Profiler Guided Java Performance Tuning
 

Mehr von Azul Systems Inc.

Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
Azul Systems Inc.
 

Mehr von Azul Systems Inc. (20)

Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017Advancements ingc andc4overview_linkedin_oct2017
Advancements ingc andc4overview_linkedin_oct2017
 
Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017Understanding GC, JavaOne 2017
Understanding GC, JavaOne 2017
 
Zulu Embedded Java Introduction
Zulu Embedded Java IntroductionZulu Embedded Java Introduction
Zulu Embedded Java Introduction
 
What's New in the JVM in Java 8?
What's New in the JVM in Java 8?What's New in the JVM in Java 8?
What's New in the JVM in Java 8?
 
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie EnvironmentsDotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
DotCMS Bootcamp: Enabling Java in Latency Sensitivie Environments
 
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gapObjectLayout: Closing the (last?) inherent C vs. Java speed gap
ObjectLayout: Closing the (last?) inherent C vs. Java speed gap
 
Priming Java for Speed at Market Open
Priming Java for Speed at Market OpenPriming Java for Speed at Market Open
Priming Java for Speed at Market Open
 
Azul Systems open source guide
Azul Systems open source guideAzul Systems open source guide
Azul Systems open source guide
 
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
Start Fast and Stay Fast - Priming Java for Market Open with ReadyNow!
 
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and ToolsIntelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
Intelligent Trading Summit NY 2014: Understanding Latency: Key Lessons and Tools
 
Understanding Java Garbage Collection
Understanding Java Garbage CollectionUnderstanding Java Garbage Collection
Understanding Java Garbage Collection
 
The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014The evolution of OpenJDK: From Java's beginnings to 2014
The evolution of OpenJDK: From Java's beginnings to 2014
 
Push Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and ZingPush Technology's latest data distribution benchmark with Solarflare and Zing
Push Technology's latest data distribution benchmark with Solarflare and Zing
 
Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...Webinar: Zing Vision: Answering your toughest production Java performance que...
Webinar: Zing Vision: Answering your toughest production Java performance que...
 
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
Speculative Locking: Breaking the Scale Barrier (JAOO 2005)
 
Java vs. C/C++
Java vs. C/C++Java vs. C/C++
Java vs. C/C++
 
What's Inside a JVM?
What's Inside a JVM?What's Inside a JVM?
What's Inside a JVM?
 
The Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVMThe Java Evolution Mismatch - Why You Need a Better JVM
The Java Evolution Mismatch - Why You Need a Better JVM
 
Towards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding StyleTowards a Scalable Non-Blocking Coding Style
Towards a Scalable Non-Blocking Coding Style
 
Experiences with Debugging Data Races
Experiences with Debugging Data RacesExperiences with Debugging Data Races
Experiences with Debugging Data Races
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Java at Scale, Dallas JUG, October 2013

  • 1. Java at Scale: Performance & GC Presented to Dallas JUG October 2013 Matt Schuetze Product Manager
  • 2. Where is Java Working? • On the server ─ Enterprise applications: business rules ─ Monolithic & distributed computing • On the client ─ Fat client computing ─ Thin client, browser-based • Embedded ─ Android apps © 2013 Azul Systems 2
  • 3. What is Java’s Appeal? • Portable ─ Write once, run anywhere (after testing everywhere) • Productive ─ No bad features: no multiple inheritance, operator overloading ─ Do the Right Thing philosophy (vs. C++ Do the Efficient Thing) ─ Memory management reduces opportunities for error • Efficient ─ Interpreter → JIT compilation → Dynamic recompilation • Generic ─ Scala, Clojure, JRuby & more use Java runtime ─ Byte code is the new target architecture (ANDF) • Scalable ─ Small to large platforms © 2013 Azul Systems 3
  • 4. Parkinson’s Law Applied to Software • Hardware grows with Moore’s Law ─ Transistor counts double roughly every 18 months ─ Memory size grows around 100x every 10 years • Application sizes grow with hardware ─ ─ ─ ─ ─ 1980: 100 KB data on ¼ – ½ MB server 1990: 10 MB data on 16 – 32 MB server 2000: 1 GB data on 2 – 4 GB server 2010: 100 GB data on 256 GB server (In-memory data size. Bigger data is cached or distributed.) © 2013 Azul Systems 4
  • 5. Big Memory Servers are the Standard • Retail prices, major web server store (US $, Jan 2013) • Cheap (< $1/GB/Month), and roughly linear to ~1TB • 10s to 100s of GB/sec of memory bandwidth ─ ─ ─ ─ ─ © 2013 Azul Systems 24 vCore, 24 vCore, 32 vCore, 48 vCore, 64 vCore, 128 GB server 256 GB server 384 GB server 512 GB server 1 TB server $5K $8K $14K $19K $36K 5
  • 6. Has Java Kept Up? How Scalable is it? • How big is your Java heap? ˃ .5 GB ˃ 1 GB ˃ 2 GB ˃ 4 GB ˃ 10 GB ˃ 20 GB ˃ 50 GB ˃ 100 GB • Hardly anyone runs over 4 GB © 2013 Azul Systems 6
  • 7. Large Heaps are a Rarity • Survey of heap sizes for Plumbr memory leak detector ─ Source: http://plumbr.eu/blog/most-popular-memory-configurations © 2013 Azul Systems 7
  • 8. Why So Few Big JVMs on Big Servers? • Java performance gets worse with heap size ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server ─ Pause frequency varies with application activity ─ Pause duration varies with amount to scan/copy © 2013 Azul Systems 8
  • 9. Think in Terms of Service Levels • What are requirements (percentiles & worst case)? ─ Need to think beyond averages & standard deviations ─ GC pauses don’t fit a bell curve © 2013 Azul Systems 9
  • 10. A Classic Look at Application Response • Key assumption: response time is a function of load ─ © 2013 Azul Systems source: IBM CICS server documentation, “understanding response times” 10
  • 11. Java Response Has a Different Look • Pauses may track with load, but not in as obvious a way ─ © 2013 Azul Systems source: ZOHO QEngine White Paper: performance testing report analysis 11
  • 12. A Few Realities About GC • First the good: ─ GC is very efficient, much better than malloc() ─ Dead objects cost nothing to collect ─ GC will find all the dead objects without help, even cyclic graphs • Now the bad: ─ GC really does stop for ~1 second per GB of live objects ─ You can change when it happens, not if* ─ You can still have memory leaks ─ Hold on to objects so GC can’t release them ─ No pauses in a 20 minute test doesn’t mean they’re gone ─ “You can pay me now, or you can pay me later.” * We’ll talk about that later… © 2013 Azul Systems 12
  • 13. How Does a Garbage Collector Work? • Three phases to GC: ─ ─ ─ Identify the live objects ─ Start with stack & statics, flag everything we reach Reclaim resources held by dead objects ─ Anything we didn’t flag in the 1st phase Periodically relocate live objects (defrag) ─ Move objects together, correct references (remap) Free © 2013 Azul Systems 13
  • 14. How Does a Garbage Collector Work? • Three phases to GC: ─ ─ ─ Identify the live objects ─ Start with stack & statics, flag everything we reach Reclaim resources held by dead objects ─ Anything we didn’t flag in the 1st phase Periodically relocate live objects (defrag) ─ Move objects together, correct references (remap) • Sample implementations: ─ Mark/sweep/compact for old generation ─ Three separate passes, minimal extra heap ─ Copying collector for new generation ─ Move as we flag, do it all in one pass ─ Requires 2x heap © 2013 Azul Systems 14
  • 15. Generational GC Basic assumption: most objects die young • Use copying collector on new objects ─ Scan small % of heap, need small space for copy area ─ Reclaim the most space for the least effort ─ Move objects that live long enough to old generation(s) • Collect old gen as it fills up ─ Much less frequent, likely higher cost, lower benefit • Requires a Remembered Set (e.g. via Card Marking) ─ Track references from outside into new gen ─ Use as roots for new gen collector scan • Don’t absolutely need 2x memory for new gen GC ─ Can overflow into old gen space © 2013 Azul Systems 15
  • 16. GC Terminology • Concurrent vs. Parallel ─ A concurrent collector does GC while the application runs ─ A parallel collector uses multiple CPU cores to perform GC ─ A collector may be neither, one, or both • Concurrent vs. Stop-The-World ─ A STW collector pauses the application during part of GC ─ A STW collector is not concurrent; it may be parallel • Incremental ─ An incremental collector does its work in discrete chunks ─ Probably STW, with big gaps between increments © 2013 Azul Systems 16
  • 17. GC Terminology 2 • Precise vs. Conservative ─ A conservative collector doesn’t know every object reference or doesn’t know if some values are references or not ─ Can’t relocate objects if it can’t tell a ref from a value ─ A precise collector knows & can process every reference ─ Required to move objects ─ Compiler provides semantic information for the collector ─ Java relies on precise collection • Safepoints ─ Places in execution (point or range) where collector can identify every reference in a thread’s execution stack ─ We bring a thread to a safepoint and keep it there during GC ─ Might mean pausing the thread, might not (e.g. JNI) ─ Safepoints need to be reached frequently ─ Global safepoints apply to all threads (STW) © 2013 Azul Systems 17
  • 18. Typical GC Combinations • New generation ─ Usually a copying collector ─ Usually monolithic, stop-the-world • Old generation ─ Usually Mark/Sweep/Compact ─ May be stop-the-world, or concurrent, or mostly concurrent, or incremental stop-the-world, or mostly incremental stop-the-world • Mostly means not always ─ Fall back to monolithic stop-the-world (i.e. big pauses) © 2013 Azul Systems 18
  • 19. The Good Little Architect – A Moral Tale A good architect must be able to impose her architectural choices on her projects • Once upon a time, Azul met an app with 18 sec pauses ─ App had 10s of millions of object finalizations every GC cycle ─ Back then, reference processing was a stop-the-world event • Every class in the project had a finalizer ─ All the finalizers did was null every reference field ─ In theory, saves the GC from following pointers ─ Right for C++ reference counting, oh so wrong for Java • Two morals: ─ Know the cost of your actions (learn the underlying system) ─ Just because it doesn’t cost now doesn’t mean it won’t later © 2013 Azul Systems 19
  • 20. Oracle HotSpot GC Options • Parallel GC ─ New Gen: monolithic STW copying ─ Old Gen: monolithic STW mark/sweep/compact • Concurrent Mark Sweep (CMS) ─ New Gen: monolithic STW copying ─ Old Gen: mostly concurrent non-compacting ─ Mostly concurrent marking (multipass) ─ Concurrent sweeping ─ No compaction: free list, no object movement ─ Fallback is monolithic STW mark/sweep/compact © 2013 Azul Systems 20
  • 21. Oracle HotSpot GC Options 2 • Garbage First (G1GC) ─ New Gen: monolithic STW copying ─ Old Gen: ─ Mostly concurrent marker ─ STW to catch up on mutations, reference processing ─ Track inter-region relationships in remembered sets ─ STW mostly incremental compactor ─ Compact regions that can be done in limited time ─ Delay compaction of popular objects & regions ─ Goal: “avoid, as much as possible, having a full GC” ─ Fallback is monolithic STW mark/sweep/compact ─ Required for compacting popular objects & regions © 2013 Azul Systems 21
  • 22. Where Do Pauses Matter? • Interactive apps like ecommerce ─ Add many seconds to a transaction & maybe lose a customer ─ Batch apps care about start-to-finish time, not transactions • Big data apps ─ Travel site wants to keep hotel inventory in memory ─ Search app wants to keep entire index in memory • Efficiency & management ─ More work from fewer JVM instances • Low latency apps ─ Financial apps process data as it arrives ─ Small number of msecs down to < 1 msec ─ Requires low latency OS & significant tuning © 2013 Azul Systems 22
  • 23. Characterizing GC Pauses • Frequency relates to activity ─ Object creation rate ─ Object mutation rate • Severity relates to memory size ─ The more we examine & copy, the longer it takes ─ New gen is usually not the problem (yet) • Not how much GC overhead, but where it happens © 2013 Azul Systems 23
  • 24. Limits to GC Overhead • Worst case: no empty memory = 100% GC ─ GC runs hard all the time, reclaiming nothing • Best case: infinite empty memory = 0% GC ─ Just keep creating objects, never collecting • In between, GC follows 1/x curve as memory grows CPU 100% 0% Live set © 2013 Azul Systems Heap size 24
  • 25. How to Measure Pauses • Identify the magnitude of the problem ─ jHiccup: free software from Azul’s CTO (jhiccup.com) ─ Does minimal work & records time to complete ─ Long delays indicate JVM wasn’t letting apps run ─ Run against your application ─ Results should map well to GC logs ─ Results will not include app inefficiencies ─ Run against idle JVM ─ Identify pauses from OS, VM, power management • Don’t fix problems until you know where they lie © 2013 Azul Systems 25
  • 26. What To Do About Pauses • Apply creative language (the Marketing solution) ─ “Guarantee a worst case of X msec, 99% of the time” ─ “Mostly concurrent, mostly incremental” ─ i.e. “Will at times exhibit long monolithic STW pauses” ─ “Fairly consistent” ─ i.e. “Will sometimes show results well outside this range” ─ “Typical pauses in the tens of milliseconds” ─ i.e. “Some pauses are a lot longer than that” © 2013 Azul Systems 26
  • 27. What To Do About Pauses • Tune like crazy ─ Adjust GC parameters until behavior’s acceptable ─ A stopgap, not a solution • Keep the heap small ─ Multiple small instances instead of fewer bigger ones ─ Move data out of heap (e.g. external cache) ─ Pool your objects (e.g. threads, DB connections) • Commit ritual murder ─ Big heap, kill & restart instance before old gen GC ─ Yes, people really do this • Change your GC ─ Move from one that rarely stalls to one that never stalls © 2013 Azul Systems 27
  • 28. Making JVM Pauseless: The Hard Parts • Robust concurrent marking ─ References keep changing ─ Multipass marking is sensitive to mutation rate ─ Weak, Soft, Final references hard to deal with • Concurrent compaction ─ Moving the objects isn’t the problem ─ It’s fixing all the references to the moved objects ─ How do you handle an app looking at a stale reference? ─ If you can’t, remapping is a monolithic STW operation • New gen collection at scale ─ New gen is generally monolithic STW ─ Pauses are small because heaps are tiny ─ A 100 GB heap means new gen GC has a lot of work © 2013 Azul Systems 28
  • 29. Azul’s Zing JVM • High performance production JVM ─ 64-bit Linux on X86 ─ Red Hat, SuSE, Ubuntu, CentOS ─ Maximum heap size: 512 GB ─ Elastic memory to prevent out-of-memory failures ─ Overdraft protection for your JVM • Always-on performance & execution monitoring ─ System level ─ JVM level ─ Application level © 2013 Azul Systems 29
  • 30. Azul’s C4 Collector • Concurrent guaranteed-single-pass marker ─ Unaffected by mutation rate ─ Concurrent reference processing (weak, soft, final) • Concurrent compactor ─ Moves objects without pausing your application ─ Remaps references without pausing your application ─ Can relocate entire generation (new/old) in every GC cycle • Concurrent, compacting old generation • Concurrent, compacting new generation • No stop-the-world fallback. Ever. © 2013 Azul Systems 30
  • 31. Remember This Slide? • Java performance gets worse with heap size ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server ─ Pause frequency varies with application activity ─ Pause duration varies with amount to scan/copy © 2013 Azul Systems 31
  • 32. Think in Terms of Service Levels • What are requirements (percentiles & worst case)? ─ Need to think beyond averages & standard deviations ─ GC pauses don’t fit a bell curve © 2013 Azul Systems 32
  • 33. In-Memory Computing with Lucene • Wikipedia English language index in memory ─ 132 GB data in 240 GB heap ─ © 2013 Azul Systems Ref: blog.MikeMcCandless.com 33
  • 34. In-Memory Computing with Lucene • Wikipedia English language index in memory ─ 132 GB data in 240 GB heap ─ © 2013 Azul Systems Ref: blog.MikeMcCandless.com 34
  • 35. Always-on Performance Monitoring • System level activity: CPU, memory, network © 2013 Azul Systems 35
  • 36. Always-on Performance Monitoring • JVM activity: CPU & memory © 2013 Azul Systems 36
  • 37. Real Time Execution Analysis © 2013 Azul Systems 37
  • 38. www.azulsystems.com Technical papers Free trials of Zing VM Free licenses to OSS committers
  • 39. Parallel GC © 2013 Azul Systems 39
  • 40. Concurrent Mark/Sweep © 2013 Azul Systems 40
  • 41. G1GC © 2013 Azul Systems 41
  • 42. Zing C4 © 2013 Azul Systems 42