SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Attila Szegedi, Software Engineer
@asz




                                1
Everything I ever
learned about JVM
performance tuning
     @twitter

        2
Everything More
than I ever wanted
 to learned about
JVM performance
       tuning
      @twitter
        3
• Memory tuning
• CPU usage tuning
• Lock contention tuning
• I/O tuning


                   4
Twitter’s biggest enemy




           5
Twitter’s biggest enemy
       Latency




           5
Twitter’s biggest enemy
       Latency




           5
Twitter’s biggest enemy
                         Latency




CC licensed image from http://www.flickr.com/photos/dunechaser/
213255210/                          5
Latency contributors


• By far the biggest contributor is garbage collector
• others are, in no particular order:
   • in-process locking and thread scheduling,
   • I/O,
   • application algorithmic inefficiencies.
                             6
Areas of performance
           tuning

• Memory tuning
• Lock contention tuning
• CPU usage tuning
• I/O tuning
                           7
Areas of memory
      performance tuning


• Memory footprint tuning
• Allocation rate tuning
• Garbage collection tuning

                              8
Memory footprint tuning


• So you got an OutOfMemoryError…
   • Maybe you just have too much data!
   • Maybe your data representation is fat!
   • You can also have a genuine memory leak…
                          9
Too much data

• Run with -verbosegc
• Observe numbers in “Full GC” messages secs]
  [Full GC $before->$after($total), $time

• Can you give the JVM more memory?
• Do you need all that data in memory? Consider
  using:
   • a LRU cache, or…
   • soft references*   10
Fat data

• Can be a problem when you want to do wacky
  things, like
   • load the full Twitter social graph in a single
     JVM
   • load all user metadata in a single JVM
• Slimming internal data representation works at
  these economies of scale
                       11
Fat data: object header


 • JVM object header is normally two machine
     words.
 • That’s 16 bytes, or 128 bits on a 64-bit JVM!
 • new java.lang.Object() takes 16 bytes.
 •   new byte[0]   takes 24 bytes.


                         12
Fat data: padding

              class A {
                  byte x;
              }
              class B extends A {
                  byte y;
              }



•   new A()   takes 24 bytes.

•   new B()   takes 32 bytes.
                        13
Fat data: no inline structs

         class C {
           Object obj = new Object();
         }




  • new C() takes 40 bytes.
  • similarly, no inline array elements.
                         14
Slimming taken to
      extreme
• A research project had to load the full follower
  graph in memory
• Each vertex’s edges ended up being represented
  as int arrays
• If it grows further, we can consider variable-
  length differential encoding in a byte array



                      15
Compressed object
     pointers


• Pointers become 4 bytes long
• Usable below 32 GB of max heap size
• Automatically used below 30 GB of max heap

                    16
Compressed object
           pointers
                   Uncompressed Compressed            32-bit

      Pointer           8               4               4

  Object header         16             12*              8

   Array header         24              16             12

  Superclass pad        8               4               4

* Object can have 4 bytes of fields and still only take up 16 bytes
                                17
Avoid instances of
      primitive wrappers

• Hard won experience with Scala 2.7.7:
 • a Seq[Int] stores java.lang.Integer
 • an Array[Int] stores int
 • first needs (24 + 32 * length) bytes
 • second needs (24 + 4 * length) bytes
                        18
Avoid instances of
     primitive wrappers

• This was fixed in Scala 2.8, but it shows that:
 • you often don’t know the performance
   characteristics of your libraries,
 • and won’t ever know them until you run your
   application under a profiler.


                          19
Map footprints


•   Guava MapMaker.makeMap() takes 2272 bytes!

•   MapMaker.concurrencyLevel(1).makeMap()
    takes 352 bytes!
• ConcurrentMap with level 1 makes sense
    sometimes (i.e. you don’t want a
    ConcurrentModificationException)

                       20
Thrift can be heavy



• Thrift generated classes are used to encapsulate a
  wire tranfer format.
• Using them as your domain objects: almost never
  a good idea.



                         21
Thrift can be heavy


• Every Thrift class with a primitive field has a
  java.util.BitSet __isset_bit_vector       field.

• It adds between 52 and 72 bytes of overhead per
  object.



                            22
Thrift can be heavy




         23
Thrift can be heavy


• Thrift does not support 32-bit floats.
• Coupling domain model with transport:
 • resistance to change domain model
• You also miss oportunities for interning and N-to-1
  normalization.

                          24
class Location {
   public String city;
   public String region;
   public String countryCode;
   public int metro;
   public List<String> placeIds;
   public double lat;
   public double lon;
   public double confidence;




                        25
class SharedLocation {
   public String city;
   public String region;
   public String countryCode;
   public int metro;
   public List<String> placeIds;
class UniqueLocation {
   private SharedLocation sharedLocation;
   public double lat;
   public double lon;
   public double confidence;




                        26
Careful with thread locals

• Thread locals stick around.
• Particularly problematic in thread pools with m⨯n
  resource association.
 • 200 pooled threads using 50 connections: you end
   up with 10 000 connection buffers.
• Consider using synchronized objects, or
• just create new objects all the time.
                          27
Part II:
fighting latency



       28
Performance tradeoff

               Memory




                 Time



  Convenient, but oversimplified view.
                  29
Performance triangle

          Memory footprint




 Throughput                  Latency




                 30
Performance triangle

                     Compactness




       Throughput                   Responsiveness
                     C ⨯T ⨯ R = a
• Tuning: vary C, T, R for fixed a
• Optimization: increase a  31
Performance triangle

• Compactness: inverse of memory footprint
• Responsiveness: longest pause the application will
  experience
• Throughput: amount of useful application CPU work
  over time
• Can trade one for the other, within limits.
• If you have spare CPU, can be pure win.
                           32
Responsiveness vs.
   throughput




        33
Biggest threat to
responsiveness in the JVM
  is the garbage collector


            34
Memory pools

Eden       Survivor             Old



                         Code
       Permanent
                        cache




  This is entirely HotSpot specific!
                   35
How does young gen
           work?
          Eden          S1        S2      Old

• All new allocation happens in eden.
 • It only costs a pointer bump.
• When eden fills up, stop-the-world copy-collection
  into the survivor space.
 • Dead objects cost zero to collect.
• Aftr several collections, survivors get tenured into
  old generation.
                             36
Ideal young gen operation


• Big enough to hold more than one set of all
  concurrent request-response cycle objects.
• Each survivor space big enough to hold active
  request objects + tenuring ones.
• Tenuring threshold such that long-lived objects
  tenure fast.

                           37
Old generation collectors

• Throughput collectors
 •   -XX:+UseSerialGC

 •   -XX:+UseParallelGC

 •   -XX:+UseParallelOldGC

• Low-pause collectors
 •   -XX:+UseConcMarkSweepGC

 •   -XX:+UseG1GC   (can’t discuss it here)

                             38
Adaptive sizing policy

• Throughput collectors can automatically tune
  themselves:
 •   -XX:+UseAdaptiveSizePolicy

 •   -XX:MaxGCPauseMillis=…      (i.e. 100)
 •   -XX:GCTimeRatio=…   (i.e. 19)



                            39
Adaptive sizing policy at
         work




            40
Choose a collector


• Bulk service: throughput collector, no adaptive sizing
  policy.
• Everything else: try throughput collector with
  adaptive sizing policy. If it didn’t work, use
  concurrent mark-and-sweep (CMS).



                           41
Always start with tuning
 the young generation
• Enable -XX:+PrintGCDetails, -XX:+PrintHeapAtGC,
  and -XX:+PrintTenuringDistribution.

• Watch survivor sizes! You’ll need to determine
  “desired survivor size”.
• There’s no such thing as a “desired eden size”, mind
  you. The bigger, the better, with some
  responsiveness caveats.
• Watch the tenuring threshold; might need to tune it
  to tenure long lived objects faster.
                          42
-XX:+PrintHeapAtGC


Heap after GC invocations=7000 (full 87):
  par new generation    total 4608000K, used 398455K
   eden space 4096000K,    0% used
   from space 512000K, 77% used
   to   space 512000K,    0% used
  concurrent mark-sweep generation total 3072000K, used 1565157K
  concurrent-mark-sweep perm gen total 53256K, used 31889K
}




                                43
-XX:+PrintTenuringDistribution

Desired   survivor size   262144000 bytes, new threshold 4 (max 4)
- age     1: 137474336    bytes, 137474336 total
- age     2:   37725496   bytes, 175199832 total
- age     3:   23551752   bytes, 198751584 total
- age     4:   14772272   bytes, 213523856 total



 • Things of interest:
  • Number of ages
  • Size distribution in ages
    • You want strongly declining.
                                   44
Tuning the CMS

• Give your app as much memory as possible.
 • CMS is speculative. More memory, less punitive
   miscalculations.
• Try using CMS without tuning. Use -verbosegc and
  -XX:+PrintGCDetails.

 • Didn’t get any “Full GC” messages? You’re done!
• Otherwise, tune the young generation first.
                         45
Tuning the old generation


 • Goals:
  • Keep the fragmentation low.
  • Avoid full GC stops.
 • Fortunately, the two goals are not conflicting.

                          46
Tuning the old generation


 • Find the minimum and maximum working set size
   (observe “Full GC” numbers under stable state and
   under load).
 • Overprovision the numbers by 25-33%.
  • This gives CMS a cushion to concurrently clean
    memory as it’s used.

                           47
Tuning the old generation


 • Set -XX:InitiatingOccupancyFraction to
   between 80-75, respectively.
  • corresponds to overprovisioned heap ratio.
 • You can lower initiating occupancy fraction to 0 if
   you have CPU to spare.


                         48
Responsiveness still not
     good enough?
• Too many live objects during young gen GC:
 • Reduce NewSize, reduce survivor spaces, reduce
   tenuring threshold.
• Too many threads:
 • Find the minimal concurrency level, or
 • split the service into several JVMs.
                          49
Responsiveness still not
    good enough?
• Does the CMS abortable preclean phase, well,
  abort?
 • It is sensitive to number of objects in the new
   generation, so
   • go for smaller new generation
   • try to reduce the amount of short-lived garbage
     your app creates.

                         50
Part III:
let’s take a break from GC



            51
Thread coordination
        optimization

• You don’t have to always go for synchronized.
• Synchronization is a read barrier on entry; write
  barrier on exit.
• Sometimes you only need a half-barrier; i.e. in a
  producer-observer pattern.
• Volatiles can be used as half-barriers.
                            52
Thread coordination
        optimization

• For atomic update of a single value, you only need
  Atomic{Integer|Long}.compareAndSet().

• You can use AtomicReference.compareAndSet() for
  atomic update of composite values represented by
  immutable objects.


                          53
Fight CMS fragmentation
   with slab allocators

• CMS doesn’t compact, so it’s prone to fragmentation,
  which will lead to a stop-the-world pause.
• Apache Cassandra uses a slab allocator internally.


                           54
Cassandra slab allocator


 • 2MB slab sizes
 • copy byte[] into them using compare-and-set
 • GC before: 30-60 seconds every hour
 • GC after: 5 seconds once in 3 days and 10 hours

                       55
Slab allocator constraints
• Works for limited usage:
 • Buffers are written to linearly, flushed to disk and
   recycled when they fill up.
 • The objects need to be converted to binary
   representation anyway.
• If you need random freeing and compaction, you’re
  heading down the wrong direction.
• If you find yourself writing a full memory manager
  on top of byte buffers, stop!
                          56
Soft references revisited

• Soft reference clearing is based on the amount of
  free memory available when GC encounters the
  reference.
• By definition, throughput collectors always clear
  them.
• Can use them with CMS, but they increase memory
  pressure and make the behavior less predictable.
• Need two GC cycles to get rid of referenced objects.
                           57
Everything More
than I ever wanted
 to learned about
JVM performance
       tuning
      @twitter
   Questions?
        58
Attila Szegedi, Software Engineer
@asz




                               59

Weitere ähnliche Inhalte

Was ist angesagt?

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

Was ist angesagt? (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Troubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolutionTroubleshooting Kafka's socket server: from incident to resolution
Troubleshooting Kafka's socket server: from incident to resolution
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
InnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick FiguresInnoDB Locking Explained with Stick Figures
InnoDB Locking Explained with Stick Figures
 
Apache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache Spark Performance tuning and Best Practise
Apache Spark Performance tuning and Best PractiseApache Spark Performance tuning and Best Practise
Apache Spark Performance tuning and Best Practise
 

Andere mochten auch

Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
Attila Szegedi
 
F5 link controller
F5  link controllerF5  link controller
F5 link controller
Jimmy Saigon
 
Regions Of Pennsylvania
Regions Of PennsylvaniaRegions Of Pennsylvania
Regions Of Pennsylvania
iheart4th
 
radioligand binding studies
radioligand binding studiesradioligand binding studies
radioligand binding studies
ankit
 
Feature Story - Sample
Feature Story - SampleFeature Story - Sample
Feature Story - Sample
Courtney Dunn
 
Relationship marketing concept, process and importance
Relationship marketing concept, process and importanceRelationship marketing concept, process and importance
Relationship marketing concept, process and importance
gaurav jain
 
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
How Brands Grow : A summary of Byron Sharp's book on what marketers don't knowHow Brands Grow : A summary of Byron Sharp's book on what marketers don't know
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
Amie Weller
 
Pharmaceutical packaging
Pharmaceutical packagingPharmaceutical packaging
Pharmaceutical packaging
ceutics1315
 

Andere mochten auch (20)

Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
 
Understanding Java Garbage Collection and What You Can Do About It: Gil Tene
Understanding Java Garbage Collection and What You Can Do About It: Gil TeneUnderstanding Java Garbage Collection and What You Can Do About It: Gil Tene
Understanding Java Garbage Collection and What You Can Do About It: Gil Tene
 
Java Performance Monitoring & Tuning
Java Performance Monitoring & TuningJava Performance Monitoring & Tuning
Java Performance Monitoring & Tuning
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java ApplicationsEfficient Memory and Thread Management in Highly Parallel Java Applications
Efficient Memory and Thread Management in Highly Parallel Java Applications
 
Pimp my gc - Supersonic Scala
Pimp my gc - Supersonic ScalaPimp my gc - Supersonic Scala
Pimp my gc - Supersonic Scala
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
F5 link controller
F5  link controllerF5  link controller
F5 link controller
 
Regions Of Pennsylvania
Regions Of PennsylvaniaRegions Of Pennsylvania
Regions Of Pennsylvania
 
Infrastructureless Wireless networks
Infrastructureless Wireless networksInfrastructureless Wireless networks
Infrastructureless Wireless networks
 
Tropical seasonal forests
Tropical seasonal forestsTropical seasonal forests
Tropical seasonal forests
 
radioligand binding studies
radioligand binding studiesradioligand binding studies
radioligand binding studies
 
Feature Story - Sample
Feature Story - SampleFeature Story - Sample
Feature Story - Sample
 
Guide to Construction Procurement Strategies
Guide to Construction Procurement StrategiesGuide to Construction Procurement Strategies
Guide to Construction Procurement Strategies
 
Relationship marketing concept, process and importance
Relationship marketing concept, process and importanceRelationship marketing concept, process and importance
Relationship marketing concept, process and importance
 
Citing Yourself (citing your previous work) in MLA or APA format
Citing Yourself (citing your previous work) in MLA or APA formatCiting Yourself (citing your previous work) in MLA or APA format
Citing Yourself (citing your previous work) in MLA or APA format
 
Difference between flyers, brochures, posters & leaflets
Difference between flyers, brochures, posters & leafletsDifference between flyers, brochures, posters & leaflets
Difference between flyers, brochures, posters & leaflets
 
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
How Brands Grow : A summary of Byron Sharp's book on what marketers don't knowHow Brands Grow : A summary of Byron Sharp's book on what marketers don't know
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
 
Layouting Your School Paper
Layouting Your School PaperLayouting Your School Paper
Layouting Your School Paper
 
Pharmaceutical packaging
Pharmaceutical packagingPharmaceutical packaging
Pharmaceutical packaging
 

Ähnlich wie Everything I Ever Learned About JVM Performance Tuning @Twitter

SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 

Ähnlich wie Everything I Ever Learned About JVM Performance Tuning @Twitter (20)

Decima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero DawnDecima Engine: Visibility in Horizon Zero Dawn
Decima Engine: Visibility in Horizon Zero Dawn
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Memory Management & Garbage Collection
Memory Management & Garbage CollectionMemory Management & Garbage Collection
Memory Management & Garbage Collection
 
Performance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen BorgersPerformance van Java 8 en verder - Jeroen Borgers
Performance van Java 8 en verder - Jeroen Borgers
 
Spil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NLSpil Storage Platform (Erlang) @ EUG-NL
Spil Storage Platform (Erlang) @ EUG-NL
 
Pulsar
PulsarPulsar
Pulsar
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Loom promises: be there!
Loom promises: be there!Loom promises: be there!
Loom promises: be there!
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
NickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScaleNickKallen_DataArchitectureAtTwitterScale
NickKallen_DataArchitectureAtTwitterScale
 
An Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in JavaAn Introduction to JVM Internals and Garbage Collection in Java
An Introduction to JVM Internals and Garbage Collection in Java
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Everything I Ever Learned About JVM Performance Tuning @Twitter

  • 1. Attila Szegedi, Software Engineer @asz 1
  • 2. Everything I ever learned about JVM performance tuning @twitter 2
  • 3. Everything More than I ever wanted to learned about JVM performance tuning @twitter 3
  • 4. • Memory tuning • CPU usage tuning • Lock contention tuning • I/O tuning 4
  • 8. Twitter’s biggest enemy Latency CC licensed image from http://www.flickr.com/photos/dunechaser/ 213255210/ 5
  • 9. Latency contributors • By far the biggest contributor is garbage collector • others are, in no particular order: • in-process locking and thread scheduling, • I/O, • application algorithmic inefficiencies. 6
  • 10. Areas of performance tuning • Memory tuning • Lock contention tuning • CPU usage tuning • I/O tuning 7
  • 11. Areas of memory performance tuning • Memory footprint tuning • Allocation rate tuning • Garbage collection tuning 8
  • 12. Memory footprint tuning • So you got an OutOfMemoryError… • Maybe you just have too much data! • Maybe your data representation is fat! • You can also have a genuine memory leak… 9
  • 13. Too much data • Run with -verbosegc • Observe numbers in “Full GC” messages secs] [Full GC $before->$after($total), $time • Can you give the JVM more memory? • Do you need all that data in memory? Consider using: • a LRU cache, or… • soft references* 10
  • 14. Fat data • Can be a problem when you want to do wacky things, like • load the full Twitter social graph in a single JVM • load all user metadata in a single JVM • Slimming internal data representation works at these economies of scale 11
  • 15. Fat data: object header • JVM object header is normally two machine words. • That’s 16 bytes, or 128 bits on a 64-bit JVM! • new java.lang.Object() takes 16 bytes. • new byte[0] takes 24 bytes. 12
  • 16. Fat data: padding class A { byte x; } class B extends A { byte y; } • new A() takes 24 bytes. • new B() takes 32 bytes. 13
  • 17. Fat data: no inline structs class C { Object obj = new Object(); } • new C() takes 40 bytes. • similarly, no inline array elements. 14
  • 18. Slimming taken to extreme • A research project had to load the full follower graph in memory • Each vertex’s edges ended up being represented as int arrays • If it grows further, we can consider variable- length differential encoding in a byte array 15
  • 19. Compressed object pointers • Pointers become 4 bytes long • Usable below 32 GB of max heap size • Automatically used below 30 GB of max heap 16
  • 20. Compressed object pointers Uncompressed Compressed 32-bit Pointer 8 4 4 Object header 16 12* 8 Array header 24 16 12 Superclass pad 8 4 4 * Object can have 4 bytes of fields and still only take up 16 bytes 17
  • 21. Avoid instances of primitive wrappers • Hard won experience with Scala 2.7.7: • a Seq[Int] stores java.lang.Integer • an Array[Int] stores int • first needs (24 + 32 * length) bytes • second needs (24 + 4 * length) bytes 18
  • 22. Avoid instances of primitive wrappers • This was fixed in Scala 2.8, but it shows that: • you often don’t know the performance characteristics of your libraries, • and won’t ever know them until you run your application under a profiler. 19
  • 23. Map footprints • Guava MapMaker.makeMap() takes 2272 bytes! • MapMaker.concurrencyLevel(1).makeMap() takes 352 bytes! • ConcurrentMap with level 1 makes sense sometimes (i.e. you don’t want a ConcurrentModificationException) 20
  • 24. Thrift can be heavy • Thrift generated classes are used to encapsulate a wire tranfer format. • Using them as your domain objects: almost never a good idea. 21
  • 25. Thrift can be heavy • Every Thrift class with a primitive field has a java.util.BitSet __isset_bit_vector field. • It adds between 52 and 72 bytes of overhead per object. 22
  • 26. Thrift can be heavy 23
  • 27. Thrift can be heavy • Thrift does not support 32-bit floats. • Coupling domain model with transport: • resistance to change domain model • You also miss oportunities for interning and N-to-1 normalization. 24
  • 28. class Location { public String city; public String region; public String countryCode; public int metro; public List<String> placeIds; public double lat; public double lon; public double confidence; 25
  • 29. class SharedLocation { public String city; public String region; public String countryCode; public int metro; public List<String> placeIds; class UniqueLocation { private SharedLocation sharedLocation; public double lat; public double lon; public double confidence; 26
  • 30. Careful with thread locals • Thread locals stick around. • Particularly problematic in thread pools with m⨯n resource association. • 200 pooled threads using 50 connections: you end up with 10 000 connection buffers. • Consider using synchronized objects, or • just create new objects all the time. 27
  • 32. Performance tradeoff Memory Time Convenient, but oversimplified view. 29
  • 33. Performance triangle Memory footprint Throughput Latency 30
  • 34. Performance triangle Compactness Throughput Responsiveness C ⨯T ⨯ R = a • Tuning: vary C, T, R for fixed a • Optimization: increase a 31
  • 35. Performance triangle • Compactness: inverse of memory footprint • Responsiveness: longest pause the application will experience • Throughput: amount of useful application CPU work over time • Can trade one for the other, within limits. • If you have spare CPU, can be pure win. 32
  • 36. Responsiveness vs. throughput 33
  • 37. Biggest threat to responsiveness in the JVM is the garbage collector 34
  • 38. Memory pools Eden Survivor Old Code Permanent cache This is entirely HotSpot specific! 35
  • 39. How does young gen work? Eden S1 S2 Old • All new allocation happens in eden. • It only costs a pointer bump. • When eden fills up, stop-the-world copy-collection into the survivor space. • Dead objects cost zero to collect. • Aftr several collections, survivors get tenured into old generation. 36
  • 40. Ideal young gen operation • Big enough to hold more than one set of all concurrent request-response cycle objects. • Each survivor space big enough to hold active request objects + tenuring ones. • Tenuring threshold such that long-lived objects tenure fast. 37
  • 41. Old generation collectors • Throughput collectors • -XX:+UseSerialGC • -XX:+UseParallelGC • -XX:+UseParallelOldGC • Low-pause collectors • -XX:+UseConcMarkSweepGC • -XX:+UseG1GC (can’t discuss it here) 38
  • 42. Adaptive sizing policy • Throughput collectors can automatically tune themselves: • -XX:+UseAdaptiveSizePolicy • -XX:MaxGCPauseMillis=… (i.e. 100) • -XX:GCTimeRatio=… (i.e. 19) 39
  • 44. Choose a collector • Bulk service: throughput collector, no adaptive sizing policy. • Everything else: try throughput collector with adaptive sizing policy. If it didn’t work, use concurrent mark-and-sweep (CMS). 41
  • 45. Always start with tuning the young generation • Enable -XX:+PrintGCDetails, -XX:+PrintHeapAtGC, and -XX:+PrintTenuringDistribution. • Watch survivor sizes! You’ll need to determine “desired survivor size”. • There’s no such thing as a “desired eden size”, mind you. The bigger, the better, with some responsiveness caveats. • Watch the tenuring threshold; might need to tune it to tenure long lived objects faster. 42
  • 46. -XX:+PrintHeapAtGC Heap after GC invocations=7000 (full 87): par new generation total 4608000K, used 398455K eden space 4096000K, 0% used from space 512000K, 77% used to space 512000K, 0% used concurrent mark-sweep generation total 3072000K, used 1565157K concurrent-mark-sweep perm gen total 53256K, used 31889K } 43
  • 47. -XX:+PrintTenuringDistribution Desired survivor size 262144000 bytes, new threshold 4 (max 4) - age 1: 137474336 bytes, 137474336 total - age 2: 37725496 bytes, 175199832 total - age 3: 23551752 bytes, 198751584 total - age 4: 14772272 bytes, 213523856 total • Things of interest: • Number of ages • Size distribution in ages • You want strongly declining. 44
  • 48. Tuning the CMS • Give your app as much memory as possible. • CMS is speculative. More memory, less punitive miscalculations. • Try using CMS without tuning. Use -verbosegc and -XX:+PrintGCDetails. • Didn’t get any “Full GC” messages? You’re done! • Otherwise, tune the young generation first. 45
  • 49. Tuning the old generation • Goals: • Keep the fragmentation low. • Avoid full GC stops. • Fortunately, the two goals are not conflicting. 46
  • 50. Tuning the old generation • Find the minimum and maximum working set size (observe “Full GC” numbers under stable state and under load). • Overprovision the numbers by 25-33%. • This gives CMS a cushion to concurrently clean memory as it’s used. 47
  • 51. Tuning the old generation • Set -XX:InitiatingOccupancyFraction to between 80-75, respectively. • corresponds to overprovisioned heap ratio. • You can lower initiating occupancy fraction to 0 if you have CPU to spare. 48
  • 52. Responsiveness still not good enough? • Too many live objects during young gen GC: • Reduce NewSize, reduce survivor spaces, reduce tenuring threshold. • Too many threads: • Find the minimal concurrency level, or • split the service into several JVMs. 49
  • 53. Responsiveness still not good enough? • Does the CMS abortable preclean phase, well, abort? • It is sensitive to number of objects in the new generation, so • go for smaller new generation • try to reduce the amount of short-lived garbage your app creates. 50
  • 54. Part III: let’s take a break from GC 51
  • 55. Thread coordination optimization • You don’t have to always go for synchronized. • Synchronization is a read barrier on entry; write barrier on exit. • Sometimes you only need a half-barrier; i.e. in a producer-observer pattern. • Volatiles can be used as half-barriers. 52
  • 56. Thread coordination optimization • For atomic update of a single value, you only need Atomic{Integer|Long}.compareAndSet(). • You can use AtomicReference.compareAndSet() for atomic update of composite values represented by immutable objects. 53
  • 57. Fight CMS fragmentation with slab allocators • CMS doesn’t compact, so it’s prone to fragmentation, which will lead to a stop-the-world pause. • Apache Cassandra uses a slab allocator internally. 54
  • 58. Cassandra slab allocator • 2MB slab sizes • copy byte[] into them using compare-and-set • GC before: 30-60 seconds every hour • GC after: 5 seconds once in 3 days and 10 hours 55
  • 59. Slab allocator constraints • Works for limited usage: • Buffers are written to linearly, flushed to disk and recycled when they fill up. • The objects need to be converted to binary representation anyway. • If you need random freeing and compaction, you’re heading down the wrong direction. • If you find yourself writing a full memory manager on top of byte buffers, stop! 56
  • 60. Soft references revisited • Soft reference clearing is based on the amount of free memory available when GC encounters the reference. • By definition, throughput collectors always clear them. • Can use them with CMS, but they increase memory pressure and make the behavior less predictable. • Need two GC cycles to get rid of referenced objects. 57
  • 61. Everything More than I ever wanted to learned about JVM performance tuning @twitter Questions? 58
  • 62. Attila Szegedi, Software Engineer @asz 59

Hinweis der Redaktion

  1. Here it is all together\n
  2. Here it is all together\n
  3. Here it is all together\n
  4. Here it is all together\n
  5. Here it is all together\n
  6. Here it is all together\n
  7. Here it is all together\n
  8. Here it is all together\n
  9. Here it is all together\n
  10. Here it is all together\n
  11. Here it is all together\n
  12. Here it is all together\n
  13. Here it is all together\n
  14. Here it is all together\n
  15. Here it is all together\n
  16. Here it is all together\n
  17. Here it is all together\n
  18. Here it is all together\n
  19. Here it is all together\n
  20. Here it is all together\n
  21. Here it is all together\n
  22. Here it is all together\n
  23. Here it is all together\n
  24. Here it is all together\n
  25. Here it is all together\n
  26. Here it is all together\n
  27. \n
  28. \n
  29. Here it is all together\n
  30. Here it is all together\n
  31. Here it is all together\n
  32. Here it is all together\n
  33. Here it is all together\n
  34. Here it is all together\n
  35. Here it is all together\n
  36. Here it is all together\n
  37. Here it is all together\n
  38. Here it is all together\n
  39. Here it is all together\n
  40. Here it is all together\n
  41. Here it is all together\n
  42. Here it is all together\n
  43. Here it is all together\n
  44. Here it is all together\n
  45. Here it is all together\n
  46. Here it is all together\n
  47. Here it is all together\n
  48. Here it is all together\n
  49. Here it is all together\n
  50. Here it is all together\n
  51. Here it is all together\n
  52. Here it is all together\n
  53. Here it is all together\n
  54. Here it is all together\n
  55. Here it is all together\n
  56. Here it is all together\n
  57. Here it is all together\n
  58. Here it is all together\n
  59. Here it is all together\n
  60. Here it is all together\n
  61. Here it is all together\n