Apache Direct Memory is an open source implementation of off-heap caching that uses ByteBuffer.allocateDirect to store objects in off-heap memory without degrading JVM performance. It provides a multi-layered caching solution and can be used to build a standalone cache server similar to Memcached. Current use cases include integrating with Ehcache for multi-level caching and implementing an off-heap output stream to process streaming data without filling heap memory. Future work includes benchmarking, improving the API, and integrating with more libraries.
AWS Community Day CPH - Three problems of Terraform
Direct memory jugl-2012.03.08
1. JUG Lausanne
8. March 2012
Apache Direct Memory
Reducing Heap Memory Stress
The next battle horse for JVM
performance tuning
2. JUG Lausanne
8. March 2012
About me
• Benoit Perroud
• Apache Direct Memory Commiter
• bperroud@apache.org
• @killerwhile
• Software craftsman
• BigData Engineer @
3. JUG Lausanne
8. March 2012
Today's Agenda
• Off Heap Caching
– Java Memory
– Garbage Collector (GC)
– Cache On-heap vs. Off-heap Caching
• Apache Direct Memory
– Design and principles
– Uses cases
• Multi layered cache
• Standalone server “à la memcache”
– Next steps
• Questions
4. JUG Lausanne
8. March 2012
Before starting
• Sorry for my bad English and my poor French
• Interrupt me anytime
• I have nothing to sell. It's just worth while sharing
• Please do ask questions
5. JUG Lausanne
8. March 2012
Java Memory
• Automatic memory allocation
• Garbage collector (GC)
6. JUG Lausanne
8. March 2012
Garbage Collector
• Several types of GC
– Serial GC
– Parallel GC (throughput collector)
– Concurrent Mark & Sweep GC (concurrent
low pause collector)
– G1 GC (low latency concurrent M&S)
7. JUG Lausanne
8. March 2012
Garbage Collector
• But all GC have a stop-the-world
behavior
• Proportional to the memory's size
• Resulting in application
unresponsiveness
– A pain when dealing with tight SLAs
8. JUG Lausanne
8. March 2012
Cache On-Heap vs. Off-Heap
• On-heap
– Objects tends to be promoted into tenured
memory
– GC storm effect when using refreshing
cache
– No overhead (for caching by reference)
9. JUG Lausanne
8. March 2012
On-Heap vs. Off-Heap
• Off-heap
– Object payload is no more affecting GC
– Serialization/Deserialization overhead
• Hopefully lots of work on serialization has been
done (Protobuf, Avro, Thrift, msgpack,
BSON, ...)
10. JUG Lausanne
8. March 2012
Apache Direct Memory
Apache Direct Memory is a multi
layered cache implementation featuring
off-heap memory storage to enable
caching of java objects without
degrading jvm performance.
→ Opensource implementation of
Terracotta BigMemory.
11. JUG Lausanne
8. March 2012
Apache Direct Memory
• Apache Software Foundation Incubator project
• Met the Incubator falls 2011
• 12 developers ATM, 10+ contributors
• I joined 1st January 2012
– was the good achievement of my Hacky Christmas Holiday :)
• Disclaimer : Under heavy development
– I rewrote most of the memory allocation service
– APIs are subject to changes, and bugs to be found
12. JUG Lausanne
8. March 2012
Design & Principles
• ByteBuffer.allocateDirect is the
foundation of the cache
• ByteBuffers are allocated in big chunk
and then splitted for internal use
13. JUG Lausanne
8. March 2012
Design & Principles
• Build on layers :
– CachingService
• Serialize object (pluggable)
– MemoryManagerService
• Compute access statistics
– ByteBufferAllocatorService
• Eventually deal with ByteBuffers
14. JUG Lausanne
8. March 2012
ByteBuffers Allocation
2 different allocation's strategies
• Merging ByteBuffers allocation
– No memory wasted
– Free at creation
– Suffer from fragmentation
– Need synchronization at allocation and
deallocation
15. JUG Lausanne
8. March 2012
ByteBuffers Allocation
• Fixed size ByteBuffers allocation
– Linux kernel SLAB's style allocation
• Select a set of fixed sizes
• Split bigger buffers (1MB+) in that size
– Allocation is really fast and good concurrency
• All structures is pre-instanciated
– Creation (or buffer's size increase) has a cost
• 1GB split in 128 bytes slabs is 8M+ buffers created
– Do not suffer from fragmentation
– Waste memory if the selected size is not relevant
• Work really well in HDFS where all blocks are of the same size
16. JUG Lausanne
8. March 2012
Use case 1 : Multi layers
cache
• Idea : most used objects are cached on-heap,
the rest off-heap, may overflown to disk.
• Sounds like BigMemory.
• See
net.sf.ehcache.store.offheap.OffHeapStore
• Actually we inject DM in ehcache like do
BigMemory. Ouch ;)
• Comparison needs to be done
17. JUG Lausanne
8. March 2012
Use case 2 : OffHeap Output
Stream
• Idea : read Twitter firehose stream without
filling the precious heap memory
– OOM will lead to unpredictable behavior else where in the
application
• From your socket directly write off-heap using
OutputStream style
– allocate a fixed size temporary buffer of your choice
• Read from this stream
– InputAndOutputStream parent class that holds both
OutputStream and InputStream instances
18. JUG Lausanne
8. March 2012
Use case 3 : Standalone
cache server
• Idea : replace Memcached :)
– But with native plain REST API
• DM has all the building blocks to implement
such server, worth while trying
• See the server submodule
19. JUG Lausanne
8. March 2012
Next Steps
• JSR 107
• Real Benchmarks
• Builder patterns
• Integration with more libs (Spring, Guice, …)
• Implementations with DM lib (Cassandra (wip), Lucene,
Tomcat, …)
• Cache Resizing
• Management and monitoring
• ...
• https://issues.apache.org/jira/browse/DIRECTMEMORY
20. JUG Lausanne
8. March 2012
Questions ?
• Thanks for you attention