SlideShare a Scribd company logo
1 of 61
Download to read offline
Java Collections
The Force Awakens
Darth @RaoulUK
Darth @RichardWarburto
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Collection bugs
1. Element access (Off-by-one error, ArrayOutOfBound)
2. Concurrent modification
3. Check-then-Act
Scenario 1
List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));
for (String jedi: jedis) {
if (Character.isLowerCase(jedi.charAt(0))) {
jedis.remove(jedi);
}
}
Scenario 2
Map<String, BigDecimal> movieViews = new HashMap<>();
BigDecimal views = movieViews.get(MOVIE);
if(views != null) {
movieViews.put(MOVIE, views.add(BigDecimal.ONE));
}
views != nullmoviesViews.get movieViews.put
Then
Check Act
Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads &
updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collections
http://dig.cs.illinois.edu/papers/checkThenAct.pdf
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object
pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.
com/MarApr2016/Twitter#&pageSet=28&page=0
Java 9 API updates
Collection factory methods
● Non-goal to provide persistent immutable collections
● http://openjdk.java.net/jeps/269
java.util.Optional
● ifPresentOrElse(), or(), stream(), getWhenPresent()
● Optional.get() becomes deprecated
java.util.Stream
● takeWhile, dropWhile
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Categorising Collections
Mutable
Immutable
Non-Persistent Persistent
Unsynchronized Concurrent
Unmodifiable View
Available in
Core Library
Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
Unmodifiable
List<String> jedis = new ArrayList<>();
jedis.add("Luke Skywalker");
List<String> cantChangeMe = Collections.unmodifiableList(jedis);
// java.lang.UnsupportedOperationException
//cantChangeMe.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker]
jedis.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
Immutable vs. Mutable hierarchy
ImmutableList MutableList
+ ImmutableList<T> toImmutable()
java.util.List
+ MutableList<T> toList()
Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/
ListIterable
Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying
on updates
Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
Updating Persistent List
A B C X Y Z
Before
Updating Persistent List
A B C X Y Z
Before
A B D
After
Blue nodes indicate new copies
Purple nodes indicates nodes we wish to update
Concatenating Two Persistent Lists
A B C
X Y Z
Before
Concatenating Two Persistent Lists
- Poor locality due to pointer chasing
- Copying of nodes
A B C
X Y Z
Before
A B C
After
Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
Updating Persistent Binary Tree
Before
Updating Persistent Binary Tree
After
Persistent Array
How do we get the immutability benefits with performance of mutable
variants?
Trie
root
10 4520
3. Picking the right branch is done by using
parts of the key as a lookup
1. Branch factor
not limited to
binary
2. Leaf nodes
contain actual
values
a
a e
b
c
b c f
Persistent Array (Bitmapped Vector Trie)
... ...
... ...
... ...
... ...
.
.
.
.
.
.
1 31
0 1 31
Level 1 (root)
Level 2
Leaf nodes
Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes
- PCollections: https://github.com/hrldcpr/pcollections
- Port of Clojure DS: https://github.com/krukow/clj-ds
- Port of Scala DS: https://github.com/andrewoma/dexx
Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MB
Integer[]: 160MB
ArrayList<Integer>: 215MB
PersistentVector<Integer>: 214MB (Clojure-DS)
Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout: http://openjdk.java.
net/projects/code-tools/jol/
Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
HashMaps Basics
...
Han Solo
hash = 72309
Chewbacca
hash = 72309
Chaining Probing
HashMaps
a separate data
structure for
collision lookups
Store inline and
have a probing
sequence
Aliases: Palpatine vs Darth Sidious
Chaining Probing
HashMaps
aka Closed
Addressing
aka Open Hashing
aka Open
Addressing
aka Closed
Hashing
Chaining Probing
HashMaps
Linked List Based Tree Based
java.util.HashMap
Chaining Based HashMap
Historically maintained a LinkedList in the case of a collision
Problem: with high collision rates that the HashMap approaches O(N)
lookup
java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
So which HashMap is best?
Benchmarking is about building a mental
model of the performance tradeoffs
Example Jar-Jar Benchmark
call get() on a single value for a map
of size 1
No model of the different factors that
affect things!
Benchmarking HashMaps
Load Factor
Nonlinear key access
Successful vs Failed get()
Hash Collisions
Comparable vs Incomparable keys
Different Keys and Values
Cost of hashCode/Equals
Tree Optimization - 60% Collisions
Tree Optimization - 10% Collisions
Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be
significantly slower.
Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps
“precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
Conclusions
Interface Popularity
List 1576210
Set 980763
Map 803171
Queue 62024
Deque 3464
SortedSet 9121
NavigableSet 1735
SortedMap 8677
NavigableMap 1484
Implementation Popularity
ArrayList 225029
LinkedList 26850
ArrayDeque 1086
HashSet 68940
TreeSet 10108
EnumSet 10512
HashMap 137610
TreeMap 7734
WeakHashMap 3473
IdentityHashMap 2443
EnumMap 1904
Evolution can be interesting ...
Java 1.2 Java 10?
Any Questions?
www.pluralsight.com/author/richard-warburton
www.cambridgecoding.com
www.iteratrlearning.com
Further reading
Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays
https://infoscience.epfl.ch/record/64410/files/techlists.pdf
Smaller Footprint for Java Collections
http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf
Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections
http://michael.steindorfer.name/publications/oopsla15.pdf
RRB-Trees: Efficient Immutable Vectors
https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
Further reading
Doug Lea’s Analysis of the HashMap implementation tradeoffs
http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html
Java Specialists HashMap article
http://www.javaspecialists.eu/archive/Issue235.html
Sample and Benchmark Code
https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens
Further reading
Debian code search used for popularity
https://codesearch.debian.net/

More Related Content

What's hot

Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John StevensonJAX London
 
Best practices in Java
Best practices in JavaBest practices in Java
Best practices in JavaMudit Gupta
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Sunghyouk Bae
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring frameworkSunghyouk Bae
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainersSunghyouk Bae
 
Debugging Your Production JVM
Debugging Your Production JVMDebugging Your Production JVM
Debugging Your Production JVMkensipe
 
Scala eXchange opening
Scala eXchange openingScala eXchange opening
Scala eXchange openingMartin Odersky
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01Jérôme Rocheteau
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleMartin Odersky
 
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Holden Karau
 
The TclQuadcode Compiler
The TclQuadcode CompilerThe TclQuadcode Compiler
The TclQuadcode CompilerDonal Fellows
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Holden Karau
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Logic programming a ruby perspective
Logic programming a ruby perspectiveLogic programming a ruby perspective
Logic programming a ruby perspectiveNorman Richards
 
Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Jay Coskey
 
TclOO: Past Present Future
TclOO: Past Present FutureTclOO: Past Present Future
TclOO: Past Present FutureDonal Fellows
 

What's hot (20)

Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Clojure made-simple - John Stevenson
Clojure made-simple - John StevensonClojure made-simple - John Stevenson
Clojure made-simple - John Stevenson
 
Best practices in Java
Best practices in JavaBest practices in Java
Best practices in Java
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
JUnit5 and TestContainers
JUnit5 and TestContainersJUnit5 and TestContainers
JUnit5 and TestContainers
 
Debugging Your Production JVM
Debugging Your Production JVMDebugging Your Production JVM
Debugging Your Production JVM
 
Scala eXchange opening
Scala eXchange openingScala eXchange opening
Scala eXchange opening
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
 
Oscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
 
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
 
The TclQuadcode Compiler
The TclQuadcode CompilerThe TclQuadcode Compiler
The TclQuadcode Compiler
 
Adventures in TclOO
Adventures in TclOOAdventures in TclOO
Adventures in TclOO
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Logic programming a ruby perspective
Logic programming a ruby perspectiveLogic programming a ruby perspective
Logic programming a ruby perspective
 
Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13
 
TclOO: Past Present Future
TclOO: Past Present FutureTclOO: Past Present Future
TclOO: Past Present Future
 

Similar to Collections forceawakens

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevMattias Karlsson
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overviewstasimus
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with ClojureJohn Stevenson
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)Spark Summit
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
Scala clojure techday_2011
Scala clojure techday_2011Scala clojure techday_2011
Scala clojure techday_2011Thadeu Russo
 
Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Agora Group
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 

Similar to Collections forceawakens (20)

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Devoxx
DevoxxDevoxx
Devoxx
 
Java 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from OredevJava 7 Whats New(), Whats Next() from Oredev
Java 7 Whats New(), Whats Next() from Oredev
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overview
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Scala clojure techday_2011
Scala clojure techday_2011Scala clojure techday_2011
Scala clojure techday_2011
 
Lobos Introduction
Lobos IntroductionLobos Introduction
Lobos Introduction
 
Clojure And Swing
Clojure And SwingClojure And Swing
Clojure And Swing
 
Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011Terence Barr - jdk7+8 - 24mai2011
Terence Barr - jdk7+8 - 24mai2011
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
Forgive me for i have allocated
Forgive me for i have allocatedForgive me for i have allocated
Forgive me for i have allocated
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?
 

More from RichardWarburton

Fantastic performance and where to find it
Fantastic performance and where to find itFantastic performance and where to find it
Fantastic performance and where to find itRichardWarburton
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)RichardWarburton
 
Production profiling: What, Why and How
Production profiling: What, Why and HowProduction profiling: What, Why and How
Production profiling: What, Why and HowRichardWarburton
 
Production profiling what, why and how (JBCN Edition)
Production profiling  what, why and how (JBCN Edition)Production profiling  what, why and how (JBCN Edition)
Production profiling what, why and how (JBCN Edition)RichardWarburton
 
Production Profiling: What, Why and How
Production Profiling: What, Why and HowProduction Profiling: What, Why and How
Production Profiling: What, Why and HowRichardWarburton
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hoodRichardWarburton
 
Pragmatic functional refactoring with java 8 (1)
Pragmatic functional refactoring with java 8 (1)Pragmatic functional refactoring with java 8 (1)
Pragmatic functional refactoring with java 8 (1)RichardWarburton
 
Twins: Object Oriented Programming and Functional Programming
Twins: Object Oriented Programming and Functional ProgrammingTwins: Object Oriented Programming and Functional Programming
Twins: Object Oriented Programming and Functional ProgrammingRichardWarburton
 
Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8RichardWarburton
 
Introduction to lambda behave
Introduction to lambda behaveIntroduction to lambda behave
Introduction to lambda behaveRichardWarburton
 
Introduction to lambda behave
Introduction to lambda behaveIntroduction to lambda behave
Introduction to lambda behaveRichardWarburton
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Simplifying java with lambdas (short)
Simplifying java with lambdas (short)Simplifying java with lambdas (short)
Simplifying java with lambdas (short)RichardWarburton
 
Lambdas myths-and-mistakes
Lambdas myths-and-mistakesLambdas myths-and-mistakes
Lambdas myths-and-mistakesRichardWarburton
 
Lambdas: Myths and Mistakes
Lambdas: Myths and MistakesLambdas: Myths and Mistakes
Lambdas: Myths and MistakesRichardWarburton
 

More from RichardWarburton (20)

Fantastic performance and where to find it
Fantastic performance and where to find itFantastic performance and where to find it
Fantastic performance and where to find it
 
Production profiling what, why and how technical audience (3)
Production profiling  what, why and how   technical audience (3)Production profiling  what, why and how   technical audience (3)
Production profiling what, why and how technical audience (3)
 
Production profiling: What, Why and How
Production profiling: What, Why and HowProduction profiling: What, Why and How
Production profiling: What, Why and How
 
Production profiling what, why and how (JBCN Edition)
Production profiling  what, why and how (JBCN Edition)Production profiling  what, why and how (JBCN Edition)
Production profiling what, why and how (JBCN Edition)
 
Production Profiling: What, Why and How
Production Profiling: What, Why and HowProduction Profiling: What, Why and How
Production Profiling: What, Why and How
 
Jvm profiling under the hood
Jvm profiling under the hoodJvm profiling under the hood
Jvm profiling under the hood
 
How to run a hackday
How to run a hackdayHow to run a hackday
How to run a hackday
 
Pragmatic functional refactoring with java 8 (1)
Pragmatic functional refactoring with java 8 (1)Pragmatic functional refactoring with java 8 (1)
Pragmatic functional refactoring with java 8 (1)
 
Twins: Object Oriented Programming and Functional Programming
Twins: Object Oriented Programming and Functional ProgrammingTwins: Object Oriented Programming and Functional Programming
Twins: Object Oriented Programming and Functional Programming
 
Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8Pragmatic functional refactoring with java 8
Pragmatic functional refactoring with java 8
 
Introduction to lambda behave
Introduction to lambda behaveIntroduction to lambda behave
Introduction to lambda behave
 
Introduction to lambda behave
Introduction to lambda behaveIntroduction to lambda behave
Introduction to lambda behave
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Simplifying java with lambdas (short)
Simplifying java with lambdas (short)Simplifying java with lambdas (short)
Simplifying java with lambdas (short)
 
Twins: OOP and FP
Twins: OOP and FPTwins: OOP and FP
Twins: OOP and FP
 
Twins: OOP and FP
Twins: OOP and FPTwins: OOP and FP
Twins: OOP and FP
 
The Bleeding Edge
The Bleeding EdgeThe Bleeding Edge
The Bleeding Edge
 
Lambdas myths-and-mistakes
Lambdas myths-and-mistakesLambdas myths-and-mistakes
Lambdas myths-and-mistakes
 
Caching in
Caching inCaching in
Caching in
 
Lambdas: Myths and Mistakes
Lambdas: Myths and MistakesLambdas: Myths and Mistakes
Lambdas: Myths and Mistakes
 

Recently uploaded

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Collections forceawakens

  • 1. Java Collections The Force Awakens Darth @RaoulUK Darth @RichardWarburto
  • 2.
  • 3. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 4. Collection bugs 1. Element access (Off-by-one error, ArrayOutOfBound) 2. Concurrent modification 3. Check-then-Act
  • 5. Scenario 1 List<String> jedis = new ArrayList<>(asList("Luke", "yoda")); for (String jedi: jedis) { if (Character.isLowerCase(jedi.charAt(0))) { jedis.remove(jedi); } }
  • 6. Scenario 2 Map<String, BigDecimal> movieViews = new HashMap<>(); BigDecimal views = movieViews.get(MOVIE); if(views != null) { movieViews.put(MOVIE, views.add(BigDecimal.ONE)); } views != nullmoviesViews.get movieViews.put Then Check Act
  • 7. Reducing scope for bugs ● ~280 bugs in 28 projects including Cassandra, Lucene ● ~80% check-then-act bugs discovered are put-if-absent ● Library designers can help by updating APIs as new idioms emerge ● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs CHECK-THEN-ACT Misuse of Java Concurrent Collections http://dig.cs.illinois.edu/papers/checkThenAct.pdf
  • 8. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 9. Java 8 Lazy Collection Initialization Many allocated HashMaps and ArrayLists never written to, eg Null object pattern Java 8 adds Lazy Initialization for the default initialization case Typically 1-2% reduction in memory consumption http://www.javamagazine.mozaicreader. com/MarApr2016/Twitter#&pageSet=28&page=0
  • 10.
  • 11. Java 9 API updates Collection factory methods ● Non-goal to provide persistent immutable collections ● http://openjdk.java.net/jeps/269 java.util.Optional ● ifPresentOrElse(), or(), stream(), getWhenPresent() ● Optional.get() becomes deprecated java.util.Stream ● takeWhile, dropWhile
  • 12. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 13. Categorising Collections Mutable Immutable Non-Persistent Persistent Unsynchronized Concurrent Unmodifiable View Available in Core Library
  • 14. Mutable ● Popular friends include ArrayList, HashMap, TreeSet ● Memory-efficient modification operations ● State can be accidentally modified ● Can be thread-safe, but requires careful design
  • 15. Unmodifiable List<String> jedis = new ArrayList<>(); jedis.add("Luke Skywalker"); List<String> cantChangeMe = Collections.unmodifiableList(jedis); // java.lang.UnsupportedOperationException //cantChangeMe.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker] jedis.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
  • 16.
  • 17. Immutable & Non-persistent ● No updates ● Flexibility to convert source in a more efficient representation ● No locking in context of concurrency ● Satisfies co-variant subtyping requirements ● Can be copied with modifications to create a new version (can be expensive)
  • 18. Immutable vs. Mutable hierarchy ImmutableList MutableList + ImmutableList<T> toImmutable() java.util.List + MutableList<T> toList() Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/ ListIterable
  • 19. Immutable and Persistent ● Changing source produces a new (version) of the collection ● Resulting collections shares structure with source to avoid full copying on updates
  • 20. Persistent List (aka Cons) public final class Cons<T> implements ConsList<T> { private final T head; private final ConsList<T> tail; public Cons(T head, ConsList<T> tail) { this.head = head; this.tail = tail; } @Override public ConsList<T> add(T e) { return new Cons(e, this); } }
  • 21. Updating Persistent List A B C X Y Z Before
  • 22. Updating Persistent List A B C X Y Z Before A B D After Blue nodes indicate new copies Purple nodes indicates nodes we wish to update
  • 23. Concatenating Two Persistent Lists A B C X Y Z Before
  • 24. Concatenating Two Persistent Lists - Poor locality due to pointer chasing - Copying of nodes A B C X Y Z Before A B C After
  • 25. Persistent List ● Structural sharing: no need to copy full structure ● Poor locality due to pointer chasing ● Copying becomes more expensive with larger lists ● Poor Random Access and thus Data Decomposition
  • 28. Persistent Array How do we get the immutability benefits with performance of mutable variants?
  • 29. Trie root 10 4520 3. Picking the right branch is done by using parts of the key as a lookup 1. Branch factor not limited to binary 2. Leaf nodes contain actual values a a e b c b c f
  • 30. Persistent Array (Bitmapped Vector Trie) ... ... ... ... ... ... ... ... . . . . . . 1 31 0 1 31 Level 1 (root) Level 2 Leaf nodes
  • 31. Trade-offs ● Large branching factor facilitates iteration but hinders updates ● Small branching factor facilitates updates but hinders traversal
  • 32. Java Persistent Collections - Not available as part of Java Core Library - Existing projects includes - PCollections: https://github.com/hrldcpr/pcollections - Port of Clojure DS: https://github.com/krukow/clj-ds - Port of Scala DS: https://github.com/andrewoma/dexx
  • 33. Memory usage survey 10,000,000 elements, heap < 32GB int[] : 40MB Integer[]: 160MB ArrayList<Integer>: 215MB PersistentVector<Integer>: 214MB (Clojure-DS) Vector<Integer>: 206MB (Dexx, port of Scala-DS) Data collected using Java Object Layout: http://openjdk.java. net/projects/code-tools/jol/
  • 34. Primitive specialised collections ● Collections often hold boxed representations of primitive values ● Java 8 introduced IntStream, LongStream, DoubleStream and primitive specialised functional interfaces ● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide primitive specialised collections today. ● Valhalla investigates primitive specialised generics
  • 35. Takeaways ● Immutable collections reduce the scope for bugs ● Always a compromise between programming safety and performance ● Performance of persistent data structure is improving
  • 36. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 37.
  • 38. HashMaps Basics ... Han Solo hash = 72309 Chewbacca hash = 72309
  • 39. Chaining Probing HashMaps a separate data structure for collision lookups Store inline and have a probing sequence
  • 40. Aliases: Palpatine vs Darth Sidious
  • 41. Chaining Probing HashMaps aka Closed Addressing aka Open Hashing aka Open Addressing aka Closed Hashing
  • 43. java.util.HashMap Chaining Based HashMap Historically maintained a LinkedList in the case of a collision Problem: with high collision rates that the HashMap approaches O(N) lookup
  • 44. java.util.HashMap in Java 8 Starts by using a List to store colliding values. Trees used when there are over 8 elements Tree based nodes use about twice the memory Make heavy collision lookup case O(log(N)) rather than O(N) Relies on keys being Comparable https://github.com/RichardWarburton/map-visualiser
  • 45. So which HashMap is best?
  • 46. Benchmarking is about building a mental model of the performance tradeoffs
  • 47. Example Jar-Jar Benchmark call get() on a single value for a map of size 1 No model of the different factors that affect things!
  • 48. Benchmarking HashMaps Load Factor Nonlinear key access Successful vs Failed get() Hash Collisions Comparable vs Incomparable keys Different Keys and Values Cost of hashCode/Equals
  • 49. Tree Optimization - 60% Collisions
  • 50. Tree Optimization - 10% Collisions
  • 51. Probing vs Chaining Probing Maps usually have lower memory consumption Small Maps: Probing never has long clusters, can be up to 91% faster. In large maps with high collision rates, probing scales poorly and can be significantly slower.
  • 52. Takeaways There’s no clearcut “winner”. JDK Implementations try to minimise worst case. Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes. IdentityHashMap has low memory consumption and is fast, use it! 3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
  • 54. Interface Popularity List 1576210 Set 980763 Map 803171 Queue 62024 Deque 3464 SortedSet 9121 NavigableSet 1735 SortedMap 8677 NavigableMap 1484
  • 55. Implementation Popularity ArrayList 225029 LinkedList 26850 ArrayDeque 1086 HashSet 68940 TreeSet 10108 EnumSet 10512 HashMap 137610 TreeMap 7734 WeakHashMap 3473 IdentityHashMap 2443 EnumMap 1904
  • 56. Evolution can be interesting ... Java 1.2 Java 10?
  • 57.
  • 59. Further reading Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays https://infoscience.epfl.ch/record/64410/files/techlists.pdf Smaller Footprint for Java Collections http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections http://michael.steindorfer.name/publications/oopsla15.pdf RRB-Trees: Efficient Immutable Vectors https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
  • 60. Further reading Doug Lea’s Analysis of the HashMap implementation tradeoffs http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html Java Specialists HashMap article http://www.javaspecialists.eu/archive/Issue235.html Sample and Benchmark Code https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens
  • 61. Further reading Debian code search used for popularity https://codesearch.debian.net/