Java collections such as ArrayList and HashMap were discussed. Common problems with collections like concurrent modification exceptions were presented along with scenarios demonstrating check-then-act bugs. Immutable and persistent collections were described as alternatives that reduce bugs by restricting state changes. Tradeoffs between different collection implementations like memory usage and performance were examined.
7. Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads &
updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collections
http://dig.cs.illinois.edu/papers/checkThenAct.pdf
9. Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object
pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.
com/MarApr2016/Twitter#&pageSet=28&page=0
14. Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
17. Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
19. Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying
on updates
20. Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
24. Concatenating Two Persistent Lists
- Poor locality due to pointer chasing
- Copying of nodes
A B C
X Y Z
Before
A B C
After
25. Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
28. Persistent Array
How do we get the immutability benefits with performance of mutable
variants?
29. Trie
root
10 4520
3. Picking the right branch is done by using
parts of the key as a lookup
1. Branch factor
not limited to
binary
2. Leaf nodes
contain actual
values
a
a e
b
c
b c f
31. Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
32. Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes
- PCollections: https://github.com/hrldcpr/pcollections
- Port of Clojure DS: https://github.com/krukow/clj-ds
- Port of Scala DS: https://github.com/andrewoma/dexx
33. Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MB
Integer[]: 160MB
ArrayList<Integer>: 215MB
PersistentVector<Integer>: 214MB (Clojure-DS)
Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout: http://openjdk.java.
net/projects/code-tools/jol/
34. Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
35. Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
44. java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
47. Example Jar-Jar Benchmark
call get() on a single value for a map
of size 1
No model of the different factors that
affect things!
48. Benchmarking HashMaps
Load Factor
Nonlinear key access
Successful vs Failed get()
Hash Collisions
Comparable vs Incomparable keys
Different Keys and Values
Cost of hashCode/Equals
51. Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be
significantly slower.
52. Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps
“precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.