Infinispan, a distributed in-memory key/value data grid and cache

Infinispan
Distributed in-memory key/value data
grid and cache
@infinispan

Agenda
• Introduction
• Part 1
• Hash Tables
• Distributed Hash Tables
• Consistent Hashing
• Chord Lookup Protocol
• Part 2
• Data Grids
• Infinispan
• Architecture
• Consistent Hashing / Split Clusters
• Other features

Part I – A (very) short introduction to
distributed hash tables

Hash Tables
Source: Wikipedia
http://commons.wikimedia.org/wiki/File:Hash_table_5_0_1_1_1_1_1_LL.svg#/media/File:Hash_table_5_0_1_1_1_1_1_LL.svg

Distributed Hash Tables (DHT)
Source: Wikipedia - http://commons.wikimedia.org/wiki/File:DHT_en.svg#/media/File:DHT_en.svg

• Decentralized Hash Table functionality
• Interface
• put(K,V)
• get(K) -> V
• Nodes can fail, join and leave
• The system has to scale
Distributed Hash Tables (DHT)

• Flooding in N nodes
• put() – store in any node O(1)
• get() – send query to all nodes O(N)
• Full replication in N nodes
• put() – store in all nodes O(N)
• get() – check any node O(1)
Simple solutions

Fixed Hashing
NodeID = hash(key) % TotalNodes.

Fixed Hashing with High Availability
NodeID = hash(key) % TotalNodes.

Fixed Hashing and Scalability
NodeID = hash(key) % TotalNodes+1.

2 Nodes, Key Space={0,1,2,3,4,5}
NodeID = hash(key) % 2.
NodeID = hash(key) % 3.
N0 (key mod 2 = 0) N1 (key mod 2 = 1)
0,2,4 1,3,5
N0 (key mod 3 = 0) N1 (key mod 3 = 1) N2 (key mod 3 = 2)
0,3 1,4 2,5

Consistent Hashing – The Hash Ring
0
N0
N1
N2
K1
K2
K3
K4
K5
K6

Consistent Hashing – Nodes Joining, Leaving
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

Chord: Peer-to-peer Lookup Protocol
• Load Balance – distributed hash function, spreading
keys evenly over nodes
• Decentralization – fully distributed no SPOF
• Scalability – logarithmic growth of lookup cost with
the number of nodes, large systems are feasible
• Availability – automatically adjusts its internal tables
to ensure the node responsible for a key is always
found
• Flexible naming – key-space is flat (flexibility in how
to map names to keys)

Chord – Lookup O(N)
Source: Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications
Ion Stoica , Robert Morrisz, David Liben-Nowellz, David R. Kargerz, M. Frans Kaashoekz, Frank Dabekz, Hari Balakrishnanz

Chord – Lookup O(logN)
• K=6 (0, 26−1)
• Finger[i] = first node that succeeds
(N+ 2𝑖−1
) mod 2K
, where 1 ≤ 𝑖 ≤ 𝐾
• Successor/Predecessor – the next/previous node on circle

Chord – Node Join
• Node 26 joins the system between nodes 21 and 32.
• (a) Initial state: node 21 points to node 32;
• (b) node 26 finds its successor (i.e., node 32) and points to it;
• (c) node 26 copies all keys less than 26 from node 32;
• (d) the stabilize procedure updates the successor of node 21
to node 26.

• CAN (Hypercube), Chord (Ring), Pastry (Tree+Ring),
Tapestry (Tree+Ring), Viceroy, Kademlia, Skipnet,
Symphony (Ring), Koorde, Apocrypha, Land,
Bamboo, ORDI …
The world of DHTs …

Part II – A short introduction to
Infinispan

Where do we store data?
One size does not fit all...

Infinispan – History
• 2002 – JBoss App Server needed a clustered solution for
HTTP and EJB session state replication for HA clusters.
JGroups (open source group communication suite) had a
replicated map demo, expanded to a tree data structure,
added eviction and JTA transactions.
• 2003 – this was moved to JBoss AS code base
• 2005 – JBoss Cache was extracted and became a standalone
project
… JBoss Cache evolved into Infinispan, core parts redesigned
• 2009 – JBoss Cache 3.2 and Infinispan 4.0.0.ALPHA1 was
released
• 2015 - 7.2.0.Alpha1
• Check the Infinispan RoadMap for more details

Code?
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-embedded</artifactId>
<version>7.1.0.Final</version>
</dependency>
EmbeddedCacheManager cacheManager = new DefaultCacheManager();
Cache<String,String> cache = cacheManager.getCache();
cache.put("Hello", "World!");

Usage Modes
• Embedded / library mode
• clustering for apps and frameworks (e.g. JBoss
session replication)
• Local mode single cache
• JSR 107: JCACHE - Java Temporary Caching API
• Transactional local cache
• Eviction, expiration, write through, write
behind, preloading, notifications, statistics
• Cluster of caches
• Invalidation, Hibernate 2nd level cache
• Server mode – remote data store
• REST, MemCached, HotRod, WebSocket (*)

Code?
Configuration config = new ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
.sync()
.l1().lifespan(25000L)
.hash().numSegments(100).numOwners(3)
.build();
Configuration config = new ConfigurationBuilder()
.eviction()
.maxEntries(20000).strategy(EvictionStrategy.LRU)
.expiration()
.wakeUpInterval(5000L)
.maxIdle(120000L)
.build();

Infinispan – Core Architecture
Remote App 1 (C++) Remote App 2 (Java) Remote App 3 (.NET)
Network (TCP)
Node (JVM)
MemCached, HotRod, REST,
WebSocket (*)
Embedded App (Java)
Transport (JGroups)
Notification
Transactions / XA
Query
Map / Reduce
Monitoring
Storage Engine
(RAM +
Overflow)
Node (JVM)
MemCached, HotRod, REST,
WebSocket (*)
Embedded App (Java)
Transport (JGroups)
Notification
Transactions / XA
Query
Map / Reduce
Monitoring
Storage Engine
(RAM +
Overflow)
TCP/UDP

Infinispan Clustering and Consistent Hashing
• JGroups Views
• Each node has a unique address
• View changes when nodes join, leave
• Keys are hashed using MurmurHash3
algorithm
• Hash Space is divided into segments
• Key > Segment > Owners
• Primary and Backup Owners

Does it scale?
• 320 nodes, 3000 caches, 20 TB RAM
• Largest cluster formed: 1000 nodes

Primary and Backup
CLUSTER
K1
K1

Add another one
CLUSTER
K1
K1
K2

Primary And Backup
CLUSTER
K1
K1
K2K2

A cluster with more keys
CLUSTER
K1
K1
K2K2
K3
K3
K4
K4
K5
K5

A node dies…
CLUSTER
K1
K1
K2K2
K3
K3
K4
K4
K5
K5

The cluster heals
CLUSTER
K1
K1
K2K2
K3 K3
K4
K4
K5
K5

If multiple nodes fail…
• CAP Theorem to the rescue:
• Formulated by Eric Brewer in 1998
• C - Consistency
• A - High Availability
• P - Tolerance to Network Partitions
• Can only satisfy 2 at the same time:
• Consistency + Availability: The Ideal World where
network partitions do not exist
• Partitioning + Availability: Data might be different
between partitions
• Partitioning + Consistency: Do not corrupt data!

Infinispan Partition Handling Strategies
• In the presence of network partitions
• Prefer availability (partition handling DISABLED)
• Prefer consistency (partition handling ENABLED)
• Split Detection with partition handling ENABLED:
• Ensure stable topology
• LOST > numOwners OR no simple majority
• Check segment ownership
• Mark partition as Available / Degraded
• Send PartitionStatusChangedEvent to listeners

Cluster Partitioning – No data lost
K1
K1
K2K2
K3
K3
K4
K4
K5
K5
Partition1 Partition2

Cluster Partitioning – Lost data
K1
K1
K2K2
K3
K3
K4
K4
K5
K5
Partition1
Partition2

Merging Split Clusters
• Split Clusters see each other again
• Step1: Ensure stable topology
• Step2: Automatic: based on partition state
• 1 Available -> attempt merge
• All Degraded -> attempt merge
• Step3: Manual
• Data was lost
• Custom listener on Merge
• Application decides

Querying Infinispan
• Apache Lucene Index
• Native Query API (Query DSL)
• Hibernate Search and Apache Lucene to index and
search
• Native Map/Reduce
• Index-less
• Distributed Execution Framework
• Hadoop Integration (WIP)
• Run existing map/reduce jobs on Infinispan data

Map Reduce:
MapReduceTask<String, String, String, Integer> mapReduceTask
= new MapReduceTask<>(wordCache);
mapReduceTask
.mappedWith(new WordCountMapper())
.reducedWith(new WordCountReducer());
Map<String, Integer> wordCountMap = mapReduceTask.execute();

Query DSL:
QueryParser qp = new QueryParser("default", new
StandardAnalyzer());
Query luceneQ = qp
.parse("+station.name:airport +year:2014 +month:12
+(avgTemp < 0)");
CacheQuery cq = Search.getSearchManager(cache)
.getQuery(luceneQ, DaySummary.class);
List<Object> results = query.list();

Other features
• JMX Management
• RHQ (JBoss Enterprise Management Solution)
• CDI Support
• JSR 107 (JCACHE) integration
• Custom interceptors
• Runs on Amazon Web Services Platform
• Command line client
• JTA with JBoss TM, Bitronix, Atomikos
• GridFS (experimental API), CloudTM, Cross Site
Replication

Thank you!
Resources:
http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf
http://pdos.csail.mit.edu/papers/ton:chord/paper-ton.pdf
http://www.martinbroadhurst.com/Consistent-Hash-Ring.html
http://infinispan.org/docs/7.2.x/user_guide/user_guide.html
https://github.com/infinispan/infinispan/wiki

Infinispan, a distributed in-memory key/value data grid and cache

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Infinispan, a distributed in-memory key/value data grid and cache

Ähnlich wie Infinispan, a distributed in-memory key/value data grid and cache (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Infinispan, a distributed in-memory key/value data grid and cache