ContextSpace is working to develop and support an open source implementation of the Camunda core engine that persists all if its data to Cassandra. This development addresses issues of ACID as well as approaches to lock management. ContextSpace plans to integrate this implementation with its own product offering in order to expose data and events generated from its identity, security, roles, messaging and contextual user activities to be managed by Camunda-driven business processes.
2. CONTEXTSPACE, CAMUNDA AND CASSANDRA
• ContextSpace is a platform for executing secure digital customer
engagement on a very large scale.
• Digital engagement requires consistency, a.k.a. Business Process
Management.
• Digital engagement requires Big Identity, with many contextual
profiles per person and massive quantities of behavioral data
from customer interactions and mobile devices, hence Cassandra.
4. WHO IS CONTEXTSPACE DESIGNED FOR?
Very large communities of consumers, with many contextual
relationships, typically found in:
• Health Care
• Telecom
• Smart Cities
• Digital Media
• Retail
• Energy
• Service Providers
6. TYPICAL CONTEXTSPACE USE OF CAMUNDA
ContextSpace employs Camunda as a toolset to allow its customers
to develop consistent digital engagement business processes, with a
common underlying API access to all services and data.
We also create Camunda-based service processes to support highly
specialized industrial patterns that require:
• Specific conditional processing
• Industry data formats (such as HL7 FHIR)
• Multiple service escalation paths
• Long running processes
8. HIGH VELOCITY HEALTH OBSERVATION INGESTION
mHealth
Camunda
Processes
Cassandra
Persistence
Analytics
9. WHY DOES CONTEXTSPACE USE CASSANDRA?
• Exceptional performance, linearly scaled
• No DBAs, no tuning, fault tolerant, lights out operation
• Inherently multi-data centre aware for active-active
distributed operations
• Simple - replaces more than a dozen heterogeneous
technologies that our solution would otherwise require
All ContextSpace data services are based on Cassandra,
enabling reliable operations with low operational costs.
10. OPEN SOURCE CAMUNDA ON CASSANDRA
• Begun in Berlin as “hackathon session” in July 2015
between Camunda and ContextSpace
• Amazing Camunda Team!
• Maintained by ContextSpace
• Purpose: extend scalability, availability and
distributed processing to Camunda operations
• Initial goal: support core BPM Engine functions,
including Job Executor
• Target production launch: Q4 2015
11. CASSANDRA STRENGTHS
• Great for stable or immutable data
• Writes are faster than reads (both are very fast)
• Highly granular, tunable consistency
• Inherent data retention management
• Very reliable
• All nodes identical, no SPOF, no “fail-over” required
• No effective limit to quantity of managed data
12. CASSANDRA LIMITATIONS
• No ad-hoc searches– need to model your queries, not your
data
• Deleting data can be problematic. Deleting entire rows
works well, “deleting” columns creates “tombstones” that
can decrease performance and compromise stability.
• Rapidly changing columnar data is an anti-pattern
• No native join operations
• Rudimentary locking support
• Rudimentary native indexing support
13. CASSANDRA INDICES
• “Out of the Box” secondary indices are limited to low-
cardinality data. These are useless or even dangerous
to use.
• Most developers employ custom indexing schemes very
successfully. For Camunda, a simple, unsorted reverse
lookup table will meet most requirements.
• Camunda job scheduler requires sorted indices
• Therefore, we have developed a Cassandra indexing
framework for Camunda that supports both sorted and
unsorted indexing.
14. CASSANDRA LOCKING
• Native Cassandra locking is rudimentary
• Atomic batches can be used for transactions (multiple
insert, update, delete in a single logical operation)
• However, if batch performs a lock, all statements can
only apply to one partition (row).
• With separate tables used for custom indices, scheduled
jobs, etc., we need to use multiple batches within one
transaction, however, this breaks the atomicity.
15. LOCKING APPROACH AND LIMITATIONS
• We lock an entire process, including executions, event
subscriptions, and variables by placing all this data in one
Cassandra partition (row).
• Indices and jobs are still not updated atomically, leading to
potential data integrity issues.
• When locking the entire process, parallel (non-exclusive) tasks will
execute sequentially due to optimistic locking. This is tolerable for
many use cases, but undesirable for others (such as events that
need to occur on time). For example, our urgent customer
messaging events need to conform to service levels.
16. OVERCOMING LOCKING LIMITATIONS - ZOOKEEPER
• Industry best practice for Cassandra locking is to use an external
lock manager. ContextSpace uses Zookeeper, which shares
Cassandra distributed and “lights-out” management strengths.
• With Zookeeper, we can maintain any degree of locking
granularity, which permits parallel execution to be performed.
• By removing locking from atomic batches, we can maintain full
atomicity for transactions.
• We are currently implementing Zookeeper support and
recommend this configuration for production applications.
17. JOB SCHEDULER (ASYNCH EXECUTION)
• Support for the Job Scheduler has recently been added to
this project.
• Camunda job scheduling presents us with another
challenge.
• We only need to pick jobs that are due. This requires using
global ordering of jobs by time.
• This is essentially a queue.
18. JOB SCHEDULER (ASYNCH EXECUTION)
• Cassandra and queuing patterns don’t like each other.
• Cassandra can only maintain order within a single partition.
• Remember the “tombstones?” Queuing operations generate a
heavy columnar delete workload. This creates the “perfect
storm” for Cassandra.
• Accessing a common partition (which is always stored on a
single Cassandra node), will also create a database hotspot.
19. JOB SCHEDULER (ASYNCH EXECUTION)
• In this situation, best practice is to partition scheduling data to
contain one time slice per partition (such as a day or an hour).
• This addresses the problem with excessive tombstones, but
does not fully alleviate the hotspot. However, given Cassandra
performance, this will only become an issue for very large
workloads.
• For high workload deployments, we will implement support for
an external queue manager. Kafka is a distributed queue
manager that shares lights-out characteristics with Cassandra
and Zookeeper and is already part of the ContextSpace
architecture.
20. CASSANDRA PERSISTENCE FOR CAMUNDA
• Is Cassandra a natural match for Camunda? In a word, not
really….
• Camunda creates and then deletes a lot. Cassandra generally
hates deletions.
• Cassandra locking is enforced at partition (row) level only. Using
Zookeeper, we can update process, job execution and indices
atomically.
• Locking one process within a row precludes parallel executions.
Again, Zookeeper locking addresses this.
• Job Executor always searching for next jobs at current time, will
consistently create hotspots in Cassandra. This is fine for most
workloads, but for very high loads we will support Kafka.
21. UNDERSTANDING THE STACK OPTIONS
• CAMUNDA + CASSANDRA implements optimistic locking
and will not support parallel executions, but will be highly
available across multiple distributed data centres.
• CAMUNDA + CASSANDRA + ZOOKEEPER implements
granular locking and can support the full capabilities of the
CAMUNDA BPM engine.
• CAMUNDA + CASSANDRA + ZOOKEEPER + KAFKA
eliminates all anti-patterns and will provides support for
virtually unlimited workloads in a distributed environment
22. SUMMARY
• Cassandra is not a “native fit” for Camunda
• Therefore, much work has been done to counter anti-
patterns
• Nevertheless, backing Camunda with Cassandra promises
incredible performance, scalability, availability and
operational gains
• ContextSpace focus is on completing operational support
for the core BPM engine
• Additional Camunda user application queries can then be
incrementally supported as they are required.
23. THANK YOU AND BRIEF Q&A
Stan Levine
stan.levine@contextspace.com