This document discusses using RabbitMQ and Java workers to provide atomic and idempotent counters in Cassandra. The workers consume messages from RabbitMQ queues, perform in-memory atomic counter operations, persist to Cassandra, and push static counter values to Cassandra periodically. This allows real-time analytics, fast operations, atomicity through single-threaded workers, and graceful recovery if a worker crashes through the persisted static counter values. The approach supports over 1 million operations per second per worker and requires sharding data at the application layer for scalability beyond a single worker.
2. Who is this guy?
I’m also the Founder, which in Latin means
“everyone else gets paid before me.”
~ Literal Translation
3. Founded in 2012
Currently handle around 1/2 Billion call billing records
per day.
What is 46 Labs?
We build realtime telecom analytics
and security solutions for Carriers and Enterprises
4. Shout Outs
#Cassandra IRC Channel
“Unbelievable resource”
!
“Thumbs up for the Startup Program”
Nate McCall
“Helped us in our time of need”
5. To all of you who aren’t in that ballpark…feel free to take
the pitch and swing away.
Patent Warning
So…we the have parts of this process related to the
handling of telecom analytics and billing records patented.
!
Fair Warning to the telecom folks in the room.
6. You can do an operation several times without changing
the result as a function of performing the operation.
Simple Answer:
What is idempotence?
Example:
For example, as “set” is idempotent. An “increment or decrement”
isn’t. Not just with Cassandra, but with anything, by definition.
7. But why?
Because counters are NOT atomic in Cassandra.
Why does it matter?
Because it is really, really, really hard to do anything
atomic and distributed, especially counters.
8. Since counters aren’t idempotent, by definition, and not
atomic in Cassandra, it means that if you repeated the
same counter operation 100 times….you might get
different results on each run.
So…
???
9. It means that you can’t use Cassandra counters for anything
requiring precision….like billing balances, voting, statistical
analysis or any time-series data that must be exact.
The higher the volume and the more nodes you have, the
more inaccurate the counters become.
And…?
10. If you are wanting atomic counters inside of a database
as of today’s date, then maybe.
Hint: We have tried both (and a lot more). They are slow. Like…really slow for this type of
operation and have hurdles way beyond just being slow.
So I should use Mysql or Couchbase?
11. Is there a chance that a better alternative exists that will
allow me to use Cassandra and have atomic and
idempotent counters?
So, All is Lost?
Yeap.!
!
But it involves some helpers.
13. Our call billing records come off our infrastructure and go
into a RabbitMQ cluster.
!
Hint: you could use Kafka, Redis, 0MQ, etc.
The RabbitMQ queues are a nice and safe place for our messages to sit and
wait to be processed.
RabbitMQ
With RabbitMQ ACKs, we can be sure the messages are fully processed
before they are removed.
14. We wrote Java workers, who’s sole job in life is to:
1. Consume Messages from Rabbit!
!
2. Perform In-memory atomic increment operations (increment/decrement).!
!
3. Persist the message to Cassandra.!
!
4. Push a static counter value into Cassandra (i.e. a set instead of an increment) every X seconds.!
!
5. ACK that the operation is complete back to Rabbit.
Workers
(You can use whatever language you prefer)
15. 1. You can stream analytics in realtime.
!
2. Being in-memory, it is ridiculously fast and lightweight.
!
3. Its atomic because each counter constituent is in a single thread.
!
4. Cassandra can be used to atomically persist the counter.
!
5. The counter data matches the underlying data used to generate it exactly.
Why is this special?
16. What happens if the worker crashes…its all in memory!!
!
Refer to step 4 in what our worker’s job is to do:
“Push a static counter value into Cassandra (i.e. a set instead of an increment) every second.”
Wait…
Since we push a static counter value into Cassandra, we now have an idempotent way
to recover gracefully in the event of a crash. The worker fires up, asks Cassandra what
it should have in its memory, then starts its atomic operations again. This backup
worker can come up (Zookeeper) on a different physical or virtual host if needed.
17. Since you are limited to a single thread processing a single counter….once you run out of memory
or saturate the CPU for that counter you can’t grow!!
!
Yeap. This is why we shard our data at the application layer and not the worker layer. We abstract
scalability further out knowing we have a finite amount of memory and processing power to play with at
the worker level.
You cant grow!
We can atomically handle 1M ops/sec from a single worker on a single moderately
powered server. If you are taxing that single server you need to re-think your
architecture.!
18. Sure it does.
Does it work?
We currently process over 2 million counter operations
per second using this method.
19. Questions?
If you think of any ones that you forgot to ask,
you can email me at trevor@46labs.com.