Michael Klishin from Pivotal gave an update on RabbitMQ. He discussed the state of the 3.7.x releases which focused on deployment automation, operator friendliness, and stability. Future plans include improved scalability, reliability, and simplifying upgrades. Key projects include implementing quorum queues using Raft for improved mirroring, adding OAuth 2.0 support, developing a new schema storage system, and allowing mixed version clusters. The 3.8 release will include quorum queues, OAuth 2.0, and initial mixed version cluster support.
12. Future directions
General directions
! Improved reliability and operator friendliness
! Improved scalability
! Simplify upgrades
! Continue improving and expanding the ecosystem
! Move away from home grown distsys algorithms 🤦 "
! Improved correctness
! Repay technical debt
13. What problems are we addressing?
Scalability
! Mirroring uses a ring topology which is linear
! Excessive bandwidth usage
! No DC awareness
! The core is not protocol-agnostic which leads
to overhead
Reliability
! Recovery from failures is not always
predictable (or easy to reason about)
! Fails to pass certain resiliency tests in some
scenarios
! Queue sync after failure can take too long and
cause a thundering herd
! Schema data store (Mnesia) is overly
opinionated and isn’t a great fit
! Lack of unified WAL/oplog makes backups
hard
14. Future directions
Currently in progress
! Quorum queues (mirrored queues 2.0) based
on Raft
! OAuth 2.0 support
! Mixed version clusters
! Mnevis: a new schema data store
! Protocol-agnostic core
16. How does Raft-based replication compare?
Improved scalability
! No ring topology, parallel replication
! More reasonable bandwidth usage*
! Opens the door to cross-DC replication
Improved reliability
! Recovery from failures is predictable and well
defined
! Queue sync after failure transfers as little data
as possible
! Passes more resiliency tests
! Opens the door to a unified WAL/oplog to
simplify backups
18. What is Raft?
! A group of algorithms for reaching consensus in a distributed system
! Implementer-focused
! Proven
! Multiple implementations
! Industry use
! TLA+ specification
22. Quorum queues
The State of Quorum Queues
! Reasonably polished
! Throughput beats mirrored queues
! Passes a [slightly modified] Jepsen test
! Available in RabbitMQ 3.8.0-beta.1 today
23. Quorum queues
Implementing Quorum Queues
! Powered by our own from scratch Raft implementation
! github.com/rabbitmq/ra
! Heavily geared towards but not specific to RabbitMQ
! Deviates from Raft a little bit to reduce network and disk I/O
! Does not sacrifice correctness
! Includes a peer unavailability detection library
! Optimized for throughput, adapts I/O for latency or throughput based on load
24.
25. Quorum queues
The State of Quorum Queues
! Has limitations
! Doesn’t support some features, e.g. TTL won’t be supported
! Memory management is still a hard problem
! As any new major feature, will take time to mature (you can help by giving it a try!)
27. One queue type
to rule them all?
What are the chances that all
these features can work efficiently
or be easy to reason about when
they are combined?
! Durable
! Auto-delete
! Mirrored
! Lazy
! With TTL
! With message TTL
! With length limit
! With dead-lettering
! With failure recovery settings
! With exclusive consumers
! Affected by plugins
! …
28. One queue type
per workload?
Or is there a better way?
! Durable queues with consensus replication
! Transient queues, auto-delete with TTL and/or
length limit
! “Infinite queues” (durable, aggressive paging to
disk)
! Lower latency in-memory queues
30. OAuth 2.0 support
How does it work?
! Implemented as a plugin, rabbitmq_auth_backend_oauth2
! OAuth 2.0/JWT token scopes that follow naming conventions are translated to RabbitMQ
permissions
! Clients can use any OAuth 2.0 authorization code flow
! Management UI will use the authorization code flow
! Officially supported clients will simplify token renewal
31. OAuth 2.0 support
The State of OAuth 2.0 Support
! Plugin is done, open source and currently targets RabbitMQ 3.8
! Will ship in 3.8.0 Milestone 2
! Management UI needs work
! Targets UAA and AD
33. How does next gen schema storage compare?
Improved scalability
! Mnesia does reasonably well here (for schema
storage)
! Opens the door to DC awareness
! No long-entrenched Mnesia limitations
! Higher rate of development iteration
Improved reliability
! Recovery from failures that works for
RabbitMQ users (and maintainers)
! Nodes no longer have to erase themselves in
order to re-sync
! Can integrate with the unified WAL/oplog
34. Next gen schema storage
The State of Mnevis
! Currently an area of active research
! Mnesia will be used in a node-local way
! Mnesia is quite extensible
! Raft is not necessarily a great fit for transaction log propagation due to [relatively] high
latency
36. Mixed version clusters
How do we get there?
! Relaxing overly conservative restrictions
! Capability testing instead of version testing
! Feature flags
! Extensive internal refactoring
! Somewhat funny APIs to work around Erlang record limitations
37. Mixed version clusters
The State of Mixed Version Clusters
! Feature flag experiment with promising results so far
! Meant to simplify upgrades
! Long running mixed clusters is not a goal
! Cannot guarantee safety for every possible breaking change
! Needs more feedback