Let’s explore an inventory use case and discover how it’s possible to use an eventually consistent data store like Cassandra / Datastax Enterprise to support scalability, consistency, continuous availability. Combine that with analytics and I’ll show you how to build an inventory system for the future.
(Andrew)
Hotels - descriptive data about the hotels and their products, and policies. Quite static
Rates - prices that are charged for the products.
these can change many times a day, and could include an automated pricing system
Inventory - constantly changes as rooms are booked, cancelled, etc. Data quality and currency is extremely important here so we don’t oversell our hotels
Reservations - contract with the customer. Generally only changed when initiated by the customer, infrequent changes (The Marriott may disagree after dealing with me)
(Andrew)
After we identified our key data types, they seemed like a good way to divide the work and the system landscape
As work commenced, we actually divided things a little further, and kept the keyspace per service idea going, approaching a share-nothing architecture style.
We stayed with a single cluster for now to ease operations and reduce cost.
On top of this we added services to encapsulate the complexities shopping and booking
We used rules at this level to define business logic likely to change
We also built data maintenance applications to:
synchronize of data from other systems – our legacy system as well as some other systems that will stay in operation, such as property management systems
Verify data accuracy across systems and across service boundaries
Correct data issues caused by defects
(Jeff)
We’ve made use of Netflix’s Simian Army in order to build reliability into the system
Part of this was allowing Chaos Monkey to kill Cassandra nodes to make sure our clusters could survive losing nodes abruptly
This helped us mature our cluster monitoring and test automated cluster management capabilities.
It also helped us uncover an unexpected behavior of the Java driver, which has since been fixed in the 3.0 driver.
Our configuration was using a DNS name to locate the nodes in the cluster. Calling the “addContactPoint()” operation with the well known name initially bound to one record. If this happened to be the node that was killed by Chaos Monkey, before the record was cleared from DNS, the driver would fail to connect to that node, and would be unable to bootstrap. We worked around this by calling addContactPoints() instead, which binds to multiple IP addresses so that it can make multiple connection attempts.
This has been fixed in the 3.0 driver – addContactPoint() now binds to multiple Ips if you use a DNS name.
The moral of the story - to help mitigate against common connection issues, we created a common library to manage connections across services and connectivity. It loads connection information from the environment, including the cluster name, security credentials
(Jeff)
One of the challenges of a microservices architecture is keeping changes in sync across service boundaries. One example situation is in booking a reservation.
Since the reservation represents our contract with the customer to reserve a specific room at a specific price and with certain conditions, we need to mark a reservation as committed at the same time as we reserve the inventory.
This is important so that we don’t accidentally overbook our hotel. Making the situation more complicated, there could be simultaneous bookings and data maintenance activities also trying to access the same inventory
Since these types are split across microservice boundaries, there is no transaction mechanism. In fact, since the data is in different rows (and different tables), Cassandra’s lightweight transactions are of no use to us here.
We solved this by a layered approach – LWTs to protect inventory counts, retries within the booking service, and compensating processes to detect and cleanup failures
(Jeff)
Thankfully we have a variety of tools in our toolbox for guaranteeing consistency. Some of these are provided by Cassandra but some of them are architecture approaches.
(Andrew)
We have separated the shopping and booking concerns from our analysis and history uses, which means that in the shopping and booking systems, data relevant to the past is not much use.
As we insert our data, we set the TTL for when it will no longer be needed, which saves us from developing our own cleanup process and reduces our storage footprint.
We still need the historic data for analysis and customer service purposes, though, so we store it in a separate data platform which we feed from the reservation system using asynchronous event processing
Our colleague Narasimhan Sampath is talking at Strata NYC later this month about our data and analytics platform, which is based on Spark and Hadoop. Make sure to check out his talk if you’re attending Strata.
(Jeff)
after religiously following the mantra of designing tables for each access pattern, we soon ran into cases where adding a table per unique access pattern proved to be too much
Take for example hotels and the number of ways by which various clients could search for hotels
Since the hotel records are quite large, imagine the impact of all of these tables on our cluster size and storage requirements for 6000+ hotels.
We reined this in by designing tables to support multiple queries and doing some filtering at the service layer, which helped us rein in our computing costs.
We’re also looking to move to Cassandra 3.X in order to take advantage of materialized views and SASI indexes, which will allow us to shift some of the processing burden back to the database