New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Acunu and Hailo: a realtime analytics case study on Cassandra
1. Hailo - a case study for Cassandra & Acunu
dai clegg
october 2013
JAX London
@daiclegg @acunu
2. What is Hailo?
‣ The world’s highest-rated taxi app – over 11,000 five-star
reviews
‣ Over 500,000 registered passengers
‣ A Hailo hail is accepted around the world every 4 seconds
‣ Hailo operates in 15 cities on 3 continents from Tokyo to
Toronto in nearly 2 years of operation
2
@daiclegg @acunu
3. The Adoption of Cassandra & Acunu at Hailo
‣ Launched on AWS
‣ Two PHP/MySQL web apps plus a Java backend
‣ Mostly built by a team of 3 or 4 backend engineers
‣ MySQL multi-master for single available zone resilience
‣
Get/create/update entity
‣
Analytics
‣
Text search
3
@daiclegg @acunu
4. The Adoption of Cassandra & Acunu at Hailo
‣ A desire for greater resilience – “become a utility”
‣
Cassandra is designed for high availability
‣ Plans for international expansion around a single consumer app
‣
Cassandra is good at global replication
‣ Expected growth
‣
Cassandra scales linearly for both reads and writes
‣ Prior experience
‣
successful in-team experience with Cassandra
4
@daiclegg @acunu
5. The Adoption of Cassandra & Acunu at Hailo
‣ Replacement of key consumer app functionality,
‣
split PHP/MySQL web app into:
‣ a mixture of PHP/Java services
‣ backed by a Cassandra data store
‣ Launched into production in September 2012
‣
originally just powering North American expansion,
‣
gradually switching over Dublin and London
5
@daiclegg @acunu
6. The Adoption of Cassandra & Acunu at Hailo
‣ Further decompose functionality into Go/Java SOA
‣ Migrating:
‣
Entity databases to Cassandra
‣
Analytics to Acunu
‣
Search into Elastic Search
6
@daiclegg @acunu
9. Some Considerations for Data Modeling
‣ Do not read the entire entity, update one property and then
write back a mutation containing every column
‣
Only mutate columns that have been set
‣
This avoids read-before-write race conditions
‣ Choose row key carefully, since this partitions the records
‣ Think about how many records you want in a single row
‣ Denormalise on write into many indexes/views
9
@daiclegg @acunu
10. Some Considerations for Data Modeling
not obvious!
Average years experience per team member
10
MySQL
Cassandra
10
@daiclegg @acunu
12. Some considerations for Application Development
People who can
attempt to query
MySQL
People who can
attempt to
query Cassandra
12
@daiclegg @acunu
15. Acunu Analytics
Hailo needed to understand system performance/business SLAs
‣ Raw Cassandra lacks analytic primitives
‣
eg: COUNT, SUM, AVG, GROUP BY
‣ Acunu Analytics provides a platform for real time
‣
for pre-planned query templates
‣ It uses Cassandra as the store
‣
so it is highly available, resilient and globally distributed
‣ Integration is straightforward
15
@daiclegg @acunu
18. Acunu Analytics: an example
Rich instant queries over cubes
Define aggregate cubes:
SELECT
FROM
WHERE
AND
GROUP BY
JOIN
HAVING
ORDER BY
CREATE CUBE APPROX TOP(keyword)
WHERE browser, time
GROUP BY time
New events update cubes
TOP(keyword)
table
browser = ‘chrome’
time BETWEEN..
d1, d2, ...
...
..
..
build cube
from history
Populate new cubes
from historic data
Drill down to raw events
18
@daiclegg @acunu
19. Acunu Analytics: summary
Overview of the workflow
develop queries in AQL, query
builder or self-service data explorer
invoke queries from within
applications with JSON query API
define aggregation cubes with DDL
or infer from self-service queries
define alerts to be raised
on trigger conditions
fill cube
from history
define connector: either
from library, toolkit or REST
populate new cubes from
historic data
define pre-processors:
programmatic, Java or
Javascript; or AQL query
define event schema with DDL
or infer from sample events
19
@daiclegg @acunu
20. Acunu Analytics at Hailo
some sample screenshots
“drill-across” to see
breakdown of data
and in-depth
analysis
20
@daiclegg @acunu
21. Acunu Analytics at Hailo
use cases
‣ Infrastructure and Application monitoring
‣ Real-time A/B testing of app layout and incentives
‣ Real time geo-view of supply/demand for drivers
‣ More in the pipeline
21
@daiclegg @acunu
23. Conclusions
Choosing the Platform
‣ Solid Cassandra design
‣
High availability characteristics
‣
Easy multi-data centre setup
‣
Simplicity of operation
‣ With Acunu
‣
SQL-like rich queries
‣
easier data modeling
23
@daiclegg @acunu
24. Conclusions
Exploiting the platform
‣ Have an advocate
‣
sell the dream
‣ Learn the fundamentals
‣
get the best out of Cassandra
‣ Invest in tools to make life easier
‣ Keep management in the loop
‣
explain the trade offs
24
@daiclegg @acunu
25. Thank You.
Apache, Apache Cassandra, Cassandra and the eye logo
are trademarks of the Apache Software Foundation.
@daiclegg @acunu