5. Motivation
A system starts out simple…
…but gets complex in the real world
…as you address real requirements
Application
client library
Scale
Failover
Bootstrapping
Call Routing
System
Replica 1 …
Replica 2 …
5
6. Motivation
Scale
Failover
Bootstrapping
These are cluster management problems
Helix solves them once…
…so you can focus on your system
6
7. Outline
What is Helix
Use case 1: distributed data store
Architecture
Use case 2: consumer group
Helix at LinkedIn
Q&A
7
14. Use-case requirements
• Partition constraints
• 1 master per partition
• Balance partitions across cluster
• No single-point-of-failure: replicas on different nodes
• Handle failures: transfer mastership
• Elasticity
• Distribute workload across added nodes
Minimize partition movement
• Meet SLAs
Throttle concurrent data movement
14
15. Declarative Problem Statement
State machine Constraints
– States – States
offline, slave, master – Transitions
– Transitions Objective
O-S, S-O, S-M, M-S
– Partition placement
COUNT=2 minimize(maxnj∈N S(nj) )
t1≤ 5
S
t1 t2
t3 t4
O M COUNT=1 minimize(maxnj∈N M(nj) )
15
30. Outline
What is Helix
Use case 1: distributed data store
Architecture
Use case 2: consumer group
Helix at LinkedIn
Q&A
30
31. Helix usage at LinkedIn (Pictures)
Espresso
– a timeline-consistent, distributed data store
Databus
– a change data capture service
Search as a Service
– a multi-tenant service for multiple search applications
More planned
31