Adobe has packaged HBase in Docker containers and uses Marathon and Mesos to schedule them—allowing us to decouple the RegionServer from the host, express resource requirements declaratively, and open the door for unassisted real-time deployments, elastic (up and down) real-time scalability, and more. In this talk, you'll hear what we've learned and explain why this approach could fundamentally change HBase operations.
5. Why
• peak load provisioning (can be 30X)
• resource imbalance (CPU vs. I/O vs. RAM bound)
• incorrect usage predictions
• all of the above (and others)
6. Typical HBase Deployment
• (mostly) static deployment footprint
• infrequent scaling out by adding more nodes
• scaling down uncommon
• OLTP, OLAP workloads as separate clusters
• < 32GB Heap (compressed OOPS, GC)
8. Idleness Costs
• idle servers draw > ~50% of the nominal power
• hardware deprecation accounts for ~40%
• public clouds idleness translates to 100% waste
(charged by time not by resource use)
10. • daily, weekly, seasonal variation (both up and down)
• load varies across workloads
• peaks are not synchronized
Load is not Constant
11. Opportunities
• datacenter as a single pool of shared resources
• resource oversubscription
• mixed workloads can scale elastically within pools
• shared extra capacity
25. Why: Docker Containers
• “static link” everything (including the OS)
• Standard interface (resources, lifecycle, events)
• lightweight
• Just another process
• No overhead, native performance
• fine-grained resources
• e.g. 0.5 cores, 32MB RAM, 32MB disk
26. From .tgz/rpm + Puppet
to Docker
• Goal: optimize for Mesos (not standalone)
• cluster, host agnostic (portability)
• env config injected through Marathon
• Self contained:
• OS-base + JDK + HBase
• centos-7 + java-1.8u40 + hbase-1.0
36. Elasticity
• Scale up / down based on load
• traffic spikes, compactions, etc.
• yield unused resources
37. Smaller, Better?
• multiple RS per node
• use all RAM without losing compressed OOPS
• smaller failure domain
• smaller heaps
• less GC-induced latency jitter
38. Simplified Tuning
• standard container sizes
• decoupled from physical hosts
• portable
• same tuning everywhere
• invariants based on resource ratios
• # threads to # cores to RAM to Bandwidth
41. Improvements
• drain regions before suspending
• schedule for data locality
• collocate Region Servers and HFiles blocks
• DN short-circuit through shared volumes
43. Disaggregating HBase
• HBase is an consistent, highly available, distributed
cache on top of HFiles in HDFS
• Most *real* resource-wise, multi-tenant concerns
revolve around a (single) table
• Each table could have it’s own cluster (minus some
security groups concerns)
44. HMaster as a Scheduler?
• could fully manage HRS lifecycle (start/stop)
• in conjunction to region allocation
• considerations:
• Marathon is a generic long-running app scheduler
• extend scheduling capabilities instead of
“reinventing” it?
46. Resources
• The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second edition
- http://www.morganclaypool.com/doi/abs/10.2200/S00516ED2V01Y201306CAC024
• Omega: flexible, scalable schedulers for large compute clusters - http://research.google.com/pubs/
pub41684.html
• Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center - https://www.cs.berkeley.edu/~alig/
papers/mesos.pdf
• https://github.com/mesosphere/marathon