2. Membase is an Open Source
distributed, key-value database
management system optimized for
storing data behind interactive web
applications.
All aspects of membase are simple, fast
and elastic by design.
2
6. Fast
5
⢠Original use case: speed up
access to authoritative data as a
distributed hashtable
⢠Must be at at least as fast as a
highly tuned DBMS
⢠Designed for modern datacenter
substrate
â Designed forVM and cloud
deployments
7. Elastic
⢠Add nodes without
losing access to data
⢠Maintain consistency
when accessing data
â membase is a CP
type system
⢠Scale linearly by just
adding more nodes
6
8. Before: Application scales linearly, data hits
wall
Application Scales Out
Just add more commodity web servers
Database Scales Up
Get a bigger, more complex server
7
9. Membase is a distributed database
8
Membase Servers
In the data center
Web application server
Application user
On the administrator console
10. Built-in Memcached Caching Layer
9
Memcached
Membase Database
Memcached
Membase Database
Memcached Mode Membase Mode
Fact: Membase development team has also contributed over
half of the code to the Memcached project.
11. Leading cloud service (PAAS)
provider
Over 65,000 hosted
applications
Over 2,000 users to date
Membase Server serving over
3,000 Heroku customers
Proven at small, and extra large scale
10
Social game leader â FarmVille,
MaďŹa Wars, CafĂŠ World
Over 230 million monthly users
Membase Server is the
500,000 ops-per-second
database behind FarmVille and
CafĂŠ World
12. After: Data layer scales like application logic layer
Data layer now scales with linear cost and constant performance.
Application Scales Out
Just add more commodity web servers
11
Database Scales Out
Just add more commodity data servers
Scaling out ďŹattens the cost and performance curves.
Membase Servers
15. What is Project Arcus?
⢠Memcached
â Common protocol across PHP, Java, C
applications
⢠Moxi (Memcached proxy) based
⢠In-house automatic fault-detection and failover
solution
⢠Collectd-based monitoring
⢠Proxy and cache server administration UI
⢠Private cloud service
14
16. Previous Deployments
⢠A few individual memcached installations
⢠Problems
â No fault-tolerance
⢠Hardware failures are common (heat, network switch
failure, etc)
â No automatic scalability
⢠To add / remove a memcached server, they need to
rebuild code, distribute, and restart all clients
15
17. Today
⢠Memcached clusters
â Fault-tolerance transparent to clients
⢠Consistent hashing in moxi (memcached proxy)
â Cache As A Service (CaaS)
⢠All major services in NHN started using cache
⢠Multitenancy across cache services
16
19. Membase-Cloudera Partnership
âAOL serves more than 5 billion impressions per day from our ad
serving platforms, and any incremental improvement in processing
time translates to huge benefits in our ability to more effectively serve
the ads to needed meet our contractual commitments. Traditional
databases like MySQL lack the scalability required to support our goal
of five milliseconds per read/write. Creating user profiles with Hadoop,
then serving them from Membase, reduces profile read and write
access to under a millisecond, leaving the bulk of the processing time
budget for improved targeting and customization.â
Pero Subasic
Chief Architect, AOL
20. Joint development of bi-directional software
integration between Membase and Hadoop
⢠Membase NodeCode Module streaming interface
to Cloudera Distribution for Hadoop via Flume
interface
⢠Sqoop-derived command line utility for bi-
directional batch movement of data between
Membase and Cloudera Distribution for Hadoop
Joint marketing and sales of integrated
distributed OLTP-OLAP solution
⢠Membase â the distributed OLTP solution
⢠Cloudera â the distributed OLAP solution
Cloudera to distribute integration
Membase-Cloudera Partnership
21. Customer use case â Ad targeting
20
events
profiles, campaigns
profiles, real time campaign
statistics
40 milliseconds to come
up with an answer.
2
3
1
25. Clustering
⢠Underlying cluster
functionality based on
erlang OTP
⢠Have a custom, vector
clock based way of
storing and
propagating...
â Cluster topology
â vBucket mapping
⢠Collect statistics from
many nodes of the
cluster
â Identify hot keys,
resource utilization 24
31. TAP
⢠A generic, scalable method of streaming mutations
from a given server
â As data operations arrive, they can be sent to arbitrary TAP
receivers
⢠Leverages the existing memcached engine interface,
and the non-blocking IO interfaces to send data
⢠Three modes of operation
Working set
Data
Mutations
Working set
Data
Mutations
Working set
27
32. Disk > Memory
BucketConďŹguration
mem_high_wat
mem_low_wat
memory quota
28
Dataset may have many
items infrequently accessed.
However, memcached has
different behavior (LRU) than
wanted with membase.
Still, traditional (most)
RDBMS implementations are
not 100% correct for us
either. The speed of a miss
is very, very important.
33. ns_server
membase
(memcached + membase engine)
moxi ns_server
vbucketmigrator
TAP
memcached operations
with tap commands
memcached operations
Client
port 11211
memcached operations
moxi + Client
port 11210
memcached operations REST/comet
cluster topology
and vbucket map
Clients, nodes and other nodes
29