My presentation from the NoSQL Now 2014 conference.
Abstract
NoSQL databases including Couchbase are increasingly being selected as the backend technology for web and mobile apps. Document databases in particular are well suited for a large number of different use cases as an operational datastore.
This session provides a brief overview of Couchbase Server, a document database and its underlying distributed architecture. In addition, Dipti will present some common use cases of Couchbase with a drill down into three specific customer use cases.
Paypal – A multi data center session store
LivePerson – A scalable, real time analytics system
Orbitz – A highly available cache solution
3. Overview
Couchbase
offers
a
full
range
of
Data
Management
solu7ons
High
Availability
Cache
Key
Value
Document
Mobile
device
SSN:
400
658
9993
Pass:
******
Pass:
******
4. NoSQL
Database
Considera7ons
Easy
Scalability
Consistent
High
Performance
Flexible
Data
Model
Always
On
24x7x365
Grow
cluster
without
applica<on
changes,
without
down<me
when
needed
Always
awesome
experience
for
your
applica<on
users
The
sun
never
sets
on
the
Internet,
your
applica<on
needs
the
database
to
always
serve
data
Keep
developers
produc<ve
and
allow
fast
and
easy
addi<on
of
new
features
JSON
JSON
JSON
JSONJSON
PERFORMANCE
6. 3
3
2
Single
node
–
Couchbase
Write
Opera7on
Managed
Cache
Disk
Queue
Disk
Replica<on
Queue
App
Server
Couchbase
Server
Node
To
other
node
Doc
1
Doc
1
Doc
1
7. 3
3
2
Single
node
–
Couchbase
Read
Opera7on
Managed
Cache
Disk
Queue
Disk
Replica<on
Queue
App
Server
Couchbase
Server
Node
To
other
node
Doc
1
Get
Doc
1
Doc
1
Doc
1
8. Auto
Sharding
and
Cluster
Map
Hash
func7on
(KEY)
vB1
vB2
vB3
vB4
vB5
vB6
Physical
servers
A
B
C
More
scalability
required
Add
node
Logical
Par77ons
Cluster
Map
New
Cluster
Map
9. Couchbase
Server
Cluster
Basic
Opera7on
User
Configured
Replica
Count
=
1
Read/write/update
Ac<ve
SERVER
1
Ac<ve
SERVER
2
Ac<ve
SERVER
3
App
Server
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
App
Server
2
Doc
5
Doc
2
Doc
9
Doc
Doc
Doc
Doc
4
Doc
7
Doc
8
Doc
Doc
Doc
Doc
1
Doc
3
Doc
6
Doc
Doc
Doc
Replica
Replica
Replica
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Doc
6
Doc
3
Doc
2
Doc
Doc
Doc
Doc
7
Doc
9
Doc
5
Doc
Doc
Doc
• Docs
distributed
evenly
across
servers
• Each
server
stores
both
ac7ve
and
replica
docs
Only
one
server
ac<ve
at
a
<me
• Client
library
provides
app
with
simple
interface
to
database
• Cluster
map
provides
map
to
which
server
doc
is
on
App
never
needs
to
know
• App
reads,
writes,
updates
docs
• Mul7ple
app
servers
can
access
same
document
at
same
7me
10. Add
Nodes
to
Cluster
SERVER
4
SERVER
5
Replica
Ac<ve
Replica
Ac<ve
Read/write/update
App
Server
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
App
Server
2
User
Configured
Replica
Count
=
1
Couchbase
Server
Cluster
Ac<ve
SERVER
1
Doc
5
Doc
2
Doc
9
Doc
Doc
Doc
Replica
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Ac<ve
SERVER
2
Doc
4
Doc
7
Doc
8
Doc
Doc
Doc
Replica
Doc
6
Doc
3
Doc
2
Doc
Doc
Doc
Ac<ve
SERVER
3
Doc
1
Doc
3
Doc
6
Doc
Doc
Doc
Replica
Doc
7
Doc
9
Doc
5
Doc
Doc
Doc
Read/write/update
• Two
servers
added
with
one-‐click
opera7on
• Docs
automa7cally
rebalance
across
cluster
Even
distribu<on
of
docs
Minimum
doc
movement
• Cluster
map
updated
• App
database
calls
now
distributed
over
larger
number
of
servers
11. Fail
Over
Node
User
Configured
Replica
Count
=
1
SERVER
4
SERVER
5
Replica
Ac<ve
Replica
Ac<ve
App
Server
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
App
Server
2
Couchbase
Server
Cluster
Ac<ve
SERVER
1
Doc
5
Doc
2
Doc
9
Doc
Doc
Doc
Replica
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Ac<ve
SERVER
2
Doc
4
Doc
7
Doc
8
Doc
Doc
Doc
Replica
Doc
6
Doc
3
Doc
2
Doc
Doc
Doc
Ac<ve
SERVER
3
Doc
1
Doc
3
Doc
6
Doc
Doc
Doc
Replica
Doc
7
Doc
9
Doc
5
Doc
Doc
Doc
• App
servers
accessing
docs
• Requests
to
Server
3
fail
• Cluster
detects
server
failed
– Promotes
replicas
of
docs
to
ac<ve
– Updates
cluster
map
• Requests
for
docs
now
go
to
appropriate
server
• Typically
rebalance
would
follow
Doc
1
Doc
3
Doc
12. Couchbase
Server
Cluster
Indexing
and
Querying
User
Configured
Replica
Count
=
1
Ac<ve
SERVER
1
SERVER
3
App
Server
1
COUCHBASE
Client
Library
CLUSTER
MAP
COUCHBASE
Client
Library
CLUSTER
MAP
App
Server
2
Doc
5
Doc
2
Doc
9
Doc
Doc
Doc
Ac<ve
Doc
1
Doc
3
Doc
6
Doc
Doc
Doc
Replica
Doc
4
Doc
1
Doc
8
Doc
Doc
Doc
Ac<ve
SERVER
2
Doc
4
Doc
7
Doc
8
Doc
Doc
Doc
Replica
Doc
6
Doc
3
Doc
2
Doc
Doc
Doc
Replica
Doc
7
Doc
9
Doc
5
Doc
Doc
Doc
• Indexing
work
is
distributed
amongst
nodes
• Large
data
set
possible
• Parallelize
the
effort
• Each
node
has
index
for
data
stored
on
it
• Queries
combine
the
results
from
required
nodes
Query
13.
ACTIVE
SERVER
1
RAM
DISK
Doc
Doc
2
Doc
9
Doc
Doc
Doc
ACTIVE
SERVER
2
RAM
DISK
Doc
Doc
Doc
Doc
Doc
Doc
ACTIVE
SERVER
3
RAM
DISK
Doc
Doc
Doc
Doc
Doc
Doc
Cross
Data
Center
Replica7on
(XDCR)
COUCHBASE
SERVER
CLUSTER
NYC
DATA
CENTER
COUCHBASE
SERVER
CLUSTER
SF
DATA
CENTER
ACTIVE
SERVER
1
RAM
DISK
Doc
Doc
2
Doc
9
Doc
Doc
Doc
ACTIVE
SERVER
2
RAM
DISK
Doc
Doc
Doc
Doc
Doc
Doc
ACTIVE
SERVER
3
RAM
DISK
Doc
Doc
Doc
Doc
Doc
Doc
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
{
}
15. High-‐Availability
Caching
RDBMS
Applica7on
Layer
User
Requests
Cache
Misses
and
Write
Requests
Read-‐Write
Requests
Couchbase
Distributed
Cache
Use
Case
1
16. • Applica<on
objects
• Popular
search
query
results
• Session
informa<on
• Heavily
accessed
web
landing
pages
High-‐Availability
Caching
• Speed
up
RDBMS
• Consistently
low
response
<mes
for
document
/
key
lookups
• High-‐availability
24x7x365
• Replacement
for
en<re
caching
<er
Data
cached
in
Couchbase?
Applica7on
characteris7c
Use
Case
1
hap://www.Look.PopularSearchWuerycom
Look
Something
Search
WEB
%
of
clicks
%
of
clicks
something
56.3
28
DoSomething.com
13.4
25.08
SomethingFishy.org
9.8
14.68
Popular
Couchbase,
Inc.
Confiden<al
17. High-‐Availability
Caching
• Low
latency
in
sub-‐milliseconds
with
consistently
high
read
/
write
throughput
using
built-‐in
cache
• Always-‐on
opera7ons
even
for
database
upgrades
and
maintenance
with
zero
down
7me
Why
NoSQL
and
Couchbase?
Use
Case
1
Couchbase,
Inc.
Confiden<al
19. Session
Store
• Extremely
fast
access
to
session
data
using
unique
session
ID
• Easy
scalability
to
handle
fast
growing
number
of
users
and
user-‐generated
data
• Always-‐on
func<onality
for
global
user
base
Applica7on
characteris7c
Use
Case
2
• Session
values
or
Cookies
(stored
as
key-‐value
pairs)
• Examples
include:
items
in
a
shopping
cart,
flights
selected,
search
results,
etc.
Data
stored
in
Couchbase?
Couchbase,
Inc.
Confiden<al
20. Session
Store
• Low
latency
in
sub-‐milliseconds
with
consistently
high
read
/
write
throughput
for
session
data
via
the
built-‐in
object-‐level
cache
• Linear
throughput
scalability
to
grow
the
database
as
user
and
data
volume
grow
• Always-‐on
opera7ons
even
par7cularly
high
availability
using
Couchbase
replica7on
and
failover
• Intra
cluster
and
cross
cluster
(XDCR)
replica7on
for
globally
distributed
ac7ve-‐ac7ve
plagorm
Why
NoSQL
and
Couchbase?
Use
Case
2
Couchbase,
Inc.
Confiden<al
22. hap://www.ProfileStore.com
e
enim
nec
felis
rhoncus,
ac
volutpat
magna
blandit.
Nunc
facilisis
turpis
eget
dolor
mollis,
id
<ncidunt
dui
mais.
Nunc
sodales
elementum
turpis,
vel
interdum
ante
congue
quis.
Pellentesque
habitant
morbi
tris<que
senectus
et
netus
et
malesuada
fames
ac
turpis
egestas.
Aliquam
erat
volutpat.
Nullam
suscipit
diam
nec
tortor
pharetra,
vitae
adipiscing
dolor
pre<um.
Integer
ac
porta
tortor.
Ves<bulum
imperdiet
quam
laoreet
nisl
scelerisque,
a
tempus
tortor
<ncidunt.
Mauris
suscipit
dui
ac
urna
dignissim,
vitae
aliquet
velit
convallis.
Phasellus
lobor<s
felis
eu
magna
vulputate
dapibus.
Ut
ornare
ut
quam
a
vulputat
ullam
et
dui
odio.
Nulla
pharetra,
velit
ac
convallis
semper,
dolor
turpis
porta
nunc,
in
egestas
mauris
leo
a
nisi.
Pellentesque
fringilla
sagiis
magna
vitae
imperdiet.
Mauris
ac
leo
ut
tellus
aliquet
interdum.
Interdum
et
malesuada
fames
ac
ante
ipsum
primis
in
faucibus.
Nunc
cursus
odio
sit
amet
elit
mollis,
et
sollicitudin
lacus
accumsan.
Nulla
facilisi.
Fusce
et
vehicula
sem.
Curabitur
interdum
ves<bulum
nulla
id
accumsan.
Integer
ut
tortor
in
ligula
semper
vehicula.
Ves<bulum
ut
nibh
ultrices,
venena<s
metus
at,
adipiscing
ipsum.
Donec
quis
consequat
lectus.
Class
aptent
taci<
sociosqu
ad
litora
torquent
per
conubia
nostra,
per
inceptos
himenaeos.
Donec
a
diam
tempus,
aliquet
ipsum
eu,
ves<bulum
sapien.
Donec
eleifend
lectus
sit
amet
luctus
facilisis.
Morbi
poritor,
orci
sit
amet
placerat
tempus,
nisi
justo
dictum
augue,
ac
dignissim
elit
enim
eget
dolor.
Praesent
pulvinar
ipsum
arcu,
eu
posuere
eros
luctus
nec.
Ves<bulum
odio
eros,
ultrices
non
metus
sit
amet,
tris<que
malesuada
augue.
Pellentesque
lacinia
dolor
nec
diam
eleifend
mollis.
Ves<bulum
sit
amet
ultrices
diam.
Aliquam
lacinia
accumsan
eros
id
hendrerit.
Cras
placerat
laoreet
urna
scelerisque
rutrum.
Duis
ornare
mi
ac
augue
varius,
sit
amet
accumsan
leo
lacinia.
Vivamus
nec
egestas
neque.
Quisque
interdum
enim
moles<e
urn.
turpis
eget
dolor
mollis,
id
<ncidunt
dui
mais.
Nunc
sodales
elementum
turpis,
vel
interdum
ante
congue
quis.
Pellentesque
habitant
morbi
tris<que
senectus
et
netus
et
malesuada
Welcome
back
Laura!
You
have
3
items
in
your
shopping
cart
wai<ng
for
you.
LOGIN
ID:
PASS:
Globally
Distributed
User
Profile
Store
• Extremely
fast
access
to
individual
profiles
• Always
online
system
as
mul<ple
applica<ons
access
user
profiles
• Flexibility
to
add
and
update
user
aaributes
• Easy
scalability
to
handle
fast
growing
number
of
users
• User
profile
with
unique
ID
• User
seing
/
preferences
• User’s
network
• User
applica<on
state
Data
stored
in
Couchbase?
Applica7on
characteris7c
Use
Case
3
Laura930
********
23. Globally
Distributed
User
Profile
Store
• Low
latency
and
high
throughput
for
very
quick
lookups
for
millions
of
concurrent
users
using
built-‐in
cache
• Intra
cluster
and
cross
cluster
(XDCR)
replica7on
for
high
availability
and
disaster
recovery
• Ac7ve-‐ac7ve
geo-‐distributed
system
to
handle
globally
distributed
user
base
• Online
admin
opera7ons
eliminate
system
down7me
Why
NoSQL
and
Couchbase?
Use
Case
3
24. Data
Aggrega7on
• Flexibility
to
store
any
kind
of
content
• Flexibility
to
handle
schema
changes
• Full-‐text
Search
across
data
set
• High
speed
data
inges<on
• Scales
horizontally
as
more
content
gets
added
to
the
system
• Social
media
feeds:
Twiaer,
Facebook,
LinkedIn
• Blogs,
news,
press
ar<cles
• Data
service
feeds:
Hoovers,
Reuters
• Data
form
other
systems
Data
stored
in
Couchbase?
Applica7on
characteris7c
Use
Case
4
in
F
t
NEWS
Blog
25. Data
Aggrega7on
• JSON
provides
schema
flexibility
to
store
all
types
of
content
and
metadata
• Fast
access
to
individual
documents
via
built-‐in
cache,
high
write
throughput
• Indexing
and
querying
provides
real-‐7me
analy7cs
capabili7es
across
dataset
• Integra7on
with
Elas7cSearch
for
full-‐text
search
• Ease
of
scalability
ensures
that
the
data
cluster
can
be
grown
seamlessly
as
the
amount
of
user
and
ad
data
grows
Why
NoSQL
and
Couchbase?
Use
Case
4
26. Content
and
Metadata
Store
Use
Case
5
Content
and
Metadata
Nature,
Field,
Summer,
Farm,
Sky,
Environment,
Landscaped,
Gr
ass,
Green,Blue,
Oilseed,
Rape,
Agriculture,
Scenics,
Land,
Spring,
Non-‐Urban
Scene,Environmental,
Conserva<on,
Sun,
Meadow,
Horizon,
Season,
Cloud,
Landscapes,
Travel
Loca<ons,
Pasture,
Cul<vated
Land,
Stratoshpere,
cloudy
day,
Oliseed
Rape,
Rural
Scene,
Vibrant
Color,
No
People,
Beauty
In
Nature,Gold,
Color
Image,
Beauty,
Idyllic,
Mul<colored,
Yellow,
Colors,
Cloudscape,
Outdoors,
Plant,
Sunlight,
Horizon
Over
Land
27. Content
and
Metadata
Store
• Flexibility
to
store
any
kind
of
content
• Fast
access
to
content
metadata
(most
accessed
objects)
and
content
• Full-‐text
Search
across
data
set
• Scales
horizontally
as
more
content
gets
added
to
the
system
• Content
metadata
• Content:
Ar<cles,
text
• Landing
pages
for
website
• Digital
content:
eBooks,
magazine,
research
material
Data
stored
in
Couchbase?
Applica7on
characteris7c
Use
Case
5
hap://www.LandingPage.com
ebook
Mag
28. Content
and
Metadata
Store
• Fast
access
to
metadata
and
content
via
object-‐managed
cache
• JSON
provides
schema
flexibility
to
store
all
types
of
content
and
metadata
• Indexing
and
querying
provides
real-‐7me
analy7cs
capabili7es
across
dataset
• Integra7on
with
Elas7cSearch
for
full-‐text
search
• Ease
of
scalability
ensures
that
the
data
cluster
can
be
grown
seamlessly
as
the
amount
of
user
and
ad
data
grows
Why
NoSQL
and
Couchbase?
Use
Case
5
30.
User
Profile,
Ad
Targe2ng
&
Real-‐Time
Analy2cs
• Company
Global
Leader
in
Online
Payments
132m
Ac<ve
Accounts,
193
Markets,
25
Currencies
• Scalability
and
Performance
Requirements
300m
to
1bn
documents
with
3
Tb
to
10TB
Billions
of
requests
and
sub
200ms
response
<mes
access
to
JSON
documents
Read/write
mix
50/50
with
5ms
latency
• Exis7ng
Database
Infrastructure
Mul<ple
Tiers
–
Separate
caching
and
durable
store
MySQL,
Oracle,
Terracoaa,
Coherence
• Pain
Real-‐Time
Access
to
Iden<ty
Mapping
–
eBay
ID,
PayPal
ID,
Social
ID,
3rd
Party
ID,
Email
Performance
–
Ad
needs
to
be
served
in
200ms
Cost
–
Mul<ple
<ers
for
caching
and
durability
Highly
Available
–
Across
large
clusters
and
across
data
centers
• Couchbase
Benefits
Performance
–
Reduced
latency
with
5ms
access
<mes
Cost
–
Consolida<on
of
database
and
cache
layers
Cross
Data
Center
Availability
+
+
+
31. Why
couchbase?
§ Data
volume
• Online
system
;
300M
–
1B
documents
@
10k
value
size
;
3-‐10TB
total
storage
§ Data
Access
• Distributed
caching
• Persistence
§ Data
Structure
• Flexible
&
Schemaless
§ Read/Write
• 50%
read/50%
write
• Low
latency
<
10
msec
§ Par77oning
§ Replica<on
§ Auto
Healing
§ Availability
and
scalability
• Resilient
• Mul<
data
center
–
DR/BCP
• Linearly
Scalable
32. Use
cases
at
PayPal
• Ad
Tech
targe7ng
• Cookie
infrastructure
• Real
7me
analy7cs
33. Cookie
architecture
CookieService
Couchbase
DC
A
Couchbase
DC
B
Front
Tier
Interac<on
Channels
Applica7on
Cookie
Libraries
Mid
Tier
Data
Service
-‐ Key
Value
-‐ Cache
Interface
-‐ Couchbase
Client
Data
Tier
XDCR
35. DEPLOYMENT MODEL
A CB
Cookie
Service
Cookie
Service
Cookie
Service
XDCR
ACTIVE
ACTIVE
PASSIVE
AVAILABILITY
REDUNDANCY
DISASTER
RECOVERY
WRITE
READ
36.
High
Performance
Caching
• Company
Leading
online
travel
company
• Scalability
and
Performance
Requirements
11
Clusters/100
Nodes
Over
3TB
of
Data
149,000
Ops/
sec
• Exis7ng
Database
Infrastructure
Rela<onal
Database
technology,
Terracota
• Pain
Scalability/Capacity
Planning
–
Cannot
be
planned.
Dependent
onexternal
factors
Scalability
–
Complex
and
<me
consuming
scaleout
Performance
–
Caching
too
complex.
Weeks
of
planning/hours
of
down<me
Cost
–
Mul<ple
<ers
of
hardware
for
database
and
caching
• Couchbase
Benefits
Scalability
–
Over
70
Nodes
with
simple
scaleout
in
minutes
not
hours
Performance
–
Improved
response
<mes
by
up
to
47%
with
consistent
3ms
to
4ms
response
Cost
-‐
Consolidate
caching
and
database
<ers
–
less
machines,
power,
cooling,
footprint
–
drama<c
savings
Dynamic
schema
change
–
Drama<cally
reduced
down<me
37. High
Availability
Cache
• 11
Clusters
(4
mirrored)
100
nodes
• >
3
TB
of
data
• ~430m
objects
(146m
in
largest)
• Total
ops/sec
~
75k
*149k
with
HA
38. Use
Case
#1
• Content
HTML
Image
Links
HA
caches
XDCR
40.
Real
7me
analy7cs
• Company
Leading
cloud
company
–
allows
enterprises
to
connect
in
real-‐<me
with
their
customers
via
chat,
voice,
and
content
delivery
• Scalability
and
Performance
Requirements
13TB/Month
20m
engagements/month
1.8bn
sessions/month
• Exis7ng
Database
Infrastructure
MySQL
• Pain
Scalability
Performance
–
Batch
analy<cs
and
real-‐<me
access
to
customer
profiles
Cross
Data
Center
Replica<on
–
4
data
centers
• Couchbase
Benefits
Scalability
Performance
–
Mixed
read/
write
with
very
high
throughput
Document
Store
–
Ease
of
Development
+
41. Use
Case:
3rd
party
data
aggrega7on
with
analy7cs
Real
<me
Analy<cs
for
LivePerson's
customers
LiveEngage
DASHBOARD
43. Requirements Requirements Requirements
• High
throughput,
really
fast
• Linear
scale
• Searchable
(Views
and
M/R)
• Supports
both
K/V
&
Document
store
• Cross
data
center
replica<on
• “Always
on”,
Resilience
solu<on
The Problem
13
TB
per
month
~1
PB
In
total
1.8
B
Visits
per
month
VOLUME