For more deep NoSQL content from Couchbase, check out http://www.couchbase.com/webinars
NoSQL databases have emerged as a better match than relational systems for modern interactive applications, offering cost-effective data management at “Big Data” scale. But there are significant differences between structured and schema-less database technology. What should architects and technical managers know as they explore NoSQL solutions for their teams?
In this workshop you will learn:
- How to evaluate NoSQL (both technical advantages and limitations) as a potential data management approach
- Critical differences between NoSQL and RDBMS for designing, building and running production applications
- Ideal use cases for NoSQL technology and sample reference architectures
3. Two
big
drivers
for
NoSQL
adop&on
49%
35%
29%
16%
12%
11%
Lack
of
flexibility/
Inability
to
Performance
Cost
All
of
these
Other
rigid
schemas
scale
out
data
challenges
Source:
Couchbase
Survey,
December
2011,
n
=
1351.
3
6. Document
Databases
• Each
record
in
the
database
is
a
self-‐
describing
document
{
• Each
document
has
an
independent
“UUID”:
“ 21f7f8de-‐8051-‐5b89-‐86
“Time”:
“2011-‐04-‐01T13:01:02.42
“Server”:
“A2223E”,
structure
“Calling
Server”:
“A2213W”,
“Type”:
“E100”,
“Initiating
User”:
“dsallings@spy.net”,
• Documents
can
be
complex
“Details”:
{
“IP”:
“ 10.1.1.22”,
• All
databases
require
a
unique
key
“API”:
“InsertDVDQueueItem”,
“Trace”:
“cleansed”,
• Documents
are
stored
using
JSON
or
“Tags”:
[
“SERVER”,
XML
or
their
deriva&ves
“US-‐West”,
“API”
]
• Content
can
be
indexed
and
queried
}
}
• Offer
auto-‐sharding
for
scaling
and
replica&on
for
high-‐availability
6
9. Rela&onal
vs
Document
data
model
C1
C2
C3
C4
{
JSON
JSON
}
JSON
Rela&onal
data
model
Document
data
model
Highly-‐structured
table
organiza&on
Collec&on
of
complex
documents
with
with
rigidly-‐defined
data
formats
and
arbitrary,
nested
data
formats
and
record
structure.
varying
“record”
format.
9
10. Example:
User
Profile
User
Info
Address
Info
KEY
First
Last
ZIP_id
ZIP_id
CITY
STATE
ZIP
1
Dip&
Borkar
2
1
DEN
CO
30303
2
Joe Smith
2
2
MV
CA
94040
3
Ali
Dodson
2
3
CHI
IL
60609
4
John
Doe
3
4
NY
NY
10010
To
get
informa&on
about
specific
user,
you
perform
a
join
across
two
tables
10
11. Document
Example:
User
Profile
{
“ID”:
1,
=
+
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
All
data
in
a
single
document
11
12. Making
a
Change
Using
RDBMS
User
Table
Photo
Table
Country
Table
Country
TEL Country
User
ID
First
Last
Zip
ID
User
ID
3
Photo
ID
Comment
ID
Country
ID
Country
name
2
d043
NYC
001
001
USA
1
Dip&
Borkar
94040
001
2
b054
Bday
007
002
UK
2
Joe
Smith
94040
001
5
c036
Miami
001
003
Argen&na
3
Ali
Dodson
94040
001
7
d072
Sunset
133
004
Australia
5002
e086
Spain
133
4
Sarah
Gorin
NW1
002
005
Aruba
Status
Table
006
Austria
5
Bob
Young
30303
001
Country
User
ID
Status
ID
Text
ID
007
Brazil
6
Nancy
Baker
10010
001
1
a42
At
conf
134
008
Canada
4
b26
excited
007
7
Ray
Jones
31311
001
5
c32
hockey
008
009
Chile
8
Lee
Chen
V5V3M
008
12
d83
Go
A’s
001
•
•
•
5000
e34
sailing
005
•
.
•
.
130
Portugal
•
.
Affilia&ons
Table
Country
User
ID
Affl
ID
Affl
Name
ID
131
Romania
50000
Doug
Moore
04252
001
2
a42
Cal
001
132
Russia
4
b96
USC
001
50001
Mary
White
SW195
002
133
Spain
7
c14
UW
001
50002
Lisa
Clark
12425
001
8
e22
Oxford
002
134
Sweden
12
13. Making
the
Same
Change
with
a
Document
Database
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”,
“STATUS”:
}
,
{
“TEXT”:
“At
Conf”
}
“GEO_LOC”:
“134”
},
“COUNTRY”:
”USA”
}
JSON
Just
add
informa&on
to
a
document
13
14. Document
modeling
• Are
these
separate
object
in
the
model
layer?
Q
•
•
Are
these
objects
accessed
together?
Do
you
need
updates
to
these
objects
to
be
atomic?
• Are
mul&ple
people
edi&ng
these
objects
concurrently?
When
considering
how
to
model
data
for
a
given
applica&on
• Think
of
a
logical
container
for
the
data
• Think
of
how
data
groups
together
14
15. Document
Design
Op&ons
• One
document
that
contains
all
related
data
– Data
is
de-‐normalized
– Be]er
performance
and
scale
– Eliminate
client-‐side
joins
• Separate
documents
for
different
object
types
with
cross
references
– Data
duplica&on
is
reduced
– Objects
may
not
be
co-‐located
– Transac&ons
supported
only
on
a
document
boundary
– Most
document
databases
do
not
support
joins
15
16. Document
ID
/
Key
selec&on
• Similar
to
primary
keys
in
rela&onal
databases
• Documents
are
sharded
based
on
the
document
ID
• ID
based
document
lookup
is
extremely
fast
• Usually
an
ID
can
only
appear
once
in
a
bucket
Q
•
Do
you
have
a
unique
way
of
referencing
objects?
•
Are
related
objects
stored
in
separate
documents?
Op&ons
• UUIDs,
date-‐based
IDs,
numeric
IDs
• Hand-‐crajed
(human
readable)
• Matching
prefixes
(for
mul&ple
related
objects)
16
17. Example:
En&&es
for
a
Blog
BLOG
• User
profile
The
main
pointer
into
the
user
data
• Blog
entries
• Badge
sekngs,
like
a
twi]er
badge
• Blog
posts
Contains
the
blogs
themselves
• Blog
comments
• Comments
from
other
users
17
20. Threaded
Comments
• You
can
imagine
how
to
take
this
to
a
threaded
list
List
First
Reply
to
comment
Blog
List
comment
More
Comments
Advantages
• Only
fetch
the
data
when
you
need
it
• For
example,
rendering
part
of
a
web
page
• Spread
the
data
and
load
across
the
en&re
cluster
20
22. Rela&onal
Technology
Scales
Up
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
RDBMS
Scales
Up
Get
a
bigger,
more
complex
server
System
Cost
Applica&on
Performance
Won’t
scale
beyond
this
point
Rela&onal
Database
Users
Expensive
and
disrup&ve
sharding,
doesn’t
perform
at
web
scale
22
23. Couchbase
Server
Scales
Out
Like
App
Tier
Applica&on
Scales
Out
Just
add
more
commodity
web
servers
System
Cost
Applica&on
Performance
Web/App
Server
Tier
Users
NoSQL
Database
Scales
Out
Cost
and
performance
mirrors
app
&er
System
Cost
Applica&on
Performance
Couchbase
Distributed
Data
Store
Users
Scaling
out
flatens
the
cost
and
performance
curves
23
25. The
Process
–
From
Evalua&on
to
Go
Live
No
different
from
evalua&ng
a
rela&onal
database
1
Analyze
your
requirements
2
Find
solu&ons
/
products
that
match
key
requirements
3
Execute
a
proof
of
concept
/
performance
evalua&on
4
Begin
development
of
applica&on
5
Deploy
in
staging
and
then
produc&on
New
requirements
è
New
solu&ons
25
26. 1
Analyze
your
requirements
Common
applica&on
requirements
• Rapid
applica&on
development
– Changing
market
needs
– Changing
data
needs
• Scalability
– Unknown
user
demand
– Constantly
growing
throughput
• Consistent
Performance
– Low
response
&me
for
be]er
user
experience
– High
throughput
to
handle
viral
growth
• Reliability
– Always
online
26
27. 2
Find
solu&ons
that
match
key
requirements
• Linear
Scalability
• Schema
flexibility
NoSQL
• High
Performance
• Mul&-‐document
transac&ons
• Database
Rollback
• Complex
security
needs
RDBMS
• Complex
joins
• Extreme
compression
needs
• Both
/
depends
on
the
data
RDBMS
NoSQL
27
28. 3
Proof
of
concept
/
Performance
evalua&on
Prototype
a
workload
• Look
for
consistent
performance…
– Low
response
&mes
/
latency
• For
be]er
user
experience
– High
throughput
• To
handle
viral
growth
• For
resource
efficiency
• …
across
– Read
heavy
/
Write
heavy
/
Mixed
workloads
– Clusters
of
growing
sizes
• …
and
watch
for
– Conten&on
/
heavy
locking
– Linear
scalability
28
29. 3
Other
considera&ons
Accessing
data
App
Server
– No
standards
exist
yet
– Typically
via
SDKs
or
over
HTTP
– Check
if
the
programing
language
of
your
choice
is
supported.
Consistency
App
Server
– Consistent
only
at
the
document
level
– Most
documents
stores
currently
don’t
support
mul&-‐document
transac&ons
– Analyze
your
applica&on
needs
Availability
App
Server
– Each
node
stores
ac&ve
and
replica
data
(Couchbase)
– Each
node
is
either
a
master
or
slave
(MongoDB)
29
30. 3
Other
considera&ons
Opera&ons
App
Server
– Monitoring
the
system
– Backup
and
restore
the
system
– Upgrades
and
maintenance
– Support
Ease
of
Scaling
App
Server
– Ease
of
adding
and
reducing
capacity
Client
– Single
node
type
– App
availability
on
topology
changes
Indexing
and
Querying
– Secondary
indexes
(Map
func&ons)
– Aggregates
Grouping
(Reduce
func&ons)
– Basic
querying
30
31. 4
Begin
development
Data
Modeling
and
Document
Design
31
32. 5
Deploying
to
staging
and
produc&on
• Monitoring
the
system
• RESTful
interfaces
/
Easy
integra&on
with
monitoring
tools
• High-‐availability
• Replica&on
• Failover
and
Auto-‐failover
• Always
Online
–
even
for
maintenance
tasks
• Database
upgrades
• Sojware
(OS)
and
Hardware
upgrades
• Backup
and
restore
• Index
building
• Compac&on
32
35. So
are
you
being
impacted
by
these?
Schema
Rigidity
problems
• Do
you
store
serialized
objects
in
the
database?
• Do
you
have
lots
of
sparse
tables
with
very
few
columns
Q
being
used
by
most
rows?
• Do
you
find
that
your
applica&on
developers
require
schema
changes
frequently
due
to
constantly
changing
data?
• Are
you
using
your
database
as
a
key-‐value
store?
Scalability
problems
• Do
you
periodically
need
to
upgrade
systems
to
more
powerful
servers
and
scale
up?
Q
• Are
you
reaching
the
read
/
write
throughput
limit
of
a
single
database
server?
• Is
your
server’s
read
/
write
latency
not
mee&ng
your
SLA?
• Is
your
user
base
growing
at
a
frightening
pace?
35
36. Is
NoSQL
the
right
choice
for
you?
Does
your
applica&on
need
rich
database
func&onality?
• Mul&-‐document
transac&ons
• Complex
security
needs
–
user
roles,
document
level
security,
authen&ca&on,
authoriza&on
integra&on
• Complex
joins
across
bucket
/
collec&ons
• BI
integra&on
• Extreme
compression
needs
NoSQL
may
not
be
the
right
choice
for
your
applica&on
36
38. Performance
driven
use
cases
• Low
latency
• High
throughput
ma]ers
• Large
number
of
users
• Unknown
demand
with
sudden
growth
of
users/data
• Predominantly
direct
document
access
• Workloads
with
very
high
muta&on
rate
per
document
(temporal
locality)
Working
set
with
heavy
writes
38
39. Data
driven
use
cases
• Support
for
unlimited
data
growth
• Data
with
non-‐homogenous
structure
• Need
to
quickly
and
ojen
change
data
structure
• 3rd
party
or
user
defined
structure
• Variable
length
documents
• Sparse
data
records
• Hierarchical
data
39
41. Couchbase
Server
NoSQL
Distributed
Document
Database
for
interac&ve
web
applica&ons
2.0
41
42. Couchbase
Server
Grow
cluster
without
Easy
applica&on
changes,
without
Scalability
down&me
with
a
single
click
Consistent
sub-‐millisecond
Consistent,
High
read
and
write
response
&mes
Performance
with
consistent
high
throughput
Always
On
No
down&me
for
sovware
24x7x365
upgrades,
hardware
maintenance,
etc.
42
43. Flexible
Data
Model
{
“ID”:
1,
“FIRST”:
“Dip&”,
“LAST”:
“Borkar”,
“ZIP”:
“94040”,
“CITY”:
“MV”,
“STATE”:
“CA”
}
JSON
JSON
JSON
JSON
• No
need
to
worry
about
the
database
when
changing
your
applica&on
• Records
can
have
different
structures,
there
is
no
fixed
schema
• Allows
painless
data
model
changes
for
rapid
applica&on
development
43
45. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
Data
Manager
Cluster
Manager
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
45
46. Couchbase
Server
2.0
Architecture
8092
11211
11210
Query
API
Memcapable
1.0
Memcapable
2.0
Moxi
Query
Engine
REST
management
API/Web
UI
vBucket
state
and
replica&on
manager
Memcached
Global
singleton
supervisor
Rebalance
orchestrator
Configura&on
manager
Node
health
monitor
Process
monitor
Heartbeat
Couchbase
EP
Engine
storage
interface
New
Persistence
Layer
htp
on
each
node
one
per
cluster
Erlang/OTP
HTTP
Erlang
port
mapper
Distributed
Erlang
8091
4369
21100
-‐
21199
46
47. Couchbase
deployment
Web
Applica&on
Couchbase
Client
Library
Data
Flow
Cluster
Management
47
48. Single
node
-‐
Couchbase
Write
Opera&on
2
Doc
1
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Queue
Disk
Queue
Disk
Couchbase
Server
Node
48
49. Single
node
-‐
Couchbase
Update
Opera&on
2
Doc
1’
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Doc
1
Doc
1’
Queue
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
49
50. Single
node
-‐
Couchbase
Read
Opera&on
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Couchbase
Server
Node
50
51. Single
node
-‐
Couchbase
Cache
Evic&on
2
Doc
6
2
3
4
5
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
51
52. Single
node
–
Couchbase
Cache
Miss
2
Doc
1
GET
App
Server
3
2
3
Managed
Cache
To
other
node
Replica&on
Queue
Doc
1
Doc
5
4
4
Doc
Doc
Doc
3
2
Doc
Disk
Queue
Disk
Doc
1
Doc
6
Doc
5
Doc
4
Doc
3
Doc
2
Couchbase
Server
Node
52
53. Cluster
wide
-‐
Basic
Opera&on
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
• Docs
distributed
evenly
across
ACTIVE
ACTIVE
ACTIVE
servers
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Each
server
stores
both
ac&ve
and
replica
docs
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Only
one
server
ac&ve
at
a
&me
• Client
library
provides
app
with
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
simple
interface
to
database
REPLICA
REPLICA
REPLICA
• Cluster
map
provides
map
to
which
server
doc
is
on
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
App
never
needs
to
know
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
• App
reads,
writes,
updates
docs
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
• Mul&ple
app
servers
can
access
same
document
at
same
&me
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
53
54. Cluster
wide
-‐
Add
Nodes
to
Cluster
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
READ/WRITE/UPDATE
READ/WRITE/UPDATE
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• Two
servers
added
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
One-‐click
opera&on
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
• Docs
automa&cally
rebalanced
across
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
cluster
Even
distribu&on
of
docs
Minimum
doc
movement
Doc
9
Doc
Doc
8
Doc
Doc
6
Doc
• Cluster
map
updated
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
• App
database
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
calls
now
distributed
over
larger
number
of
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
servers
Doc
8
Doc
Doc
2
Doc
Doc
5
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
54
55. Cluster
wide
-‐
Fail
Over
Node
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
SERVER
1
SERVER
2
SERVER
3
SERVER
4
SERVER
5
• App
servers
accessing
docs
ACTIVE
ACTIVE
ACTIVE
ACTIVE
ACTIVE
• Requests
to
Server
3
fail
Doc
5
Doc
Doc
4
Doc
Doc
1
Doc
Doc
9
Doc
Doc
6
Doc
• Cluster
detects
server
failed
Promotes
replicas
of
docs
to
Doc
2
Doc
Doc
7
Doc
Doc
2
Doc
Doc
8
Doc
Doc
ac&ve
Updates
cluster
map
Doc
1
Doc
3
• Requests
for
docs
now
go
to
REPLICA
REPLICA
REPLICA
REPLICA
REPLICA
appropriate
server
Doc
4
Doc
Doc
6
Doc
Doc
7
Doc
Doc
5
Doc
Doc
8
Doc
• Typically
rebalance
would
follow
Doc
1
Doc
Doc
3
Doc
Doc
9
Doc
Doc
2
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
55
56. Indexing
and
Querying
APP
SERVER
1
APP
SERVER
2
COUCHBASE
Client
Library
COUCHBASE
Client
Library
CLUSTER
MAP
CLUSTER
MAP
Query
SERVER
1
SERVER
2
SERVER
3
• Indexing
work
is
distributed
ACTIVE
ACTIVE
ACTIVE
amongst
nodes
Doc
5
Doc
Doc
5
Doc
Doc
5
Doc
• Large
data
set
possible
Doc
2
Doc
Doc
2
Doc
Doc
2
Doc
• Parallelize
the
effort
Doc
9
Doc
• Each
node
has
index
for
data
stored
Doc
9
Doc
Doc
9
Doc
on
it
REPLICA
REPLICA
REPLICA
• Queries
combine
the
results
from
Doc
4
Doc
required
nodes
Doc
4
Doc
Doc
4
Doc
Doc
1
Doc
Doc
1
Doc
Doc
1
Doc
Doc
8
Doc
Doc
8
Doc
Doc
8
Doc
COUCHBASE
SERVER
CLUSTER
User
Configured
Replica
Count
=
1
56
57. Cross
Data
Center
Replica&on
(XDCR)
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
NY
DATA
CENTER
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
DISK
DISK
DISK
SERVER
1
SERVER
2
SERVER
3
ACTIVE
ACTIVE
ACTIVE
Doc
Doc
Doc
Doc
2
Doc
Doc
Doc
9
Doc
Doc
RAM
RAM
RAM
COUCHBASE
SERVER
CLUSTER
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
Doc
SF
DATA
CENTER
DISK
DISK
DISK
57