NoSQL has turned many database concepts upside down. Consistency models, transactions, data models, and query interfaces are being reinvented. Tradeoffs between performance, availability, managability, and usability are being re-thought. In this talk 10gen President Max Schireson, reviews some of the different approaches being taken and offers opinions on the right choices for different uses.
2. My background
At Oracle from 1994 to 2003
At MarkLogic from 2003 to Feb 2011
Join 10gen Feb 2011
3. The world has changed
1970 2011
Main memory Intel 1103, 1k bits 4GB of RAM costs
$25.99
$25 99
Mass storage IBM 3330 Model 1, 100 3TB Superspeed USB
MB for $129
Microprocessor Nearly – 4004 being Westmere EX has 10
developed; 4 bits and cores, 30MB L3 cache,
92,000 instructions per runs at 2.4GHz
second
Motor Trend Car of the Ford Torino Chevy Volt
Year
President Richard Nixon Barack Obama
Ted Codd In his 40’s Dead
Me In diapers In my 40s
4. More recent changes
A decade ago Now
Faster Buy a bigger server Buy more servers
Faster t
F t storage A SAN with more
ith SSD
spindles
More reliable storage More expensive SAN More copies of local
storage
Deployed in Your data center The cloud – private or
public
Large user base Thousands - Millions - consumers
employees
Tracking Business transactions Every click and more
5. Assumptions behind todays
DBMS
Relational data model
Third normal form
ACID
SQL
Q
Multi-
Multi-statement transactions
Database is hardware agnostic
RAM is small and disks are slow
If its too slow you can buy a faster computer
6. Yesterday’s assumptions in
today’s
t d ’ worldld
Scaleout is hard
Distributed joins are hard
Making two-phase commits fast is hard
two-
Custom solutions proliferate
p
Too slow? Just add a cache
ORM t l everywhere
tools h
More computers and disk are nearly free but SAN
and f
d faster computers are expensive
i
7. Challenging some
assumptions
ti
Do you need a database at all
How does it scale out
What type of queries does it need to be able to do
How should it model data
How do you query it
How does it handle transactions and consistency
Is i
I it enterprise software, open source, an appliance, or a cloud service
i f li l d i
Does the data fit in memory?
What if your disks are SSD?
8. My opinions
Different use cases will produce different answers
Existing RDBMS solutions will continue to solve a
broad set of problems well but many applications
will work better on top of alternative technologies
Many new technologies will find niches but only
one or two will become mainstream
9. Do you need a database at
all
ll
Can you better solve your problem with a batch
processing framework
Can you better solve your problem with an in
memory object store/cache
10. How does it scale out
Scale-
Scale-out for working set size
Scale-
Scale-out for total data size
Scale out for write volume
Scale-
Scale-out for read volume
Scale-
Scale-out for redundancy
How do you incrementally add nodes or change configuration
How do you trade off query performance (which wants fewer
index segments) for elasticity (which wants more index
segments))
11. What type of queries does it
need t b able to d
d to be bl t do
Is a key/value store enough
Will you be retrieving your data by one key or by
many
Is there a primary way you ll be viewing your data
you’ll
Do you need specialized queries (eg, time series,
(eg,
geospatial)
12. Imagine a garage…
You hand your valet the keys to your car
Before they park your car, they completely disassemble it
The pistons are stored in piston storage, brake pads with brake pads, steering
p p g p p g
wheels with steering wheels
Over time, they have storage areas for catalytic converters, DVD-based nav
DVD-
systems, headlight washers, and traction control systems
When you ask for your car back, the valet is incredibly fast at reassembly
One minor issue: you have to provide the disassembly and reassembly instructions
and they will be followed literally, even if you say the spare tire should be used as
a steering wheel and forgot to specify re-insertion of spark plugs
re-
A technological marvel
Might be a good way to store your car if you don’t know whether you’ll be asking
for a car back or lots of brake pads or pistons – for a salvage yard?
13. How should it model data
Relational
Row oriented or column oriented
Key value
Document oriented
Graph oriented
14. How do you query it
Do you want an API, a language, or a map-reduce
map-
style interface?
Will most of your queries be hand-typed, embedded
hand-
in code or dynamically generated
15. How do you handle
transactions and consistency
t ti d i t
Do you need transactions at all
Be careful; web services, for example, need to be able to
assign userIDs
Do you need multi-master updates
multi-
If so, how do y resolve conflicts
, you
Do you need immediate consistency?
For some queries or all?
How do you handle failures
Are you optimizing for read availability or write
availability
16. What is it
Enterprise software
Open source
p
With commercial support?
Appliance
Packaged with commodity hardware
Specialized hardware
Cloud
Cl d service
i
Available for on-premise deployment?
on-
Integrated in another PaaS offering?
Where on the net?
17. Does the data fit in
memory
Transactions can be very very fast
Do you trust enough copies in memory (perhaps
across multiple data centers) or do you require
some sort of sync to persistent storage
How big will the data be and how much do you
care about costs
18. What if your disks are SSD
Alleviate hotspots
Random accesses are measured in microseconds not
milliseconds
Degradation from in-memory to on-disk can be
in- on-
more graceful
But data representations on disk vs in memory may be
very different which may create significant overhead
19. In choosing a solution
Examine your requirements
They will dictate certain choices
Once you have narrowed the field
Prefer solutions that may become mainstream
y
Consider TCO:
Purchase cost
Learning curve
L i
Productivity
Viability
20. Which solution sets will
become mainstream
b i t
High confidence
Horizontally scalable: to take advantage of hardware trends
Non-
Non-relational: to enable scalability
Highly functional: for usage beyond mega-scale
mega-
Developer-
Developer-friendly: because decision making has shifted
Freely available: for rapid adoption
My predictions
Document oriented: enables scalability, functionality,
developer friendliness, and agility
Open source: with multiple PaaS providers