Re-inventing the Database: What to Keep and What to Throw Away

Reinventing the Database
Max Schireson
President, 10gen

My background

At Oracle from 1994 to 2003

At MarkLogic from 2003 to Feb 2011

Join 10gen Feb 2011

The world has changed

1970 2011
Main memory Intel 1103, 1k bits 4GB of RAM costs
$25.99
$25 99
Mass storage IBM 3330 Model 1, 100 3TB Superspeed USB
MB for $129
Microprocessor Nearly – 4004 being Westmere EX has 10
developed; 4 bits and cores, 30MB L3 cache,
92,000 instructions per runs at 2.4GHz
second
Motor Trend Car of the Ford Torino Chevy Volt
Year
President Richard Nixon Barack Obama
Ted Codd In his 40’s Dead
Me In diapers In my 40s

More recent changes

A decade ago Now
Faster Buy a bigger server Buy more servers
Faster t
F t storage A SAN with more
ith SSD
spindles
More reliable storage More expensive SAN More copies of local
storage
Deployed in Your data center The cloud – private or
public
Large user base Thousands - Millions - consumers
employees
Tracking Business transactions Every click and more

Assumptions behind todays
DBMS
Relational data model
Third normal form
ACID
SQL
Q
Multi-
Multi-statement transactions
Database is hardware agnostic
RAM is small and disks are slow
If its too slow you can buy a faster computer

Yesterday’s assumptions in
today’s
t d ’ worldld

Scaleout is hard
Distributed joins are hard
Making two-phase commits fast is hard
two-

Custom solutions proliferate
p

Too slow? Just add a cache

ORM t l everywhere
tools h

More computers and disk are nearly free but SAN
and f
d faster computers are expensive
i

Challenging some
assumptions
ti
Do you need a database at all

How does it scale out

What type of queries does it need to be able to do

How should it model data

How do you query it

How does it handle transactions and consistency

Is i
I it enterprise software, open source, an appliance, or a cloud service
i f li l d i

Does the data fit in memory?

What if your disks are SSD?

My opinions

Different use cases will produce different answers

Existing RDBMS solutions will continue to solve a
broad set of problems well but many applications
will work better on top of alternative technologies

Many new technologies will find niches but only
one or two will become mainstream

Do you need a database at
all
ll
Can you better solve your problem with a batch
processing framework

Can you better solve your problem with an in
memory object store/cache

How does it scale out

Scale-
Scale-out for working set size

Scale-
Scale-out for total data size

Scale out for write volume

Scale-
Scale-out for read volume

Scale-
Scale-out for redundancy

How do you incrementally add nodes or change configuration

How do you trade off query performance (which wants fewer
index segments) for elasticity (which wants more index
segments))

What type of queries does it
need t b able to d
d to be bl t do

Is a key/value store enough

Will you be retrieving your data by one key or by
many

Is there a primary way you ll be viewing your data
you’ll

Do you need specialized queries (eg, time series,
(eg,
geospatial)

Imagine a garage…
You hand your valet the keys to your car

Before they park your car, they completely disassemble it

The pistons are stored in piston storage, brake pads with brake pads, steering
p p g p p g
wheels with steering wheels

Over time, they have storage areas for catalytic converters, DVD-based nav
DVD-
systems, headlight washers, and traction control systems

When you ask for your car back, the valet is incredibly fast at reassembly

One minor issue: you have to provide the disassembly and reassembly instructions
and they will be followed literally, even if you say the spare tire should be used as
a steering wheel and forgot to specify re-insertion of spark plugs
re-

A technological marvel

Might be a good way to store your car if you don’t know whether you’ll be asking
for a car back or lots of brake pads or pistons – for a salvage yard?

How should it model data

Relational
Row oriented or column oriented

Key value

Document oriented

Graph oriented

How do you query it

Do you want an API, a language, or a map-reduce
map-
style interface?

Will most of your queries be hand-typed, embedded
hand-
in code or dynamically generated

How do you handle
transactions and consistency
t ti d i t
Do you need transactions at all
Be careful; web services, for example, need to be able to
assign userIDs

Do you need multi-master updates
multi-
If so, how do y resolve conflicts
, you

Do you need immediate consistency?
For some queries or all?

How do you handle failures
Are you optimizing for read availability or write
availability

What is it

Enterprise software
Open source
p
With commercial support?

Appliance
Packaged with commodity hardware
Specialized hardware

Cloud
Cl d service
i
Available for on-premise deployment?
on-
Integrated in another PaaS offering?
Where on the net?

Does the data fit in
memory
Transactions can be very very fast

Do you trust enough copies in memory (perhaps
across multiple data centers) or do you require
some sort of sync to persistent storage

How big will the data be and how much do you
care about costs

What if your disks are SSD

Alleviate hotspots

Random accesses are measured in microseconds not
milliseconds

Degradation from in-memory to on-disk can be
in- on-
more graceful
But data representations on disk vs in memory may be
very different which may create significant overhead

In choosing a solution

Examine your requirements
They will dictate certain choices

Once you have narrowed the field
Prefer solutions that may become mainstream
y
Consider TCO:
Purchase cost
Learning curve
L i
Productivity
Viability

Which solution sets will
become mainstream
b i t
High confidence
Horizontally scalable: to take advantage of hardware trends
Non-
Non-relational: to enable scalability
Highly functional: for usage beyond mega-scale
mega-
Developer-
Developer-friendly: because decision making has shifted
Freely available: for rapid adoption

My predictions
Document oriented: enables scalability, functionality,
developer friendliness, and agility
Open source: with multiple PaaS providers

Re-inventing the Database: What to Keep and What to Throw Away

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Re-inventing the Database: What to Keep and What to Throw Away

Ähnlich wie Re-inventing the Database: What to Keep and What to Throw Away (20)

Mehr von DATAVERSITY

Mehr von DATAVERSITY (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Re-inventing the Database: What to Keep and What to Throw Away