Abstract:
Cassandra is a new kind of database: it is more than a single-machine system. It naturally runs in a High-Availability configuration. All nodes in the system are symmetric; there is no single point of failure. As you add machines, failure becomes routine, and Cassandra is built to tolerate that with no interruptions.
Cassandra is linearly scalable with good performance characteristics for very small and very large data stores. Unlike earlier efforts, Cassandra is more than just a key-value store; it is a structured data store which can facilitate complex use cases and queries. Cassandra allows for random access to your data organized into rows and columns.
Cassandra is different, and exciting. This presentation will discuss the pros and cons of using Cassandra, and why it has seen such amazing adoption in the past year.
Bio:
Ben Coverston is Director of Operations at DataStax (formerly knows as Riptano), a provider of software, support, services, training, resources and help for Cassandra. He has been involved in enterprise software his entire career. Working in the airline industry, he helped to build some of the highest volume online booking sites in the world. He saw first hand the consequences of trying to solve real world scalability problems at the limit of what traditional relational databases are capable of.
1. Ben
Coverston
Director
of
Opera2ons
ben.coverston@datastax.com
Hosted
By:
Ma=hew
O’Keefe
MorningStar
2. History
• Open
Sourced
by
FB
in
July
2008
• Apache
Incubator
March
2009
• Graduated
March
2010
• Riptano
Founded
April
2010
• First
Summit
August
2010
• Riptano
Changed
to
Datastax
January
2011
3. You
Changed
Your
Name?
Why!?
• Suits
– Marke2ng
– Relevancy
– Riptano
too
“Skateboard”
• The
Real
Reason?
– “The
X
makes
it
sound
cool.”
–
Bender
Bending
Rodriguez,
Futurama
4. Strengths
• Scalable
• Reliable
– Replica2on
that
works
– Mul2-‐DC
Support
– No
Single
Point
of
Failure
• Analy2cs
in
the
same
system
as
OLTP
(with
“integrated”
Hadoop
support)
5. Weaknesses
• No
ACID
Transac2ons
• Limited
Support
for
(OLTP)
ad-‐hoc
queries
• ..but
you
lost
that
when
you
started
to
shard
your
rela2onal
system.
6. A
Short
History
of
Big
Data
(Or
Why
Cassandra)
• Rela2onal
databases
scale
poorly
• B-‐trees
are
slow
– ..and
require
read
before
write.
– ..hope
your
dataset
fits
in
memory
14. What
do
we
end
up
with?
(“The
eBay
Architecture,”
Randy
Shoup
and
Dan
Pritche=)
15.
16. BASE
• BASE
is
diametrically
opposed
to
ACID.
Where
ACID
is
pessimis2c
and
forces
consistency
at
the
end
of
every
opera2on,
BASE
is
op2mis2c
and
accepts
that
the
database
consistency
will
be
in
a
state
of
flux.
Although
this
sounds
impossible
to
cope
with,
in
reality
it
is
quite
manageable
and
leads
to
levels
of
scalability
that
cannot
be
obtained
with
ACID.
– Dan
Pritche=
–
NoSQL
Pioneer,
Ebay
Engineer
h=p://queue.acm.org/detail.cfm?id=1394128
17. Myth
• Lack
of
ACID
means
that
I
have
to
give
up
transac2onal
guarantees
and
consistency.
• Paraphrasing:
At
Nellix
we
tend
to
be
op2mis2c.
When
things
don’t
quite
work
out
we
try
again.
– Siddharth
Andand
• Achievable
18. Cassandra
In
Produc2on
• Nellix
:
Streaming
Bookmarks
• Digital
Reasoning:
NLP
&
En2ty
Analy2cs
• OpenX:
largest
publisher-‐side
ad
network
• Cloudkick:
performance
data
&
aggrega2on
• SimpleGeo:
loca2on-‐as-‐API
• Ooyala:
video
analy2cs
and
business
intelligence
• ngmoco:
massively
mul2player
online
game
worlds
• Kosmix:
social
media
aggrega2on
• Reddit:
vote
tracking
system
• Twi=er:
Rainbird,
geo
data,
analy2cs
• …
lots
more
19. Who
is
inves2ng
in
Cassandra?
• DataStax
• Twi=er:
– We're
inves2ng
in
Cassandra
every
day.
It'll
be
with
us
for
a
long
2me
and
our
usage
of
it
will
only
grow.
• Rackspace
• >
100
different
individuals
have
submi=ed
patches
to
C*
• You?
20. Durability
• Write
to
Commit
Log
– fsync
is
cheap
(append
only)
– Latency
is
only
subject
to
rota2onal
latency
• Separate
par22on
(no
seeking)
• SSD
won’t
hurt,
but
it
may
not
help
either.
• Write
to
memtable
• Flush
memtable
to
SSTable
29. Replica2on
• Simple
Replica2on
Strategy
• Network
Topology
Strategy
– How
many
replicas
in
each
datacenter
for
each
keyspace?
– Generaliza2on
of
Rack
Aware
Strategy
33. Reliability
• No
Single
Points
of
Failure
• Mul2ple
Datacenters
• Monitorable
– JMX
(or
whatever
plugs
into
it
–
lots
of
counters)
– Cac2
– Munin
– Nagios
34. Expecta2on
of
Failure
• C*
is
designed
to
fail
• No
“Clean
Shutdown”
• kill
-‐9,
it’s
ok.
56. I
can
has
smarter
clients?
l Don't
use
thriv
directly
l Higher
level
clients
have
a
lot
of
features
you
want
l Knowledge
about
data
types
l Connec2on
pooling
l Automa2c
retries
l Logging
58. Raw
thriv
API:
Inser2ng
data = {'id': useruuid, ...}
columns = [Column(k, v, time.time())
for (k, v) in data.items()]
mutations = [Mutation(ColumnOrSuperColumn(column=c))
for c in columns]
rows = {useruuid: {'User': mutations}}
client.batch_mutate('Twissandra', rows,
ConsistencyLevel.ONE)
61. Language
support
l Python
l pycassa
l telephus
l Ruby
l Speed
is
a
nega2ve
l Java
l Hector
l PHP
(soon
with
less
suckage!)
62. Done
yet?
l S2ll
doing
1+N
queries
per
page
l Solu2on:
Supercolumns
l Err..
Well
maybe…
63. Supercolumns:
limita2ons
l Requires
reading
an
en2re
SC
(not
the
en2re
row)
from
disk
even
if
you
just
want
one
subcolumn
l No
Secondary
Indexes
l It’s
just
an
extra
map
layer.
l Probably
best
to
avoid
them
if
you
can.
64. UUIDs
l Column
names
should
be
uuids,
not
longs,
to
avoid
collisions
l Version
1
UUIDs
can
be
sorted
by
2me
(“TimeUUID”)
l Any
UUID
can
be
sorted
by
its
raw
bytes
(“LexicalUUID”)
l Usually
Version
4
l Slightly
less
overhead
66. Lucandra
l What
documents
contain
term
X?
l …
and
term
Y?
l …
or
start
with
Z?
67. FAQ:
coun2ng
l UUIDs
+
batch
process
l Mutex
(contrib/mutex
or
“cages”)
l Use
redis
or
mysql
or
memcached
l column-‐per-‐app-‐server
l counter
API
(aver
.7
is
out)
68. Tips
l Insert
instead
of
check-‐then-‐insert
l Use
client-‐side
clock
to
your
advantage
l use
TTL
l Wider
rows
(but
not
too
wide)
l Start
with
queries,
work
backwards
l Avoid
storing
extra
“2mestamp”
columns