3. Daisy
(this is what we do)
Java
open source cms
MySQL Lucene
system- and
full-text indexes
metadata
filesystem actual content
4. rather nice query language
rather nice publishing features
wanted to move away from
WCMS
fast access control and facet
browser requires the active
document set to reside in
memory cache
SCALE?
5. Findings
we learned a lot (about content management) over the past 6
years (like: versioning, staging, multilinguality, searching, access
control, publishing)
people don’t like Cocoon / XSLT, prefer templating instead
a lot of our project specifics were about finding the correct
storage model for specific data structures (not all data was fit for
our CMS)
customers with growing ambitions → Daisy has to follow !
6. Overhaul
get rid of Cocoon → Kauri
quest for differentation
“the internet barrier”
combine massive storage
with useful query-ability
REST, Maven, Spring,
webapps
www.kauriproject.org
7. The Internet Barrier
fundamental technology gap
NoSQL
buy/use vs build
focus on architecture and
infrastructure
SQL rise of semi- or free-
structured information
layered approach
http://github.com/blog/530-how-we-made-github-fast
8. Conclusions
let’s start from scratch (ouch)
a different architecture / foundations
scale big and be available
modularity (pluggability)
we’re not a banking application, so consistency might be less
important
9. Storage challenges
sparse data structures
flexible, evolving data structures
lack of good fault-tolerance setups
cope with scale
CAP vs BASE
(Google) BigTable and (Amazon) Dynamo
10. ACID vs BASE
atomicity consistency isolation durability basically available, soft state, eventually consistent
Strong Consistency Weak Consistency
Isolation Availability First
Focus on “commit” Best Effort
Nested Transactions Approximative Answers OK
Availability? Aggressive (optimistic)
Conservative (pessimistic) Simpler!
Difficult Evolution (schema) Faster
Easier Evolution
spectrum slide: Mark Brewer
11. Our CAP multilemma
scale
consistency partition
availability
of data tolerance
ping means results ‘cluster splits’ should not block
fault tolerance
12. C?A?P?
Initial gut feeling: cAP
A was a given
C would be a function of our datastore choice
however the P seemed like a nice-to-have (aka over-ambitious
use-case)
13. CAPondering
HBase vs Cassandra
consistency vs SPOF ?
possible higher latency vs possibly frailer community ?
Cocoon trauma
http://www.cs.cornell.edu/projects/ladis2009/talks/ramakrishnan-keynote-ladis2009.pdf
14. Comparison Matrix
Partitioning Availability Replication Storage
Consistency
Sync/async
Local/geo
Hash/sort
Durability
Dynamic
Reads/
Failures
handled
Routing
writes
During
failure
Colo+ Local+ Double Buffer
PNUTS H+S Rtr
Read+ Async Timeline +
Y server write geo Eventual WAL pages
Local+ Buffer
MySQL H+S N Cli Colo+ Read Async ACID WAL
pages
server nearby
Read+ Local+ N/A (no Triple
Y
Colo+ Sync Files
HDFS Other Rtr server write nearby updates) replication
Colo+ Read+ Local+ Multi- Triple LSM/
BigTable S Y Rtr write
Sync nearby version replication SSTable
server
Colo+ Read+ Local+ Buffer
Dynamo H Y P2P write Async nearby
Eventual WAL
pages
server
Read+ Sync+ Local+ Triple LSM/
Cassandra H+S Y P2P Colo+ Eventual WAL SSTable
server write Async nearby
Megastore S Y Rtr Colo+ Read+
Sync Local+
ACID/
Triple
replication
LSM/
SSTable
server write nearby other
Azure S N Cli Server Read+ Buffer
write
Sync Local ACID WAL
pages
100 100
15. HBase
sorted
distributed persisted
column-oriented storage system
multi-dimensional
highly-available
high-performance adds random access reads
and writes atop HDFS
17. People
Inventors Project leads
Google BigTable ☺ Michael Stack (Powerset/
Jim Kellerman (Powerset/ Microsoft)
Microsoft) Jonathan Gray (Streamy.com)
Mike Cafarella (UMich) Ryan Rawson (StumbleUpon)
Jean-Daniel Cryans (SU)
Bryan Duxbury (Rapleaf)
19. HBase data model
Distributed multi-dimensional Keys are arbitrary strings
sparse map
Access to row data is atomic
Multi-dimensional keys:
(table, row, family:column,
timestamp) → value
20. Date: Thu, 12 Nov 2009 18:19:50 -0800
Message-ID: <78568af10911121819x292527b2t7f8b7d857c3650b2@mail.gmail.com>
Subject: Re: newbie: need help on understanding HBase
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org
HBase is semi-column oriented. Column families is the storage model -
everything in a column family is stored in a file linearly in HDFS.
That means accessing data from a column family is really cheap and
easy. Adding more column families adds more files - it has the
performance profile of adding new tables, except you dont actually
have additional tables, so the conceptual complexity stays low.
Data is stored at the "intersection" of the rowid, and the column
family + qualifier. This is sometimes called a "Cell" - contains a
timestamp as well. You can have multiple versions all timestamped.
The timestamp is by default the int64/java system milli time. I have
to recommend against setting the timestamp explicitly if you can avoid
it. So when you retrieve a row, you can get everything, a list of
column qualifiers or a list of families or any combo. (eg: list of
these qualifiers out of family A and everything from family B)
[...]
The terms to use are:
- Column family (or just family): the unit of locality in hbase.
Everything in a family is stored in 1 (or a set) of files. A table is
a name and a list of families with attributes for those families (eg:
compression). A family is a string.
- Column qualifier (or just qualifier): allows you to store multiple
values for the same row in 1 family. This value is a byte array and
can be anything. The API converts null => new byte[0]{}. This is the
tricky bit, since most people dont think of "column names" as being
dynamic.
- Cell - the old name for a value + timestamp. The new API (see:
class Result) doesn't use this term, instead provides a different path
to read data.
You can use HBase as a normal datastore and use static names for the
qualifiers, and that is just fine. But if you need something special
to get past the lack of relations, you can start to do fun things with
the qualifier as data. Building a secondary index for example. The
row key would be the secondary value (eg: city) and the qualifier
would be the primary key (eg: userid) and the value would be a
placeholder to indicate the value exists.
22. Getting data in and out
Java API
Thrift multi-language API
Stargate REST connector
HBase shell (JRuby IRB-
based)
Processing: MapReduce and
Cascading
Atomicity. All of the operations in the transaction will complete, or none will.
Consistency. The database will be in a consistent state when the transaction begins and ends.
Isolation. The transaction will behave as if it is the only operation being performed upon the database.
Durability. Upon completion of the transaction, the operation will not be reversed.
consistency of data: think serializability
availability: pinging a live node should produce results
partition tolerance: live nodes should not be blocked by partitions