4. But first... the CAP Theorem C onsistency A vailability P artition Tolerance “ Thou shalt have but 2” - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See: http://en.wikipedia.org/wiki/CAP_theorem for more
5. CAP Theorem: Cassandra Style - Explicit choice of partition tolerance and availability. - Opt for more consistency at the cost of availability Consistency is tunable (per operation)
6.
7. Generally complements another system(s) (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
10. vs. RDBMS - No Joins Unless: - you do them on the client - you do them via Map/Reduce
11. vs. RDBMS - Schema Optional (Though you can add meta information for validation and type checking) *** Supports secondary indexes too: “ … WHERE state = 'TX' ”
12. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries
13. vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
14.
15. vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
51. Data Model – Prematerialized Query Additional examples: Timeline of tweets by a user Timeline of tweets by all of the people a user is following List of comments sorted by score List of friends grouped by state
52.
53.
54.
55. Big Data: Map/Reduce Integration Cassandra Implementations of: - InputFormat and OutputFormat - RecordReader and RecordWriter - InputSplit for Column Families *** See org.apache.cassandra.hadoop package and examples for more
56. Big Data: Pig Integration grunt> name_group = GROUP score_data BY name PARALLEL 3; grunt> name_total = FOREACH name_group GENERATE group, COUNT(score_data.name), LongSum(score_data.score) AS total_score; grunt> ordered_scores = ORDER name_total BY total_score DESC PARALLEL 3; grunt> DUMP ordered_scores;
57. Using a Client Hector Client: http://hector-client.org - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed *** like any open source project fully dependent on another open source project it has it's worts
58.
59.
60.
61. Hector: ColumnFamilyTemplate ColumnFamilyResult<String, String> res = cft.queryColumns("zznate"); String value = res.getString("email"); Date startDate = res.getDate(“startDate”); Key Format Column Name Format