Compendium of my Brisk, Cassandra & Hadoop talks of the Summer 2011 - Delivered at JavaOne2011. I like the content in this one personally as it touches, Usecase driven intro to Cassandra, NoSQL followed by Intro to hadoop - MapReduce, HDFS internals, NameNode and JobTrackers. And how Brisk decomposes the Single point of failures in HDFS while providing a single form for Realtime & Batch storage and processing.
(And it seemed enjoyable to the audience in attendance)
26. map(String key, String value):
// key: document name
// value: document contents for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts int result = 0;
for each v in values: result += ParseInt(v);
Emit(AsString(result));
word count in MapReduce
44. BriskSimpleSnitch.java
if(TrackerInitializer.isTrackerNode)
{
myDC = BRISK_DC;
logger.info("Detected Hadoop trackers
are enabled, setting my DC to " + myDC);
}
else
{
myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla
Cassandra nodes, setting my DC to " + myDC);
}
46. hive> CREATE TABLE invites (foo INT, bar
STRING)PARTITIONED BY (ds STRING);
hive> LOAD DATA LOCAL INPATH
'$BRISK_HOME/resources/hive/examples/files/kv2.txt'
OVERWRITE INTO TABLE invites PARTITION (ds='2008-
08-15');
hive> SELECT count(*), ds FROM invites GROUP BY ds;
http://www.datastax.com/docs/0.8/brisk/about_hive
49. No me in team!
Ben Coverston Michael Allen
Ben Werther Mike Bulman
Brandon Williams Nate McCall
Cathy Daw Nick M Bailey
Jackson Chung Patricio Echague
Jake Luciani Tyler Hobbs
Joaquin Casares SriSatish Ambati
Jonathan Ellis Yewei Zhang
54. Consistency: R + W > N
ORACLE, 2-node: R=1, W=2, N=2,(T=2)
DNS
"brisk.consistencylevel.read", "QUORUM";
"brisk.consistencylevel.write", "QUORUM";
* N is replication factor. Not to be confused with T=total #of nodes
64. Beautiful C 0 d e
= new code(); //less is more
~90k.java.concurrent.@annotate.
bloomfilters, merkletrees.
non-blocking, staged-event-driven.
bigtable, dynamo.
66. Community
Robust. Rapid. Brisk #
Professional support from DataStax.
git clone git@github.com:riptano/brisk.git
engineers: independent,startups, large companies,
Rackspace, Twitter, Netflix..
Come join the efforts!
67. Usecase #4: first NoSQL, then scale!
simpledb Cassandra
mongodb Cassandra
72. Summary -
high scale peer-to-peer datastore
best friend for
multi-region, multi-zone availability.
Hadoop – HDFS engulfing the DataWorld
Brisk – best of both worlds!