hbase - random reads/writes - 45% of all hadoop clusters\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Drill \nRemove schema requirement\nIn-situ for real since we’ll support multiple formats\n\nNote: MR needed for big joins so to speak\n
Drill\nWill support nested\nNo schema required\n
Protocol buffers are conceptual data model\nWill support multiple data models\nWill have to define a way to explain data format\n (filtering, fields, etc)\nSchema-less will have perf penalty\nHbase will be one format\n
Likely to support these\nCould add HiveQL and more as well. Could even be clever and support HiveQL to MR or Drill based upon query\nPig as well\n\nPluggability\nData format\nQuery language\n\nSomething 6-9 months alpha quality\nCommunity driven, I can’t speak for project\n\nMapR\nFS gives better chunk size control\nNFS support may make small test drivers easier\nUnified namespace will allow multi-cluster access\nMight even have drill component that autoformats data\n\n\nRead only model\n
Example query that Drill should support\n\nNeed to talk more here about what Dremel does\n
Load data into Drill (optional)\nCould just use as is in “row” format\nMultiple query languages\nPluggability very important\n
Note: we have an already partially built execution engine\n
Note: we have an already partially built execution engine\n
\n
\n
\n
\n
\n
\n
Be prepared for Apache questions\nCommitter vs committee vs contributor\n\nIf can’t answer question, ask them to answer and contribute\nLisa - Need landing page\nReferences to paper and such at end\n
\n
\n
\n
scaling is painful\npoor fault tolerance\ncoding is hard\n
\n
\n
tweets stock ticks manufacturing machine data sensor messages\n
\n
\n
\n
\n
DAG\n\nruns continuously\n
abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n
\n
\n
kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n
\n
current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n
current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n