Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Gluecon miller horizon
1. NEARING THE EVENT HORIZON.
HADOOP WAS PREDICTABLE, WHAT’S NEXT?
May 23, 2012 Mike Miller
mike@cloudant.com
@mlmilleratmit
2. What I Am
Cloudant Founder, Chief Scientist
(we’re hiring at all positions)
Affiliate Assistant Professor, Particle Physics(UW)
Background: machine learning, analysis, big data,
globally distributed systems
Mike Miller, GlueCon May 2012 2
3. What I Am
A CDN for your Application Data
Mike Miller, GlueCon May 2012 3
4. What I Am Not
didn’t see these coming
Super luminal neutrinos
Red Sox epic collapse in September
Red Wings losing in the first round
...
But here I go anyway
Mike Miller, GlueCon May 2012 4
5. My First Postulate of Big-Data
Google Matters
What matters for google...
... matters for the internet...
...and therefore matters for the enterprise...
... will therefore be re-architected by Apache...
... and therefore matters to you.
Mike Miller, GlueCon May 2012 5
6. Evidence
Business Week, 12/24/2007
Mike Miller, GlueCon May 2012 6
7. Evidence
Business Week, 12/24/2007
Mike Miller, GlueCon May 2012 6
8. Evidence
Business Week, 12/24/2007
Mike Miller, GlueCon May 2012 6
9. The Old Canon
• Google File System (the important one)
http://labs.google.com/papers/gfs.html
• MapReduce (the big one)
http://labs.google.com/papers/mapreduce.html
• BigTable (clone me!)
http://labs.google.com/papers/bigtable.html
• Dynamo (ok, AWS. but masterless quorum)
http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
copy these. use these. print $$$
Mike Miller, GlueCon May 2012 7
10. MapReduce: The Awesome
• Approachable interface
“What do I do with a single piece of data?”
• Data Parallel
Developers can basically forget about scatter-gather
• Fault Tolerant
Failure at scale is the norm!
Protects both user and system operator
• IO Optimized
Built for sequential IO
commodity disks spinning forward at O(20 MB/sec) each
Mike Miller, GlueCon May 2012 8
11. So... is that it?
http://gigaom.com/cloud/democratizing-big-data-is-hadoop-our-only-hope/
Mike Miller, GlueCon May 2012 9
12. So... is that it?
http://gigaom.com/cloud/democratizing-big-data-is-hadoop-our-only-hope/
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/
Mike Miller, GlueCon May 2012 9
13. So... is that it?
http://gigaom.com/cloud/democratizing-big-data-is-hadoop-our-only-hope/
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/
http://mackiemathew.com/2012/02/25/the-problems-in-hadoop-when-does-it-fail-to-deliver/
Mike Miller, GlueCon May 2012 9
14. MapReduce: The not so Awesome
• Hadoop doesn’t power big data applications
Not a transactional datastore. Slosh back and forth via ETL
• Processing latency
Non-incremental, must re-slurp entire dataset every pass
• Ad-Hoc queries
Bare metal interface, data import
• Graphs
Only a handful of graph problems amenable to MR
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120
Mike Miller, GlueCon May 2012 10
15. To the Event Horizon
Mike Miller, GlueCon May 2012 11
16. Enter The New Canon
• Percolator
incremental processing
http://research.google.com/pubs/pub36726.html
• Dremel
ad-hoc analysis queries
http://research.google.com/pubs/pub36632.html
• Pregel
Big graphs
http://dl.acm.org/citation.cfm?id=1807184
Scalable, Fault Tolerant, Approachable
Mike Miller, GlueCon May 2012 12
18. Percolator: incremental processing
• Replaced MapReduce as the tool to build search index
“However, reprocessing the entire web discards the work done in earlier runs and makes latency
proportional to the size of the repository, rather than the size of the update.”
• Bigtable alone can’t do it
“BigTable scales...but doesn’t provide tools to help programmers maintain data invariants in the
face of concurrent updates.”
• Applicability
Incrementally updating data
Computational output can be broken down into small pieces
Computation large in some dimension (data size, cpu, etc)
• Does it matter?
“...Converting the indexing system to an incremental system ... reduced the averaging document
processing latency by a factor of 100...”
Mike Miller, GlueCon May 2012 14
19. Percolator: incremental processing
• BigTable plus...
Multi-row ACID Transactions
snapshot isolation, lazy locks
up to 10s write latencies
Timestamps
Notifications Start Timestamp (read)
Do not maintain invariants
Commit Timestamp (write)
Observer Framework
your code to be run upon notification of an update
Mike Miller, GlueCon May 2012 15
23. Dremel: ad-hoc Query
• Scalable, interactive ad-hoc query system for read-only nested data
“...capable of running aggregation queries over trillion-row tables in seconds.”
• ... on nested data structures in situ
Web and scientific data is often non-relational
nested data (protobuffs) underlies most structured data at Google
• Usage
DEFINE TABLE t AS /path/to/data/*
SELECT TOP(signal1,100), COUNT(*) FROM t
• Applicability
Analysis of crawled documents
Tracking of install data for apps on Android Market
Crash reports
Spam analysis...
Dream BI Tool
Mike Miller, GlueCon May 2012 19
24. Dremel: ad-hoc Query
• Ingredients
In situ data
SQL like interface
Serving trees for query execution
Column striped data (3-10x)
Analysis Catalogs
Mike Miller, GlueCon May 2012 20
25. Dremel: ad-hoc Query
Columns ~10x faster than Records 21
Mike Miller, GlueCon May 2012
26. Dremel: ad-hoc Query
Benchmark Data MapReduce (via Sawzall)
Dremel (via SQL)
Mike Miller, GlueCon May 2012 22
27. Dremel: ad-hoc Query
Significant Optimization Possible
Dremel ~100x Faster than Stock MR
Mike Miller, GlueCon May 2012 23
28. Dremel: ad-hoc Query
Most Production Queries Executed in <10 seconds
Mike Miller, GlueCon May 2012 24
30. Pregel: Big Graphs
• Massively parallel processing of big graphs
billions of vertices, trillions of edges
• Bulk synchronous parallel model
sequence of vertex oriented iterations
send/receive messages from other vertex computations
read/modify state of vertex, outgoing edges, graph topology
• Expressive, easy to program
distribution details hidden behind abstract API
• Iterative
computation continues until each vertex votes to terminate
• In production
PageRank 15 lines of code
Mike Miller, GlueCon May 2012 26
31. Pregel: Big Graphs
• Master “Name” node
connects processes for messaging
• Message Passing
no remote procedures, reads
• Graph hashed across nodes
vertex, outgoing edges stored in RAM
• Aggregators
global mechanism for aggregation
all but final reduce computed on node local data
• Checkpointing
configurable, enables automatic recovery
Mike Miller, GlueCon May 2012 27
33. Pregel: Big Graphs
Near Linear Scaling to 1B nodes
Mike Miller, GlueCon May 2012 29
34. Learn More
• Incremental Processing
Incremental, in-database map/reduce in Cloudant’s BigCouch
HBase 0.92 supports observers/coprocessors
Stream processing via Storm, HStreaming, etc.
• Ad Hoc Query
Google BigQuery
Column stores (Vertica, etc)
OpenDremel (stalled?)
?
• Big Graphs
Giraph on Hadoop (Apache Incubator)
Golden Orb (stalled?)
Mike Miller, GlueCon May 2012 30
35. Lessons Learned
• Hire Jeff Dean and Sanjay Ghemawat
• GFS enables everything
• There is massive opportunity on the horizon
Mike Miller, GlueCon May 2012 31