1. Tachyon and Apache Spark:
heralds of in-memory computing era.
Roman Shaposhnik
Director of Open Source @Pivotal
(Twitter: @rhatr)
2. Who’s this guy?
• Director of Open Source @Pivotal
• Apache Software Foundation guy (Member, VP of Apache
Incubator, committer on Hadoop, Giraph, Sqoop, etc)
• Used to be root@Cloudera
• Used to be PHB@Yahoo! (original Hadoop team)
22. Spark innovations
• Resilient Distribtued Datasets (RDDs)
• Distributed on a cluster
• Manipulated via parallel operators (map, etc.)
• Automatically rebuilt on failure
• A parallel ecosystem
• A solution to iterative and multi-stage apps
31. It will be called Hadoop
MLib
Shark
GraphX
Streaming
HDFS
Crunch Mahout
Pig
Sqoop Flume
Coordination and
workflow
management
Zookeeper
Command
Center
ASF Projects
FLOSS Projects
Pivotal Products
GemFire with Tachyon
Oozie
MapReduce
Hive
Tez
Giraph
Hadoop UI
Hue
SolrCloud
Phoenix
HBase
Spark
Impala
HAWQ
SpringXD
MADlib
Hamster
PivotalR
YARN
32. Spark/Tachyon recap
• Is it “Big Data” (Yes)
• Is it “Hadoop” (No)
• It’s one of those “in memory” things, right (Yes)
• JVM, Java, Scala (All)
• Is it Real or just another shiny technology with
a long, but ultimately small tail (Yes and ?)