SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Tachyon and Apache Spark:  
heralds of in-memory computing era. 
Roman Shaposhnik 
Director of Open Source @Pivotal 
(Twitter: @rhatr)
Who’s this guy? 
• Director of Open Source @Pivotal 
• Apache Software Foundation guy (Member, VP of Apache 
Incubator, committer on Hadoop, Giraph, Sqoop, etc) 
• Used to be root@Cloudera 
• Used to be PHB@Yahoo! (original Hadoop team)
Dearly beloved…
20 minute to figure out 
Hadoop vs. Spark
20 minute to figure out 
Hadoop++ == Spark
20 minute to figure out 
Hadoop + Spark
But wait! There’s more! 
Tachyon
Long, long time ago… 
HDFS 
ASF Projects 
FLOSS Projects 
Pivotal Products 
MapReduce
In a blink of an eye 
MLib 
Shark 
GraphX 
Streaming 
HDFS 
Crunch Mahout 
Pig 
Sqoop Flume 
Coordination and 
workflow 
management 
Zookeeper 
Command 
Center 
ASF Projects 
FLOSS Projects 
Pivotal Products 
GemFire XD 
Oozie 
MapReduce 
Hive 
Tez 
Giraph 
Hadoop UI 
Hue 
SolrCloud 
Phoenix 
HBase 
Spark 
Impala 
HAWQ 
SpringXD 
MADlib 
Hamster 
PivotalR 
YARN 
Tachyon
A Spark view? 
HDFS 
MLib 
Shark 
YARN 
GraphX 
Streaming 
Tachyon 
Sqoop Flume 
Hadoop UI 
Hue 
Coordination and 
workflow 
management 
Zookeeper 
Command 
Center 
ASF Projects 
FLOSS Projects 
Pivotal Products 
GemFire XD 
Oozie 
SolrCloud 
Phoenix 
HBase Spark 
SpringXD
BDAS
Long, long time ago…
This is 2014
What changed?
Your datacenter 
… 
server 1 
server N
Hadoop’s view 
MapReduce 
server 1 
server N 
HDFS
HDFS: decoupled storage 
… 
MR 
HDFS 
MR
Anatomy of MapReduce 
HDFS mappers reducers HDFS 
a b c 
d a c 
a 3 
b 1 
c 2 
a 1 
b 1 
c 1 
a 1 
c 1 
a 1 
a 1 1 1 
b 1 
c 1 1
What’s wrong with MR? 
Source: UC Berkeley Spark project (just the image)
This looks familiar… 
$ grep –R | awk | sort …
Spark innovations 
• Resilient Distribtued Datasets (RDDs) 
• Distributed on a cluster 
• Manipulated via parallel operators (map, etc.) 
• Automatically rebuilt on failure 
• A parallel ecosystem 
• A solution to iterative and multi-stage apps
RDDs 
warnings = textFile(…).filter(_.contains(“warning”)) 
.map(_.split(‘ ‘)(1)) 
HadoopRDD 
path = hdfs:// 
FilteredRDD 
contains… 
MappedRDD 
split…
Parallel operators 
• map, reduce 
• sample, filter 
• groupBy, reduceByKey 
• join, leftOuterJoin, rightOuterJoin 
• union, cross
What is really happening? 
MLib 
Shark 
GraphX 
Streaming 
HDFS 
Crunch Mahout 
Pig 
Sqoop Flume 
Coordination and 
workflow 
management 
Zookeeper 
Command 
Center 
ASF Projects 
FLOSS Projects 
Pivotal Products 
GemFire XD 
Oozie 
MapReduce 
Hive 
Tez 
Giraph 
Hadoop UI 
Hue 
SolrCloud 
Phoenix 
HBase 
Spark 
Impala 
HAWQ 
SpringXD 
MADlib 
Hamster 
PivotalR 
YARN 
Tachyon
May be its not so bad 
server 1 
server N
But HDFS/YARN are safe? 
HDFS, Ceph, S3, NAS, etc. 
New 
HDFS 
New 
YARN
Tachyon 
• In-memory data-exchange layer 
• A set of evolving APIs: 
• filesystem 
• caching 
• RDDs 
• Materialized views
Tachyon
Spark is best for cloud
It will be called Hadoop 
MLib 
Shark 
GraphX 
Streaming 
HDFS 
Crunch Mahout 
Pig 
Sqoop Flume 
Coordination and 
workflow 
management 
Zookeeper 
Command 
Center 
ASF Projects 
FLOSS Projects 
Pivotal Products 
GemFire with Tachyon 
Oozie 
MapReduce 
Hive 
Tez 
Giraph 
Hadoop UI 
Hue 
SolrCloud 
Phoenix 
HBase 
Spark 
Impala 
HAWQ 
SpringXD 
MADlib 
Hamster 
PivotalR 
YARN
Spark/Tachyon recap 
• Is it “Big Data” (Yes) 
• Is it “Hadoop” (No) 
• It’s one of those “in memory” things, right (Yes) 
• JVM, Java, Scala (All) 
• Is it Real or just another shiny technology with 
a long, but ultimately small tail (Yes and ?)
A NEW PLATFORM FOR A NEW 
ERA
Questions ?

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelinprajods
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark Summit
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...Holden Karau
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsTimothy Spann
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersDataWorks Summit/Hadoop Summit
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Alex Zeltov
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processingprajods
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Summit
 

Was ist angesagt? (20)

Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1   introduction to cluster computing wit...Alpine academy apache spark series #1   introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Apache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data ProcessingApache Spark: The Next Gen toolset for Big Data Processing
Apache Spark: The Next Gen toolset for Big Data Processing
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
 

Andere mochten auch

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey ClientMichal Gajdos
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsNLJUG
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonClaudiu Barbura
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingAhmed Soliman
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on AndroidTomáš Kypta
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsStephane Manciot
 
Reactive streams
Reactive streamsReactive streams
Reactive streamscodepitbull
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Izzet Mustafaiev
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQmkiedys
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Björn Antonsson
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsDejan Glozic
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM DeploymentJoe Kutner
 

Andere mochten auch (13)

Reactive Jersey Client
Reactive Jersey ClientReactive Jersey Client
Reactive Jersey Client
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
A Journey to Reactive Function Programming
A Journey to Reactive Function ProgrammingA Journey to Reactive Function Programming
A Journey to Reactive Function Programming
 
Reactive programming on Android
Reactive programming on AndroidReactive programming on Android
Reactive programming on Android
 
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streamsPSUG #52 Dataflow and simplified reactive programming with Akka-streams
PSUG #52 Dataflow and simplified reactive programming with Akka-streams
 
Reactive streams
Reactive streamsReactive streams
Reactive streams
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?Docker. Does it matter for Java developer ?
Docker. Does it matter for Java developer ?
 
Reactive Streams and RabbitMQ
Reactive Streams and RabbitMQReactive Streams and RabbitMQ
Reactive Streams and RabbitMQ
 
Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014Resilient Applications with Akka Persistence - Scaladays 2014
Resilient Applications with Akka Persistence - Scaladays 2014
 
Micro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factorsMicro services, reactive manifesto and 12-factors
Micro services, reactive manifesto and 12-factors
 
12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment12 Factor App: Best Practices for JVM Deployment
12 Factor App: Best Practices for JVM Deployment
 

Ähnlich wie Tachyon and Apache Spark

Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?rhatr
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014cdmaxime
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Rajan Kanitkar
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkManish Gupta
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたmoai kids
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 

Ähnlich wie Tachyon and Apache Spark (20)

Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
Apache Spark - Las Vegas Big Data Meetup Dec 3rd 2014
 
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Lightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache SparkLightening Fast Big Data Analytics using Apache Spark
Lightening Fast Big Data Analytics using Apache Spark
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Presentation
PresentationPresentation
Presentation
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 

Mehr von rhatr

Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemrhatr
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...rhatr
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofrhatr
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformrhatr
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...rhatr
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 

Mehr von rhatr (7)

Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystem
 
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
OSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear ofOSv: probably the best OS for cloud workloads you've never hear of
OSv: probably the best OS for cloud workloads you've never hear of
 
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platformApache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 

Kürzlich hochgeladen

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 

Kürzlich hochgeladen (20)

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 

Tachyon and Apache Spark

  • 1. Tachyon and Apache Spark: heralds of in-memory computing era. Roman Shaposhnik Director of Open Source @Pivotal (Twitter: @rhatr)
  • 2. Who’s this guy? • Director of Open Source @Pivotal • Apache Software Foundation guy (Member, VP of Apache Incubator, committer on Hadoop, Giraph, Sqoop, etc) • Used to be root@Cloudera • Used to be PHB@Yahoo! (original Hadoop team)
  • 4. 20 minute to figure out Hadoop vs. Spark
  • 5. 20 minute to figure out Hadoop++ == Spark
  • 6. 20 minute to figure out Hadoop + Spark
  • 7. But wait! There’s more! Tachyon
  • 8. Long, long time ago… HDFS ASF Projects FLOSS Projects Pivotal Products MapReduce
  • 9. In a blink of an eye MLib Shark GraphX Streaming HDFS Crunch Mahout Pig Sqoop Flume Coordination and workflow management Zookeeper Command Center ASF Projects FLOSS Projects Pivotal Products GemFire XD Oozie MapReduce Hive Tez Giraph Hadoop UI Hue SolrCloud Phoenix HBase Spark Impala HAWQ SpringXD MADlib Hamster PivotalR YARN Tachyon
  • 10. A Spark view? HDFS MLib Shark YARN GraphX Streaming Tachyon Sqoop Flume Hadoop UI Hue Coordination and workflow management Zookeeper Command Center ASF Projects FLOSS Projects Pivotal Products GemFire XD Oozie SolrCloud Phoenix HBase Spark SpringXD
  • 11. BDAS
  • 12. Long, long time ago…
  • 15. Your datacenter … server 1 server N
  • 16. Hadoop’s view MapReduce server 1 server N HDFS
  • 17. HDFS: decoupled storage … MR HDFS MR
  • 18.
  • 19. Anatomy of MapReduce HDFS mappers reducers HDFS a b c d a c a 3 b 1 c 2 a 1 b 1 c 1 a 1 c 1 a 1 a 1 1 1 b 1 c 1 1
  • 20. What’s wrong with MR? Source: UC Berkeley Spark project (just the image)
  • 21. This looks familiar… $ grep –R | awk | sort …
  • 22. Spark innovations • Resilient Distribtued Datasets (RDDs) • Distributed on a cluster • Manipulated via parallel operators (map, etc.) • Automatically rebuilt on failure • A parallel ecosystem • A solution to iterative and multi-stage apps
  • 23. RDDs warnings = textFile(…).filter(_.contains(“warning”)) .map(_.split(‘ ‘)(1)) HadoopRDD path = hdfs:// FilteredRDD contains… MappedRDD split…
  • 24. Parallel operators • map, reduce • sample, filter • groupBy, reduceByKey • join, leftOuterJoin, rightOuterJoin • union, cross
  • 25. What is really happening? MLib Shark GraphX Streaming HDFS Crunch Mahout Pig Sqoop Flume Coordination and workflow management Zookeeper Command Center ASF Projects FLOSS Projects Pivotal Products GemFire XD Oozie MapReduce Hive Tez Giraph Hadoop UI Hue SolrCloud Phoenix HBase Spark Impala HAWQ SpringXD MADlib Hamster PivotalR YARN Tachyon
  • 26. May be its not so bad server 1 server N
  • 27. But HDFS/YARN are safe? HDFS, Ceph, S3, NAS, etc. New HDFS New YARN
  • 28. Tachyon • In-memory data-exchange layer • A set of evolving APIs: • filesystem • caching • RDDs • Materialized views
  • 30. Spark is best for cloud
  • 31. It will be called Hadoop MLib Shark GraphX Streaming HDFS Crunch Mahout Pig Sqoop Flume Coordination and workflow management Zookeeper Command Center ASF Projects FLOSS Projects Pivotal Products GemFire with Tachyon Oozie MapReduce Hive Tez Giraph Hadoop UI Hue SolrCloud Phoenix HBase Spark Impala HAWQ SpringXD MADlib Hamster PivotalR YARN
  • 32. Spark/Tachyon recap • Is it “Big Data” (Yes) • Is it “Hadoop” (No) • It’s one of those “in memory” things, right (Yes) • JVM, Java, Scala (All) • Is it Real or just another shiny technology with a long, but ultimately small tail (Yes and ?)
  • 33. A NEW PLATFORM FOR A NEW ERA