SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Spark
Next generation cloud
computing engine
Wisely Chen
Agenda
• What is Spark?
• Next big thing
• How to use Spark?
• Demo
• Q&A
Who am I?
• Wisely Chen ( thegiive@gmail.com ) 	

• Sr. Engineer inYahoo![Taiwan] data team 	

• Loves to promote open source tech 	

• Hadoop Summit 2013 San Jose	

• Jenkins Conf 2013 Palo Alto	

• Coscup 2006, 2012, 2013 , OSDC 2007,Webconf 2013,
Coscup 2012, PHPConf 2012 , RubyConf 2012
Taiwan Data Team
Data!
Highway
BI!
Report
Serving!
API
Data!
Mart
ETL /
Forecast
Machine!
Learning
Machine Learning
Distribute Computing
Big Data
Recommendation
Forecast
HADOOP
Faster ML
Distribute Computing
Bigger Big Data
Opinion from Cloudera
• The leading candidate for “successor to
MapReduce” today is Apache Spark
• No vendor — no new project — is likely to catch
up. Chasing Spark would be a waste of time,
and would delay availability of real-time analytic
and processing services for no good reason. !
• From http://0rz.tw/y3OfM
What is Spark
• From UC Berkeley AMP Lab	

• Most activity Big data open
source project since Hadoop
Where is Spark?
HDFS
YARN
MapReduce
Hadoop 2.0
Storm HBase Others
HDFS
YARN
MapReduce
Hadoop Architecture
Hive
Storage
Resource Management
Computing Engine
SQL
HDFS
YARN
MapReduce
Hadoop vs Spark
Spark
Hive Shark
Spark vs Hadoop
• Spark run on Yarn, Mesos or Standalone mode
• Spark’s main concept is based on MapReduce
• Spark can read from
• HDFS: data locality
• HBase
• Cassandra
More than MapReduce
HDFS
Spark Core : MapReduce
Shark: Hive GraphX: Pregel MLib: Mahout
Streaming:
Storm
Resource Management System(Yarn, Mesos)
Why Spark?
天下武功,無堅不破,惟快不破
3X~25X than MapReduce framework
!
From Matei’s paper: http://0rz.tw/VVqgP
Logistic
regression
RunningTime(S)
0
20
40
60
80
MR Spark
3
76
KMeans
0
27.5
55
82.5
110
MR Spark
33
106
PageRank
0
45
90
135
180
MR Spark
23
171
What is Spark
• Apache Spark™ is a very fast and general
engine for large-scale data processing
Why is Spark so fast?
HDFS
• 100X lower than memory
• Store data into Network+Disk
• Network speed is 100X than memory
• Implement fault tolerance
MapReduce Pagerank
!
• …..readInputFromHDFS…
• for (int runs = 0; runs < iter_runnumber ; runs++) {
• …………..
• isCompleted = runRankCalculation(inPath,lastResultPath);
• …………
• }
• …..writeOutputToHDFS….
Workflow
Input
HDFS
Iter 1
RunRank
Tmp
HDFS
Iter 2
RunRank
Tmp
HDFS
Iter N
RunRank
Input
HDFS
Iter 1
RunRank
Tmp
Mem
Iter 2
RunRank
Tmp
Mem
Iter N
RunRank
MapReduce
Spark
First iteration!
take 200 sec
3rd iteration!
take 20 sec
Page Rank algorithm in 1 billion record url
2nd iteration!
take 20 sec
RDD
• Resilient Distributed Dataset
• Collections of objects spread across a cluster,
stored in RAM or on Disk
• Built through parallel transformations
Fault Tolerance
天下武功,無堅不破,惟快不破
RDD
RDD a RDD b
val a =sc.textFile(“hdfs://....”)
val b = a.filer( line=>line.contain(“Spark”) )
Value c
val c = b.count()
Transformation Action
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
Worker!
!
!
!
Worker!
!
!
!Task
TaskTask
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!Block1
RDD a
Worker!
!
!
!
!Block2
RDD a
Worker!
!
!
!
!Block3
RDD a
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Block1 Block2
Block3
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Block1 Block2
Block3
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Worker!
!
!
!
!
RDD err
Cache1 Cache2
Cache3
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD m
Worker!
!
!
!
!
RDD m
Worker!
!
!
!
!
RDD m
Cache1 Cache2
Cache3
Log mining
val a = sc.textfile(“hdfs://aaa.com/a.txt”)!
val err = a.filter( t=> t.contains(“ERROR”) )!
.filter( t=>t.contains(“2014”)!
!
err.cache()!
err.count()!
!
val m = err.filter( t=> t.contains(“MYSQL”) )!
! ! .count()!
val a = err.filter( t=> t.contains(“APACHE”) )!
! ! .count()
Driver
Worker!
!
!
!
!
RDD a
Worker!
!
!
!
!
RDD a
Worker!
!
!
!
!
RDD a
Cache1 Cache2
Cache3
1st
iteration(no cache)!
take same time
with cache!
take 7 sec
RDD Cache
RDD Cache
• Data locality
• Cache
A big shuffle!
take 20min
After cache, take
only 265ms
self join 5 billion record data
Easy to use
• Interactive Shell
• Multi Language API
• JVM: Scala, JAVA
• PySpark: Python
Scala Word Count
• val file = spark.textFile("hdfs://...")
• val counts = file.flatMap(line => line.split(" "))
• .map(word => (word, 1))
• .reduceByKey(_ + _)
• counts.saveAsTextFile("hdfs://...")
Step by Step
• file.flatMap(line => line.split(" “)) => (aaa,bb,cc)
• .map(word => (word, 1)) => ((aaa,1),(bb,1)..)
• .reduceByKey(_ + _) => ((aaa,123),(bb,23)…)
Java Wordcount
• JavaRDD<String> file = spark.textFile("hdfs://...");
• JavaRDD<String> words = file.flatMap(new FlatMapFunction<String, String>()
• public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); }
• });
• JavaPairRDD<String, Integer> pairs = words.map(new PairFunction<String, String, Integer>()
• public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); }
• });
• JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer>()
• public Integer call(Integer a, Integer b) { return a + b; }
• });
• counts.saveAsTextFile("hdfs://...");
Java vs Scala
• Scala : file.flatMap(line => line.split(" "))
• Java version :
• JavaRDD<String> words = file.flatMap(new
FlatMapFunction<String, String>()
• public Iterable<String> call(String s) {
• return Arrays.asList(s.split(" ")); }
• });
Python
• file = spark.textFile("hdfs://...")
• counts = file.flatMap(lambda line: line.split(" ")) 
• .map(lambda word: (word, 1)) 
• .reduceByKey(lambda a, b: a + b)
• counts.saveAsTextFile("hdfs://...")
Highly Recommend
• Scala : Latest API feature, Stable
• Python
• very familiar language
• Native Lib: NumPy, SciPy
FYI
• Combiner : ReduceByKey(_+_)
!
• Typical WordCount :
• groupByKey().mapValues{ arr =>
• var r = 0 ; arr.foreach{i=> r+=i} ; r
• }
WordCount
ReduceByKey !
reduce a lot in map side
hadoop style shuffle!
send a lot data to network
DEMO
• FB 打卡 Yahoo! 徵人 息,獲
得 Yahoo! 沐浴小鴨
• FB打卡說 ”Yahoo!	
  APP超讚!!”
並附上超級商城或新聞APP截
圖,即可憑打卡記錄,獲得小
鴨護腕 或購物袋一只
Just memory?
• From Matei’s paper: http://0rz.tw/VVqgP	

• HBM: stores data in an in-memory HDFS instance. 	

• SP : Spark 	

• HBM’1, SP’1 : first run	

• Storage: HDFS with 256 MB blocks 	

• Node information 	

• m1.xlarge EC2 nodes 	

• 4 cores 	

• 15 GB of RAM
100GB data on 100 node cluster
Logistic regression
RunningTime(S)
0
35
70
105
140
HBM'1 HBM SP'1 SP
3
46
62
139
KMeans
RunningTime(S)
0
50
100
150
200
HBM'1 HBM SP'1 SP
33
8287
182
There is more
• General DAG scheduler
• Control partition shuffle
• Fast driven RPC to launch task
!
• For more info, check http://0rz.tw/jwYwI
Osd ctw spark
Osd ctw spark

Weitere ähnliche Inhalte

Was ist angesagt?

Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015dhiguero
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayDatabricks
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsMiklos Christine
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLDatabricks
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarDatabricks
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Databricks
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaDatabricks
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Chris Fregly
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 

Was ist angesagt? (20)

Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex DadgarHomologous Apache Spark Clusters Using Nomad with Alex Dadgar
Homologous Apache Spark Clusters Using Nomad with Alex Dadgar
 
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the streamSpark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit EU talk by Miklos Christine paddling up the stream
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Spark tutorial
Spark tutorialSpark tutorial
Spark tutorial
 

Ähnlich wie Osd ctw spark

OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEPaco Nathan
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentationpunesparkmeetup
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Data Con LA
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
10 Things About Spark
10 Things About Spark 10 Things About Spark
10 Things About Spark Roger Brinkley
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...Holden Karau
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosEuangelos Linardos
 

Ähnlich wie Osd ctw spark (20)

OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Brief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICMEBrief Intro to Apache Spark @ Stanford ICME
Brief Intro to Apache Spark @ Stanford ICME
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Sumedh Wale's presentation
Sumedh Wale's presentationSumedh Wale's presentation
Sumedh Wale's presentation
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
10 Things About Spark
10 Things About Spark 10 Things About Spark
10 Things About Spark
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos LinardosApache Spark Workshop, Apr. 2016, Euangelos Linardos
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
 

Kürzlich hochgeladen

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 

Kürzlich hochgeladen (20)

Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 

Osd ctw spark

  • 2. Agenda • What is Spark? • Next big thing • How to use Spark? • Demo • Q&A
  • 3. Who am I? • Wisely Chen ( thegiive@gmail.com ) • Sr. Engineer inYahoo![Taiwan] data team • Loves to promote open source tech • Hadoop Summit 2013 San Jose • Jenkins Conf 2013 Palo Alto • Coscup 2006, 2012, 2013 , OSDC 2007,Webconf 2013, Coscup 2012, PHPConf 2012 , RubyConf 2012
  • 9. Opinion from Cloudera • The leading candidate for “successor to MapReduce” today is Apache Spark • No vendor — no new project — is likely to catch up. Chasing Spark would be a waste of time, and would delay availability of real-time analytic and processing services for no good reason. ! • From http://0rz.tw/y3OfM
  • 10. What is Spark • From UC Berkeley AMP Lab • Most activity Big data open source project since Hadoop
  • 15. Spark vs Hadoop • Spark run on Yarn, Mesos or Standalone mode • Spark’s main concept is based on MapReduce • Spark can read from • HDFS: data locality • HBase • Cassandra
  • 16. More than MapReduce HDFS Spark Core : MapReduce Shark: Hive GraphX: Pregel MLib: Mahout Streaming: Storm Resource Management System(Yarn, Mesos)
  • 19. 3X~25X than MapReduce framework ! From Matei’s paper: http://0rz.tw/VVqgP Logistic regression RunningTime(S) 0 20 40 60 80 MR Spark 3 76 KMeans 0 27.5 55 82.5 110 MR Spark 33 106 PageRank 0 45 90 135 180 MR Spark 23 171
  • 20. What is Spark • Apache Spark™ is a very fast and general engine for large-scale data processing
  • 21. Why is Spark so fast?
  • 22. HDFS • 100X lower than memory • Store data into Network+Disk • Network speed is 100X than memory • Implement fault tolerance
  • 23. MapReduce Pagerank ! • …..readInputFromHDFS… • for (int runs = 0; runs < iter_runnumber ; runs++) { • ………….. • isCompleted = runRankCalculation(inPath,lastResultPath); • ………… • } • …..writeOutputToHDFS….
  • 24. Workflow Input HDFS Iter 1 RunRank Tmp HDFS Iter 2 RunRank Tmp HDFS Iter N RunRank Input HDFS Iter 1 RunRank Tmp Mem Iter 2 RunRank Tmp Mem Iter N RunRank MapReduce Spark
  • 25. First iteration! take 200 sec 3rd iteration! take 20 sec Page Rank algorithm in 1 billion record url 2nd iteration! take 20 sec
  • 26. RDD • Resilient Distributed Dataset • Collections of objects spread across a cluster, stored in RAM or on Disk • Built through parallel transformations
  • 28. RDD RDD a RDD b val a =sc.textFile(“hdfs://....”) val b = a.filer( line=>line.contain(“Spark”) ) Value c val c = b.count() Transformation Action
  • 29. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! Worker! ! ! ! Worker! ! ! !Task TaskTask
  • 30. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! !Block1 RDD a Worker! ! ! ! !Block2 RDD a Worker! ! ! ! !Block3 RDD a
  • 31. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Block1 Block2 Block3
  • 32. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Block1 Block2 Block3
  • 33. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Worker! ! ! ! ! RDD err Cache1 Cache2 Cache3
  • 34. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! ! RDD m Worker! ! ! ! ! RDD m Worker! ! ! ! ! RDD m Cache1 Cache2 Cache3
  • 35. Log mining val a = sc.textfile(“hdfs://aaa.com/a.txt”)! val err = a.filter( t=> t.contains(“ERROR”) )! .filter( t=>t.contains(“2014”)! ! err.cache()! err.count()! ! val m = err.filter( t=> t.contains(“MYSQL”) )! ! ! .count()! val a = err.filter( t=> t.contains(“APACHE”) )! ! ! .count() Driver Worker! ! ! ! ! RDD a Worker! ! ! ! ! RDD a Worker! ! ! ! ! RDD a Cache1 Cache2 Cache3
  • 36. 1st iteration(no cache)! take same time with cache! take 7 sec RDD Cache
  • 37. RDD Cache • Data locality • Cache A big shuffle! take 20min After cache, take only 265ms self join 5 billion record data
  • 38. Easy to use • Interactive Shell • Multi Language API • JVM: Scala, JAVA • PySpark: Python
  • 39. Scala Word Count • val file = spark.textFile("hdfs://...") • val counts = file.flatMap(line => line.split(" ")) • .map(word => (word, 1)) • .reduceByKey(_ + _) • counts.saveAsTextFile("hdfs://...")
  • 40. Step by Step • file.flatMap(line => line.split(" “)) => (aaa,bb,cc) • .map(word => (word, 1)) => ((aaa,1),(bb,1)..) • .reduceByKey(_ + _) => ((aaa,123),(bb,23)…)
  • 41. Java Wordcount • JavaRDD<String> file = spark.textFile("hdfs://..."); • JavaRDD<String> words = file.flatMap(new FlatMapFunction<String, String>() • public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); } • }); • JavaPairRDD<String, Integer> pairs = words.map(new PairFunction<String, String, Integer>() • public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } • }); • JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer>() • public Integer call(Integer a, Integer b) { return a + b; } • }); • counts.saveAsTextFile("hdfs://...");
  • 42. Java vs Scala • Scala : file.flatMap(line => line.split(" ")) • Java version : • JavaRDD<String> words = file.flatMap(new FlatMapFunction<String, String>() • public Iterable<String> call(String s) { • return Arrays.asList(s.split(" ")); } • });
  • 43. Python • file = spark.textFile("hdfs://...") • counts = file.flatMap(lambda line: line.split(" ")) • .map(lambda word: (word, 1)) • .reduceByKey(lambda a, b: a + b) • counts.saveAsTextFile("hdfs://...")
  • 44. Highly Recommend • Scala : Latest API feature, Stable • Python • very familiar language • Native Lib: NumPy, SciPy
  • 45. FYI • Combiner : ReduceByKey(_+_) ! • Typical WordCount : • groupByKey().mapValues{ arr => • var r = 0 ; arr.foreach{i=> r+=i} ; r • }
  • 46. WordCount ReduceByKey ! reduce a lot in map side hadoop style shuffle! send a lot data to network
  • 47. DEMO
  • 48. • FB 打卡 Yahoo! 徵人 息,獲 得 Yahoo! 沐浴小鴨 • FB打卡說 ”Yahoo!  APP超讚!!” 並附上超級商城或新聞APP截 圖,即可憑打卡記錄,獲得小 鴨護腕 或購物袋一只
  • 49. Just memory? • From Matei’s paper: http://0rz.tw/VVqgP • HBM: stores data in an in-memory HDFS instance. • SP : Spark • HBM’1, SP’1 : first run • Storage: HDFS with 256 MB blocks • Node information • m1.xlarge EC2 nodes • 4 cores • 15 GB of RAM
  • 50. 100GB data on 100 node cluster Logistic regression RunningTime(S) 0 35 70 105 140 HBM'1 HBM SP'1 SP 3 46 62 139 KMeans RunningTime(S) 0 50 100 150 200 HBM'1 HBM SP'1 SP 33 8287 182
  • 51. There is more • General DAG scheduler • Control partition shuffle • Fast driven RPC to launch task ! • For more info, check http://0rz.tw/jwYwI