SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
ggplot2.SparkR: Rebooting ggplot2
for Scalable Big Data Visualization
Jonghyun Bae†
, Sangoh Jeong*
, Wenjing Jin†
and
Jae W. Lee†
†
Sungkyunkwan University
*
SK Telecom
Speakers
2
• Sangoh Jeong (sangoh.jeong@sk.com)
– Senior Manager at in Korea
– Interested in
• Jonghyun Bae (jonghbae@skku.edu)
– Graduate student at
– Interested in R, JavaScript and Spark
Logstore
Datastore
Cloud
storage
Source: http://www.slideshare.net/BigDataCloud/big-data-analytics-with-google-cloud-platform
Store
Spark
Hadoop
Extract
& Transform
Query
Analyze
Log data
Structured
data
Unstructured
data
Visualization
Interactive
dashboard & app
Spreadsheets
This Work!
API
3
SQL
Big Data Analytics Pipeline
Why Big Data Visualization?
● Case of a business unit at SK Telecom
– Typical DB size: 70M records with 330 columns
– Analyzes the DB using R on a single-node scale-up server
– Has much bigger DBs that cannot be handled by this server
● The business unit’s visualization needs
– Use of R
– Easy-to-use APIs
– Scalable solution for the bigger DBs
4
R Has Great Visualization Packages
ggplot2
ggvis
base
graphics
igraph
But, these packages cannot
process Spark DataFrames.
5
ggplot2
• (Arguably) the most popular visualization
package for R
– Based on the “layered” grammar of graphics
– Making it easy to produce high-quality graphs
– Limited to single node processing
“Base graphics are good for drawing pictures;
ggplot2 graphics are good for understanding the data”
(Hadley Wickham, Creator of ggplot2, 2012) 6
Top 10 packages in R
7
Source: http://www.computerworld.com/article/2920117/business-intelligence/most-downloaded-r-packages-last-month.html
Top package
for visualization
May, 2015
ggplot2: Example Plots
Source: ggplot2 documentation, http://docs.ggplot2.org/current
8
• An R package extending ggplot2 to visualize big
data represented in Spark DataFrame
ggplot2.SparkR = SparkR+ggplot2!
9
ggplot2.SparkR
ggplot2
R data.frame
Variety of graphs
Simple and easy
API
Spark DataFrame
Scalability
Distributed
processing
ggplot2.SparkR Simplifies Plotting (1)
Example: Draw a histogram using DataFrame
10
ggplot2.SparkR Simplifies Plotting (2)
AFTER
# It just takes one line!
ggplot(df, aes(x = Price)) + geom_histogram()
BEFORE
# Pre-processing Spark DataFrame using SparkR API
range <- select(df, min(df$Price), max(df$Price))
breaks <- fullseq(range, diff(range / binwidth))
left <- breaks[-length(breaks)]; right <- breaks[-1]
breaks_df <- createDataFrame(sqlContext, data.frame(left = left,
right = right))
histogram_df <- join(df, breaks_df, df$Price >= breaks_df$left &
df$Price < breaks_df$right, “inner”)
histogram_df <- count(groupBy(histogram_df, “left”, “right”))
# Draw histogram chart using ggplot2 API
ggplot(collect(histogram_df), aes(xmin = left, xmax = right, ymin
= 0, ymax = count)) + geom_rect()
11
ggplot2.SparkR: Features
12
• Scalable
– Beyond the capacity of single node (cf. ggplot2)
– Performance scales to the number of nodes
• Easy to use
– No changes to ggplot2 API
– No training required for existing ggplot2 users
• Readily deployable
– No modifications required for Spark
– Using SparkR API only
The Rest of This Talk
Overview
How to Use It?
Architecture
Performance
Status & Plan
Summary 13
How to Use It?
14
Using ggplot2.SparkR is
as easy as 1-2-3!
15
Create
2
Draw
3
Install
1
R
1. Install from Github
16
devtools::install_github
(“SKKU-SKT/ggplot2.SparkR”)
17
df <- read.json(sqlContext,
“hdfs://localhost:9000/dataset”)
2. Create DataFrame
ggplot(df, aes(x = Item, fill =
Payment)) + geom_bar() + coord_flip()
18
3. Draw it (using ggplot2 API)!
Note that df is a Spark DataFrame object (not R data.frame).
Create
2
Draw
3
Install
1
R
Demo
19
Demo: Data Set
● Schema: Sales record from a department store chain
- Source: http://content.udacity-data.com/course/hadoop/forum_data.tar.gz
Date Item Payment Place Price Time
2012-01-11 Baby Discover Houston 426.32 09:05
2012-07-23 Books MasterCard Lexington 47.20 17:30
2012-04-21 Consumer Electronics MasterCard St. Louis 234.95 15:01
2012-10-26 Garden Cash Spokane 469.47 12:46
20
Supported Graph Types & Options
21
Name Descriptions
Graph
types
geom_bar
Bars, rectangles with bases
on x-axis.
geom_
histogram
Histogram.
stat_sum Sum unique values.
geom_
boxplot
Box and whiskers plot.
geom_
bin2d
Heatmap of 2d bin counts.
geom_
freqpoly
Frequency polygon.
Name Descriptions
Positions
position
_stack
Stack overlapping objects
on top of one another
position
_fill
Same as above, but the
range is standardized.
position
_dodge
Adjust position by dodging
overlaps to the side
Facets
facet_
null
Facet specification: a
single panel
facet_
grid Lay out panels in a grid
facet_
wrap
Wrap a 1d ribbon of
panels into 2d
Scales
scale_x_
log10 Put x-axis on a log scale
scale_y_
log10 Put y-axis on a log scale
Name Description
Coords
coord_
cartesian
Cartesian
coordinates
coord_flip Flip cartesian
coordinates
Ranges
xlim Set the ranges of
the x axis
ylim Set the ranges of
the y axis
Texts
xlab Change the label
of x-axis
ylab Change the label
of y-axis
ggtitle Change the graph
title
stat_sum geom_boxplot
geom_bin2d geom_freqpoly
22
Supported Graph Types
position_stack position_fill
position_dodge facet_grid facet_wrap
1 2 3
4 5 6
7 8 9
a b c
1
2
3
a1 b1 c1
a2
a3
b2
b3
c2
c3
23
Supported Graph Options
Architecture
24
Three-stage pipeline:
• Parameter extraction
– x, y, colour, facet, scale, geom, etc.
• Data processing
– Get data from the original source
– Process data using graph parameters
• Plotting
– Draw graphs on display windows
R data.frame or
Spark DataFrame
25
Parameter
extraction
Data
processing
(R data.frame)
Plotting
Parameter
extraction
g
Data
processing
(Spark DataFrame)
ggplot2.SparkRggplot2
Input Checking
ggplot2.SparkR: Architecture (1)
ggplot2.SparkR: Architecture (2)
R
data.frame
26
Parameter
extraction
Data
processing
(R data.frame)
Plotting
Parameter
extraction
g
Data
processing
(Spark DataFrame)
ggplot2.SparkRggplot2
Input Checking
ggplot2.SparkR: Architecture (3)
Spark
DataFrame
27
Parameter
extraction
Data
processing
(R data.frame)
Plotting
Parameter
extraction
g
Data
processing
(Spark DataFrame)
ggplot2.SparkRggplot2
Input Checking
Extract x, y, colour,
facet, scale, geom,
etc
Parameters
Layer option
Graph type
ggplot2.SparkR: Architecture (4)
Use 12 internal
stages to process
Spark DataFrame
and finally convert it
to R data.frame
Calculated R
data.frame
28
Parameter
extraction
Data
processing
(R data.frame)
Plotting
Parameter
extraction
g
Data
processing
(Spark DataFrame)
ggplot2.SparkRggplot2
Input Checking
Parameters
Layer option
Graph type
Spark
DataFrame
ggplot2.SparkR: Data Flow (1)
R Spark
Context
Java
Spark
Context
Spark
Executor
Worker
Spark
Executor
Master
Worker
Partitioned
DataFrame
New
DataFrame
Partitioned
DataFrame
New
DataFrame
Functions
29
Base
object
JNI
Source: https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf
Graph type
Parameters
Layer option
ggplot2.
SparkR
Graph
function
Spark
Executor
Spark
Executor
ggplot2.SparkR: Data Flow (2)
R Spark
Context
Java
Spark
Context
Worker
Master
Worker
Combined
DataFrame
ggplot2.
SparkR
30
Partitioned
DataFrame
Partitioned
DataFrame
JNI
R
data.frame
Source: https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf
Plotting
functions
Performance
31
Experimental Setup (1)
● Cluster setup:
Master+
Worker
Worker
8-node Spark Cluster
Node Parameters
CPU Intel® i7-4790 (Haswell) 4GHz 8 cores
Memory 32GB DDR3 1600MHz
OS Ubuntu 14.04 LTS
Hadoop Ver. 1.2.1 (stable)
Spark Ver. 1.5.0
R Ver. 3.2.2
JDK Ver. 1.8.0_60
Spark Worker 8 cores + 30GB / Worker
Network Gigabit Ethernet
32
Worker Worker
WorkerWorker
WorkerWorker
Experimental Setup (2)
33
• Workload: Bar graph (ggplot(df, aes(x=Item))+geom_bar())
Performance: Scalability
34
Performance scales to the number of cluster nodes.
Based on
12 runs
Maximum
Average
Minimum
Throughput (inverse of slope) remains relatively stable.
Performance: Varying Data Size
35
Largest data
for R single node
Based on 12 runs
Status & Plan
36
ggplot2.SparkR Project Page
Project page: http://skku-skt.github.io/ggplot2.SparkR
37
To Report Bugs or Request Features
● Report using our github issue page
https://github.com/SKKU-SKT/ggplot2.SparkR/issues
Or
● Email to the ggplot2.SparkR mailing list
ggplot2-sparkr@googlegroups.com
38
• ggplot2 API Coverage
– Total: 135
– Primary target: 45*
(100%)
– Implemented: 21 (47%)
• Future Plan
– Register the project to spark-packages.org (and CRAN)
– Improve API coverage
– Optimize performance
API Coverage & Future Plan
Total
Primary target (100%)
Implemented (47%)
39*
Suitable for big data visualization
Summary: ggplot2.SparkR
40
• R package extending ggplot2 to take Spark
DataFrame (as well as R data.frame) as input
• Scalable, easy to use, and readily deployable
• Feedback and contributions from Spark
Community will be greatly appreciated.
THANK YOU.
https://github.com/SKKU-SKT/ggplot2.SparkR

Weitere ähnliche Inhalte

Was ist angesagt?

Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Databricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Databricks
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
 

Was ist angesagt? (20)

From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
Analyzing 2TB of Raw Trace Data from a Manufacturing Process: A First Use Cas...
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Apache Spark and Kafka
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
 
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 

Andere mochten auch

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Spark Summit
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Spark Summit
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 

Andere mochten auch (20)

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Devtools cheatsheet
Devtools cheatsheetDevtools cheatsheet
Devtools cheatsheet
 
Ingestão de Dados
Ingestão de DadosIngestão de Dados
Ingestão de Dados
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock DataEngConf: Apache Spark in Financial Modeling at BlackRock
DataEngConf: Apache Spark in Financial Modeling at BlackRock
 
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
SportsDataViz using Plotly, Shiny and Flexdashboard - PlotCon 2016
 
Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)Learn to use dplyr (Feb 2015 Philly R User Meetup)
Learn to use dplyr (Feb 2015 Philly R User Meetup)
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Le SoTL comme voie de développement professionnel
Le SoTL comme voie de développement professionnelLe SoTL comme voie de développement professionnel
Le SoTL comme voie de développement professionnel
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit EU talk by Sudeep Das and Aish FaentonSpark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit EU talk by Sudeep Das and Aish Faenton
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
GraphX and Pregel - Apache Spark
GraphX and Pregel - Apache SparkGraphX and Pregel - Apache Spark
GraphX and Pregel - Apache Spark
 
Lambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie StricklandLambda at Weather Scale by Robbie Strickland
Lambda at Weather Scale by Robbie Strickland
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 

Ähnlich wie ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jonghyun Bae, Sangoh Jeong, Wenjing Jin and Jae Lee

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Chester Chen
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
Databricks
 
GCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignEGCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignE
Ibrahim Hejab
 

Ähnlich wie ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jonghyun Bae, Sangoh Jeong, Wenjing Jin and Jae Lee (20)

Extended Property Graphs and Cypher on Gradoop
Extended Property Graphs and Cypher on GradoopExtended Property Graphs and Cypher on Gradoop
Extended Property Graphs and Cypher on Gradoop
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at BoschLeveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Final_show
Final_showFinal_show
Final_show
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftSF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Spark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data ProcessingSpark: The State of the Art Engine for Big Data Processing
Spark: The State of the Art Engine for Big Data Processing
 
GCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignEGCD-FPGA-Based-DesignE
GCD-FPGA-Based-DesignE
 
Harnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data PayloadsHarnessing Spark Catalyst for Custom Data Payloads
Harnessing Spark Catalyst for Custom Data Payloads
 

Mehr von Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Mehr von Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Kürzlich hochgeladen

➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Kürzlich hochgeladen (20)

hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jonghyun Bae, Sangoh Jeong, Wenjing Jin and Jae Lee

  • 1. ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization Jonghyun Bae† , Sangoh Jeong* , Wenjing Jin† and Jae W. Lee† † Sungkyunkwan University * SK Telecom
  • 2. Speakers 2 • Sangoh Jeong (sangoh.jeong@sk.com) – Senior Manager at in Korea – Interested in • Jonghyun Bae (jonghbae@skku.edu) – Graduate student at – Interested in R, JavaScript and Spark
  • 3. Logstore Datastore Cloud storage Source: http://www.slideshare.net/BigDataCloud/big-data-analytics-with-google-cloud-platform Store Spark Hadoop Extract & Transform Query Analyze Log data Structured data Unstructured data Visualization Interactive dashboard & app Spreadsheets This Work! API 3 SQL Big Data Analytics Pipeline
  • 4. Why Big Data Visualization? ● Case of a business unit at SK Telecom – Typical DB size: 70M records with 330 columns – Analyzes the DB using R on a single-node scale-up server – Has much bigger DBs that cannot be handled by this server ● The business unit’s visualization needs – Use of R – Easy-to-use APIs – Scalable solution for the bigger DBs 4
  • 5. R Has Great Visualization Packages ggplot2 ggvis base graphics igraph But, these packages cannot process Spark DataFrames. 5
  • 6. ggplot2 • (Arguably) the most popular visualization package for R – Based on the “layered” grammar of graphics – Making it easy to produce high-quality graphs – Limited to single node processing “Base graphics are good for drawing pictures; ggplot2 graphics are good for understanding the data” (Hadley Wickham, Creator of ggplot2, 2012) 6
  • 7. Top 10 packages in R 7 Source: http://www.computerworld.com/article/2920117/business-intelligence/most-downloaded-r-packages-last-month.html Top package for visualization May, 2015
  • 8. ggplot2: Example Plots Source: ggplot2 documentation, http://docs.ggplot2.org/current 8
  • 9. • An R package extending ggplot2 to visualize big data represented in Spark DataFrame ggplot2.SparkR = SparkR+ggplot2! 9 ggplot2.SparkR ggplot2 R data.frame Variety of graphs Simple and easy API Spark DataFrame Scalability Distributed processing
  • 10. ggplot2.SparkR Simplifies Plotting (1) Example: Draw a histogram using DataFrame 10
  • 11. ggplot2.SparkR Simplifies Plotting (2) AFTER # It just takes one line! ggplot(df, aes(x = Price)) + geom_histogram() BEFORE # Pre-processing Spark DataFrame using SparkR API range <- select(df, min(df$Price), max(df$Price)) breaks <- fullseq(range, diff(range / binwidth)) left <- breaks[-length(breaks)]; right <- breaks[-1] breaks_df <- createDataFrame(sqlContext, data.frame(left = left, right = right)) histogram_df <- join(df, breaks_df, df$Price >= breaks_df$left & df$Price < breaks_df$right, “inner”) histogram_df <- count(groupBy(histogram_df, “left”, “right”)) # Draw histogram chart using ggplot2 API ggplot(collect(histogram_df), aes(xmin = left, xmax = right, ymin = 0, ymax = count)) + geom_rect() 11
  • 12. ggplot2.SparkR: Features 12 • Scalable – Beyond the capacity of single node (cf. ggplot2) – Performance scales to the number of nodes • Easy to use – No changes to ggplot2 API – No training required for existing ggplot2 users • Readily deployable – No modifications required for Spark – Using SparkR API only
  • 13. The Rest of This Talk Overview How to Use It? Architecture Performance Status & Plan Summary 13
  • 14. How to Use It? 14
  • 15. Using ggplot2.SparkR is as easy as 1-2-3! 15 Create 2 Draw 3 Install 1 R
  • 16. 1. Install from Github 16 devtools::install_github (“SKKU-SKT/ggplot2.SparkR”)
  • 18. ggplot(df, aes(x = Item, fill = Payment)) + geom_bar() + coord_flip() 18 3. Draw it (using ggplot2 API)! Note that df is a Spark DataFrame object (not R data.frame).
  • 20. Demo: Data Set ● Schema: Sales record from a department store chain - Source: http://content.udacity-data.com/course/hadoop/forum_data.tar.gz Date Item Payment Place Price Time 2012-01-11 Baby Discover Houston 426.32 09:05 2012-07-23 Books MasterCard Lexington 47.20 17:30 2012-04-21 Consumer Electronics MasterCard St. Louis 234.95 15:01 2012-10-26 Garden Cash Spokane 469.47 12:46 20
  • 21. Supported Graph Types & Options 21 Name Descriptions Graph types geom_bar Bars, rectangles with bases on x-axis. geom_ histogram Histogram. stat_sum Sum unique values. geom_ boxplot Box and whiskers plot. geom_ bin2d Heatmap of 2d bin counts. geom_ freqpoly Frequency polygon. Name Descriptions Positions position _stack Stack overlapping objects on top of one another position _fill Same as above, but the range is standardized. position _dodge Adjust position by dodging overlaps to the side Facets facet_ null Facet specification: a single panel facet_ grid Lay out panels in a grid facet_ wrap Wrap a 1d ribbon of panels into 2d Scales scale_x_ log10 Put x-axis on a log scale scale_y_ log10 Put y-axis on a log scale Name Description Coords coord_ cartesian Cartesian coordinates coord_flip Flip cartesian coordinates Ranges xlim Set the ranges of the x axis ylim Set the ranges of the y axis Texts xlab Change the label of x-axis ylab Change the label of y-axis ggtitle Change the graph title
  • 23. position_stack position_fill position_dodge facet_grid facet_wrap 1 2 3 4 5 6 7 8 9 a b c 1 2 3 a1 b1 c1 a2 a3 b2 b3 c2 c3 23 Supported Graph Options
  • 25. Three-stage pipeline: • Parameter extraction – x, y, colour, facet, scale, geom, etc. • Data processing – Get data from the original source – Process data using graph parameters • Plotting – Draw graphs on display windows R data.frame or Spark DataFrame 25 Parameter extraction Data processing (R data.frame) Plotting Parameter extraction g Data processing (Spark DataFrame) ggplot2.SparkRggplot2 Input Checking ggplot2.SparkR: Architecture (1)
  • 26. ggplot2.SparkR: Architecture (2) R data.frame 26 Parameter extraction Data processing (R data.frame) Plotting Parameter extraction g Data processing (Spark DataFrame) ggplot2.SparkRggplot2 Input Checking
  • 27. ggplot2.SparkR: Architecture (3) Spark DataFrame 27 Parameter extraction Data processing (R data.frame) Plotting Parameter extraction g Data processing (Spark DataFrame) ggplot2.SparkRggplot2 Input Checking Extract x, y, colour, facet, scale, geom, etc Parameters Layer option Graph type
  • 28. ggplot2.SparkR: Architecture (4) Use 12 internal stages to process Spark DataFrame and finally convert it to R data.frame Calculated R data.frame 28 Parameter extraction Data processing (R data.frame) Plotting Parameter extraction g Data processing (Spark DataFrame) ggplot2.SparkRggplot2 Input Checking Parameters Layer option Graph type Spark DataFrame
  • 29. ggplot2.SparkR: Data Flow (1) R Spark Context Java Spark Context Spark Executor Worker Spark Executor Master Worker Partitioned DataFrame New DataFrame Partitioned DataFrame New DataFrame Functions 29 Base object JNI Source: https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf Graph type Parameters Layer option ggplot2. SparkR Graph function
  • 30. Spark Executor Spark Executor ggplot2.SparkR: Data Flow (2) R Spark Context Java Spark Context Worker Master Worker Combined DataFrame ggplot2. SparkR 30 Partitioned DataFrame Partitioned DataFrame JNI R data.frame Source: https://spark-summit.org/2014/wp-content/uploads/2014/07/SparkR-SparkSummit.pdf Plotting functions
  • 32. Experimental Setup (1) ● Cluster setup: Master+ Worker Worker 8-node Spark Cluster Node Parameters CPU Intel® i7-4790 (Haswell) 4GHz 8 cores Memory 32GB DDR3 1600MHz OS Ubuntu 14.04 LTS Hadoop Ver. 1.2.1 (stable) Spark Ver. 1.5.0 R Ver. 3.2.2 JDK Ver. 1.8.0_60 Spark Worker 8 cores + 30GB / Worker Network Gigabit Ethernet 32 Worker Worker WorkerWorker WorkerWorker
  • 33. Experimental Setup (2) 33 • Workload: Bar graph (ggplot(df, aes(x=Item))+geom_bar())
  • 34. Performance: Scalability 34 Performance scales to the number of cluster nodes. Based on 12 runs Maximum Average Minimum
  • 35. Throughput (inverse of slope) remains relatively stable. Performance: Varying Data Size 35 Largest data for R single node Based on 12 runs
  • 37. ggplot2.SparkR Project Page Project page: http://skku-skt.github.io/ggplot2.SparkR 37
  • 38. To Report Bugs or Request Features ● Report using our github issue page https://github.com/SKKU-SKT/ggplot2.SparkR/issues Or ● Email to the ggplot2.SparkR mailing list ggplot2-sparkr@googlegroups.com 38
  • 39. • ggplot2 API Coverage – Total: 135 – Primary target: 45* (100%) – Implemented: 21 (47%) • Future Plan – Register the project to spark-packages.org (and CRAN) – Improve API coverage – Optimize performance API Coverage & Future Plan Total Primary target (100%) Implemented (47%) 39* Suitable for big data visualization
  • 40. Summary: ggplot2.SparkR 40 • R package extending ggplot2 to take Spark DataFrame (as well as R data.frame) as input • Scalable, easy to use, and readily deployable • Feedback and contributions from Spark Community will be greatly appreciated.