SlideShare ist ein Scribd-Unternehmen logo
1 von 33
SCALABLE MONITORING USING
PROMETHEUS WITH APACHE SPARK
DIANE FEDDEMA
PRINCIPAL SOFTWARE ENGINEER
ZAK HASSAN
SOFTWARE ENGINEER
YOUR SPEAKERS
DIANE FEDDEMA
PRINCIPAL SOFTWARE ENGINEER - EMERGING TECHNOLOGY, DATA ANALYTICS
â—Ź Currently focused on developing and applying Data Science and Machine Learning techniques for performance
analysis, automating these analyses and displaying data in novel ways.
â—Ź Previously worked as a performance engineer at the National Center for Atmospheric Research, NCAR, working
on optimizations and tuning in parallel global climate models.
ZAK HASSAN
SOFTWARE ENGINEER - EMERGING TECHNOLOGY, DATA ANALYTICS
â—Ź Currently focused on developing analytics platform on OpenShift and leveraging Apache Spark as the analytics
engine. Also, developing data science apps and working on making metrics observable through cloud-native
technology.
â—Ź Previously worked as a Software Consultant in the financial services and insurance industry, building end-to-end
software solutions for clients.
OVERVIEW
OBSERVABILITY
â—Ź Motivation
● What Is Spark?​
â—Ź What Is Prometheus?
â—Ź Our Story
â—Ź Spark Cluster JVM Instrumentation
PERFORMANCE TUNING
â—Ź Tuning Spark jobs
â—Ź Spark Memory Model
â—Ź Prometheus as a performance tool
â—Ź Comparing cached vs non-cached dataframes
â—Ź Demo
MOTIVATION
â—Ź Rapid experimentation with data science apps
â—Ź Identify bottlenecks
â—Ź Improve performance
â—Ź Resolve incidents more quickly
â—Ź Improving memory usage to tune spark jobs
OUR STORY
â—Ź Instrumented spark jvm to expose metrics in a kubernetes pod.
â—Ź Added ability to monitor spark with prometheus
â—Ź Experimented with using Grafana with Prometheus to provide more insight
â—Ź Sharing our experiments and experience with using this to do performance
analysis of spark jobs.
â—Ź Demo at the very end
June 1, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/28
- Added agent to report jolokia metrics endpoint in kubernetes pod
Nov 7, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/35
- Added agent to report prometheus metrics endpoint in kubernetes pod
WHAT IS PROMETHEUS
â—Ź Open source monitoring
â—Ź in 2016 prometheus become the 2nd member of the CNCF
â—Ź scrapes metrics from a endpoint.
â—Ź Client libraries in Go, Java, Python, etc.
â—Ź Kubernetes comes instrumented out of the box with prometheus endpoints.
● If you don’t have native integration with prometheus there are lots of community
exporters that allow lots of things to expose metrics in your infrastructure to get
monitored.
WHAT IS APACHE SPARK
Apache Spark is an in-demand data processing engine with a thriving community and
steadily growing install base
â—Ź Supports interactive data exploration in addition to apps
â—Ź Batch and stream processing
â—Ź Machine learning libraries
â—Ź Distributed
â—Ź Separate storage and compute ( in memory processing)
â—Ź new external scheduler kubernetes
SPARK FEATURES
• Can run standalone, with yarn, mesos or Kubernetes as the cluster manager
• Has language bindings for Java, Scala, Python, and R
• Access data from JDBC, HDFS, S3 or regular filesystem
• Can persist data in different data formats: parquet, avro, json, csv, etc.
SQL MLlib Graph Streaming
SPARK CORE
SPARK APPLICATION
SPARK IN CONTAINERS
SPARK CLUSTER INSTRUMENT
SPARK MASTER
JAVA AGENT
SPARK WORKER
JAVA AGENT
SPARK WORKER
JAVA AGENT
PROMETHEUS
ALERT MANAGER
Notify alertmanager
Scrapes metrics
INSTRUMENT JAVA AGENT
PROMETHEUS TARGETS
PULL METRICS
â—Ź Prometheus lets you configure how often to scrape and which endpoints to
scrap. The prometheus server will pull in the metrics that are configured.
ALERTMANAGER
• PromQL query is used to create rules to notify you if the rule is triggered.
• Currently alertmanager will receive the notification and is able to notify you via
email, slack or other options (see docs for details) .
PROMQL
â—Ź Powerful query language to get metrics on kubernetes cluster along with spark
clusters.
â—Ź What are gauges and counters?
Gauges: Latest value of metric
Counters: Total number of event occurrences. Might be suffix “*total”.
You can use this format to get the last minute prom_metric_total[1m]
PART 2: Tuning Spark jobs
with Prometheus
Things we would like to know when tuning Spark programs:
â—Ź How much memory is the driver using?
â—Ź How much memory are the workers using?
â—Ź How is the JVM begin utilized by spark?
â—Ź Is my spark job saturating the network?
â—Ź What is the cluster view of network, cpu and memory utilization?
We will demonstrate how Prometheus coupled with Grafana on Kubernetes
can help answer these types of questions.
Our Example Application
Focus on Memory:
Efficient Memory use is Key to good performance in Spark jobs.
How:
We will create Prometheus + Grafana dashboards to evaluate memory
usage under different conditions?
Example:
Our Spark Python example will compare memory usage with and without
caching to illustrate how memory usage and timing change for a
PySpark program performing a cartesian product followed by a groupby
operation
A little Background
Memory allocation in Spark
â—Ź Spark is an "in-memory" computing framework
â—Ź Memory is a limited resource!
â—Ź There is competition for memory
â—Ź Caching reusable results can save overall memory usage under
certain conditions
â—Ź Memory runs out in many large jobs forcing spills to disk
Spark Unified Memory Model
LRU eviction and user defined memory configuration options
Block
Block
Total JVM Heap Memory allocated to SPARK JOB
Memory allocated to
EXECUTION
Block
Block
Block
Block
Block
Block
Block
Block
Memory allocated to
STORAGE
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Block
Spill to
disk
Block
Block
Block
Block
Block
Block
Block
Block
?
Block
?
Spark.memory.storageFraction
User specified unevictable amount
Spill to
disk
EXECUTION takes precedence over
STORAGE up to user defined
unevictable amount
Block
?
Block
? Spill to
disk
Using Spark SQL and Spark RDD
API together in a tuning exercise
We want to use Spark SQL to manipulate dataframes
Spark SQL is a component of Spark
â—Ź it provides structured data processing
â—Ź it is implemented as a library on top of Spark
Three main APIs:
â—Ź SQL syntax
â—Ź Dataframes
â—Ź Datasets
Two backend components:
â—Ź Catalyst - query optimizer
â—Ź Tungsten - off-heap memory management eliminates overhead of Java Objects
Performance Optimizations with
Spark SQL
JDBC Console
User Programs
(Java, Python, Scala etc,
SPARK SQL
Catalyst Optimizer Dataframe API
Spark Core
Spark SQL performance benefits:
â—Ź Catalyst compiles Spark SQL programs down to an RDD
â—Ź Tungsten provides more efficient data storage compared
to Java objects on the heap
â—Ź Dataframe API and RDD API can be intermixed
RDDs
Using Prometheus + Grafana for
performance optimization
Specific code example:
Compare non-cached and cached dataframes that are reused in a
groupBy transformation
When is good idea to use cache in a dataframe?
â—Ź when a result of a computation is going to be reused later
â—Ź when it is costly to recompute that result
â—Ź in cases where algorithms make several passes over the data
Determining memory consumption
for dataframes you want to cache
Example: Code for non-cached run
rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
seed = 3
rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
sc = spark.sparkContext
# convert each tuple in the rdd to a row
randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3])))
randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3])))
# create dataframe from rdd
schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1)
schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2)
cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2)
# aggregate
results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C"))
results.show(n=100)
print "----------Count in cross-join--------------- {0}".format(cross_df.count())
Example: Code for cached run
rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
seed = 3
rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed)
sc = spark.sparkContext
# convert each tuple in the rdd to a row
randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3])))
randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3])))
# create dataframe from rdd
schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1)
schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2)
# cache the dataframe
schemaRandomNumberDF1.cache()
schemaRandomNumberDF2.cache()
cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2)
# aggregate
results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C"))
results.show(n=100)
print "----------Count in cross-join--------------- {0}".format(cross_df.count())
Query plan comparison
Non-Cached Cached
Example: Comparing cached vs non-
cached runs
Prometheus dashboard: non-cached Prometheus dashboard: cached
Example: Comparing cached vs non-
cached runs
Prometheus dashboard: non-cached Prometheus dashboard: cached
Comparing non-cached vs
cached runs
RIP = 0.76
% Change = 76
RIP (Relative Index of Performance)
RIP: 0 to 1 = Improvement
0 to -1 = Degradation
% Change: negative values =
Improvement
RIP = 0.63
% Change = 63
RIP = 0.62
% Change = 62
RIP = 0.63
% Change = 63
RIP = 0.10
% Change = 10
RIP = 0.00
% Change = 0
SPARK JOB + PROMETHEUS + GRAFANA
DEMO
Demo Time!
Recap
You learned:
â–  About our story on spark cluster metrics monitoring with prometheus
â–  Spark Features
â–  How prometheus can be integrated with apache spark
â–  Spark Applications and how memory works
â–  Spark Cluster JVM Instrumentation
â–  How do I deploy a spark job and monitor it via grafana dashboard
â–  Performance difference between cache vs non-cached dataframes
â–  Monitoring tips and tricks
Thank You!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsDatabricks
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARNAdarsh Pannu
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0datamantra
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2datamantra
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
LCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLinaro
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Etti Gur
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 

Was ist angesagt? (20)

Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
LCA13: Hadoop DFS Performance
LCA13: Hadoop DFS PerformanceLCA13: Hadoop DFS Performance
LCA13: Hadoop DFS Performance
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 

Ă„hnlich wie SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK

Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache SparkAmir Sedighi
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Jyotasana Bharti
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 

Ă„hnlich wie SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK (20)

Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
An introduction To Apache Spark
An introduction To Apache SparkAn introduction To Apache Spark
An introduction To Apache Spark
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)Apache spark architecture (Big Data and Analytics)
Apache spark architecture (Big Data and Analytics)
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Apache spark
Apache sparkApache spark
Apache spark
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 

KĂĽrzlich hochgeladen

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 

KĂĽrzlich hochgeladen (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
WSO2Con2024 - From Blueprint to Brilliance: WSO2's Guide to API-First Enginee...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK

  • 1. SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK DIANE FEDDEMA PRINCIPAL SOFTWARE ENGINEER ZAK HASSAN SOFTWARE ENGINEER
  • 2. YOUR SPEAKERS DIANE FEDDEMA PRINCIPAL SOFTWARE ENGINEER - EMERGING TECHNOLOGY, DATA ANALYTICS â—Ź Currently focused on developing and applying Data Science and Machine Learning techniques for performance analysis, automating these analyses and displaying data in novel ways. â—Ź Previously worked as a performance engineer at the National Center for Atmospheric Research, NCAR, working on optimizations and tuning in parallel global climate models. ZAK HASSAN SOFTWARE ENGINEER - EMERGING TECHNOLOGY, DATA ANALYTICS â—Ź Currently focused on developing analytics platform on OpenShift and leveraging Apache Spark as the analytics engine. Also, developing data science apps and working on making metrics observable through cloud-native technology. â—Ź Previously worked as a Software Consultant in the financial services and insurance industry, building end-to-end software solutions for clients.
  • 3. OVERVIEW OBSERVABILITY â—Ź Motivation â—Ź What Is Spark?​ â—Ź What Is Prometheus? â—Ź Our Story â—Ź Spark Cluster JVM Instrumentation PERFORMANCE TUNING â—Ź Tuning Spark jobs â—Ź Spark Memory Model â—Ź Prometheus as a performance tool â—Ź Comparing cached vs non-cached dataframes â—Ź Demo
  • 4. MOTIVATION â—Ź Rapid experimentation with data science apps â—Ź Identify bottlenecks â—Ź Improve performance â—Ź Resolve incidents more quickly â—Ź Improving memory usage to tune spark jobs
  • 5. OUR STORY â—Ź Instrumented spark jvm to expose metrics in a kubernetes pod. â—Ź Added ability to monitor spark with prometheus â—Ź Experimented with using Grafana with Prometheus to provide more insight â—Ź Sharing our experiments and experience with using this to do performance analysis of spark jobs. â—Ź Demo at the very end June 1, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/28 - Added agent to report jolokia metrics endpoint in kubernetes pod Nov 7, 2017 - https://github.com/radanalyticsio/openshift-spark/pull/35 - Added agent to report prometheus metrics endpoint in kubernetes pod
  • 6. WHAT IS PROMETHEUS â—Ź Open source monitoring â—Ź in 2016 prometheus become the 2nd member of the CNCF â—Ź scrapes metrics from a endpoint. â—Ź Client libraries in Go, Java, Python, etc. â—Ź Kubernetes comes instrumented out of the box with prometheus endpoints. â—Ź If you don’t have native integration with prometheus there are lots of community exporters that allow lots of things to expose metrics in your infrastructure to get monitored.
  • 7. WHAT IS APACHE SPARK Apache Spark is an in-demand data processing engine with a thriving community and steadily growing install base â—Ź Supports interactive data exploration in addition to apps â—Ź Batch and stream processing â—Ź Machine learning libraries â—Ź Distributed â—Ź Separate storage and compute ( in memory processing) â—Ź new external scheduler kubernetes
  • 8. SPARK FEATURES • Can run standalone, with yarn, mesos or Kubernetes as the cluster manager • Has language bindings for Java, Scala, Python, and R • Access data from JDBC, HDFS, S3 or regular filesystem • Can persist data in different data formats: parquet, avro, json, csv, etc. SQL MLlib Graph Streaming SPARK CORE
  • 11. SPARK CLUSTER INSTRUMENT SPARK MASTER JAVA AGENT SPARK WORKER JAVA AGENT SPARK WORKER JAVA AGENT PROMETHEUS ALERT MANAGER Notify alertmanager Scrapes metrics
  • 14. PULL METRICS â—Ź Prometheus lets you configure how often to scrape and which endpoints to scrap. The prometheus server will pull in the metrics that are configured.
  • 15. ALERTMANAGER • PromQL query is used to create rules to notify you if the rule is triggered. • Currently alertmanager will receive the notification and is able to notify you via email, slack or other options (see docs for details) .
  • 16. PROMQL â—Ź Powerful query language to get metrics on kubernetes cluster along with spark clusters. â—Ź What are gauges and counters? Gauges: Latest value of metric Counters: Total number of event occurrences. Might be suffix “*total”. You can use this format to get the last minute prom_metric_total[1m]
  • 17. PART 2: Tuning Spark jobs with Prometheus Things we would like to know when tuning Spark programs: â—Ź How much memory is the driver using? â—Ź How much memory are the workers using? â—Ź How is the JVM begin utilized by spark? â—Ź Is my spark job saturating the network? â—Ź What is the cluster view of network, cpu and memory utilization? We will demonstrate how Prometheus coupled with Grafana on Kubernetes can help answer these types of questions.
  • 18. Our Example Application Focus on Memory: Efficient Memory use is Key to good performance in Spark jobs. How: We will create Prometheus + Grafana dashboards to evaluate memory usage under different conditions? Example: Our Spark Python example will compare memory usage with and without caching to illustrate how memory usage and timing change for a PySpark program performing a cartesian product followed by a groupby operation
  • 19. A little Background Memory allocation in Spark â—Ź Spark is an "in-memory" computing framework â—Ź Memory is a limited resource! â—Ź There is competition for memory â—Ź Caching reusable results can save overall memory usage under certain conditions â—Ź Memory runs out in many large jobs forcing spills to disk
  • 20. Spark Unified Memory Model LRU eviction and user defined memory configuration options Block Block Total JVM Heap Memory allocated to SPARK JOB Memory allocated to EXECUTION Block Block Block Block Block Block Block Block Memory allocated to STORAGE Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block Spill to disk Block Block Block Block Block Block Block Block ? Block ? Spark.memory.storageFraction User specified unevictable amount Spill to disk EXECUTION takes precedence over STORAGE up to user defined unevictable amount Block ? Block ? Spill to disk
  • 21. Using Spark SQL and Spark RDD API together in a tuning exercise We want to use Spark SQL to manipulate dataframes Spark SQL is a component of Spark â—Ź it provides structured data processing â—Ź it is implemented as a library on top of Spark Three main APIs: â—Ź SQL syntax â—Ź Dataframes â—Ź Datasets Two backend components: â—Ź Catalyst - query optimizer â—Ź Tungsten - off-heap memory management eliminates overhead of Java Objects
  • 22. Performance Optimizations with Spark SQL JDBC Console User Programs (Java, Python, Scala etc, SPARK SQL Catalyst Optimizer Dataframe API Spark Core Spark SQL performance benefits: â—Ź Catalyst compiles Spark SQL programs down to an RDD â—Ź Tungsten provides more efficient data storage compared to Java objects on the heap â—Ź Dataframe API and RDD API can be intermixed RDDs
  • 23. Using Prometheus + Grafana for performance optimization Specific code example: Compare non-cached and cached dataframes that are reused in a groupBy transformation When is good idea to use cache in a dataframe? â—Ź when a result of a computation is going to be reused later â—Ź when it is costly to recompute that result â—Ź in cases where algorithms make several passes over the data
  • 24. Determining memory consumption for dataframes you want to cache
  • 25. Example: Code for non-cached run rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) seed = 3 rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) sc = spark.sparkContext # convert each tuple in the rdd to a row randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3]))) randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3]))) # create dataframe from rdd schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1) schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2) cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2) # aggregate results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C")) results.show(n=100) print "----------Count in cross-join--------------- {0}".format(cross_df.count())
  • 26. Example: Code for cached run rdd1 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) seed = 3 rdd2 = RandomRDDs.normalVectorRDD(spark, nRow, nCol, numPartitions, seed) sc = spark.sparkContext # convert each tuple in the rdd to a row randomNumberRdd1 = rdd1.map(lambda x: Row(A=float(x[0]), B=float(x[1]), C=float(x[2]), D=float(x[3]))) randomNumberRdd2 = rdd2.map(lambda x: Row(E=float(x[0]), F=float(x[1]), G=float(x[2]), H=float(x[3]))) # create dataframe from rdd schemaRandomNumberDF1 = spark.createDataFrame(randomNumberRdd1) schemaRandomNumberDF2 = spark.createDataFrame(randomNumberRdd2) # cache the dataframe schemaRandomNumberDF1.cache() schemaRandomNumberDF2.cache() cross_df = schemaRandomNumberDF1.crossJoin(schemaRandomNumberDF2) # aggregate results = schemaRandomNumberDF1.groupBy("A").agg(func.max("B"),func.sum("C")) results.show(n=100) print "----------Count in cross-join--------------- {0}".format(cross_df.count())
  • 28. Example: Comparing cached vs non- cached runs Prometheus dashboard: non-cached Prometheus dashboard: cached
  • 29. Example: Comparing cached vs non- cached runs Prometheus dashboard: non-cached Prometheus dashboard: cached
  • 30. Comparing non-cached vs cached runs RIP = 0.76 % Change = 76 RIP (Relative Index of Performance) RIP: 0 to 1 = Improvement 0 to -1 = Degradation % Change: negative values = Improvement RIP = 0.63 % Change = 63 RIP = 0.62 % Change = 62 RIP = 0.63 % Change = 63 RIP = 0.10 % Change = 10 RIP = 0.00 % Change = 0
  • 31. SPARK JOB + PROMETHEUS + GRAFANA DEMO Demo Time!
  • 32. Recap You learned: â–  About our story on spark cluster metrics monitoring with prometheus â–  Spark Features â–  How prometheus can be integrated with apache spark â–  Spark Applications and how memory works â–  Spark Cluster JVM Instrumentation â–  How do I deploy a spark job and monitor it via grafana dashboard â–  Performance difference between cache vs non-cached dataframes â–  Monitoring tips and tricks