Spark architecture

•Download as PPTX, PDF•

1 like•1,703 views

GauravBiswas9

APACHE Spark architecture

Engineering

 SPARK & ITS FEATURE
 SPARK ARCHITECTURE
 RESILIENT DISTRIBUTED DATASETS(RDDs)
 DIRECT ACYCLIC GRAPH(DAG)
 ADVANTAGES & DRAWBACKS
 CONCLUSION
16-04-2019 2

 Apache Spark : an open source cluster computing
framework for real-time data processing
 According to Spark Certified Experts: Sparks
performance is up to 100 times faster in memory and
10 times faster on disk when compared to Hadoop
 The main feature of Apache Spark is its in-memory
cluster computing that increases the processing speed
of an application
16-04-2019 3

 Speed:
Spark runs up to 100 times faster than Hadoop
MapReduce for large-scale data processing
 Powerful Caching:
Simple programming layer provides powerful
caching and disk persistence capabilities.
 Deployment:
It can be deployed through Mesos, Hadoop via
YARN, or Spark’s own cluster manager
16-04-2019 5

 Real-Time:
It offers Real-time computation & low latency
because of in-memory computation
 Polyglot:
Spark provides high-level APIs in Java, Scala,
Python, and R. Spark code can be written in any
of these four languages. It also provides a shell
in Scala and Python
16-04-2019 6

16-04-2019 7
Figure:-Apache spark architecture

 SPARK DRIVE :-
 Separate process to execute user application
 Creates SparkContext to schedual
 Jobs execution & negotiate with cluster
manager
 EXECUTORS :-
 Run tasks scheduled by driver
 Store computation result in memory,on disk
or off-heap
 Interact with storage systems
16-04-2019 9

 CLUSTER MANAGER :-
 Spark context works with the cluster
manager to manage various jobs
 The driver program & Spark context takes
care of the job execution within the cluster
16-04-2019 10

 Apache Spark Architecture is based on two main
abstractions:
 Resilient Distributed Dataset (RDD)
 Directed Acyclic Graph (DAG)
16-04-2019 11

 RDDs can perform two types of operations:
 Transformations: They are the operations
that are applied to create a new RDD.
 Actions: They are applied on an RDD to
instruct Apache Spark to apply computation
and pass the result back to the driver.
16-04-2019 16

 ADVANTAGES:
 Integration with Hadoop
 Faster
 Real time stream processing
 DRAWBACKS:
 No File Management system
 No Support for Real-Time Processing
 Cost Effective
 Manual Optimization
16-04-2019 19

 SPARK makes it easy to write and run complicated data
processing
 It enables computation of tasks at a very large scale
 Although spark has many limitations, it is still trending in
the big data world
 Due to these drawbacks, many technologies are
overtaking Spark
 Such as Flink offers complete real-time processing than
the spark
 In this way somehow other technologies overcoming the
drawbacks of Spark
16-04-2019 20

What's hot

Apache Spark FundamentalsZahra Eskandari

Introduction to Apache SparkAnastasios Skarlatidis

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn

Learn Apache Spark: A Comprehensive GuideWhizlabs

Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov

Apache Spark overviewDataArt

Intro to Apache SparkRobert Sanders

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

Introduction to sparkDuyhai Doan

Introduction to Spark with PythonGokhan Atil

Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi

SparkHeena Madan

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn

Simplifying Big Data Analytics with Apache SparkDatabricks

Apache Spark ArchitectureAlexey Grishchenko

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!

Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Edureka!

SparkAmir Payberah

Spark introduction and architectureSohil Jain

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks

What's hot (20)

Apache Spark Fundamentals

Introduction to Apache Spark

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...

Learn Apache Spark: A Comprehensive Guide

Apache Spark in Depth: Core Concepts, Architecture & Internals

Apache Spark overview

Intro to Apache Spark

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Introduction to spark

Introduction to Spark with Python

Processing Large Data with Apache Spark -- HasGeek

Spark

Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...

Simplifying Big Data Analytics with Apache Spark

Apache Spark Architecture

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...

Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...

Spark

Spark introduction and architecture

Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...

Similar to Spark architecture

Spark SQL | Apache SparkEdureka!

Big Data Processing With SparkEdureka!

5 things one must know about spark!Edureka!

5 Reasons why Spark is in demand!Edureka!

Module01NPN Training

Apache Spark Introduction.pdfMaheshPandit16

5 reasons why spark is in demand!Edureka!

spark_v1_2Frank Schroeter

Spark For Faster Batch ProcessingEdureka!

Apache sparkDona Mary Philip

5 things one must know about spark!Edureka!

Apache spark with java 8Janu Jahnavi

spark interview questions & answers acadgild blogsprateek kumar

Apache Spark PDFNaresh Rupareliya

Using pySpark with Google Colab & Spark 3.0 previewMario Cartia

Apache Spark beyond Hadoop MapReduceEdureka!

Spark introduction & Architecture.pptxMUMERSHARJEELCh

Spark 101Shahaf Azriely {TopLinked} ☁

Spark vs HadoopOlesya Eidam

Similar to Spark architecture (20)

Spark SQL | Apache Spark

Big Data Processing With Spark

5 things one must know about spark!

5 Reasons why Spark is in demand!

Module01

Apache Spark Introduction.pdf

5 reasons why spark is in demand!

spark_v1_2

Spark For Faster Batch Processing

Apache spark

5 things one must know about spark!

Apache spark with java 8

spark interview questions & answers acadgild blogs

Apache Spark PDF

Using pySpark with Google Colab & Spark 3.0 preview

Apache Spark beyond Hadoop MapReduce

Spark introduction & Architecture.pptx

Spark 101

Spark vs Hadoop

Recently uploaded

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

Porous Ceramics seminar and technical writingrakeshbaidya232001

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Recently uploaded (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

UNIT-III FMM. DIMENSIONAL ANALYSIS

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

Porous Ceramics seminar and technical writing

Processing & Properties of Floor and Wall Tiles.pptx

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Spark architecture

1. Gaurav biswas Bit mesra 16-04-2019 1

2.  SPARK & ITS FEATURE  SPARK ARCHITECTURE  RESILIENT DISTRIBUTED DATASETS(RDDs)  DIRECT ACYCLIC GRAPH(DAG)  ADVANTAGES & DRAWBACKS  CONCLUSION 16-04-2019 2

3.  Apache Spark : an open source cluster computing framework for real-time data processing  According to Spark Certified Experts: Sparks performance is up to 100 times faster in memory and 10 times faster on disk when compared to Hadoop  The main feature of Apache Spark is its in-memory cluster computing that increases the processing speed of an application 16-04-2019 3

4. 16-04-2019 4

5.  Speed: Spark runs up to 100 times faster than Hadoop MapReduce for large-scale data processing  Powerful Caching: Simple programming layer provides powerful caching and disk persistence capabilities.  Deployment: It can be deployed through Mesos, Hadoop via YARN, or Spark’s own cluster manager 16-04-2019 5

6.  Real-Time: It offers Real-time computation & low latency because of in-memory computation  Polyglot: Spark provides high-level APIs in Java, Scala, Python, and R. Spark code can be written in any of these four languages. It also provides a shell in Scala and Python 16-04-2019 6

7. 16-04-2019 7 Figure:-Apache spark architecture

8. 16-04-2019 8

9.  SPARK DRIVE :-  Separate process to execute user application  Creates SparkContext to schedual  Jobs execution & negotiate with cluster manager  EXECUTORS :-  Run tasks scheduled by driver  Store computation result in memory,on disk or off-heap  Interact with storage systems 16-04-2019 9

10.  CLUSTER MANAGER :-  Spark context works with the cluster manager to manage various jobs  The driver program & Spark context takes care of the job execution within the cluster 16-04-2019 10

11.  Apache Spark Architecture is based on two main abstractions:  Resilient Distributed Dataset (RDD)  Directed Acyclic Graph (DAG) 16-04-2019 11

16.  RDDs can perform two types of operations:  Transformations: They are the operations that are applied to create a new RDD.  Actions: They are applied on an RDD to instruct Apache Spark to apply computation and pass the result back to the driver. 16-04-2019 16

17. 16-04-2019 17

18. 16-04-2019 18

19.  ADVANTAGES:  Integration with Hadoop  Faster  Real time stream processing  DRAWBACKS:  No File Management system  No Support for Real-Time Processing  Cost Effective  Manual Optimization 16-04-2019 19

20.  SPARK makes it easy to write and run complicated data processing  It enables computation of tasks at a very large scale  Although spark has many limitations, it is still trending in the big data world  Due to these drawbacks, many technologies are overtaking Spark  Such as Flink offers complete real-time processing than the spark  In this way somehow other technologies overcoming the drawbacks of Spark 16-04-2019 20

21. 16-04-2019 21

Spark architecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Spark architecture

Similar to Spark architecture (20)

More from GauravBiswas9

More from GauravBiswas9 (12)

Recently uploaded

Recently uploaded (20)

Spark architecture