Getting Started with Spark Scala

Presented By:
Kuldeepak Gupta
Software Consultant
Getting Started with
Spark Scala

Lack of etiquette and manners is a huge turn oﬀ.
KnolX Etiquettes
Punctuality
Respect Knolx session timings, you
are requested not to join sessions
after a 5 minutes threshold post
the session start time.
Feedback
Make sure to submit a constructive
feedback for all sessions as it is
very helpful for the presenter.
Mute
Please keep your window on mute
Avoid Disturbance
Avoid leaving your window
unmuted after asking a question

Agenda
What, When & Why
Introduction to Apache Spark
01
Master-slave architecture
Spark Architecture
02
Situations where spark is helpful.
Use-cases for Spark
03
Components & API in Spark eco-system
Spark Eco-System
04
Spark Scala in Action
Demonstration
05

c
What is Spark
LEARN NOW
● A General Purpose Distributed Data Processing
Engine.
● One of the most popular big data distributed
processing framework.
● A multi-language engine for executing data
engineering, data science, and machine
learning on single-node machine.

c
Why Spark
LEARN NOW
● Supported Language (Java, Python, Scala, R)
● Support multiple languages and integrations
with other popular products.
● Oﬀers much less reading and writing to and
from the disk.

c
When Spark
LEARN NOW
● Implements a full server- and client-side HTTP
stack on top of akka-actor and akka-stream.
● Works with Distributed data (S3, XD, HDFS),
NoSQL databases (HBase, Cassandra,
MongoDB).
● Machine Learning and Fog Computing.

Master Slave Architecture
Well deﬁned layered architecture, components and layers are loosely coupled.
Cluster Manager
Spark Driver
● Control the execution of
Spark Application.
● Maintains all states of
Spark Cluster.
● Interface with Cluster
Manager.
Spark Executor
● Process that perform the
tasks assigned by the
Spark driver.
● Take the tasks assigned
by the driver, run them,
and report back their
state.
● Responsible for
maintaining a cluster of
machines that will run
your Spark Application.
● Have its own “Driver” and
“Worker” abstractions.

Ideal situation to use Spark
Batch and
Streaming
Supports
both batch
and real time
processing.
Big Data in
Cloud
Easy to setup
Spark with
Data lake
technologies
Finance
Industry
Analyse the
text inside the
regulatory
ﬁlling of their
own reports.
E-Commerce
Sector
Giants like
Ebay, Alibaba
uses Spark.

Components in Spark Ecosystem
SparkR
06
Spark Core
01
Spark SQL
02
Spark Streaming
03
Spark MLLib
04
Spark GraphX
05

Getting Started with Spark Scala

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Getting Started with Spark Scala

Ähnlich wie Getting Started with Spark Scala (20)

Mehr von Knoldus Inc.

Mehr von Knoldus Inc. (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Getting Started with Spark Scala