SlideShare ist ein Scribd-Unternehmen logo
1 von 22
B I G D A T A W O R K G R O U P . I R
WHAT IS SPARK
Apache Spark is an open source big data
processing framework built around speed, ease
of use, and sophisticated analytics. It was
originally developed in 2009 in UC Berkeley’s
AMPLab, and open sourced in 2010 as an
Apache project.
B I G D A T A W O R K G R O U P . I R
WHAT IS SPARK
Advantages: In Memory
 Spark enables applications in Hadoop clusters to run up to 100
times faster in memory and 10 times faster even when running
on disk.
B I G D A T A W O R K G R O U P . I R
WHAT IS SPARK
Advantages: Generic API
 Spark lets you quickly write applications in Java, Scala, or
Python. It comes with a built-in set of over 80 high-level
operators. And you can use it interactively to query data within
the shell.
B I G D A T A W O R K G R O U P . I R
WHAT IS SPARK
Advantages: Many Applications
 Spark gives us a comprehensive, unified framework to manage
big data processing requirements with a variety of data sets
that are diverse in nature (text data, graph data etc) as well as
the source of data (batch v. real-time streaming data).
B I G D A T A W O R K G R O U P . I R
WHAT IS SPARK
Advantages: Many Applications
 In addition to Map and Reduce operations, it supports SQL
queries, streaming data, machine learning and graph data
processing. Developers can use these capabilities stand-alone
or combine them to run in a single data pipeline use case.
B I G D A T A W O R K G R O U P . I R
HADOOP AND SPARK
Hadoop Spark
Map & Reduce -> suitable for on-
pass computations
multi-step data pipelines using
directed acyclic graph (DAG)
pattern.
Clusters are hard to set up and
manage
supports in-memory data sharing
across DAGs.
need to integrate with Mahout
(Machine Learning) and Storm
(Streaming data processing)
Spark as an alternative to Hadoop
MapReduce
B I G D A T A W O R K G R O U P . I R
SPARK FEATURES
Less expensive shuffles in the data processing. With capabilities like in-
memory data storage
Lazy evaluation of big data queries, which helps with optimization of the
steps in data processing workflows.
Higher level API to improve developer productivity and a consistent
architect model for big data solutions.
B I G D A T A W O R K G R O U P . I R
SPARK FEATURES
Spark holds intermediate results in memory rather than writing them to
disk
Spark can be used for processing datasets that larger than the aggregate
memory in a cluster.
B I G D A T A W O R K G R O U P . I R
SPARK ECOSYSTEM
Spark Streaming
 micro batch style of computing and processing.(DStream)
Spark SQL
 JDBC API, SQL like queries, ETL
Spark Mlib
 including classification, regression, clustering, collaborative filtering,
dimensionality reduction, as well as underlying optimization primitives
B I G D A T A W O R K G R O U P . I R
SPARK ECOSYSTEM
Spark GraphX
GraphX extends the Spark RDD by introducing the
Resilient Distributed Property Graph
Set of fundamental operators (e.g., subgraph,
joinVertices, and aggregateMessages)
B I G D A T A W O R K G R O U P . I R
SPARK ECOSYSTEM
BlinkDB
trade-off query accuracy for response time.
Tachyon
Caches working set files in memory
Spark Cassandra Connector
access data stored in a Cassandra database
SparkR
B I G D A T A W O R K G R O U P . I R
B I G D A T A W O R K G R O U P . I R
SPARK ARCHITECTURE
B I G D A T A W O R K G R O U P . I R
RESILIENT DISTRIBUTED DATASETS
Fault tolerance because an RDD know how to recreate and re-compute the
datasets.
RDDs are immutable.
B I G D A T A W O R K G R O U P . I R
RDD OPERATIONS
B I G D A T A W O R K G R O U P . I R
HOW TO RUN SPARK
B I G D A T A W O R K G R O U P . I R
HOW TO INTERACT WITH SPARK
spark-shell.cmd
B I G D A T A W O R K G R O U P . I R
SPARK WEB CONSOLE
http://localhost:4040
B I G D A T A W O R K G R O U P . I R
SHARED VARIABLES
Broadcast Variables
Accumulators
B I G D A T A W O R K G R O U P . I R
SPARK ECOSYSTEM
Spark SQL
 JDBC API, SQL like queries, ETL
B I G D A T A W O R K G R O U P . I R
SPARK ECOSYSTEM
Spark Streaming
 micro batch style of computing and processing.(DStream)
B I G D A T A W O R K G R O U P . I R

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core conceptsMaryan Faryna
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storagehybrid cloud
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 

Was ist angesagt? (20)

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop Presentation
Hadoop PresentationHadoop Presentation
Hadoop Presentation
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Emergent Distributed Data Storage
Emergent Distributed Data StorageEmergent Distributed Data Storage
Emergent Distributed Data Storage
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 

Ähnlich wie Big data processing with apache spark part1

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With SparkEdureka!
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch ProcessingEdureka!
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogsprateek kumar
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkROlgun Aydın
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!Edureka!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!Edureka!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxRahul Borate
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewMario Cartia
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 

Ähnlich wie Big data processing with apache spark part1 (20)

Apache Spark PDF
Apache Spark PDFApache Spark PDF
Apache Spark PDF
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Apache spark
Apache sparkApache spark
Apache spark
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Big Data Processing With Spark
Big Data Processing With SparkBig Data Processing With Spark
Big Data Processing With Spark
 
Spark For Faster Batch Processing
Spark For Faster Batch ProcessingSpark For Faster Batch Processing
Spark For Faster Batch Processing
 
spark interview questions & answers acadgild blogs
 spark interview questions & answers acadgild blogs spark interview questions & answers acadgild blogs
spark interview questions & answers acadgild blogs
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
 
5 things one must know about spark!
5 things one must know about spark!5 things one must know about spark!
5 things one must know about spark!
 
5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!5 Reasons why Spark is in demand!
5 Reasons why Spark is in demand!
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
 
Using pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 previewUsing pySpark with Google Colab & Spark 3.0 preview
Using pySpark with Google Colab & Spark 3.0 preview
 
Apache spark
Apache sparkApache spark
Apache spark
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 

Kürzlich hochgeladen

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 

Kürzlich hochgeladen (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 

Big data processing with apache spark part1

  • 1. B I G D A T A W O R K G R O U P . I R
  • 2. WHAT IS SPARK Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010 as an Apache project. B I G D A T A W O R K G R O U P . I R
  • 3. WHAT IS SPARK Advantages: In Memory  Spark enables applications in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk. B I G D A T A W O R K G R O U P . I R
  • 4. WHAT IS SPARK Advantages: Generic API  Spark lets you quickly write applications in Java, Scala, or Python. It comes with a built-in set of over 80 high-level operators. And you can use it interactively to query data within the shell. B I G D A T A W O R K G R O U P . I R
  • 5. WHAT IS SPARK Advantages: Many Applications  Spark gives us a comprehensive, unified framework to manage big data processing requirements with a variety of data sets that are diverse in nature (text data, graph data etc) as well as the source of data (batch v. real-time streaming data). B I G D A T A W O R K G R O U P . I R
  • 6. WHAT IS SPARK Advantages: Many Applications  In addition to Map and Reduce operations, it supports SQL queries, streaming data, machine learning and graph data processing. Developers can use these capabilities stand-alone or combine them to run in a single data pipeline use case. B I G D A T A W O R K G R O U P . I R
  • 7. HADOOP AND SPARK Hadoop Spark Map & Reduce -> suitable for on- pass computations multi-step data pipelines using directed acyclic graph (DAG) pattern. Clusters are hard to set up and manage supports in-memory data sharing across DAGs. need to integrate with Mahout (Machine Learning) and Storm (Streaming data processing) Spark as an alternative to Hadoop MapReduce B I G D A T A W O R K G R O U P . I R
  • 8. SPARK FEATURES Less expensive shuffles in the data processing. With capabilities like in- memory data storage Lazy evaluation of big data queries, which helps with optimization of the steps in data processing workflows. Higher level API to improve developer productivity and a consistent architect model for big data solutions. B I G D A T A W O R K G R O U P . I R
  • 9. SPARK FEATURES Spark holds intermediate results in memory rather than writing them to disk Spark can be used for processing datasets that larger than the aggregate memory in a cluster. B I G D A T A W O R K G R O U P . I R
  • 10. SPARK ECOSYSTEM Spark Streaming  micro batch style of computing and processing.(DStream) Spark SQL  JDBC API, SQL like queries, ETL Spark Mlib  including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives B I G D A T A W O R K G R O U P . I R
  • 11. SPARK ECOSYSTEM Spark GraphX GraphX extends the Spark RDD by introducing the Resilient Distributed Property Graph Set of fundamental operators (e.g., subgraph, joinVertices, and aggregateMessages) B I G D A T A W O R K G R O U P . I R
  • 12. SPARK ECOSYSTEM BlinkDB trade-off query accuracy for response time. Tachyon Caches working set files in memory Spark Cassandra Connector access data stored in a Cassandra database SparkR B I G D A T A W O R K G R O U P . I R
  • 13. B I G D A T A W O R K G R O U P . I R
  • 14. SPARK ARCHITECTURE B I G D A T A W O R K G R O U P . I R
  • 15. RESILIENT DISTRIBUTED DATASETS Fault tolerance because an RDD know how to recreate and re-compute the datasets. RDDs are immutable. B I G D A T A W O R K G R O U P . I R
  • 16. RDD OPERATIONS B I G D A T A W O R K G R O U P . I R
  • 17. HOW TO RUN SPARK B I G D A T A W O R K G R O U P . I R
  • 18. HOW TO INTERACT WITH SPARK spark-shell.cmd B I G D A T A W O R K G R O U P . I R
  • 19. SPARK WEB CONSOLE http://localhost:4040 B I G D A T A W O R K G R O U P . I R
  • 20. SHARED VARIABLES Broadcast Variables Accumulators B I G D A T A W O R K G R O U P . I R
  • 21. SPARK ECOSYSTEM Spark SQL  JDBC API, SQL like queries, ETL B I G D A T A W O R K G R O U P . I R
  • 22. SPARK ECOSYSTEM Spark Streaming  micro batch style of computing and processing.(DStream) B I G D A T A W O R K G R O U P . I R