Spark Overcomes MapReduce Limitations

•Als PPTX, PDF herunterladen•

4 gefällt mir•1,239 views

This document discusses how Apache Spark overcomes the limitations of Hadoop MapReduce. It explains that Spark is up to 100 times faster than MapReduce by keeping data in-memory between jobs rather than writing to disk. It also supports features beyond batch processing like machine learning, streaming, and graph processing through its libraries. Spark constructs jobs as directed acyclic graphs of operators that can be rearranged and optimized to cut down on reading and writing to disk.

Technologie

www.edureka.co/r-for-analytics
www.edureka.co/apache-spark-scala-training
Apache Spark: Beyond Hadoop MapReduce

Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training
Agenda
At the end of this webinar you will be able to know about:
 Strength of MapReduce
 Things beyond MapReduce
 How MapReduce limitations can be overcome
 How Spark fits the bill
 Other exciting features in Spark

Slide 3Slide 3Slide 3 www.edureka.co/apache-spark-scala-training
Strength of MapReduce

Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training
Simple
Scalability
Fault
Tolerance
Minimal
data
motion
Strength of MapReduce
Independence of language of choice, such as Java, C++ or Python.
process petabytes of data, stored in HDFS on one cl
MapReduce takes care of failures using the replicated copies.
Process moves towards data to minimize disk I/O

Slide 5Slide 5Slide 5 www.edureka.co/apache-spark-scala-training
Limitations Of MapReduce (MR)

Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training
Real
Time
Complex
Algorithm
Re-reading
And parsing
Data
Minimal
Data
Motion
Graph
Processing
Iterative
Tasks
Random
Access
Limitations Of MR

Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training
Feature Comparison with Spark
Fast 100x faster than MapReduce
Batch Processing Batch and Real-time Processing
Stores Data on Disk Stores Data in Memory
Written in Java Written in Scala
Hadoop MapReduce HADOOP Spark
Source: Databrix

Slide 8Slide 8Slide 8 www.edureka.co/apache-spark-scala-training
How MR limitations can be overcome

Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Cutting down on the number of
reads and writes to the disc
Real
time

Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Libraries for Machine learning,
Streaming
Graph
processing
complex
algorithm

Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Cyclic data flows
Random
access

Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training
How Spark Implements Features To Make Its
Architecture Better Than MR

Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training
Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency
computations, whereas MapReduce keeps shuffling things in and out of disk.
Sparks Cuts Down Read/Write I/O To Disk

Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training
Libraries For ML, Graph Programming …
Machine Learning
Library
Graph
programming
Spark interface
For RDBMS lovers
Utility for
continues
ingestion of data

Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training
Cyclic Data Flows
• All jobs in spark comprise a series of operators and run on a set of data.
• All the operators in a job are used to construct a DAG (Directed Acyclic
Graph).
• The DAG is optimized by rearranging and combining operators where
possible.

Slide 16Slide 16Slide 16 www.edureka.co/apache-spark-scala-training
Spark Other Features In Demand

Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training
Spark Features/Modules In Demand
Source: Typesafe

Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training
New Features In 2015
Data Frames 
• Similar API to data frames in R and Pandas
• Automatically optimised via Spark SQL
• Released in Spark 1.3
SparkR 
• Released in Spark 1.4
• Exposes DataFrames, RDD’s & ML library in R
Machine Learning Pipelines 
• High Level API
• Featurization
• Evaluation
• Model Tuning
External Data Sources 
• Platform API to plug Data-Sources into Spark
• Pushes logic into sources
Source: Databrix

Slide 20
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
Survey

Empfohlen

Apache Spark beyond Hadoop MapReduceEdureka!

5 reasons why spark is in demand!Edureka!

Intro to Apache Spark by CTO of TwingoMapR Technologies

Spark for big data analyticsEdureka!

Performance of Spark vs MapReduceEdureka!

Big Data Processing with Spark and Scala Edureka!

Spark SQL | Apache SparkEdureka!

Big Data Processing With SparkEdureka!

Empfohlen

Apache Spark beyond Hadoop MapReduceEdureka!

5 reasons why spark is in demand!Edureka!

Intro to Apache Spark by CTO of TwingoMapR Technologies

Spark for big data analyticsEdureka!

Performance of Spark vs MapReduceEdureka!

Big Data Processing with Spark and Scala Edureka!

Spark SQL | Apache SparkEdureka!

Big Data Processing With SparkEdureka!

Spark StreamingEdureka!

Big data Processing with Apache Spark & ScalaEdureka!

5 Reasons why Spark is in demand!Edureka!

Spark For Faster Batch ProcessingEdureka!

Spark Will Replace Hadoop ! Know Why Edureka!

5 things one must know about spark!Edureka!

Apache sparkDona Mary Philip

Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy

An Introduction to Apache SparkDona Mary Philip

Spark: The State of the Art Engine for Big Data ProcessingRamaninder Singh Jhajj

Sydney Apache Spark Meetup - Spark Natural Language ProcessingAndy Huang

Apache spark linkedinYukti Kaura

Sydney Spark Meetup - September 2015Andy Huang

End-to-End Data Pipelines with Apache SparkBurak Yavuz

An Introduction to Sparkling Water by Michal MalohlavaSpark Summit

Apache Spark OverviewairisData

SPARK ARCHITECTUREGauravBiswas9

Introduction to Apache SparkRahul Jain

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit

Spark from the SurfaceJosi Aranda

What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...Edureka!

5 things one must know about spark!Edureka!

Weitere ähnliche Inhalte

Was ist angesagt?

Spark StreamingEdureka!

Big data Processing with Apache Spark & ScalaEdureka!

5 Reasons why Spark is in demand!Edureka!

Spark For Faster Batch ProcessingEdureka!

Spark Will Replace Hadoop ! Know Why Edureka!

5 things one must know about spark!Edureka!

Apache sparkDona Mary Philip

Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy

An Introduction to Apache SparkDona Mary Philip

Spark: The State of the Art Engine for Big Data ProcessingRamaninder Singh Jhajj

Sydney Apache Spark Meetup - Spark Natural Language ProcessingAndy Huang

Apache spark linkedinYukti Kaura

Sydney Spark Meetup - September 2015Andy Huang

End-to-End Data Pipelines with Apache SparkBurak Yavuz

An Introduction to Sparkling Water by Michal MalohlavaSpark Summit

Apache Spark OverviewairisData

SPARK ARCHITECTUREGauravBiswas9

Introduction to Apache SparkRahul Jain

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...Spark Summit

Spark from the SurfaceJosi Aranda

Was ist angesagt? (20)

Spark Streaming

Big data Processing with Apache Spark & Scala

5 Reasons why Spark is in demand!

Spark For Faster Batch Processing

Spark Will Replace Hadoop ! Know Why

5 things one must know about spark!

Apache spark

Apache spark - Architecture , Overview & libraries

An Introduction to Apache Spark

Spark: The State of the Art Engine for Big Data Processing

Sydney Apache Spark Meetup - Spark Natural Language Processing

Apache spark linkedin

Sydney Spark Meetup - September 2015

End-to-End Data Pipelines with Apache Spark

An Introduction to Sparkling Water by Michal Malohlava

Apache Spark Overview

SPARK ARCHITECTURE

Introduction to Apache Spark

What No One Tells You About Writing a Streaming App: Spark Summit East talk b...

Spark from the Surface

Andere mochten auch

What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...Edureka!

5 things one must know about spark!Edureka!

Understanding Big Data And HadoopEdureka!

Fault Tolerance with KafkaEdureka!

Introduction to Big Data & HadoopEdureka!

Hadoop Architecture and HDFSEdureka!

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!

Track A-1: Cloudera 大數據產品和技術最前沿資訊報告Etu Solution

Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!

Introduction to Apache Sparkdatamantra

Apache Spark 2.0: Faster, Easier, and SmarterDatabricks

Introduction to Apache Spark Developer TrainingCloudera, Inc.

Apache spark basicssparrowAnalytics.com

Apache Spark ArchitectureAlexey Grishchenko

Andere mochten auch (14)

What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...

5 things one must know about spark!

Understanding Big Data And Hadoop

Fault Tolerance with Kafka

Introduction to Big Data & Hadoop

Hadoop Architecture and HDFS

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka

Track A-1: Cloudera 大數據產品和技術最前沿資訊報告

Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...

Introduction to Apache Spark

Apache Spark 2.0: Faster, Easier, and Smarter

Introduction to Apache Spark Developer Training

Apache spark basics

Apache Spark Architecture

Ähnlich wie Spark Overcomes MapReduce Limitations

Module01NPN Training

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!

Apache Spark Introduction.pdfMaheshPandit16

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!

spark interview questions & answers acadgild blogsprateek kumar

Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!

«Почему Spark отнюдь не так хорош»Olga Lavrentieva

Spark architectureGauravBiswas9

Lightening Fast Big Data Analytics using Apache SparkManish Gupta

Apache Spark PDFNaresh Rupareliya

Spark performance tuning - Maksud IbrahimovMaksud Ibrahimov

Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation

Apache spark installation [autosaved]Shweta Patnaik

Spark Summit EU 2015: Lessons from 300+ production usersDatabricks

Cleveland Hadoop Users Group - SparkVince Gonzalez

Apache Spark: The Analytics Operating SystemAdarsh Pannu

Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...Edureka!

20140708hcjHatayama Hideharu

An Overview of Apache SparkYasoda Jayaweera

Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab

Ähnlich wie Spark Overcomes MapReduce Limitations (20)

Module01

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Apache Spark Introduction.pdf

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...

spark interview questions & answers acadgild blogs

Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...

«Почему Spark отнюдь не так хорош»

Spark architecture

Lightening Fast Big Data Analytics using Apache Spark

Apache Spark PDF

Spark performance tuning - Maksud Ibrahimov

Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...

Apache spark installation [autosaved]

Spark Summit EU 2015: Lessons from 300+ production users

Cleveland Hadoop Users Group - Spark

Apache Spark: The Analytics Operating System

Spark Interview Questions and Answers | Apache Spark Interview Questions | Sp...

20140708hcj

An Overview of Apache Spark

Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab

Mehr von Edureka!

What to learn during the 21 days Lockdown | EdurekaEdureka!

Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!

Top 5 Trending Business Intelligence Tools | EdurekaEdureka!

Tableau Tutorial for Data Science | EdurekaEdureka!

Python Programming Tutorial | EdurekaEdureka!

Top 5 PMP Certifications | EdurekaEdureka!

Top Maven Interview Questions in 2020 | EdurekaEdureka!

Linux Mint Tutorial | EdurekaEdureka!

How to Deploy Java Web App in AWS| EdurekaEdureka!

Importance of Digital Marketing | EdurekaEdureka!

RPA in 2020 | EdurekaEdureka!

Email Notifications in Jenkins | EdurekaEdureka!

EA Algorithm in Machine Learning | EdurekaEdureka!

Cognitive AI Tutorial | EdurekaEdureka!

AWS Cloud Practitioner Tutorial | EdurekaEdureka!

Blue Prism Top Interview Questions | EdurekaEdureka!

Big Data on AWS Tutorial | Edureka Edureka!

A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!

Kubernetes Installation on Ubuntu | EdurekaEdureka!

Introduction to DevOps | EdurekaEdureka!

Mehr von Edureka! (20)

What to learn during the 21 days Lockdown | Edureka

Top 10 Dying Programming Languages in 2020 | Edureka

Top 5 Trending Business Intelligence Tools | Edureka

Tableau Tutorial for Data Science | Edureka

Python Programming Tutorial | Edureka

Top 5 PMP Certifications | Edureka

Top Maven Interview Questions in 2020 | Edureka

Linux Mint Tutorial | Edureka

How to Deploy Java Web App in AWS| Edureka

Importance of Digital Marketing | Edureka

RPA in 2020 | Edureka

Email Notifications in Jenkins | Edureka

EA Algorithm in Machine Learning | Edureka

Cognitive AI Tutorial | Edureka

AWS Cloud Practitioner Tutorial | Edureka

Blue Prism Top Interview Questions | Edureka

Big Data on AWS Tutorial | Edureka

A star algorithm | A* Algorithm in Artificial Intelligence | Edureka

Kubernetes Installation on Ubuntu | Edureka

Introduction to DevOps | Edureka

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Google AI Hackathon: LLM based Evaluator for RAGSujit Pal

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Slack Application Development 101 Slidespraypatel2

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions

CNv6 Instructor Chapter 6 Quality of Service

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Unblocking The Main Thread Solving ANRs and Frozen Frames

How to Troubleshoot Apps for the Modern Connected Worker

Google AI Hackathon: LLM based Evaluator for RAG

08448380779 Call Girls In Friends Colony Women Seeking Men

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Breaking the Kubernetes Kill Chain: Host Path Mount

Slack Application Development 101 Slides

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

08448380779 Call Girls In Civil Lines Women Seeking Men

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Finology Group – Insurtech Innovation Award 2024

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Spark Overcomes MapReduce Limitations

1. www.edureka.co/r-for-analytics www.edureka.co/apache-spark-scala-training Apache Spark: Beyond Hadoop MapReduce

2. Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training Agenda At the end of this webinar you will be able to know about:  Strength of MapReduce  Things beyond MapReduce  How MapReduce limitations can be overcome  How Spark fits the bill  Other exciting features in Spark

3. Slide 3Slide 3Slide 3 www.edureka.co/apache-spark-scala-training Strength of MapReduce

4. Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training Simple Scalability Fault Tolerance Minimal data motion Strength of MapReduce Independence of language of choice, such as Java, C++ or Python. process petabytes of data, stored in HDFS on one cl MapReduce takes care of failures using the replicated copies. Process moves towards data to minimize disk I/O

5. Slide 5Slide 5Slide 5 www.edureka.co/apache-spark-scala-training Limitations Of MapReduce (MR)

6. Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training Real Time Complex Algorithm Re-reading And parsing Data Minimal Data Motion Graph Processing Iterative Tasks Random Access Limitations Of MR

7. Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training Feature Comparison with Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory Written in Java Written in Scala Hadoop MapReduce HADOOP Spark Source: Databrix

8. Slide 8Slide 8Slide 8 www.edureka.co/apache-spark-scala-training How MR limitations can be overcome

9. Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cutting down on the number of reads and writes to the disc Real time

10. Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Libraries for Machine learning, Streaming Graph processing complex algorithm

11. Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cyclic data flows Random access

12. Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training How Spark Implements Features To Make Its Architecture Better Than MR

13. Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of disk. Sparks Cuts Down Read/Write I/O To Disk

14. Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training Libraries For ML, Graph Programming … Machine Learning Library Graph programming Spark interface For RDBMS lovers Utility for continues ingestion of data

15. Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training Cyclic Data Flows • All jobs in spark comprise a series of operators and run on a set of data. • All the operators in a job are used to construct a DAG (Directed Acyclic Graph). • The DAG is optimized by rearranging and combining operators where possible.

16. Slide 16Slide 16Slide 16 www.edureka.co/apache-spark-scala-training Spark Other Features In Demand

17. Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training Spark Features/Modules In Demand Source: Typesafe

18. Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training New Features In 2015 Data Frames  • Similar API to data frames in R and Pandas • Automatically optimised via Spark SQL • Released in Spark 1.3 SparkR  • Released in Spark 1.4 • Exposes DataFrames, RDD’s & ML library in R Machine Learning Pipelines  • High Level API • Featurization • Evaluation • Model Tuning External Data Sources  • Platform API to plug Data-Sources into Spark • Pushes logic into sources Source: Databrix

19. Questions Slide 19

20. Slide 20 Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your experience better! Please spare few minutes to take the survey after the webinar. Survey

Hinweis der Redaktion

http://www.information-management.com/gallery/Big-Data-Hadoop-2015-Predictions-Forrester-10026357-1.html https://www.forrester.com/Predictions+2015+Hadoop+Will+Become+A+Cornerstone+Of+Your+Business+Technology+Agenda/fulltext/-/E-RES117705