SlideShare ist ein Scribd-Unternehmen logo
vs
vs
Submitted by:
Aastha Joshi
Aishwarya Singh
Joel Keith Pais
Laxmi
P Vignesh
RDBMS
Hadoop
Apache Spark
Comparison between them
Contents
RDBMS
Stands for ‘Relational Database Management
System’
It is a database that stores data in a structured
format using rows and columns.
One can execute queries on the data like
adding, updating, and searching for values.
It also provides a visual representation of the
data.
It is "relational" because the values within each
table are related to each other.
The relational structure makes it possible to run
queries across multiple tables at once.
Structured Query Language is the standard
programming language used to access the
database.
ADVANTAGES
Addresses the need for integrating,
managing and analysing data from
multiple sources across on-
premises and cloud environments
Ease to locate and access specific
values within the database
High flexibility due to storage,
retrieval and publishing of JSON data
within a relational database
EXAMPLES
It is a matter of the past when data were limited.
Now, the world has already experienced the power
of Big Data, and the same is used to analyze to
frame different business strategies and others.
Apache Hadoop is one of the kinds of open-
source platforms that we can use to store and
process relatively large datasets amounting from
gigabytes to petabytes. This open-source allows
multiple computers to make clusters and analyze
the large datasets in parallel and effectively.
Four main
components
of
the Hadoop
ecosystem:
HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
A primary data storage system that runs on commodity
hardware and manages enormous data collections. It
also has a high data throughput and a high fault
tolerance.
YET ANOTHER RESOURCE NEGOTIATOR (YARN)
YARN is a cluster resource manager that schedules
tasks and assigns resources (such as CPU and memory)
to applications.
1
2
3
HADOOP MAPREDUCE
Breaks down the big data processing tasks into smaller
ones, distributes them across different nodes, and then
runs each one.
4
HADOOP COMMON (HADOOP CORE):
A collection of common libraries and utilities on which
the other three modules rely.
Importance
of
Hadoop
Ability to quickly store and handle large amounts of any type of data
That's an important concern as data volumes and varieties continue to grow, notably from social
media and the Internet of Things (IoT).
Computer processing power.
Hadoop's distributed computing model efficiently processes large amounts of data. The more
computing nodes you use, the more processing power you have.
Fault tolerance
Data and application processing are protected against hardware failure. If a node fails, jobs are
automatically transferred to other nodes, ensuring that the distributed computing does not fail.
Multiple copies of all data are stored automatically.
Flexibility
Unlike traditional relational databases, we don’t have to preprocess data before storing it. We can
store as much data as we want and decide how to use it later. It includes unstructured data like text,
pictures, and videos.
Low cost
The open-source framework is free and stores large amounts of data by using commodity
hardware.
Scalability
By simply adding nodes, we can easily expand our system to handle more data. A little administrative
is required.
Challenges in using Hadoop


1 MAPREDUCE PROGRAMMING ISN'T SUITED FOR EVERY PROBLEM
It performs well for simple information requests and problems that can be broken down into independent units, but it is
inefficient for iterative and interactive analytic operations. MapReduce is file-intensive. Iterative algorithms require
multiple map-shuffle/sort-reduce phases to complete because the nodes mainly communicate through sorts and
shuffles. This results in so many files being created between MapReduce phases, which is inefficient for advanced
analytics computing.
2
It can be difficult to find entry-level programmers who have adequate Java expertise to be productive with MapReduce.
That's one reason distribution providers are racing to put relational (SQL) technology on top of Hadoop. Programmers
with SQL skills are easy to find than MapReduce skills. And, Hadoop administration seems a mix of art and science,
requiring a basic understanding of operating systems, hardware, and Hadoop kernel settings.
THERE’S A WIDELY ACKNOWLEDGED TALENT GAP
3
Another concern is the fragmented data protection challenges, which are being handled by new tools and technology.
The Kerberos authentication protocol is a significant step toward securing Hadoop environments.
DATA SECURITY
4
Hadoop lacks user-friendly, full-featured tools for data management, data cleansing, governance, and metadata services.
FULL-FLEDGED DATA GOVERNANCE AND MANAGEMENT
Apache Spark began in 2009 as a research project at UC Berkeley's AMPLab focused on data-
intensive application areas.
Apache Spark is an open-source, distributed processing system used for big data workloads.
For rapid analytic queries against any quantity of data, it uses in-memory caching and efficient
query execution.
It allows code reuse across different workloads—batch processing, interactive queries, real-
time analytics, machine learning, and graph processing—and provides development APIs in
Java, Scala, Python, and R.
Spark's objective was to build a new framework that was optimised for quick iterative
processing, such as machine learning and interactive data analysis, while preserving Hadoop
MapReduce's scalability and fault tolerance.
The primary importance of Apache Spark in the Big data industry is because of its in-memory
data processing that makes it a high-speed data processing engine compared to MapReduce.
Apache Spark delivers a better-integrated framework which supports all ranges of Big data
formats like batch data, text data, real-time streaming data, graphical data, etc.
Apache Spark
Core Components
Spark SQL and Data Frames: Spark SQL
allows users to run SQL and HQL queries in
order to process structured and semi-
structured data.
Spark Streaming: Spark streaming facilitates
the processing of live stream data i.e. log files.
It also contains APIs to manipulate data
streams.
MLib Machine Learning: MLib is the Spark
library with machine learning functionality. It
contains various machine learning algorithms
such as regressions, clustering, collaborative
filtering, classification, etc.
GraphX: The library that supports graph
computation is known as GraphX. It
enables users to perform graph
manipulation. It also provides graph
computation algorithms.
Apache Spark Core API: It provides a
platform to execute Spark applications.
Apache Spark framework consists of the main five components that are responsible
for the functioning of the Spark.
Advantages
Speed: For large-scale data processing, Spark is 100 times quicker than Hadoop. Apache Spark
utilizes an in-memory (RAM) processing architecture.
Ease of Use: Apache Spark provides simple APIs for working with big datasets. It has over 80
high-level operators that make creating parallel programs a breeze.
Advanced Analytics: Spark does more than only support 'MAP' and 'reduce'. Machine learning
(ML), graph algorithms, streaming data, SQL queries, and other features are also supported.
Apache Spark is faster than most data warehouses.
Dynamic: Apache Spark allows simple creation of parallel apps. Over 80 high-level operators
are available through Spark.
Multilingual: Python, Java, Scala, and more programming languages are supported by Apache
Spark.
Powerful: Because of its low-latency in-memory data processing capacity, Apache Spark can
handle a wide range of analytics problems. It has well-developed libraries for graph analytics
and machine learning techniques.
Open-source: The best thing about Apache Spark is, it has a massive Open-source community
behind it.
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
Nicola Ferraro
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
Edureka!
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
inside-BigData.com
 
Adf presentation
Adf presentationAdf presentation
Adf presentation
Kaunas Java User Group
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
Aakashdata
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Robert Sanders
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Samy Dindane
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
StreamNative
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
sudhakara st
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdf
MaheshPandit16
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
Mostafa
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Joud Khattab
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 

Was ist angesagt? (20)

A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Adf presentation
Adf presentationAdf presentation
Adf presentation
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdf
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 

Ähnlich wie RDBMS vs Hadoop vs Spark

finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
SukhpreetSingh519414
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
Graisy Biswal
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
Omar Jaber
 
Hadoop
HadoopHadoop
SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Big data with java
Big data with javaBig data with java
Big data with java
Stefan Angelov
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
SandeepTaksande
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
rajeshseo5
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
Rahul Sharma
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
Muthu Natarajan
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Dataconomy Media
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
BibhasDeb1
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
Anthony Thomas
 

Ähnlich wie RDBMS vs Hadoop vs Spark (20)

finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. SparkBig Data: RDBMS vs. Hadoop vs. Spark
Big Data: RDBMS vs. Hadoop vs. Spark
 
In15orlesss hadoop
In15orlesss hadoopIn15orlesss hadoop
In15orlesss hadoop
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
SparkPaper
SparkPaperSparkPaper
SparkPaper
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 
Big data with java
Big data with javaBig data with java
Big data with java
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Comparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs ApacheComparison - RDBMS vs Hadoop vs Apache
Comparison - RDBMS vs Hadoop vs Apache
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
Exploiting Apache Spark's Potential Changing Enormous Information Investigati...
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
BigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptxBigData & Hadoop Ecosystem.pptx
BigData & Hadoop Ecosystem.pptx
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 

Kürzlich hochgeladen

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Kürzlich hochgeladen (20)

一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

RDBMS vs Hadoop vs Spark

  • 1. vs vs Submitted by: Aastha Joshi Aishwarya Singh Joel Keith Pais Laxmi P Vignesh
  • 3. RDBMS Stands for ‘Relational Database Management System’ It is a database that stores data in a structured format using rows and columns. One can execute queries on the data like adding, updating, and searching for values. It also provides a visual representation of the data. It is "relational" because the values within each table are related to each other. The relational structure makes it possible to run queries across multiple tables at once. Structured Query Language is the standard programming language used to access the database. ADVANTAGES Addresses the need for integrating, managing and analysing data from multiple sources across on- premises and cloud environments Ease to locate and access specific values within the database High flexibility due to storage, retrieval and publishing of JSON data within a relational database EXAMPLES
  • 4. It is a matter of the past when data were limited. Now, the world has already experienced the power of Big Data, and the same is used to analyze to frame different business strategies and others. Apache Hadoop is one of the kinds of open- source platforms that we can use to store and process relatively large datasets amounting from gigabytes to petabytes. This open-source allows multiple computers to make clusters and analyze the large datasets in parallel and effectively.
  • 5. Four main components of the Hadoop ecosystem: HADOOP DISTRIBUTED FILE SYSTEM (HDFS) A primary data storage system that runs on commodity hardware and manages enormous data collections. It also has a high data throughput and a high fault tolerance. YET ANOTHER RESOURCE NEGOTIATOR (YARN) YARN is a cluster resource manager that schedules tasks and assigns resources (such as CPU and memory) to applications. 1 2 3 HADOOP MAPREDUCE Breaks down the big data processing tasks into smaller ones, distributes them across different nodes, and then runs each one. 4 HADOOP COMMON (HADOOP CORE): A collection of common libraries and utilities on which the other three modules rely.
  • 6. Importance of Hadoop Ability to quickly store and handle large amounts of any type of data That's an important concern as data volumes and varieties continue to grow, notably from social media and the Internet of Things (IoT). Computer processing power. Hadoop's distributed computing model efficiently processes large amounts of data. The more computing nodes you use, the more processing power you have. Fault tolerance Data and application processing are protected against hardware failure. If a node fails, jobs are automatically transferred to other nodes, ensuring that the distributed computing does not fail. Multiple copies of all data are stored automatically. Flexibility Unlike traditional relational databases, we don’t have to preprocess data before storing it. We can store as much data as we want and decide how to use it later. It includes unstructured data like text, pictures, and videos. Low cost The open-source framework is free and stores large amounts of data by using commodity hardware. Scalability By simply adding nodes, we can easily expand our system to handle more data. A little administrative is required.
  • 7. Challenges in using Hadoop 1 MAPREDUCE PROGRAMMING ISN'T SUITED FOR EVERY PROBLEM It performs well for simple information requests and problems that can be broken down into independent units, but it is inefficient for iterative and interactive analytic operations. MapReduce is file-intensive. Iterative algorithms require multiple map-shuffle/sort-reduce phases to complete because the nodes mainly communicate through sorts and shuffles. This results in so many files being created between MapReduce phases, which is inefficient for advanced analytics computing. 2 It can be difficult to find entry-level programmers who have adequate Java expertise to be productive with MapReduce. That's one reason distribution providers are racing to put relational (SQL) technology on top of Hadoop. Programmers with SQL skills are easy to find than MapReduce skills. And, Hadoop administration seems a mix of art and science, requiring a basic understanding of operating systems, hardware, and Hadoop kernel settings. THERE’S A WIDELY ACKNOWLEDGED TALENT GAP 3 Another concern is the fragmented data protection challenges, which are being handled by new tools and technology. The Kerberos authentication protocol is a significant step toward securing Hadoop environments. DATA SECURITY 4 Hadoop lacks user-friendly, full-featured tools for data management, data cleansing, governance, and metadata services. FULL-FLEDGED DATA GOVERNANCE AND MANAGEMENT
  • 8. Apache Spark began in 2009 as a research project at UC Berkeley's AMPLab focused on data- intensive application areas. Apache Spark is an open-source, distributed processing system used for big data workloads. For rapid analytic queries against any quantity of data, it uses in-memory caching and efficient query execution. It allows code reuse across different workloads—batch processing, interactive queries, real- time analytics, machine learning, and graph processing—and provides development APIs in Java, Scala, Python, and R. Spark's objective was to build a new framework that was optimised for quick iterative processing, such as machine learning and interactive data analysis, while preserving Hadoop MapReduce's scalability and fault tolerance. The primary importance of Apache Spark in the Big data industry is because of its in-memory data processing that makes it a high-speed data processing engine compared to MapReduce. Apache Spark delivers a better-integrated framework which supports all ranges of Big data formats like batch data, text data, real-time streaming data, graphical data, etc. Apache Spark
  • 9. Core Components Spark SQL and Data Frames: Spark SQL allows users to run SQL and HQL queries in order to process structured and semi- structured data. Spark Streaming: Spark streaming facilitates the processing of live stream data i.e. log files. It also contains APIs to manipulate data streams. MLib Machine Learning: MLib is the Spark library with machine learning functionality. It contains various machine learning algorithms such as regressions, clustering, collaborative filtering, classification, etc. GraphX: The library that supports graph computation is known as GraphX. It enables users to perform graph manipulation. It also provides graph computation algorithms. Apache Spark Core API: It provides a platform to execute Spark applications. Apache Spark framework consists of the main five components that are responsible for the functioning of the Spark.
  • 10. Advantages Speed: For large-scale data processing, Spark is 100 times quicker than Hadoop. Apache Spark utilizes an in-memory (RAM) processing architecture. Ease of Use: Apache Spark provides simple APIs for working with big datasets. It has over 80 high-level operators that make creating parallel programs a breeze. Advanced Analytics: Spark does more than only support 'MAP' and 'reduce'. Machine learning (ML), graph algorithms, streaming data, SQL queries, and other features are also supported. Apache Spark is faster than most data warehouses. Dynamic: Apache Spark allows simple creation of parallel apps. Over 80 high-level operators are available through Spark. Multilingual: Python, Java, Scala, and more programming languages are supported by Apache Spark. Powerful: Because of its low-latency in-memory data processing capacity, Apache Spark can handle a wide range of analytics problems. It has well-developed libraries for graph analytics and machine learning techniques. Open-source: The best thing about Apache Spark is, it has a massive Open-source community behind it.
  • 11.