SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Presentation
Outline
• Big Data
• Hadoop
• Components of Hadoop
• Strengths and Weaknesses of Hadoop
• Projection
• References
Big Data
Big data can be defined as large volumes of data, too complex to be dealt with by traditional processing
technologies.
Organizations want to get insightful information from big data to know their customers and get a
competitive advantage.
Data Then Data Now
Apache
Hadoop
• An open-source framework written in java, that
processes big data in a parallel manner and stores it
on distributed files systems that are linked together
in clusters.
Why use Hadoop?
• Data growth in volume and variety at a high
velocity.
• Big organizations wanted to get value from their data
for revenue and profit.
• There was a need for distributed storage machines
where big data could be stored and processed.
Componen
ts
of
Hadoop
• Hadoop Common provides a collection
of utilities and libraries that support
other Hadoop modules.
• It contains the necessary Java archive
(JAR) files and scripts required to start
Hadoop
1. Hadoop
Common
Hadoop is made up of
individual components
that enable it to store
and process data
2. HDFS
• Hadoop Distributed File System
(HDFS)
• Hadoop storage is handled by
HDFS.
• HDFS circulates multiple copies of
data to the nodes, grouped into
racks in a cluster.
• HDFS deploys a master-slave
architecture
• Name Node: Master node that
monitors data nodes and contains
all metadata.
• Data Nodes: Slave nodes that
contains the actual data in form of
blocks. They frequently send their
status updates through heartbeat
signals to the NameNode.
• Secondary Name Node has a copy
of Name Node’s metadata in disk.
3.
MapReduce
• MapReduce is a
component that uses
simple programming
models to process
huge amounts of data
in a parallel and
distributed manner on
large clusters of
commodity hardware.
4. YARN
• Yet Another Resource
Negotiator (YARN).
• YARN is responsible for
allocating system resources to
the various applications
running in a Hadoop cluster
and scheduling tasks to be
executed on different cluster
nodes.
• YARN decentralizes
execution and monitoring of
processing jobs by separating
the various responsibilities
into these components:
1. Resource manager
2. Node Manager
3. Application master
4. Containers
HADOOP
ECOSYSTEM
• Being a framework, Hadoop is
made of several modules that are
supported by a large ecosystem
of technologies.
• All are services to solve big data
problems.
• It includes Apache projects and
various commercial tools and
solutions that supplement or
support the four major
components mentioned in slides
before.
Hadoop Strengths & Weaknesses
Weaknesses
• Hadoop fails when it needs to access many small files.
• Not good for real-time data applications because it used batch
processing.
• Not good when the work cannot be done in parallel or when there
are dependencies in that data.
• Hadoop supports Machine Learning and Artificial Intelligence to a
limited extent
• Not good for intensive calculations with little data.
Strengths
• Hadoop runs at a lower cost since it relies on any disk storage type for
data processing.
• Flexibility: Hadoop can deal with structured or unstructured data
• Fault Tolerant because data is replicated on various nodes.
HDFS VS PVFS
Similarities
• Both HDFS and PVFS divide a file into multiple pieces, called chunks in HDFS and
stripe units in PVFS, that are stored on different data servers.
• HDFS and PVFS have a similar high-level design. They are user-level cluster file
systems that store file data and file metadata on different types of servers, i.e., two
different user-level processes that run on separate nodes and use the lower-layer local
file systems for persistent storage.
Differences
HDFS PVFS
Designed for data intensive computing Designed for high performance computing
Co-locates compute and storage on the same
node
(beneficial to Hadoop/MapReduce model
where
computation is moved closer to the data)
Separate compute and storage nodes (easy
manageability and incremental growth)
Not optimized for small files Uses few optimizations for packing small files
Hadoop
Projection
• Big data is increasing exponentially
hence the urgent need for
technologies that can store, process
and analysis big data in real-time
for reasons such as gaining
competitive advantage, increased
profits and revenue.
• Hadoop is the backbone of big data
and competitors such as Microsoft
Azure, MapR, Databricks Amazon
Web Services etc… that have
developed cutting-edge
technologies on top of it.
Hadoop
Projection
• Major giants such as NASA, Yahoo, Adobe are moving
towards Apache Spark which is also by Apache Software
Foundation.
• Spark is an open-source distributed computing engine for
processing and analyzing huge volumes of data in real-
time.
• Apache Spark is compact, 100x faster in memory and 10x
faster on disk than Hadoop. Its ecosystem contains well
build features that are continually being improved.
• Spark can perform Machine Learning through its own
MLlib, which performs iterative in-memory ML
computations.
• Apache Spark replaces the MapReduce component of
Hadoop but not Hadoop as a whole.
References
• A. G. Wendy, R. M. Mohammad, H. Marleen and F. Frans, "Debating big data: A literature review on
realizing value from big data," Debating big data: A literature review on realizing value from big data,
vol. 26, no. 3, pp. 191-209, 2017.
• A. A. Ifeyinwa and f. n. Henry, "Big Data and Business: Trends, Platforms, Success Factors and
Applications," Big Data and Cognitive Computing, vol. 3, no. 2, 2019.
• K. Khushboo and G. Neeraj, "Analysis of Hadoop MapReduce scheduling in heterogenous environment.,"
Ain Shams Engineering Journal, pp. 1101-1110, 2021.
• H. H. Baydaa and R. Z. Subhi, "Improvised Distributions framework of Hadoop: A review," International
journal of Science and Business, pp. 31-41, 2021.
• R. .. Z. Rizgar, R. M. Z. Subhi, M. .. S. Hanan and M. H. Lailan, "Characteristics and Analysis of Hadoop
Distributed Systems," pp. 1555-1564, 2020.
• A. Otmane and F. Renaud, "Processing of Big Data with Apache Hadoop in the Current Challenging Era
of COVID-19," Big Data and Cognitive Computing, vol. 5, no. 1, 2021.
• W. Meng, W. Chase Q., C. Huiyan, L. Yang, W. Yongqiang and H. Aiqin, "On MapReduce Scheduling in
Hadoop Yarn on Heterogeneous Clusters," in 2018 17th IEEE International Conference On Trust, Security
And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data
Science And Engineering (TrustCom/BigDataSE), 2018.
• Hadoop, "Apache Hadoop YARN," Apache Software Foundation, 21 February 2022. [Online]. Available:
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html.
Thank you!
Advanced Topics (MCS 7106)
APACHE HADOOP
April 12, 2022

Weitere ähnliche Inhalte

Ähnlich wie Apache-Hadoop-Slides.pptx

Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop clusterFurqan Haider
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introductionyalla4u
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introductionsaisreealekhya
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 

Ähnlich wie Apache-Hadoop-Slides.pptx (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
finap ppt conference.pptx
finap ppt conference.pptxfinap ppt conference.pptx
finap ppt conference.pptx
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 

Kürzlich hochgeladen

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Kürzlich hochgeladen (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

Apache-Hadoop-Slides.pptx

  • 1. Presentation Outline • Big Data • Hadoop • Components of Hadoop • Strengths and Weaknesses of Hadoop • Projection • References
  • 2. Big Data Big data can be defined as large volumes of data, too complex to be dealt with by traditional processing technologies. Organizations want to get insightful information from big data to know their customers and get a competitive advantage. Data Then Data Now
  • 3. Apache Hadoop • An open-source framework written in java, that processes big data in a parallel manner and stores it on distributed files systems that are linked together in clusters. Why use Hadoop? • Data growth in volume and variety at a high velocity. • Big organizations wanted to get value from their data for revenue and profit. • There was a need for distributed storage machines where big data could be stored and processed.
  • 4. Componen ts of Hadoop • Hadoop Common provides a collection of utilities and libraries that support other Hadoop modules. • It contains the necessary Java archive (JAR) files and scripts required to start Hadoop 1. Hadoop Common Hadoop is made up of individual components that enable it to store and process data
  • 5. 2. HDFS • Hadoop Distributed File System (HDFS) • Hadoop storage is handled by HDFS. • HDFS circulates multiple copies of data to the nodes, grouped into racks in a cluster. • HDFS deploys a master-slave architecture • Name Node: Master node that monitors data nodes and contains all metadata. • Data Nodes: Slave nodes that contains the actual data in form of blocks. They frequently send their status updates through heartbeat signals to the NameNode. • Secondary Name Node has a copy of Name Node’s metadata in disk.
  • 6. 3. MapReduce • MapReduce is a component that uses simple programming models to process huge amounts of data in a parallel and distributed manner on large clusters of commodity hardware.
  • 7. 4. YARN • Yet Another Resource Negotiator (YARN). • YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes. • YARN decentralizes execution and monitoring of processing jobs by separating the various responsibilities into these components: 1. Resource manager 2. Node Manager 3. Application master 4. Containers
  • 8. HADOOP ECOSYSTEM • Being a framework, Hadoop is made of several modules that are supported by a large ecosystem of technologies. • All are services to solve big data problems. • It includes Apache projects and various commercial tools and solutions that supplement or support the four major components mentioned in slides before.
  • 9. Hadoop Strengths & Weaknesses Weaknesses • Hadoop fails when it needs to access many small files. • Not good for real-time data applications because it used batch processing. • Not good when the work cannot be done in parallel or when there are dependencies in that data. • Hadoop supports Machine Learning and Artificial Intelligence to a limited extent • Not good for intensive calculations with little data. Strengths • Hadoop runs at a lower cost since it relies on any disk storage type for data processing. • Flexibility: Hadoop can deal with structured or unstructured data • Fault Tolerant because data is replicated on various nodes.
  • 10. HDFS VS PVFS Similarities • Both HDFS and PVFS divide a file into multiple pieces, called chunks in HDFS and stripe units in PVFS, that are stored on different data servers. • HDFS and PVFS have a similar high-level design. They are user-level cluster file systems that store file data and file metadata on different types of servers, i.e., two different user-level processes that run on separate nodes and use the lower-layer local file systems for persistent storage. Differences HDFS PVFS Designed for data intensive computing Designed for high performance computing Co-locates compute and storage on the same node (beneficial to Hadoop/MapReduce model where computation is moved closer to the data) Separate compute and storage nodes (easy manageability and incremental growth) Not optimized for small files Uses few optimizations for packing small files
  • 11. Hadoop Projection • Big data is increasing exponentially hence the urgent need for technologies that can store, process and analysis big data in real-time for reasons such as gaining competitive advantage, increased profits and revenue. • Hadoop is the backbone of big data and competitors such as Microsoft Azure, MapR, Databricks Amazon Web Services etc… that have developed cutting-edge technologies on top of it.
  • 12. Hadoop Projection • Major giants such as NASA, Yahoo, Adobe are moving towards Apache Spark which is also by Apache Software Foundation. • Spark is an open-source distributed computing engine for processing and analyzing huge volumes of data in real- time. • Apache Spark is compact, 100x faster in memory and 10x faster on disk than Hadoop. Its ecosystem contains well build features that are continually being improved. • Spark can perform Machine Learning through its own MLlib, which performs iterative in-memory ML computations. • Apache Spark replaces the MapReduce component of Hadoop but not Hadoop as a whole.
  • 13. References • A. G. Wendy, R. M. Mohammad, H. Marleen and F. Frans, "Debating big data: A literature review on realizing value from big data," Debating big data: A literature review on realizing value from big data, vol. 26, no. 3, pp. 191-209, 2017. • A. A. Ifeyinwa and f. n. Henry, "Big Data and Business: Trends, Platforms, Success Factors and Applications," Big Data and Cognitive Computing, vol. 3, no. 2, 2019. • K. Khushboo and G. Neeraj, "Analysis of Hadoop MapReduce scheduling in heterogenous environment.," Ain Shams Engineering Journal, pp. 1101-1110, 2021. • H. H. Baydaa and R. Z. Subhi, "Improvised Distributions framework of Hadoop: A review," International journal of Science and Business, pp. 31-41, 2021. • R. .. Z. Rizgar, R. M. Z. Subhi, M. .. S. Hanan and M. H. Lailan, "Characteristics and Analysis of Hadoop Distributed Systems," pp. 1555-1564, 2020. • A. Otmane and F. Renaud, "Processing of Big Data with Apache Hadoop in the Current Challenging Era of COVID-19," Big Data and Cognitive Computing, vol. 5, no. 1, 2021. • W. Meng, W. Chase Q., C. Huiyan, L. Yang, W. Yongqiang and H. Aiqin, "On MapReduce Scheduling in Hadoop Yarn on Heterogeneous Clusters," in 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE), 2018. • Hadoop, "Apache Hadoop YARN," Apache Software Foundation, 21 February 2022. [Online]. Available: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html.
  • 14. Thank you! Advanced Topics (MCS 7106) APACHE HADOOP April 12, 2022