SlideShare a Scribd company logo
1 of 28
INTRODUCTION TO
HADOOP
Learning Objectives Learning Outcomes
Introduction to Hadoop
1. To study the features of Hadoop.
2. To learn the basic concepts of HDFS and
MapReduce Programming.
3. To study HDFS Architecture.
4. To study MapReduce Programming Model
5. To study Hadoop Ecosystem.
a) To comprehend the reasons behind the
popularity of Hadoop.
b) To be able to perform HDFS
operations.
c) To comprehend MapReduce framework.
d) To understand the read and write in
HDFS.
e) To be able to understand Hadoop
Ecosystem.
Agenda
 Hadoop - An Introduction
 RDBMS versus Hadoop
 Distributed Computing
Challenges
 History of Hadoop
 Hadoop Overview
 Key Aspects of Hadoop
 Hadoop Components
 High Level Architecture of
Hadoop
 Use case for Hadoop
 ClickStream Data
 Hadoop Distributors
 HDFS
 HDFS Daemons
 Anatomy of File Read
 Anatomy of File Write
 Replica Placement Strategy
 Working with HDFS commands
 Special Features of HDFS
Agenda
 Processing Data
with Hadoop
 What is MapReduce
Programming?
 How does MapReduce
Works?
 MapReduce Word Count
Example
 Managing Resources and
Application with Hadoop YARN
 Limitations of Hadoop 1.0
Architecture
 Hadoop 2 YARN: Taking Hadoop
Beyond Batch
 Hadoop Ecosystem
 Pig
 Hive
 Sqoop
 HBase
Hadoop – An Introduction
 Hadoop is an open-source distributed
computing framework that is used for
storing and processing large volumes of
data.
 It is designed to run on a cluster of
commodity hardware, and its main
components include a distributed file
system (Hadoop Distributed File System or
HDFS) and a parallel processing
framework (MapReduce).
 Its capability to handle massive
amounts of data, different categories of
What is Hadoop ?
Hadoop is an open-source, Java-based framework from
Apache which is used for storing, processing and analyzing
data which are very huge in volume.
Hadoop is used for batch/ offline processing.
It is a collection of software utilities which uses a network of
many computers to solve problems involving large amounts
of data and computation.
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
History of Hadoop- Hadoop was created by Doug Cutting and Mike Cafarella in
2005, inspired by Google's MapReduce and Google File System (GFS) technologies.
Is there any full form of HADOOP?
 NO
 Doug used the name for his open source project because it
was relatively easy to spell and pronounce, meaningless, and
not used elsewhere.
RDBMS versus HADOOP
Distributed Computing Challenges
Hadoop Components
HBase is a key value store (mostly), Hive is a system to execute SQL-like queries on a Hadoop system,
Pig is a special query language to access big data. Apache Sqoop is a tool that is extensively used to
transfer large amounts of data from Hadoop to the relational database servers and vice-versa.
Hadoop Components
Hadoop Components
 Hadoop Core Components:
 HDFS:
(a) Storage component.
(b) Distributes data across several nodes.
(c) Natively redundant.
 MapReduce:
(a) Computational framework.
(b) Splits a task across multiple nodes.
(c) Processes data in parallel.
Hadoop
HDFS
MapReduce
Hadoop High Level Architecture
Hadoop High Level Architecture
Hadoop High Level Architecture
 Every Hadoop cluster consists of a single master and multiple
worker nodes.
 The Master node has a Job Tracker, Task Tracker, Name Node
and Data Node while
 the Slave (worker node) can act as both a DataNode and
TaskTracker.
 Also it is possible to have data-only and compute only worker
nodes.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS) : It includes the files that
will be broken into blocks and will be stored in nodes over a
distributed architecture. Using a distributed file system provides very
high aggregate bandwidth across clusters
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator) : Used for job
scheduling and managing the computing resources in clusters.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator)
 Hadoop MapReduce : It is an algorithm which distributes the task
into small pieces and assigns those pieces to many computers
joined over the network, and assembles all the events to form the
last event dataset.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator)
 Hadoop MapReduce
 Hadoop Common : Includes Java Libraries that are used to start
Hadoop and utilities which are needed by other Hadoop modules.
ClickStream Data Analysis
 ClickStream data (mouse clicks) helps you to
understand the purchasing behavior of customers.
ClickStream analysis helps online marketers to
optimize their product web pages, promotional
content, etc. to improve their business.
Hadoop Distributors

More Related Content

Similar to Lecture 2 Hadoop.pptx

Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainersriram0233
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabsSiva Sankar
 
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptxTazeenSayed3
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training Keylabs
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 

Similar to Lecture 2 Hadoop.pptx (20)

Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainer
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
HDFS
HDFSHDFS
HDFS
 

More from Anonymous9etQKwW

os distributed system theoretical foundation
os distributed system theoretical foundationos distributed system theoretical foundation
os distributed system theoretical foundationAnonymous9etQKwW
 
osi model computer networks complete detail
osi model computer networks complete detailosi model computer networks complete detail
osi model computer networks complete detailAnonymous9etQKwW
 
IntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxIntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxAnonymous9etQKwW
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxAnonymous9etQKwW
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptAnonymous9etQKwW
 

More from Anonymous9etQKwW (11)

os distributed system theoretical foundation
os distributed system theoretical foundationos distributed system theoretical foundation
os distributed system theoretical foundation
 
osi model computer networks complete detail
osi model computer networks complete detailosi model computer networks complete detail
osi model computer networks complete detail
 
CODch3Slides.ppt
CODch3Slides.pptCODch3Slides.ppt
CODch3Slides.ppt
 
IntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxIntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptx
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
 
mapreduceApril24.ppt
mapreduceApril24.pptmapreduceApril24.ppt
mapreduceApril24.ppt
 
ch7.ppt
ch7.pptch7.ppt
ch7.ppt
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
 

Recently uploaded

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 

Recently uploaded (20)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

Lecture 2 Hadoop.pptx

  • 2. Learning Objectives Learning Outcomes Introduction to Hadoop 1. To study the features of Hadoop. 2. To learn the basic concepts of HDFS and MapReduce Programming. 3. To study HDFS Architecture. 4. To study MapReduce Programming Model 5. To study Hadoop Ecosystem. a) To comprehend the reasons behind the popularity of Hadoop. b) To be able to perform HDFS operations. c) To comprehend MapReduce framework. d) To understand the read and write in HDFS. e) To be able to understand Hadoop Ecosystem.
  • 3. Agenda  Hadoop - An Introduction  RDBMS versus Hadoop  Distributed Computing Challenges  History of Hadoop  Hadoop Overview  Key Aspects of Hadoop  Hadoop Components  High Level Architecture of Hadoop  Use case for Hadoop  ClickStream Data  Hadoop Distributors  HDFS  HDFS Daemons  Anatomy of File Read  Anatomy of File Write  Replica Placement Strategy  Working with HDFS commands  Special Features of HDFS
  • 4. Agenda  Processing Data with Hadoop  What is MapReduce Programming?  How does MapReduce Works?  MapReduce Word Count Example  Managing Resources and Application with Hadoop YARN  Limitations of Hadoop 1.0 Architecture  Hadoop 2 YARN: Taking Hadoop Beyond Batch  Hadoop Ecosystem  Pig  Hive  Sqoop  HBase
  • 5. Hadoop – An Introduction  Hadoop is an open-source distributed computing framework that is used for storing and processing large volumes of data.  It is designed to run on a cluster of commodity hardware, and its main components include a distributed file system (Hadoop Distributed File System or HDFS) and a parallel processing framework (MapReduce).  Its capability to handle massive amounts of data, different categories of
  • 6. What is Hadoop ? Hadoop is an open-source, Java-based framework from Apache which is used for storing, processing and analyzing data which are very huge in volume. Hadoop is used for batch/ offline processing. It is a collection of software utilities which uses a network of many computers to solve problems involving large amounts of data and computation.
  • 7. Hadoop Overview  Key Aspects of Hadoop
  • 8. Hadoop Overview  Key Aspects of Hadoop
  • 9. Hadoop Overview  Key Aspects of Hadoop
  • 10. Hadoop Overview  Key Aspects of Hadoop
  • 11. Hadoop Overview  Key Aspects of Hadoop
  • 12. History of Hadoop- Hadoop was created by Doug Cutting and Mike Cafarella in 2005, inspired by Google's MapReduce and Google File System (GFS) technologies.
  • 13. Is there any full form of HADOOP?  NO  Doug used the name for his open source project because it was relatively easy to spell and pronounce, meaningless, and not used elsewhere.
  • 16. Hadoop Components HBase is a key value store (mostly), Hive is a system to execute SQL-like queries on a Hadoop system, Pig is a special query language to access big data. Apache Sqoop is a tool that is extensively used to transfer large amounts of data from Hadoop to the relational database servers and vice-versa.
  • 17.
  • 19. Hadoop Components  Hadoop Core Components:  HDFS: (a) Storage component. (b) Distributes data across several nodes. (c) Natively redundant.  MapReduce: (a) Computational framework. (b) Splits a task across multiple nodes. (c) Processes data in parallel. Hadoop HDFS MapReduce
  • 20. Hadoop High Level Architecture
  • 21. Hadoop High Level Architecture
  • 22. Hadoop High Level Architecture  Every Hadoop cluster consists of a single master and multiple worker nodes.  The Master node has a Job Tracker, Task Tracker, Name Node and Data Node while  the Slave (worker node) can act as both a DataNode and TaskTracker.  Also it is possible to have data-only and compute only worker nodes.
  • 23. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS) : It includes the files that will be broken into blocks and will be stored in nodes over a distributed architecture. Using a distributed file system provides very high aggregate bandwidth across clusters
  • 24. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator) : Used for job scheduling and managing the computing resources in clusters.
  • 25. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator)  Hadoop MapReduce : It is an algorithm which distributes the task into small pieces and assigns those pieces to many computers joined over the network, and assembles all the events to form the last event dataset.
  • 26. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator)  Hadoop MapReduce  Hadoop Common : Includes Java Libraries that are used to start Hadoop and utilities which are needed by other Hadoop modules.
  • 27. ClickStream Data Analysis  ClickStream data (mouse clicks) helps you to understand the purchasing behavior of customers. ClickStream analysis helps online marketers to optimize their product web pages, promotional content, etc. to improve their business.