INTRODUCTION TO HADOOP
Hello!
I am Abhishek
Mukherjee
OVERVIEW
▷What is Hadoop?
▷Short intro to the HDFS architecture
▷What is Map Reduce?
▷The components of Map Reduce
Algorit...
What is Hadoop?
Let’s start with the 1st set of slides
“Hadoop is a free, Java-based
programming framework that
supports the processing of large
data sets in a distributed
compu...
Short introduction to the HDFS
Let’s start with the 2nd set of slides
Each of these platfroms have their
own uses and will be dealt in detail in
our upcoming presentations
HDFS Architecture
▷Follows master slave architecture
▷Here we have master as namenode and slave as
datanode
Map reduce algorithm
Let’s start with the 3rd set of slides
Delving into the algorithm
Use Case: word count
Phases of map reduce
▷Map Phase
▷Combiner Phase(Optional)
▷Sort Phase
▷Shuffle Phase
▷Partition Phase(Optional)
▷Reducer P...
Map Phase
Take this as an input file:
Hello my name is abhishek Hello my name is utsav
Hello my passion is cricket
 This ...
Operation on output of map phase
Hello 1
my 1
name 1
is 1
abhishek 1
Hello 1
my 1
name 1
is 1
utsav 1
Hello 1
my 1
passion...
Explanation of sort and
shuffle phase
▷Sort the key value pairs according to the key
values
▷Shuffle the mapped output to ...
Hello(1,1,1)
my(1,1,1)
name(1,1,1)
is(1,1,1)
abhishek(1)
utsav(1)
passion(1)
cricket(1)
Reducer input
abhishek(1)
cricket(...
Thanks!
Any questions?
You can find me at:
scobbyabhi9@gmail.com
Nächste SlideShare
Wird geladen in …5
×

Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee

178 Aufrufe

Veröffentlicht am

Introduction to hadoop and map reduce 'a headstart to algorithm'.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Introduction to Hadoop : A bird eye's view | Abhishek Mukherjee

  1. 1. INTRODUCTION TO HADOOP
  2. 2. Hello! I am Abhishek Mukherjee
  3. 3. OVERVIEW ▷What is Hadoop? ▷Short intro to the HDFS architecture ▷What is Map Reduce? ▷The components of Map Reduce Algorithm
  4. 4. What is Hadoop? Let’s start with the 1st set of slides
  5. 5. “Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
  6. 6. Short introduction to the HDFS Let’s start with the 2nd set of slides
  7. 7. Each of these platfroms have their own uses and will be dealt in detail in our upcoming presentations
  8. 8. HDFS Architecture ▷Follows master slave architecture ▷Here we have master as namenode and slave as datanode
  9. 9. Map reduce algorithm Let’s start with the 3rd set of slides
  10. 10. Delving into the algorithm Use Case: word count
  11. 11. Phases of map reduce ▷Map Phase ▷Combiner Phase(Optional) ▷Sort Phase ▷Shuffle Phase ▷Partition Phase(Optional) ▷Reducer Phase
  12. 12. Map Phase Take this as an input file: Hello my name is abhishek Hello my name is utsav Hello my passion is cricket  This file has 2 lines  Each line in the file has a byte offset of its own which serves as a key to the mapper and the value of the mapper is the data which is present In the line
  13. 13. Operation on output of map phase Hello 1 my 1 name 1 is 1 abhishek 1 Hello 1 my 1 name 1 is 1 utsav 1 Hello 1 my 1 passion 1 is 1 cricket 1 Input to reducer abhishek(1) cricket(1) Hello(1,1,1) is(1,1,1) my(1,1,1) name(1,1) passion(1) utsav(1) Output of mapper
  14. 14. Explanation of sort and shuffle phase ▷Sort the key value pairs according to the key values ▷Shuffle the mapped output to get values with same key to create a tuple of values with same key ▷This output is fed to the reducer which in turn maps the values of the tuple by returning a single value for a list of values present in the tuple
  15. 15. Hello(1,1,1) my(1,1,1) name(1,1,1) is(1,1,1) abhishek(1) utsav(1) passion(1) cricket(1) Reducer input abhishek(1) cricket(1) Hello(3) is(3) my(3) name(3) passion(1) utsav(1) Reducer output Reducer phase
  16. 16. Thanks! Any questions? You can find me at: scobbyabhi9@gmail.com

×