1. Difference Between Hadoop 2 vs Hadoop 3
Features Hadoop 2.x Hadoop 3.x
License Apache 2.0, Open Source Apache 2.0, Open Source
Minimum
supported
version of java
Minimum supported version of java is java 7. Minimum supported version of java is java 8
Fault tolerance
Fault tolerance can be handled by replication
(which is wastage of space)
Fault tolerance can be handled by erasure
coding
Data Balancing For data balancing uses HDFS balancer.
For data balancing uses intra datanode
balancer, which is invoked via the hdfs disk
balancer CLI.
Storage
Scheme
Uses 3X replication scheme Support for erasure encoding in hdfs.
Storage
overhead
HDFS has 200% overhead in storage space Storage overhead is only 50%
Storage
overhead
example
If there is 6 block so there will be 18 blocks
occupied the space because of replication
scheme.
If there is 6 block so there will be 9 block
occupied the space 6 block and 3 for parity.
YARN timeline
service
Uses an old timeline service which has
scalability issues.
Improve the timeline service v2 and improves
the scalability and reliability of timeline service.
Default ports
range
In Hadoop 2.0 some default ports are Linux
ephemeral port range. So at the time of startup
they will be fail to bind.
But in hadoop 3.0 these ports have been moved
out of the ephemeral range.
Tools Uses Hive, pig, Giraph and other hadoop tools
Hive, pig, Tez, Hama, Giraph and other hadoop
tools are available.
Compatible file
system
HDFS (Default FS), FTP File system: This
stores all its data on remotely accessible FTP
servers. Amazon S3 (Simple Storage Service)
file system Windows Azure Storage Blobs
(WASB) file system.
It supports all the previous one as well as
Microsoft Azure Data Lake filesystem.
Datanode
Resources
Datanode resource is not dedicated for
the mapreduce we can use it for other
application.
Here also datanode resources can be used for
other Applications too
MR API
compatibity
MR API compatible with hadoop 1.x program to
execute on hadoop 2.X
Here also MR API is compatible with running
hadoop 1.x programs to execute on hadoop 3.X
support for
Microsoft
windows
It can be deployed on windows it also supports for Microsoft windows
Slots / container
Hadoop 1 works on concept of slots but
hadoop 2.X works on the concept of the
container. Through in the container we can run
generic task.
It also works on the concept of container.
Single point of
failure
Has Features to overcome SPOF so whenever
Namenode fails it recovers automatically
Has Feature to overcome SPOF so whenever
Namenode fail it recovers automatically no
needs manual intervention to overcome it