1. Hadoop Versioning
July 9, 2012
Anty Rao
Big Data Engineering Team
Hanborq Inc.
1
2. Development Convention
• Trunk
– The main codeline, new features are developed on
trunk
• Branch
– Occasionally very large features are developed on
their own branchers with the expectation they’ll later
merge into trunk.
• Release
– Candidate releases are branched from trunk
– Stop accepting new features
– Bugs get fixed and after a vote, a release is declared
for that particular branch.
2
3. Hadoop 0.20 branch
• Two major features were added to branches
off 0.20.2
– Authentication
• Enabling strong security for core hadoop
– Append
• Enabling users to run apache HBase without risk of data
loss
3
5. Confusion about Version
• Release off the 0.20 branches had features
that release off the trunk did not have and
vice versa.
• Apache Hadoop 0.23 is a strict superset of
features over 0.22, but it actually released a
month before 0.22
• The 0.20 branch formerly known as 0.20.205
was renumbered 1.0. This is just a
renumbering, no functional difference.
5
6. Status
• There has been 18 month period where there
has been no one apache release that had all
the committed features of Apache Hadoop!
• Recently released Hadoop 1.0, including
following features
– 0.20 Append
– 0.20 security
6