Tom White presented on the future of Hadoop at a user group meeting. Key goals for Hadoop include modularity, support for multiple languages, and integration with other systems. The Hadoop project was split into core, HDFS, and MapReduce repositories. Upcoming releases include 0.20.1 and 0.21, with 1.0 to establish versioning rules. Interesting projects include using Avro for RPC, distributed configuration, and improving MapReduce performance.
2. Hadoop Futures
What to watch
Tom White, Cloudera
Hadoop User Group UK, Bristol
10 August 2009
3. About me
▪ Apache Hadoop Committer, PMC
Member, Apache Member
▪ Employed by Cloudera
▪ Author of “Hadoop: The Definitive
Guide”
▪ http://hadoopbook.com
4. Goals
▪ Modular
▪ E.g. pluggable block placement algorithm
▪ Multiple languages
▪ E.g. not just Java for MapReduce
▪ Integration with other systems
▪ E.g. JMX monitoring hooks
5. The Project Split
▪ Core -> Common, HDFS, MapReduce
▪ New repositories
▪ New mailing lists
▪ {common,hdfs,mapreduce}-{user,dev,issues}@hadoop.apache.org
▪ New directory layouts
▪ New configuration
▪ hadoop-site.xml -> {core,hdfs,mapreduce}-site.xml
▪ More information at
▪ http://www.cloudera.com/blog/2009/07/17/the-project-split/
▪ general@hadoop.apache.org
6. Releases
▪ 0.18.3 - 29 Jan 2009
▪ Official “stable” release
▪ Probably the most commonly used
▪ Basis for first Cloudera distribution
▪ 0.19.2 - 23 July 2009
▪ 0.19 series is not widely used
▪ 0.20.0 - 22 April 2009
▪ Expect large adoption with 0.20.1 release in coming weeks
▪ Basis for second Cloudera distribution, first Yahoo! distribution
▪ 0.21 series - feature freeze end of August 2009
7. Hadoop 1.0
▪ After 0.21 release
▪ Need to establish rules about version evolution
▪ Hadoop 1.0 Interface Classification - HADOOP-5073
▪ API, Data, wire protocol compatibility - HADOOP-5071