期待已久的HADOOP2.2.0稳定版本终于在2013年10月15号发布了,由于hadoop1.0时代出现了诸多的问题,在2.0将解决大部分问题,并且这是必然趋势,相信2.0时代将会很快替代1.0时代。以下是从官方网站摘取的主要Release Notes,详细请看官方说明。
- YARN - A general purpose resource management system for Hadoop to allow MapReduce and other other data processing frameworks and services
- High Availability for HDFS
- HDFS Federation
- HDFS Snapshots
- NFSv3 access to data in HDFS
- Support for running Hadoop on Microsoft Windows
- Binary Compatibility for MapReduce applications built on hadoop-1.x
- Substantial amount of integration testing with rest of projects in the ecosystem
A couple of important points to note while upgrading to hadoop-2.2.0:
- HDFS - The HDFS community decided to push the symlinks feature out to a future 2.3.0 release and is currently disabled.
- YARN/MapReduce - Users need to change ShuffleHandler service name from mapreduce.shuffle to mapreduce_shuffle.
比较有趣的是,上面有一个比较大的改变,那就是可以支持运行在windows平台。并且我下载源码后编译成64bit,安装在centos上测试了下,速度明显提升不少,让我很兴奋,另外在yarn-site.xml配置文件中,yarn.nodemanager.aux-services属性的值由原来的mapreduce.shuffle改为mapreduce_shuffle需要注意的地方,不然运行自带wordcount例子报错,由于时间关系,写的比较随意,以后有时间会详细写些教程。
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>shuffle service that needs to be set for Map Reduce to run </description> </property>