Hadoop 2.2.0编译,安装步骤(基于64位CentOS)(未完待续)
Hadoop 2.2.0编译,安装步骤(基于64位CentOS)
hight: 基于yarn计算框架和高可用性hdfs的第一个稳定版本。
注1:官网只提供32位release版本, 若机器为64位,需要手动编译。
注2:目前网上传的2.2版本的安装步骤几乎都有问题,没有一个版本是完全正确的。若不懂新框架内部机制,不要照抄网传的版本。
一、hadoop2.2编译(username:hadoop)
因为我们安装的CentOS是64bit的,而官方release的hadoop2.2.0版本没有对应的64bit安装包,故需要自行编译。
首先需要去oracle下载64位jdk:
$su root $wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz
注: prompt(提示符)为%默认为当前用户, #则为root,注意以下各步骤中的prompt类型。
下面为hadoop编译步骤(注:中间部分的文本框里内容提要只是一些补充说明,不要执行框里的命令)
(1) BOTPROTO改为”dhcp”
#su root #sed –i s/static/dhcp/g /etc/sysconfig/network-scripts/ifcfg-eth0 #servicenetwork restart
(2) 下载hadoop2.2.0 源码
#su hadoop $ cd ~ $ wget http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz
(3)安装maven
# suroot; cd /opt #wget http://apache.fayea.com/apache-mirror/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz # tarzxvf apache-maven-3.1.1-bin.tar.gz # cdapache-maven-3.1.1
修改系统环境变量:
有两种方式,修改/etc/profile或者在/etc/profile.d/下添加定制的shell文件,
鉴于profile文件的重要性,尽量不要在profile文件里添加内容,官方建议采用第二种,以保证profile文件的绝对安全。
下面采用第二种方式:
创建一个简单shell脚脚本并添加相关内容进去:
#cd/etc/profile.d/ # touch maven.sh
maven.sh里添加内容如下:
1 # catmaven.sh 2 3 #environmentvariable settings for mavn 4 5 exportMAVEN_HOME='/opt/apache-maven-3.1.1' 6 7 PATH=$MAVEN_HOME/bin:$PATH
接下来,
#Source/etc/profile
# mvn –version
显示版本信息“Apache Maven3.1.1”
(4)安装protobuf
注意:apache官方网站上的提示“NOTE: You will need protoc 2.5.0 installed.”
# suroot; cd /opt # wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2 # tarxvf protobuf-2.5.0.tar.bz2 (注意压缩文件后缀, maven安装包是—gzip文件,解压时需加–z) # cd protobuf-2.5.0 # ./configure
Oh,shit, 提示报错”configure: error: C++ preprocessor "/lib/cpp" failssanity check”
安装gcc
#yuminstall gcc
(5)编译hadoop
首先从官网下载hadoop2.2.0source code:
#suhadoop; cd ~hadoop/ #wget http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz
好了,痛苦的编译过程来了。
解压之:
% tarzxvf hadoop-2.2.0-src.tar.gz % cd hadoop-2.2.0-src
注:蓝色部分是说明,不需要执行!
我们看看官网上是怎么写的:
You should be able to obtain the MapReduce tarball fromthe release. If not, you should be able to create a tarball from the source.
$ mvn clean install -DskipTests
$ cd hadoop-mapreduce-project
$ mvn clean install assembly:assembly-Pnative
NOTE: Youwill need protoc 2.5.0 installed.
To ignore the native builds in mapreduce you can omit the-Pnative argument for maven. The tarball should be available in target/directory.
看看,写得多么简要,好像一切都很美好的样子。
那好,现在按照官网要求来执行安装:
$mvn clean install –DskipTests
结果各种报错!而且下载极其缓慢~
Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.1:build-classpath(build-classpath)…
经过排查, 怀疑是GFW的原因,可能被墙了,修改maven的默认镜源为国内镜像,则正常。
步骤如下:
Step1.切root权限, 修改/opt/apache-maven-3.1.1/conf/settings.xml
1 2 3 %su root 4 5 #vim /opt/apache-maven-3.1.1/conf/settings.xml
(1).在<mirrors>…</mirrors>里添加国内源(注意,绿色部分是新加的,黑色部分是原本就有的):
1 <mirrors> 2 3 <mirror> 4 5 <id>nexus-osc</id> 6 7 <mirrorOf>*</mirrorOf> 8 9 <name>Nexusosc</name> 10 11 <url>http://maven.oschina.net/content/groups/public/</url> 12 13 </mirror> 14 15 </mirrors>
(2).在<profiles>标签中增加以下内容(<profiles>…</profiles>别动,只添加绿色的)
1 <profile> 2 3 <id>jdk-1.7</id> 4 5 <activation> 6 7 <jdk>1.7</jdk> 8 9 </activation> 10 11 <repositories> 12 13 <repository> 14 15 <id>nexus</id> 16 17 <name>local private nexus</name> 18 19 <url>http://maven.oschina.net/content/groups/public/</url> 20 21 <releases> 22 23 <enabled>true</enabled> 24 25 </releases> 26 27 <snapshots> 28 29 <enabled>false</enabled> 30 31 </snapshots> 32 33 </repository> 34 35 </repositories> 36 37 <pluginRepositories> 38 39 <pluginRepository> 40 41 <id>nexus</id> 42 43 <name>local private nexus</name> 44 45 <url>http://maven.oschina.net/content/groups/public/</url> 46 47 <releases> 48 49 <enabled>true</enabled> 50 51 </releases> 52 53 <snapshots> 54 55 <enabled>false</enabled> 56 57 </snapshots> 58 59 </pluginRepository> 60 61 </pluginRepositories> 62 63 </profile>
Step2
1 # su hadoop 2 3 # sudo cp/opt/apache-maven-3.1.1/conf/settings.xml ~/.m2/
(若提示该用户不在sudoers里,执行以下步骤:
$ su root;
在sudoers里第99行添加当前用户(下面行号不要加):
#cat /etc/sudoers
98 root ALL=(ALL) ALL
99 grid ALL=(ALL) ALL
)
现在执行:
1 $mvn clean install –DskipTests
漫长的等待后发现安装一切正常。
继续编译:
注:此紫色部分为说明,不要要照做!
执行官网的编译步骤:
$ cd hadoop-mapreduce-project
$ mvn clean install assembly:assembly –Pnative
编译了很久, 最后提示有ERROR,看到官网说:
To ignore the native builds in mapreduceyou can omit the -Pnative argument for maven. The tarball should be availablein target/ directory.
于是采用ignore native方式直接编译:
$mvn clean install assembly:assembly
结果发现又是各种错误。
Google后发现大家普遍采用的是以下编译方式(直接执行):
$ cd hadoop-2.2.0-src
$mvn package -Pdist,native -DskipTests -Dtar
又是漫长的等待……
最后安装结果,典型的错误由两个:
Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-pipes: An Ant BuildException has occured: exec returned: 1 -> [Help1]…
[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-common: An Ant BuildException has occured: Execute failed:java.io.IOException: Cannot run program "cmake"
基于以上错误,所需的安装包有3个:
Cmake
ncurses-devel
openssl-devel
1 %su root 2 3 #yuminstall ncurses-devel 4 5 #yuminstall openssl-devel 6 7 #yuminstall cmake 8 9
以上安装完成后,切回hadoop用户:
1 #suhadoop;#cd ~/hadoop-2.2.0-src
编译:
1 $mvn package-Pdist,native -DskipTests -Dtar
漫长的等待后,查看结果:
一切正常。至此,hadoop2.2.0编译完成。
验证:
下面验证编译结果是否符合预期, 注意我们当前是在目录~/hadoop-2.2.0-src下,
1 2 3 $cdhadoop-dist/ 4 5 $ls 6 7 pom.xml target
以上为maven编译的配置文件
1 %cd target 2 3 $ls -l 4 5 antrun 6 7 dist-tar-stitching.sh 8 9 hadoop-2.2.0.tar.gz 10 11 hadoop-dist-2.2.0-javadoc.jar 12 13 maven-archiver 14 15 dist-layout-stitching.sh 16 17 hadoop-2.2.0 18 19 hadoop-dist-2.2.0.jar 20 21 javadoc-bundle-options 22 23 test-dir
以上为maven编译后自动生成的目录文件,进入hadoop-2.2.0
1 $cd hadoop-2.2.0 2 3 $ls 4 5 bin etc include lib libexec sbin share
这才是和官方release2.2.0版本(官方只有32bit版本)的相同的目录结构。
下面主要验证两项:
a.验证版本号
1 $bin/hadoopversion 2 3 Hadoop 2.2.0 4 5 Subversionhttps://svn.apache.org/repos/asf/hadoop/common -r 1529768 6 7 Compiledby hortonmu on 2013-10-07T06:28Z 8 9 Compiledwithprotoc 2.5.0 10 11 Fromsource with checksum 79e53ce7994d1628b240f09af91e1af4 12 13 Thiscommand was run using/home/grid/yarn/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
可以看到hadoop版本号,编译工具(protoc2.5.0版本号与官方要求一致)以及编译日期.
b.验证hadoop lib的位数
1 %file lib//native/* 2 3 lib//native/libhadoop.a: current ar archive 4 5 lib//native/libhadooppipes.a: current ar archive 6 7 lib//native/libhadoop.so: symbolic link to `libhadoop.so.1.0.0' 8 9 lib//native/libhadoop.so.1.0.0:ELF 64-bitLSB shared object, x86-64, version 1 (SYSV), dynamically linked, notstripped 10 11 lib//native/libhadooputils.a: current ar archive 12 13 lib//native/libhdfs.a: current ar archive 14 15 lib//native/libhdfs.so: symbolic link to `libhdfs.so.0.0.0' 16 17 lib//native/libhdfs.so.0.0.0: ELF 64-bit LSB shared object, x86-64,version 1 (SYSV), dynamically linked, not stripped
看到黑色的ELF-64bit LSB证明64bit hadoop2.2.0初步编译成功,查看我们之前的hadoop0.20.3版本,会发现lib//native/libhadoop.so.1.0.0是32bit,这是不正确的!。^_^
二、hadoop2.2配置
(1)home设置
为了和MRv1区别, 2.2版本的home目录直接命名为yarn:
1 #su hadoop 2 3 $cd ~ 4 5 $mkdir –p yarn/yarn_data 6 7 $cp –a~hadoop/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0 ~hadoop/yarn
(2)环境变量设置
.bashrc里添加新环境变量:
1 # javaenv 2 3 export JAVA_HOME="/usr/java/jdk1.7.0_45" 4 5 exportPATH="$JAVA_HOME/bin:$PATH" 6 7 # hadoopvariable settings 8 9 exportHADOOP_HOME="$HOME/yarn/hadoop-2.2.0" 10 11 export HADOOP_PREFIX="$HADOOP_HOME/" 12 13 exportYARN_HOME=$HADOOP_HOME 14 15 exportHADOOP_MAPRED_HOME="$HADOOP_HOME" 16 17 exportHADOOP_COMMON_HOME="$HADOOP_HOME" 18 19 exportHADOOP_HDFS_HOME="$HADOOP_HOME" 20 21 export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/" 22 23 exportYARN_CONF_DIR=$HADOOP_CONF_DIR 24 25 exportPATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
以上操作注意2点:
1. jdk一定要保证是64bit的
2.HADOOP_PREFIX极其重要,主要是为了兼容MRv1,优先级最高(比如寻找conf目录,即使我们配置了HADOOP_CONF_DIR,启动脚本依然会优先从$HADOOP_PREFIX/conf/里查找),
一定要保证此变量正确配置!
若想再2.2中使用MRv1框架,用此变量可以方便地指定MRv1的conf文件,否则,就直接注释掉,以防止产生不可预期的错误。
(3)改官方启动脚本的bug
说明:此版本虽然是release稳定版,但是依然有非常弱智的bug存在。
注意:紫色部分的内容了解即可,不需要执行!
正常情况下,启动守护进程$YARN_HOME/sbin/hadoop-daemons.sh中可以指定node.
我们看下该启动脚本的说明:
%$YARN_HOME/sbin/hadoop-daemons.sh
Usage:hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop]command args...
可以看到--hosts可以指定需要启动的存放节点名的文件名:
但这是无效的,此脚本调用的$YARN_HOME/libexec/hadoop-config.sh脚本有bug.
执行一下启动脚本:
%$YARN_HOME/sbin/hadoop-daemons.sh--hosts my_datanodes start datanode
at: /home/grid/yarn/hadoop-2.2.0/etc/hadoop//126571:No such file or directory
分析脚本,定位到嵌套脚本$YARN_HOME/libexec/hadoop-config.sh第96行:
96 exportHADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1"
看到红色部分是不对的,修改为:
96 export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"
现在执行:
$hadoop-daemons.sh--hosts nodes start datanode
Slave:starting datanode,loggingto /home/grid/yarn/hadoop-2.2.0//logs/hadoop-grid-datanode-Slave.out
备注1:此版本11月初发布至今,网上教程不论中文还是英文,均未提及此错误。
备注2:$YARN_HOME/libexec/hadoop-config.sh中分析hostfile的逻辑非常脑残:
if [ "--hosts" = "$1" ] then shift export HADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1" shift
因此,用户只能将自己的hostfile放在${HADOOP_CONF_DIR}/下面,放在其它地方是无效的。
备注3:按照$YARN_HOME/libexec/hadoop-config.sh脚本的逻辑,还有一种方式指定host
$hadoop-daemons.sh–hostnames Slave start datanode, 注意,因为bash脚本区分输入参数的分割符为\t或\s,所以限制了此种方式只能指定一个单独的节点
执行以下步骤:
%cd$YARN_HOME/libexec/
%vim hadoop-config.sh
修改第96行代码为:
export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"
保存退出vim
(5)配置文件设置
etc/hadoop/core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>${user.home}/yarn/yarn_data/tmp/hadoop-${user.name}</value> </property> </configuration>
备注1:注意fs.defaultFS为新的变量,代替旧的:fs.default.name
备注2:tmp文件夹放在我们刚才新建的$HOME/yarn/yarn_data/下面。
etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>x</value>
</property>
</configuration>
备注1. 新:dfs.namenode.name.dir,旧:dfs.name.dir,新:dfs.datanode.name.dir,旧:dfs.data.dir
备注2.dfs.replication确定data block的副本数目,hadoop基于rackawareness(机架感知)默认复制3份分block,(同一个rack下两个,另一个rack下一份,按照最短距离确定具体所需block, 一般很少采用跨机架数据块,除非某个机架down了)
etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
备注1:新的计算框架取消了实体上的jobtracker, 故不需要再指定mapreduce.jobtracker.addres,而是要指定一种框架,这里选择yarn,
备注2:hadoop2.2.还支持第三方的计算框架,但没怎么关注过。
(6)启动
确保在每主机的/etc/hosts里添加了所有node的域名解析表(i.e.node1 198.0.0.1);iptables已关闭;/etc/sysconfig/network-script/ifcfg-eth0里BOTPROTO=static;/etc/sysconfig/network文件里已设置了各台主机的hostname, 静态ip地址,且已经重启过每台机器;jdk和hadoop都为64bit;ssh免登陆已配置;完成以上几项后,就可以启动hadoop2.2.0了。
注意到从头到尾都没有提到Master, Slave,也没有提到namenode,datanode,是因为,新的计算框架,新的hdfs中不存在物理上的Master节点,所有的节点都是等价的。
下面以两个节点为例,为方便理解新框架,特做了以下规划:
node1 resourcemanager, nodemanager, proxyserver,historyserver, datanode, namenode,
node2 datanode, nodemanager
6.1 格式化:
%$YARN_HOME/bin/hdfsnamenode –format
(注意:hadoop 2.2.0的格式化步骤和旧版本不一样,旧的为 $YARN_HOME/bin/hadoop namenode –format
6.2 启动:
启动方式(1)手动开启
在node1节点主机上,分别启动resourcemanager,nodemanager, proxyserver, historyserver, datanode, namenode,
在node2节点主机上,分别启动datanode,nodemanager
备注:如果resourcemanager是独立的,则除了resourcemanager,其余每一个节点都需要一个nodemanager,我们可以在$YARN_HOME/etc/hadoop/下新建一个nodehosts, 在里面添加所有的除了resourcemanager外的所有node,因为此处我们配置的resourcemanager和namenode是同一台主机,所以此处也需要添加nodemanager
执行步骤如下:
%hostname node1 %$YARN_HOME/sbin/hadoop-daemon.sh --script hdfs start namenode %$YARN_HOME/sbin/hadoop-daemon.sh --script hdfs start datanode %$YARN_HOME/sbin/yarndaemon.shstart nodemanager %$YARN_HOME/sbin/yarn-daemon.sh start resourcemanager %$YARN_HOME/sbin/yarn-daemon.shstart proxyserver %$YARN_HOME/sbin/mr-jobhistory-daemon.sh start historyserver %$ssh node2 %$hostname node2 %$YARN_HOME/sbin/yarndaemon.shstart nodemanager %$YARN_HOME/sbin/hadoop-daemon.sh --script hdfs start datanode 启动方式(2)自动开启 Step1.确认已登录node1 $hostname node1
在$YARN_HOME/etc/hadoop/下新建namenodehosts,添加所有namenode节点
$cat $YARN_HOME/etc/hadoop/namenodehosts node1
在$YARN_HOME/etc/hadoop/下新建datanodehosts,添加所有datanode节点
$cat$YARN_HOME/etc/hadoop/datanodehosts node1 node2
在$YARN_HOME/etc/hadoop/下新建nodehosts,添加所有datano和namenode节点
$cat$YARN_HOME/etc/hadoop/datanodehosts node1 node2
备注:以上的hostfile名字是随便起的,可以是任意的file1,file2,file3, 但是必须放在$YARN_HOME/etc/hadoop/下面!
step2.执行
%$YARN_HOME/sbin/hadoop-daemons.sh--hosts namenodehosts --script hdfsstart namenode %$YARN_HOME/sbin/hadoop-daemons.sh--hosts datanodehosts --script hdfsstart datanode %$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start resourcemanager %$YARN_HOME/sbin/yarn-daemons.sh--hosts nodehosts start nodemanager %$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start proxyserver %$YARN_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Step3.查看启动情况
在node1上:
$jps
20698DataNode
21041JobHistoryServer
20888NodeManager
21429Jps
20606NameNode
20792ResourceManager
在node2上
$jps 8147DataNode 8355 Jps 8234NodeManager
Step4.查看各节点状态以及yarncluster运行状态
(1)查看各节点状态
FireFox进入: http://node1:50070(node1为namenode所在节点)
在主页面(第一张图)上点击Live Node,查看(第二张图)上各Live Node状态:
(2)查看resourcemanager上cluster运行状态
Firefox进入:http://node1:8088(node1为resourcemanager所在节点)
Step5. Cluster上MapReduce测试
现提供3个test cases
Test Case 1estimated_value_of_pi
%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \ pi 101000000
Console输出摘录:
Number of Maps =10 Samples per Map = 1000000 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 13/11/06 23:20:07 INFO Configuration.deprecation: mapred.map.tasksis deprecated. Instead, use mapreduce.job.maps 13/11/06 23:20:07 INFO Configuration.deprecation:mapred.output.key.class is deprecated. Instead, usemapreduce.job.output.key.class 13/11/06 23:20:07 INFO Configuration.deprecation:mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 13/11/06 23:20:11 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0001 13/11/06 23:20:15 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0001 to ResourceManager at /0.0.0.0:8032 13/11/06 23:20:16 INFO mapreduce.Job: The url to trackthe job: http://Node1:8088/proxy/application_1383806445149_0001/ 13/11/06 23:20:16 INFO mapreduce.Job: Running job:job_1383806445149_0001 13/11/06 23:21:09 INFO mapreduce.Job: Jobjob_1383806445149_0001 running in uber mode : false 13/11/06 23:21:10 INFO mapreduce.Job: map 0% reduce 0% 13/11/06 23:24:28 INFO mapreduce.Job: map 20% reduce 0% 13/11/06 23:24:30 INFO mapreduce.Job: map 30% reduce 0% 13/11/06 23:26:56 INFO mapreduce.Job: map 57% reduce 0% 13/11/06 23:26:58 INFO mapreduce.Job: map 60% reduce 0% 13/11/06 23:28:33 INFO mapreduce.Job: map 70% reduce 20% 13/11/06 23:28:35 INFO mapreduce.Job: map 80% reduce 20% 13/11/06 23:28:39 INFO mapreduce.Job: map 80% reduce 27% 13/11/06 23:30:06 INFO mapreduce.Job: map 90% reduce 27% 13/11/06 23:30:09 INFO mapreduce.Job: map 100% reduce 27% 13/11/06 23:30:12 INFO mapreduce.Job: map 100% reduce 33% 13/11/06 23:30:25 INFO mapreduce.Job: map 100% reduce 100% 13/11/06 23:30:54 INFO mapreduce.Job: Jobjob_1383806445149_0001 completed successfully 13/11/06 23:31:10 INFO mapreduce.Job: Counters: 43 File SystemCounters FILE:Number of bytes read=226 FILE:Number of bytes written=879166 FILE:Number of read operations=0 FILE:Number of large read operations=0 FILE:Number of write operations=0 HDFS:Number of bytes read=2590 HDFS:Number of bytes written=215 HDFS:Number of read operations=43 HDFS:Number of large read operations=0 HDFS:Number of write operations=3 JobCounters Launchedmap tasks=10 Launchedreduce tasks=1 Data-localmap tasks=10 Totaltime spent by all maps in occupied slots (ms)=1349359 Totaltime spent by all reduces in occupied slots (ms)=190811 Map-ReduceFramework Mapinput records=10 Mapoutput records=20 Mapoutput bytes=180 Mapoutput materialized bytes=280 Inputsplit bytes=1410 Combineinput records=0 Combineoutput records=0 Reduceinput groups=2 Reduceshuffle bytes=280 Reduceinput records=20 Reduceoutput records=0 SpilledRecords=40 ShuffledMaps =10 FailedShuffles=0 MergedMap outputs=10 GCtime elapsed (ms)=45355 CPUtime spent (ms)=29860 Physicalmemory (bytes) snapshot=1481818112 Virtualmemory (bytes) snapshot=9214468096 Totalcommitted heap usage (bytes)=1223008256 ShuffleErrors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File InputFormat Counters BytesRead=1180 File OutputFormat Counters BytesWritten=97 13/11/06 23:31:15 INFO mapred.ClientServiceDelegate:Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server Job Finished in 719.041 seconds Estimated value of Pi is 3.14158440000000000000
说明:可以看到最后输出值为该job使用了10个maps, job id为job_1383806445149_000, 最后计算得Pi的值为13.14158440000000000000, job Id分配原则为job_年月日时分_job序列号,序列号从0开始,上限值为1000, task id分配原则为job_年月日时分_job序列号_task序列号_m, job_年月日时分_job序列号_task序列号_r, m代表map taskslot , r代表reduce task slot, task 序列号从0开始,上限值为1000.
Test Case 2 random_writting
%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \ randomwriter/user/grid/test/test_randomwriter/out
Console输出摘录:
Running 10 maps. Job started: Wed Nov 06 23:42:17 PST 2013 13/11/06 23:42:17 INFO client.RMProxy: Connecting toResourceManager at /0.0.0.0:8032 13/11/06 23:42:19 INFO mapreduce.JobSubmitter: number ofsplits:10 13/11/06 23:42:20 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0002 13/11/06 23:42:21 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0002 to ResourceManager at /0.0.0.0:8032 13/11/06 23:42:21 INFO mapreduce.Job: The url to trackthe job: http://Master:8088/proxy/application_1383806445149_0002/ 13/11/06 23:42:21 INFO mapreduce.Job: Running job:job_1383806445149_0002 13/11/06 23:42:40 INFO mapreduce.Job: Jobjob_1383806445149_0002 running in uber mode : false 13/11/06 23:42:40 INFO mapreduce.Job: map 0% reduce 0% 13/11/06 23:55:02 INFO mapreduce.Job: map 10% reduce 0% 13/11/06 23:55:14 INFO mapreduce.Job: map 20% reduce 0% 13/11/06 23:55:42 INFO mapreduce.Job: map 30% reduce 0% 13/11/07 00:06:55 INFO mapreduce.Job: map 40% reduce 0% 13/11/07 00:07:10 INFO mapreduce.Job: map 50% reduce 0% 13/11/07 00:07:36 INFO mapreduce.Job: map 60% reduce 0% 13/11/07 00:13:47 INFO mapreduce.Job: map 70% reduce 0% 13/11/07 00:13:54 INFO mapreduce.Job: map 80% reduce 0% 13/11/07 00:13:58 INFO mapreduce.Job: map 90% reduce 0% 13/11/07 00:16:29 INFO mapreduce.Job: map 100% reduce 0% 13/11/07 00:16:37 INFO mapreduce.Job: Jobjob_1383806445149_0002 completed successfully File OutputFormat Counters BytesWritten=10772852496 Job ended: Thu Nov 07 00:16:40 PST 2013 The job took 2062 seconds.
说明:电脑存储空间足够的话,可以从hdfs里down下来看看。
现只能看一看输出文件存放的具体形式:
%$YARN_HOME/bin/hadoopfs -ls /user/grid/test/test_randomwriter/out/ Found 11items -rw-r--r-- 2 grid supergroup 0 2013-11-07 00:16/user/grid/test/test_randomwriter/out/_SUCCESS -rw-r--r-- 2 grid supergroup 1077278214 2013-11-0623:54 /user/grid/test/test_randomwriter/out/part-m-00000 -rw-r--r-- 2 grid supergroup 1077282751 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00001 -rw-r--r-- 2 grid supergroup 1077280298 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00002 -rw-r--r-- 2 grid supergroup 1077303152 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00003 -rw-r--r-- 2 grid supergroup 1077284240 2013-11-0700:06 /user/grid/test/test_randomwriter/out/part-m-00004 -rw-r--r-- 2 grid supergroup 1077286604 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00005 -rw-r--r-- 2 grid supergroup 1077284336 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00006 -rw-r--r-- 2 grid supergroup 1077284829 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00007 -rw-r--r-- 2 grid supergroup 1077289706 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00008 -rw-r--r-- 2 grid supergroup 1077278366 2013-11-0700:16 /user/grid/test/test_randomwriter/out/part-m-00009
Test Case3 word_count
(1)Locaol上创建文件:
%mkdirinput %echo ‘hello,world’ >> input/file1.in %echo ‘hello, ruby’ >> input/file2.in
(2)上传到hdfs上:
%$YARN_HOME/bin/hadoop fs -mkdir -p /user/grid/test/test_wordcount/ %$YARN_HOME/bin/hadoop fs –put input/user/grid/test/test_wordcount/in
(3)用yarn新计算框架运行mapreduce:
%$YARN_HOME/bin/yarn jar$YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jarwordcount /user/grid/test/test_wordcount/in/user/grid/test/test_wordcount/out
ConSole输出摘录:
3/11/07 00:35:03 INFO client.RMProxy:Connecting to ResourceManager at /0.0.0.0:8032 13/11/07 00:35:05 INFO input.FileInputFormat:Total input paths to process : 2 13/11/07 00:35:05 INFO mapreduce.JobSubmitter:number of splits:2 13/11/07 00:35:06 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1383806445149_0003 13/11/07 00:35:08 INFO impl.YarnClientImpl:Submitted application application_1383806445149_0003 to ResourceManager at /0.0.0.0:8032 13/11/07 00:35:08 INFO mapreduce.Job: The urlto track the job: http://Master:8088/proxy/application_1383806445149_0003/ 13/11/07 00:35:08 INFO mapreduce.Job: Runningjob: job_1383806445149_0003 13/11/07 00:35:25 INFO mapreduce.Job: Jobjob_1383806445149_0003 running in uber mode : false 13/11/07 00:35:25 INFO mapreduce.Job: map 0% reduce 0% 13/11/07 00:37:50 INFO mapreduce.Job: map 33% reduce 0% 13/11/07 00:37:54 INFO mapreduce.Job: map 67% reduce 0% 13/11/07 00:37:55 INFO mapreduce.Job: map 83% reduce 0% 13/11/07 00:37:58 INFO mapreduce.Job: map 100% reduce 0% 13/11/07 00:38:51 INFO mapreduce.Job: map 100% reduce 100% 13/11/07 00:38:54 INFO mapreduce.Job: Jobjob_1383806445149_0003 completed successfully 13/11/07 00:38:56 INFO mapreduce.Job:Counters: 43
说明:查看word count的计算结果:
%$YARN_HOME/bin/hadoop fs -cat /user/grid/test//test_wordcount/out/* hadoop 1 hello 1 ruby 1
补充:因为新的YARN为了保持与MRv1框架的旧版本兼容性,很多老的API还是可以用,但是会有INFO。此处通过修改$YARN_HOME/etc/hadoop/log4j.properties可以turn offconfiguration deprecation warnings.
建议去掉第138行的注释(可选),确保错误级别为WARN(默认为INFO级别,详见第20行:hadoop.root.logger=INFO,console):
138 log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN