centos下编译hadoop2.2.0src包

Hadoop 2.2.0编译,安装步骤(基于64位CentOS)(未完待续)

Hadoop 2.2.0编译,安装步骤(基于64位CentOS)

hight: 基于yarn计算框架和高可用性hdfs的第一个稳定版本

注1:官网只提供32位release版本, 若机器为64位,需要手动编译。

注2:目前网上传的2.2版本的安装步骤几乎都有问题,没有一个版本是完全正确的。若不懂新框架内部机制,不要照抄网传的版本。

一、hadoop2.2编译(username:hadoop)

因为我们安装的CentOS是64bit的,而官方release的hadoop2.2.0版本没有对应的64bit安装包,故需要自行编译。

首先需要去oracle下载64位jdk:

$su root

$wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.tar.gz

 

注: prompt(提示符)为%默认为当前用户, #则为root,注意以下各步骤中的prompt类型。

下面为hadoop编译步骤:中间部分的文本框里内容提要只是一些补充说明,不要执行框里的命令)

(1) BOTPROTO改为”dhcp”

#su root

#sed –i  s/static/dhcp/g /etc/sysconfig/network-scripts/ifcfg-eth0

#servicenetwork restart

 

(2) 下载hadoop2.2.0 源码

#su hadoop

$ cd ~

$ wget  http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz

 

(3)安装maven

# suroot;  cd /opt

#wget http://apache.fayea.com/apache-mirror/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz

# tarzxvf apache-maven-3.1.1-bin.tar.gz

# cdapache-maven-3.1.1

 

 

修改系统环境变量

        有两种方式,修改/etc/profile或者在/etc/profile.d/下添加定制的shell文件,

        鉴于profile文件的重要性,尽量不要在profile文件里添加内容,官方建议采用第二种,以保证profile文件的绝对安全。

下面采用第二种方式:

创建一个简单shell脚脚本并添加相关内容进去:

#cd/etc/profile.d/

# touch maven.sh

 

 maven.sh里添加内容如下:

1 # catmaven.sh
2 
3 #environmentvariable settings for mavn
4 
5 exportMAVEN_HOME='/opt/apache-maven-3.1.1'
6 
7 PATH=$MAVEN_HOME/bin:$PATH

 

接下来,

#Source/etc/profile

# mvn –version

 

显示版本信息“Apache Maven3.1.1”

(4)安装protobuf

注意:apache官方网站上的提示“NOTE: You will need protoc 2.5.0 installed.”

# suroot; cd /opt

# wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.bz2

# tarxvf protobuf-2.5.0.tar.bz2 (注意压缩文件后缀, maven安装包是—gzip文件,解压时需加–z)

# cd protobuf-2.5.0

# ./configure

 

 

Oh,shit, 提示报错”configure: error: C++ preprocessor "/lib/cpp" failssanity check”

安装gcc

#yuminstall gcc

 

(5)编译hadoop

首先从官网下载hadoop2.2.0source code:

#suhadoop; cd ~hadoop/

#wget http://apache.dataguru.cn/hadoop/common/stable/hadoop-2.2.0-src.tar.gz

 

好了,痛苦的编译过程来了。

解压之:

% tarzxvf hadoop-2.2.0-src.tar.gz

% cd hadoop-2.2.0-src

 

注:蓝色部分是说明,不需要执行!


我们看看官网上是怎么写的:

You should be able to obtain the MapReduce tarball fromthe release. If not, you should be able to create a tarball from the source.

$ mvn clean install -DskipTests

$ cd hadoop-mapreduce-project

$ mvn clean install assembly:assembly-Pnative

NOTE: Youwill need protoc 2.5.0 installed.

To ignore the native builds in mapreduce you can omit the-Pnative argument for maven. The tarball should be available in target/directory.

看看,写得多么简要,好像一切都很美好的样子。

那好,现在按照官网要求来执行安装:

$mvn clean install –DskipTests

结果各种报错!而且下载极其缓慢~

Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:2.1:build-classpath(build-classpath)…

经过排查, 怀疑是GFW的原因,可能被墙了,修改maven的默认镜源为国内镜像,则正常。


 

 

步骤如下:

Step1.切root权限, 修改/opt/apache-maven-3.1.1/conf/settings.xml

1  
2 
3 %su root
4 
5 #vim /opt/apache-maven-3.1.1/conf/settings.xml

(1).在<mirrors>…</mirrors>里添加国内源(注意,绿色部分是新加的,黑色部分是原本就有的):

 1 <mirrors>
 2 
 3 <mirror>  
 4 
 5 <id>nexus-osc</id>  
 6 
 7 <mirrorOf>*</mirrorOf>
 8 
 9  <name>Nexusosc</name>  
10 
11 <url>http://maven.oschina.net/content/groups/public/</url>  
12 
13 </mirror>
14 
15 </mirrors>

 

(2).在<profiles>标签中增加以下内容(<profiles>…</profiles>别动,只添加绿色的)

 1 <profile>
 2 
 3  <id>jdk-1.7</id>
 4 
 5 <activation>
 6 
 7    <jdk>1.7</jdk>
 8 
 9 </activation>
10 
11 <repositories>
12 
13   <repository>
14 
15     <id>nexus</id>
16 
17     <name>local private nexus</name>
18 
19     <url>http://maven.oschina.net/content/groups/public/</url>
20 
21     <releases>  
22 
23       <enabled>true</enabled>
24 
25     </releases>  
26 
27     <snapshots>  
28 
29       <enabled>false</enabled>
30 
31     </snapshots>  
32 
33   </repository>  
34 
35  </repositories>  
36 
37 <pluginRepositories>  
38 
39   <pluginRepository>  
40 
41     <id>nexus</id>  
42 
43     <name>local private nexus</name>  
44 
45     <url>http://maven.oschina.net/content/groups/public/</url>
46 
47     <releases>
48 
49       <enabled>true</enabled>
50 
51      </releases>
52 
53     <snapshots>
54 
55       <enabled>false</enabled>
56 
57     </snapshots>
58 
59   </pluginRepository>
60 
61 </pluginRepositories>
62 
63 </profile>

 

 

 

Step2

1 # su hadoop
2 
3 # sudo cp/opt/apache-maven-3.1.1/conf/settings.xml ~/.m2/

 

(若提示该用户不在sudoers里,执行以下步骤:

        $ su root;  

        在sudoers里第99行添加当前用户(下面行号不要加):

        #cat /etc/sudoers

        98 root     ALL=(ALL)       ALL

        99 grid      ALL=(ALL)       ALL

)

现在执行:

1 $mvn clean install –DskipTests

 

漫长的等待后发现安装一切正常。

继续编译:

注:此紫色部分为说明,不要要照做!


执行官网的编译步骤:

$ cd hadoop-mapreduce-project

$ mvn clean install assembly:assembly –Pnative

编译了很久, 最后提示有ERROR,看到官网说:

To ignore the native builds in mapreduceyou can omit the -Pnative argument for maven. The tarball should be availablein target/ directory.

于是采用ignore native方式直接编译:

$mvn clean install assembly:assembly

结果发现又是各种错误。

Google后发现大家普遍采用的是以下编译方式(直接执行):

$ cd hadoop-2.2.0-src

$mvn package -Pdist,native -DskipTests -Dtar

 

又是漫长的等待……

最后安装结果,典型的错误由两个:

 

Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-pipes: An Ant BuildException has occured: exec returned: 1 -> [Help1]…

 

[ERROR] Failed to execute goalorg.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on projecthadoop-common: An Ant BuildException has occured: Execute failed:java.io.IOException: Cannot run program "cmake"

基于以上错误,所需的安装包有3个:

Cmake

ncurses-devel

openssl-devel


 

1 %su root
2 
3 #yuminstall ncurses-devel
4 
5 #yuminstall openssl-devel
6 
7 #yuminstall cmake
8 
9  

 

以上安装完成后,切回hadoop用户:

1 #suhadoop;#cd ~/hadoop-2.2.0-src

 

编译

1 $mvn package-Pdist,native -DskipTests -Dtar

 

漫长的等待后,查看结果:

 

 

一切正常。至此,hadoop2.2.0编译完成。

验证:

下面验证编译结果是否符合预期, 注意我们当前是在目录~/hadoop-2.2.0-src下,

1  
2 
3 $cdhadoop-dist/
4 
5 $ls
6 
7 pom.xml  target

 

以上为maven编译的配置文件

 

 1 %cd target
 2 
 3 $ls -l
 4 
 5 antrun                    
 6 
 7 dist-tar-stitching.sh  
 8 
 9 hadoop-2.2.0.tar.gz    
10 
11 hadoop-dist-2.2.0-javadoc.jar  
12 
13 maven-archiver
14 
15 dist-layout-stitching.sh  
16 
17 hadoop-2.2.0           
18 
19 hadoop-dist-2.2.0.jar  
20 
21 javadoc-bundle-options         
22 
23 test-dir

 

以上为maven编译后自动生成的目录文件,进入hadoop-2.2.0

1 $cd hadoop-2.2.0
2 
3 $ls
4 
5 bin  etc  include  lib libexec  sbin  share

 

这才是和官方release2.2.0版本(官方只有32bit版本)的相同的目录结构。

下面主要验证两项:

a.验证版本号

 1 $bin/hadoopversion
 2 
 3 Hadoop 2.2.0
 4 
 5 Subversionhttps://svn.apache.org/repos/asf/hadoop/common -r 1529768
 6 
 7 Compiledby hortonmu on 2013-10-07T06:28Z
 8 
 9 Compiledwithprotoc 2.5.0
10 
11 Fromsource with checksum 79e53ce7994d1628b240f09af91e1af4
12 
13 Thiscommand was run using/home/grid/yarn/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

 

可以看到hadoop版本号,编译工具(protoc2.5.0版本号与官方要求一致)以及编译日期.

b.验证hadoop lib的位数

 1 %file  lib//native/*
 2 
 3 lib//native/libhadoop.a:        current ar archive
 4 
 5 lib//native/libhadooppipes.a:   current ar archive
 6 
 7 lib//native/libhadoop.so:       symbolic link to `libhadoop.so.1.0.0'
 8 
 9 lib//native/libhadoop.so.1.0.0:ELF 64-bitLSB shared object, x86-64, version 1 (SYSV), dynamically linked, notstripped
10 
11 lib//native/libhadooputils.a:   current ar archive
12 
13 lib//native/libhdfs.a:          current ar archive
14 
15 lib//native/libhdfs.so:         symbolic link to `libhdfs.so.0.0.0'
16 
17 lib//native/libhdfs.so.0.0.0:   ELF 64-bit LSB shared object, x86-64,version 1 (SYSV), dynamically linked, not stripped

 

看到黑色的ELF-64bit LSB证明64bit hadoop2.2.0初步编译成功,查看我们之前的hadoop0.20.3版本,会发现lib//native/libhadoop.so.1.0.0是32bit,这是不正确的!。^_^

二、hadoop2.2配置

(1)home设置

为了和MRv1区别, 2.2版本的home目录直接命名为yarn:

1 #su hadoop
2 
3 $cd ~
4 
5 $mkdir –p yarn/yarn_data
6 
7 $cp –a~hadoop/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0  ~hadoop/yarn

 

 

(2)环境变量设置

.bashrc里添加新环境变量:

 1 # javaenv
 2 
 3 export JAVA_HOME="/usr/java/jdk1.7.0_45"
 4 
 5 exportPATH="$JAVA_HOME/bin:$PATH"
 6 
 7 # hadoopvariable settings
 8 
 9 exportHADOOP_HOME="$HOME/yarn/hadoop-2.2.0"
10 
11 export HADOOP_PREFIX="$HADOOP_HOME/"
12 
13 exportYARN_HOME=$HADOOP_HOME
14 
15 exportHADOOP_MAPRED_HOME="$HADOOP_HOME"
16 
17 exportHADOOP_COMMON_HOME="$HADOOP_HOME"
18 
19 exportHADOOP_HDFS_HOME="$HADOOP_HOME"
20 
21 export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
22 
23 exportYARN_CONF_DIR=$HADOOP_CONF_DIR
24 
25 exportPATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"

 

以上操作注意2点:

1. jdk一定要保证是64bit的

2.HADOOP_PREFIX极其重要,主要是为了兼容MRv1,优先级最高(比如寻找conf目录,即使我们配置了HADOOP_CONF_DIR,启动脚本依然会优先从$HADOOP_PREFIX/conf/里查找),

一定要保证此变量正确配置!

若想再2.2中使用MRv1框架,用此变量可以方便地指定MRv1的conf文件,否则,就直接注释掉,以防止产生不可预期的错误。

(3)改官方启动脚本的bug

说明:此版本虽然是release稳定版,但是依然有非常弱智的bug存在。

注意:紫色部分的内容了解即可,不需要执行!


 

正常情况下,启动守护进程$YARN_HOME/sbin/hadoop-daemons.sh中可以指定node.

我们看下该启动脚本的说明:

%$YARN_HOME/sbin/hadoop-daemons.sh

Usage:hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop]command args...

可以看到--hosts可以指定需要启动的存放节点名的文件名:

但这是无效的,此脚本调用的$YARN_HOME/libexec/hadoop-config.sh脚本有bug.

执行一下启动脚本:

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts my_datanodes start datanode

at: /home/grid/yarn/hadoop-2.2.0/etc/hadoop//126571:No such file or directory

分析脚本,定位到嵌套脚本$YARN_HOME/libexec/hadoop-config.sh第96行:

96         exportHADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1"

看到红色部分是不对的,修改为:

96         export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"


 

现在执行

$hadoop-daemons.sh--hosts nodes start datanode

Slave:starting datanode,loggingto /home/grid/yarn/hadoop-2.2.0//logs/hadoop-grid-datanode-Slave.out

备注1:此版本11月初发布至今,网上教程不论中文还是英文,均未提及此错误。

备注2:$YARN_HOME/libexec/hadoop-config.sh中分析hostfile的逻辑非常脑残:

 if [ "--hosts" = "$1" ]

   then

        shift

        export HADOOP_SLAVES="${HADOOP_CONF_DIR}/%$1"

        shift

因此,用户只能将自己的hostfile放在${HADOOP_CONF_DIR}/下面,放在其它地方是无效的。

备注3:按照$YARN_HOME/libexec/hadoop-config.sh脚本的逻辑,还有一种方式指定host

$hadoop-daemons.sh–hostnames Slave start datanode, 注意,因为bash脚本区分输入参数的分割符为\t或\s,所以限制了此种方式只能指定一个单独的节点

执行以下步骤:

%cd$YARN_HOME/libexec/

%vim hadoop-config.sh

修改第96行代码为:

export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"

保存退出vim

(5)配置文件设置

etc/hadoop/core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

 <value>hdfs://node1:9000</value>

</property>

 

<property>

<name>hadoop.tmp.dir</name>

 <value>${user.home}/yarn/yarn_data/tmp/hadoop-${user.name}</value>

</property>

</configuration>

 

 

备注1:注意fs.defaultFS为新的变量,代替旧的:fs.default.name

备注2:tmp文件夹放在我们刚才新建的$HOME/yarn/yarn_data/下面。

 etc/hadoop/hdfs-site.xml

 

 

<property>

<name>dfs.replication</name>

<value>x</value>

</property>

 

</configuration>

备注1. 新:dfs.namenode.name.dir,旧:dfs.name.dir,新:dfs.datanode.name.dir,旧:dfs.data.dir

备注2.dfs.replication确定data block的副本数目,hadoop基于rackawareness(机架感知)默认复制3份分block,(同一个rack下两个,另一个rack下一份,按照最短距离确定具体所需block, 一般很少采用跨机架数据块,除非某个机架down了)

etc/hadoop/yarn-site.xml

<configuration>

  <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>

 </property>

  <property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>

</configuration>

 

 

etc/hadoop/mapred-site.xml

 

<configuration>

 <property>

   <name>mapreduce.framework.name</name>

   <value>yarn</value>

 </property>

</configuration>

 

备注1:新的计算框架取消了实体上的jobtracker, 故不需要再指定mapreduce.jobtracker.addres,而是要指定一种框架,这里选择yarn,

备注2:hadoop2.2.还支持第三方的计算框架,但没怎么关注过。

(6)启动

        确保在每主机的/etc/hosts里添加了所有node的域名解析表(i.e.node1  198.0.0.1);iptables已关闭;/etc/sysconfig/network-script/ifcfg-eth0里BOTPROTO=static;/etc/sysconfig/network文件里已设置了各台主机的hostname, 静态ip地址,且已经重启过每台机器;jdk和hadoop都为64bit;ssh免登陆已配置;完成以上几项后,就可以启动hadoop2.2.0了。

        注意到从头到尾都没有提到Master, Slave,也没有提到namenode,datanode,是因为,新的计算框架,新的hdfs中不存在物理上的Master节点,所有的节点都是等价的。

下面以两个节点为例,为方便理解新框架,特做了以下规划:

node1       resourcemanager, nodemanager, proxyserver,historyserver, datanode, namenode,

node2       datanode, nodemanager

 

6.1 格式化:

%$YARN_HOME/bin/hdfsnamenode –format

 

(注意:hadoop 2.2.0的格式化步骤和旧版本不一样,旧的为 $YARN_HOME/bin/hadoop namenode –format

6.2 启动:

启动方式(1)手动开启

在node1节点主机上,分别启动resourcemanager,nodemanager, proxyserver, historyserver, datanode, namenode,

在node2节点主机上,分别启动datanode,nodemanager

备注:如果resourcemanager是独立的,则除了resourcemanager,其余每一个节点都需要一个nodemanager,我们可以在$YARN_HOME/etc/hadoop/下新建一个nodehosts, 在里面添加所有的除了resourcemanager外的所有node,因为此处我们配置的resourcemanager和namenode是同一台主机,所以此处也需要添加nodemanager

 

执行步骤如下:

%hostname

node1

%$YARN_HOME/sbin/hadoop-daemon.sh  --script hdfs start namenode

%$YARN_HOME/sbin/hadoop-daemon.sh --script hdfs start datanode

%$YARN_HOME/sbin/yarndaemon.shstart nodemanager

%$YARN_HOME/sbin/yarn-daemon.sh   start resourcemanager

%$YARN_HOME/sbin/yarn-daemon.shstart proxyserver

%$YARN_HOME/sbin/mr-jobhistory-daemon.sh   start historyserver

%$ssh node2

%$hostname

node2

%$YARN_HOME/sbin/yarndaemon.shstart nodemanager

%$YARN_HOME/sbin/hadoop-daemon.sh  --script hdfs start datanode

启动方式(2)自动开启

Step1.确认已登录node1

$hostname

node1

 

在$YARN_HOME/etc/hadoop/下新建namenodehosts,添加所有namenode节点

$cat $YARN_HOME/etc/hadoop/namenodehosts

node1

 

在$YARN_HOME/etc/hadoop/下新建datanodehosts,添加所有datanode节点

$cat$YARN_HOME/etc/hadoop/datanodehosts

node1

node2

 

在$YARN_HOME/etc/hadoop/下新建nodehosts,添加所有datano和namenode节点

$cat$YARN_HOME/etc/hadoop/datanodehosts

node1

node2

备注:以上的hostfile名字是随便起的,可以是任意的file1,file2,file3, 但是必须放在$YARN_HOME/etc/hadoop/下面!

step2.执行

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts namenodehosts --script  hdfsstart  namenode

%$YARN_HOME/sbin/hadoop-daemons.sh--hosts datanodehosts --script  hdfsstart  datanode

%$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start resourcemanager

%$YARN_HOME/sbin/yarn-daemons.sh--hosts nodehosts start nodemanager

%$YARN_HOME/sbin/yarn-daemons.sh--hostnames node1 start proxyserver

%$YARN_HOME/sbin/mr-jobhistory-daemon.sh   start  historyserver

 

Step3.查看启动情况

在node1上:

$jps

20698DataNode

21041JobHistoryServer

20888NodeManager

21429Jps

20606NameNode

20792ResourceManager

 

在node2上

$jps

8147DataNode

8355 Jps

8234NodeManager

 

Step4.查看各节点状态以及yarncluster运行状态

(1)查看各节点状态

FireFox进入: http://node1:50070(node1为namenode所在节点)

在主页面(第一张图)上点击Live Node,查看(第二张图)上各Live Node状态:

 (2)查看resourcemanager上cluster运行状态

Firefox进入:http://node1:8088(node1为resourcemanager所在节点)

 

Step5. Cluster上MapReduce测试

现提供3个test cases

Test Case 1estimated_value_of_pi

%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \

pi 101000000

 

Console输出摘录:

Number of Maps  =10

Samples per Map = 1000000

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Wrote input for Map #4

Wrote input for Map #5

Wrote input for Map #6

Wrote input for Map #7

Wrote input for Map #8

Wrote input for Map #9

Starting Job

13/11/06 23:20:07 INFO Configuration.deprecation: mapred.map.tasksis deprecated. Instead, use mapreduce.job.maps

13/11/06 23:20:07 INFO Configuration.deprecation:mapred.output.key.class is deprecated. Instead, usemapreduce.job.output.key.class

13/11/06 23:20:07 INFO Configuration.deprecation:mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

13/11/06 23:20:11 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0001

13/11/06 23:20:15 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0001 to ResourceManager at /0.0.0.0:8032

13/11/06 23:20:16 INFO mapreduce.Job: The url to trackthe job: http://Node1:8088/proxy/application_1383806445149_0001/

13/11/06 23:20:16 INFO mapreduce.Job: Running job:job_1383806445149_0001

13/11/06 23:21:09 INFO mapreduce.Job: Jobjob_1383806445149_0001 running in uber mode : false

13/11/06 23:21:10 INFO mapreduce.Job:  map 0% reduce 0%

13/11/06 23:24:28 INFO mapreduce.Job:  map 20% reduce 0%

13/11/06 23:24:30 INFO mapreduce.Job:  map 30% reduce 0%

13/11/06 23:26:56 INFO mapreduce.Job:  map 57% reduce 0%

13/11/06 23:26:58 INFO mapreduce.Job:  map 60% reduce 0%

13/11/06 23:28:33 INFO mapreduce.Job:  map 70% reduce 20%

13/11/06 23:28:35 INFO mapreduce.Job:  map 80% reduce 20%

13/11/06 23:28:39 INFO mapreduce.Job:  map 80% reduce 27%

13/11/06 23:30:06 INFO mapreduce.Job:  map 90% reduce 27%

13/11/06 23:30:09 INFO mapreduce.Job:  map 100% reduce 27%

13/11/06 23:30:12 INFO mapreduce.Job:  map 100% reduce 33%

13/11/06 23:30:25 INFO mapreduce.Job:  map 100% reduce 100%

13/11/06 23:30:54 INFO mapreduce.Job: Jobjob_1383806445149_0001 completed successfully

13/11/06 23:31:10 INFO mapreduce.Job: Counters: 43

        File SystemCounters

                  FILE:Number of bytes read=226

                  FILE:Number of bytes written=879166

                  FILE:Number of read operations=0

                  FILE:Number of large read operations=0

                  FILE:Number of write operations=0

                  HDFS:Number of bytes read=2590

                  HDFS:Number of bytes written=215

                  HDFS:Number of read operations=43

                  HDFS:Number of large read operations=0

                  HDFS:Number of write operations=3

        JobCounters

                  Launchedmap tasks=10

                  Launchedreduce tasks=1

                  Data-localmap tasks=10

                  Totaltime spent by all maps in occupied slots (ms)=1349359

                  Totaltime spent by all reduces in occupied slots (ms)=190811

        Map-ReduceFramework

                  Mapinput records=10

                  Mapoutput records=20

                  Mapoutput bytes=180

                  Mapoutput materialized bytes=280

                  Inputsplit bytes=1410

                  Combineinput records=0

                  Combineoutput records=0

                  Reduceinput groups=2

                  Reduceshuffle bytes=280

                  Reduceinput records=20

                  Reduceoutput records=0

                  SpilledRecords=40

                  ShuffledMaps =10

                  FailedShuffles=0

                  MergedMap outputs=10

                  GCtime elapsed (ms)=45355

                  CPUtime spent (ms)=29860

                  Physicalmemory (bytes) snapshot=1481818112

                  Virtualmemory (bytes) snapshot=9214468096

                  Totalcommitted heap usage (bytes)=1223008256

        ShuffleErrors

                  BAD_ID=0

                  CONNECTION=0

                  IO_ERROR=0

                  WRONG_LENGTH=0

                  WRONG_MAP=0

                  WRONG_REDUCE=0

        File InputFormat Counters

                  BytesRead=1180

        File OutputFormat Counters

                  BytesWritten=97

13/11/06 23:31:15 INFO mapred.ClientServiceDelegate:Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirectingto job history server

Job Finished in 719.041 seconds

Estimated value of Pi is 3.14158440000000000000

 

 

说明:可以看到最后输出值为该job使用了10个maps, job id为job_1383806445149_000, 最后计算得Pi的值为13.14158440000000000000, job Id分配原则为job_年月日时分_job序列号,序列号从0开始,上限值为1000, task id分配原则为job_年月日时分_job序列号_task序列号_m, job_年月日时分_job序列号_task序列号_r, m代表map taskslot , r代表reduce task slot, task 序列号从0开始,上限值为1000.

 

Test Case 2  random_writting

%$YARN_HOME/sbin/yarnjar $YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jar \

randomwriter/user/grid/test/test_randomwriter/out

 

Console输出摘录

Running 10 maps.

Job started: Wed Nov 06 23:42:17 PST 2013

13/11/06 23:42:17 INFO client.RMProxy: Connecting toResourceManager at /0.0.0.0:8032

13/11/06 23:42:19 INFO mapreduce.JobSubmitter: number ofsplits:10

13/11/06 23:42:20 INFO mapreduce.JobSubmitter: Submittingtokens for job: job_1383806445149_0002

13/11/06 23:42:21 INFO impl.YarnClientImpl: Submittedapplication application_1383806445149_0002 to ResourceManager at /0.0.0.0:8032

13/11/06 23:42:21 INFO mapreduce.Job: The url to trackthe job: http://Master:8088/proxy/application_1383806445149_0002/

13/11/06 23:42:21 INFO mapreduce.Job: Running job:job_1383806445149_0002

13/11/06 23:42:40 INFO mapreduce.Job: Jobjob_1383806445149_0002 running in uber mode : false

13/11/06 23:42:40 INFO mapreduce.Job:  map 0% reduce 0%

13/11/06 23:55:02 INFO mapreduce.Job:  map 10% reduce 0%

13/11/06 23:55:14 INFO mapreduce.Job:  map 20% reduce 0%

13/11/06 23:55:42 INFO mapreduce.Job:  map 30% reduce 0%

13/11/07 00:06:55 INFO mapreduce.Job:  map 40% reduce 0%

13/11/07 00:07:10 INFO mapreduce.Job:  map 50% reduce 0%

13/11/07 00:07:36 INFO mapreduce.Job:  map 60% reduce 0%

13/11/07 00:13:47 INFO mapreduce.Job:  map 70% reduce 0%

13/11/07 00:13:54 INFO mapreduce.Job:  map 80% reduce 0%

13/11/07 00:13:58 INFO mapreduce.Job:  map 90% reduce 0%

13/11/07 00:16:29 INFO mapreduce.Job:  map 100% reduce 0%

13/11/07 00:16:37 INFO mapreduce.Job: Jobjob_1383806445149_0002 completed successfully

        File OutputFormat Counters

                  BytesWritten=10772852496

Job ended: Thu Nov 07 00:16:40 PST 2013

The job took 2062 seconds.

 

说明:电脑存储空间足够的话,可以从hdfs里down下来看看。

现只能看一看输出文件存放的具体形式:

%$YARN_HOME/bin/hadoopfs -ls /user/grid/test/test_randomwriter/out/

Found 11items

-rw-r--r--   2 grid supergroup          0 2013-11-07 00:16/user/grid/test/test_randomwriter/out/_SUCCESS

-rw-r--r--   2 grid supergroup 1077278214 2013-11-0623:54 /user/grid/test/test_randomwriter/out/part-m-00000

-rw-r--r--   2 grid supergroup 1077282751 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00001

-rw-r--r--   2 grid supergroup 1077280298 2013-11-0623:55 /user/grid/test/test_randomwriter/out/part-m-00002

-rw-r--r--   2 grid supergroup 1077303152 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00003

-rw-r--r--   2 grid supergroup 1077284240 2013-11-0700:06 /user/grid/test/test_randomwriter/out/part-m-00004

-rw-r--r--   2 grid supergroup 1077286604 2013-11-0700:07 /user/grid/test/test_randomwriter/out/part-m-00005

-rw-r--r--   2 grid supergroup 1077284336 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00006

-rw-r--r--   2 grid supergroup 1077284829 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00007

-rw-r--r--   2 grid supergroup 1077289706 2013-11-0700:13 /user/grid/test/test_randomwriter/out/part-m-00008

-rw-r--r--   2 grid supergroup 1077278366 2013-11-0700:16 /user/grid/test/test_randomwriter/out/part-m-00009

 

 

Test Case3 word_count

(1)Locaol上创建文件

%mkdirinput

%echo ‘hello,world’ >> input/file1.in

%echo ‘hello, ruby’ >> input/file2.in

 

(2)上传到hdfs

%$YARN_HOME/bin/hadoop fs -mkdir -p /user/grid/test/test_wordcount/

%$YARN_HOME/bin/hadoop fs –put input/user/grid/test/test_wordcount/in

 

 

(3)yarn新计算框架运行mapreduce

%$YARN_HOME/bin/yarn jar$YARN_HOME/share/hadoop//mapreduce/hadoop-mapreduce-examples-2.2.0.jarwordcount  /user/grid/test/test_wordcount/in/user/grid/test/test_wordcount/out

 

 

ConSole输出摘录

3/11/07 00:35:03 INFO client.RMProxy:Connecting to ResourceManager at /0.0.0.0:8032

13/11/07 00:35:05 INFO input.FileInputFormat:Total input paths to process : 2

13/11/07 00:35:05 INFO mapreduce.JobSubmitter:number of splits:2

13/11/07 00:35:06 INFO mapreduce.JobSubmitter:Submitting tokens for job: job_1383806445149_0003

13/11/07 00:35:08 INFO impl.YarnClientImpl:Submitted application application_1383806445149_0003 to ResourceManager at /0.0.0.0:8032

13/11/07 00:35:08 INFO mapreduce.Job: The urlto track the job: http://Master:8088/proxy/application_1383806445149_0003/

13/11/07 00:35:08 INFO mapreduce.Job: Runningjob: job_1383806445149_0003

13/11/07 00:35:25 INFO mapreduce.Job: Jobjob_1383806445149_0003 running in uber mode : false

13/11/07 00:35:25 INFO mapreduce.Job:  map 0% reduce 0%

13/11/07 00:37:50 INFO mapreduce.Job:  map 33% reduce 0%

13/11/07 00:37:54 INFO mapreduce.Job:  map 67% reduce 0%

13/11/07 00:37:55 INFO mapreduce.Job:  map 83% reduce 0%

13/11/07 00:37:58 INFO mapreduce.Job:  map 100% reduce 0%

13/11/07 00:38:51 INFO mapreduce.Job:  map 100% reduce 100%

13/11/07 00:38:54 INFO mapreduce.Job: Jobjob_1383806445149_0003 completed successfully

13/11/07 00:38:56 INFO mapreduce.Job:Counters: 43

 

 

说明:查看word count的计算结果:

%$YARN_HOME/bin/hadoop fs -cat /user/grid/test//test_wordcount/out/*

hadoop     1

hello  1

ruby   1

 

补充:因为新的YARN为了保持与MRv1框架的旧版本兼容性,很多老的API还是可以用,但是会有INFO。此处通过修改$YARN_HOME/etc/hadoop/log4j.properties可以turn offconfiguration deprecation warnings.

建议去掉第138行的注释(可选),确保错误级别为WARN(默认为INFO级别,详见第20行:hadoop.root.logger=INFO,console):

138 log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN






  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值