研究Hadoop也有2个星期了,由于之前有MooseFS的基础,所以学习Hadoop的HDFS部分还算比较顺利。我的环境是非常典型的Hadoop环境,一台NameNode+Hmaster,4台DataNode+RegionServer。如下图所示

    p_w_picpath

   说明: DataNode1和DataNode2是VM,配置4VCPU和4G内存,虚拟化平台是ESXI4.1。DataNode3和DataNode4是物理机,OS为Ubuntu11.04,2cpu和2G内存

 

   实验:利用hadoop自身提供的程序,利用MapReduce计算“TestTable”这个表的“行数”
[root@master bin]# ./hadoop jar /root/hbase/hbase-0.90.4.jar rowcounter TestTable

 

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:host.name=master

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_26

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.

。。。。。。。。。。。。。。。。。。。。。。

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/root/hadoop/bin/../lib/native/Linux-i386-32

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.arch=i386

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-164.el5

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.name=root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root/hadoop-0.20.2/bin

11/10/25 10:34:36 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=datanode2:2181,datanode1:2181,master:2181 sessionTimeout=180000 watcher=hconnection

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:34:36 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:34:37 INFO mapred.JobClient: Running job: job_201110241323_0001

11/10/25 10:34:38 INFO mapred.JobClient: map 0% reduce 0%

11/10/25 10:35:34 INFO mapred.JobClient: map 2% reduce 0%

11/10/25 10:39:30 INFO zookeeper.ClientCnxn: Socket connection established to datanode1/172.20.10.201:2181, initiating session

11/10/25 10:40:05 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x23338df2f890004, likely server has closed socket, closing socket connection and attempting reconnect

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Opening socket connection to server datanode2/172.20.10.193:2181

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Socket connection established to datanode2/172.20.10.193:2181, initiating session

11/10/25 10:40:06 INFO zookeeper.ClientCnxn: Session establishment complete on server datanode2/172.20.10.193:2181, sessionid = 0x23338df2f890004, negotiated timeout = 180000

11/10/25 10:40:13 INFO mapred.JobClient: map 7% reduce 0%

11/10/25 10:42:31 INFO mapred.JobClient: map 10% reduce 0%

11/10/25 10:43:04 INFO mapred.JobClient: map 13% reduce 0%

11/10/25 10:53:08 INFO mapred.JobClient: map 15% reduce 0%

。。。。。。。。。。。。。。。

11/10/25 11:11:28 INFO mapred.JobClient: map 36% reduce 0%

11/10/25 11:18:00 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000008_0, Status : FAILED

Task attempt_201110241323_0001_m_000008_0 failed to report status for 608 seconds. Killing!

Task attempt_201110241323_0001_m_000008_0 failed to report status for 611 seconds. Killing!

11/10/25 11:18:36 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000009_0, Status : FAILED

Task attempt_201110241323_0001_m_000009_0 failed to report status for 646 seconds. Killing!

Task attempt_201110241323_0001_m_000009_0 failed to report status for 617 seconds. Killing!

11/10/25 11:18:37 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000013_0, Status : FAILED

Task attempt_201110241323_0001_m_000013_0 failed to report status for 624 seconds. Killing!

Task attempt_201110241323_0001_m_000013_0 failed to report status for 604 seconds. Killing!

11/10/25 11:26:17 INFO mapred.JobClient: map 39% reduce 0%

11/10/25 11:26:48 INFO mapred.JobClient: map 42% reduce 0%

11/10/25 11:35:48 INFO mapred.JobClient: Task Id : attempt_201110241323_0001_m_000014_0, Status : FAILED

Task attempt_201110241323_0001_m_000014_0 failed to report status for 600 seconds. Killing!

11/10/25 11:42:07 INFO mapred.JobClient: map 44% reduce 0%

11/10/25 11:57:43 INFO mapred.JobClient: map 47% reduce 0%

11/10/25 12:12:16 INFO mapred.JobClient: map 49% reduce 0%

。。。。。。。。。。。。。。。。。。。。。

11/10/25 15:45:26 INFO mapred.JobClient: map 97% reduce 0%

11/10/25 16:16:57 INFO mapred.JobClient: map 100% reduce 0%

11/10/25 16:16:59 INFO mapred.JobClient: Job complete: job_201110241323_0001

11/10/25 16:16:59 INFO mapred.JobClient: Counters: 7

11/10/25 16:16:59 INFO mapred.JobClient: Job Counters

11/10/25 16:16:59 INFO mapred.JobClient: Rack-local map tasks=21

11/10/25 16:16:59 INFO mapred.JobClient: Launched map tasks=58

11/10/25 16:16:59 INFO mapred.JobClient: Data-local map tasks=37

11/10/25 16:16:59 INFO mapred.JobClient: org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters

 

11/10/25 16:16:59 INFO mapred.JobClient: ROWS=6003531

 

11/10/25 16:16:59 INFO mapred.JobClient: Map-Reduce Framework

11/10/25 16:16:59 INFO mapred.JobClient: Map input records=6003531

11/10/25 16:16:59 INFO mapred.JobClient: Spilled Records=0

11/10/25 16:16:59 INFO mapred.JobClient: Map output records=0

web监控如下显示

p_w_picpath

由于篇幅限制,日志输出中有大量略去,经过5h42min的运算,可以看到TestTable这个表总共有6003531行。日志中出现10个错误,基本就2类,

  第一类是“状态报告超时”

Task attempt_201110251718_0001_m_000014_0 failed to report status for 602 seconds. Killing!

Task attempt_201110251718_0001_m_000014_0 failed to report status for 600 seconds. Killing!

  第二类是“租约到期”

org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '7594235277433071278' does not exist at

org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230) at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1847) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at

 

由于篇幅限制,关于性能的分析见下一篇文章《hadoop初探--性能分析2