Hadoop cluster测试过程

接上篇http://blog.csdn.net/caiwenguang1992/article/details/9289401


启动Hadoop cluster
在Hadoop1上
[root@Hadoop1 root]# start-dfs.sh
[root@Hadoop1 root]# start-mapred.sh

查看HDFS
http://188.188.3.241:50070
查看JOB

http://188.188.3.241:50030


下边进入测试阶段
简单的HDFS效率测试
[root@Hadoop1 tmp]# dd if=/dev/zero of=1.img bs=1M count=2048
记录了2048+0 的读入
记录了2048+0 的写出
2147483648字节(2.1 GB)已复制,14.0923 秒,152 MB/秒
[root@Hadoop1 tmp]# hadoop dfs -ls /
Found 2 items
drwxr-xr-x   - root supergroup          0 2013-07-10 03:28 /mapred
drwxr-xr-x   - root supergroup          0 2013-07-10 05:09 /user
[root@Hadoop1 tmp]# hadoop dfs -mkdir /test
[root@Hadoop1 tmp]# hadoop dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2013-07-10 03:28 /mapred
drwx------   - root supergroup          0 2013-07-10 05:20 /test
drwxr-xr-x   - root supergroup          0 2013-07-10 05:09 /user
[root@Hadoop1 tmp]# time hadoop dfs -put /tmp/1.img /test/    #约为21MB/s,其实在4个台式机为节点的情况下速度是42MB/s

real    1m37.163s
user    0m23.178s
sys    0m11.598s
[root@Hadoop1 tmp]# hadoop dfs -lsr /test
-rw-------   3 root supergroup 2147483648 2013-07-10 05:28 /test/1.img
[root@Hadoop1 aaa]# du -sh #零散小文件测试
674M    .
[root@Hadoop1 aaa]# time hadoop dfs -put *.img /test/aaa/    #这个速度确实很但疼啊!

real    46m20.630s
user    12m29.656s
sys    4m37.152s
[root@Hadoop1 aaa]# ll 1.img
-rw-r--r-- 1 root root 1024 7月  10 05:40 1.img
[root@Hadoop1 aaa]# hadoop dfs -dus /test/aaa
hdfs://Hadoop1:8020/test/aaa    175298560

简单的JOB效率测试
[root@Hadoop1 tmp]# du -sh *.txt #很大的文件
564M    file1.txt
1.1G    file2.txt
[root@Hadoop1 tmp]# file file1.txt file2.txt #都是文本文件
file1.txt: UTF-8 Unicode English text, with very long lines, with CRLF, CR, LF line terminators, with escape sequences, with overstriking
file2.txt: UTF-8 Unicode English text, with very long lines, with CRLF, CR, LF line terminators, with escape sequences, with overstriking
[root@Hadoop1 tmp]# time hadoop dfs -put ./*.txt /test/
real    1m24.618s
user    0m21.289s
sys    0m8.805s
[root@Hadoop1 tmp]# hadoop dfs -dus /test/
hdfs://Hadoop1:8020/test    1754865736
[root@Hadoop1 tmp]# hadoop dfs -lsr /test/
-rw-------   3 root supergroup  590579815 2013-07-12 01:43 /test/file1.txt
-rw-------   3 root supergroup 1164285921 2013-07-12 01:43 /test/file2.txt
[root@Hadoop1 tmp]# time hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.0.jar -output hdfs://Hadoop1:8020/test/sum -input hdfs://Hadoop1:8020/test/*.txt -mapper /bin/cat -reducer /usr/bin/wc
packageJobJar: [/tmp/hadoop-root/hadoop-unjar5408019495846038309/] [] /tmp/streamjob6237398413461299607.jar tmpDir=null
13/07/12 01:47:17 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/12 01:47:17 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/12 01:47:17 INFO mapred.FileInputFormat: Total input paths to process : 2
13/07/12 01:47:18 INFO streaming.StreamJob: getLocalDirs(): [/opt/hadoop/mapred]
13/07/12 01:47:18 INFO streaming.StreamJob: Running job: job_201307100843_0023
13/07/12 01:47:18 INFO streaming.StreamJob: To kill this job, run:
13/07/12 01:47:18 INFO streaming.StreamJob: /usr/local/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=Hadoop1:9000 -kill job_201307100843_0023
13/07/12 01:47:18 INFO streaming.StreamJob: Tracking URL: http://Hadoop1:50030/jobdetails.jsp?jobid=job_201307100843_0023
13/07/12 01:47:19 INFO streaming.StreamJob:  map 0%  reduce 0%
13/07/12 01:47:34 INFO streaming.StreamJob:  map 3%  reduce 0%
13/07/12 01:47:35 INFO streaming.StreamJob:  map 10%  reduce 0%
13/07/12 01:47:36 INFO streaming.StreamJob:  map 15%  reduce 0%
13/07/12 01:47:37 INFO streaming.StreamJob:  map 18%  reduce 0%
13/07/12 01:47:38 INFO streaming.StreamJob:  map 22%  reduce 0%
13/07/12 01:47:39 INFO streaming.StreamJob:  map 28%  reduce 0%
13/07/12 01:47:41 INFO streaming.StreamJob:  map 29%  reduce 0%
13/07/12 01:47:42 INFO streaming.StreamJob:  map 32%  reduce 0%
13/07/12 01:47:43 INFO streaming.StreamJob:  map 33%  reduce 0%
13/07/12 01:47:46 INFO streaming.StreamJob:  map 38%  reduce 0%
13/07/12 01:47:50 INFO streaming.StreamJob:  map 40%  reduce 0%
13/07/12 01:47:51 INFO streaming.StreamJob:  map 41%  reduce 0%
13/07/12 01:47:54 INFO streaming.StreamJob:  map 42%  reduce 0%
13/07/12 01:47:55 INFO streaming.StreamJob:  map 43%  reduce 0%
13/07/12 01:47:57 INFO streaming.StreamJob:  map 44%  reduce 0%
13/07/12 01:47:58 INFO streaming.StreamJob:  map 45%  reduce 0%
13/07/12 01:47:59 INFO streaming.StreamJob:  map 46%  reduce 0%
13/07/12 01:48:01 INFO streaming.StreamJob:  map 50%  reduce 0%
13/07/12 01:48:03 INFO streaming.StreamJob:  map 51%  reduce 0%
13/07/12 01:48:04 INFO streaming.StreamJob:  map 55%  reduce 0%
13/07/12 01:48:05 INFO streaming.StreamJob:  map 56%  reduce 0%
13/07/12 01:48:07 INFO streaming.StreamJob:  map 61%  reduce 0%
13/07/12 01:48:09 INFO streaming.StreamJob:  map 62%  reduce 0%
13/07/12 01:48:10 INFO streaming.StreamJob:  map 63%  reduce 0%
13/07/12 01:48:11 INFO streaming.StreamJob:  map 68%  reduce 0%
13/07/12 01:48:12 INFO streaming.StreamJob:  map 72%  reduce 0%
13/07/12 01:48:13 INFO streaming.StreamJob:  map 73%  reduce 5%
13/07/12 01:48:14 INFO streaming.StreamJob:  map 76%  reduce 5%
13/07/12 01:48:15 INFO streaming.StreamJob:  map 78%  reduce 5%
13/07/12 01:48:16 INFO streaming.StreamJob:  map 81%  reduce 9%
13/07/12 01:48:17 INFO streaming.StreamJob:  map 87%  reduce 9%
13/07/12 01:48:19 INFO streaming.StreamJob:  map 87%  reduce 11%
13/07/12 01:48:21 INFO streaming.StreamJob:  map 88%  reduce 11%
13/07/12 01:48:22 INFO streaming.StreamJob:  map 91%  reduce 11%
13/07/12 01:48:25 INFO streaming.StreamJob:  map 96%  reduce 11%
13/07/12 01:48:28 INFO streaming.StreamJob:  map 97%  reduce 11%
13/07/12 01:48:34 INFO streaming.StreamJob:  map 98%  reduce 11%
13/07/12 01:48:37 INFO streaming.StreamJob:  map 100%  reduce 11%
13/07/12 01:48:40 INFO streaming.StreamJob:  map 100%  reduce 16%
13/07/12 01:48:44 INFO streaming.StreamJob:  map 100%  reduce 19%
13/07/12 01:48:53 INFO streaming.StreamJob:  map 100%  reduce 26%
13/07/12 01:48:56 INFO streaming.StreamJob:  map 100%  reduce 28%
13/07/12 01:49:02 INFO streaming.StreamJob:  map 100%  reduce 31%
13/07/12 01:49:05 INFO streaming.StreamJob:  map 100%  reduce 33%
13/07/12 01:49:14 INFO streaming.StreamJob:  map 100%  reduce 67%
13/07/12 01:49:53 INFO streaming.StreamJob:  map 100%  reduce 73%
13/07/12 01:49:56 INFO streaming.StreamJob:  map 100%  reduce 75%
13/07/12 01:49:59 INFO streaming.StreamJob:  map 100%  reduce 76%
13/07/12 01:50:02 INFO streaming.StreamJob:  map 100%  reduce 77%
13/07/12 01:50:05 INFO streaming.StreamJob:  map 100%  reduce 78%
13/07/12 01:50:08 INFO streaming.StreamJob:  map 100%  reduce 80%
13/07/12 01:50:11 INFO streaming.StreamJob:  map 100%  reduce 81%
13/07/12 01:50:14 INFO streaming.StreamJob:  map 100%  reduce 82%
13/07/12 01:50:17 INFO streaming.StreamJob:  map 100%  reduce 83%
13/07/12 01:50:20 INFO streaming.StreamJob:  map 100%  reduce 85%
13/07/12 01:50:23 INFO streaming.StreamJob:  map 100%  reduce 86%
13/07/12 01:50:26 INFO streaming.StreamJob:  map 100%  reduce 87%
13/07/12 01:50:29 INFO streaming.StreamJob:  map 100%  reduce 88%
13/07/12 01:50:32 INFO streaming.StreamJob:  map 100%  reduce 89%
13/07/12 01:50:35 INFO streaming.StreamJob:  map 100%  reduce 91%
13/07/12 01:50:38 INFO streaming.StreamJob:  map 100%  reduce 92%
13/07/12 01:50:41 INFO streaming.StreamJob:  map 100%  reduce 93%
13/07/12 01:50:45 INFO streaming.StreamJob:  map 100%  reduce 94%
13/07/12 01:50:48 INFO streaming.StreamJob:  map 100%  reduce 95%
13/07/12 01:50:51 INFO streaming.StreamJob:  map 100%  reduce 97%
13/07/12 01:50:54 INFO streaming.StreamJob:  map 100%  reduce 98%
13/07/12 01:50:57 INFO streaming.StreamJob:  map 100%  reduce 99%
13/07/12 01:51:01 INFO streaming.StreamJob:  map 100%  reduce 100%
13/07/12 01:51:05 INFO streaming.StreamJob: Job complete: job_201307100843_0023
13/07/12 01:51:05 INFO streaming.StreamJob: Output: hdfs://Hadoop1:8020/test/sum

real    3m49.814s
user    0m4.454s
sys    0m0.349s
[root@Hadoop1 tmp]# hadoop dfs -cat /test/sum/*    #Hadoop计算结果
15613728 140736648 1770033512    
[root@Hadoop1 tmp]# time cat ./*.txt |wc #本地计算结果
15531360 140736648 1754865736    #好像有差距,还待化验

real    1m1.869s    #时间上差2s多种,不过此处为虚拟机这个不能当标准来衡量
user    0m57.948s
sys    0m2.914s

**********************************************************
    我所遇到的错误和解决办法已将错误整合到以上操作步骤中了
**********************************************************
错误
2013-07-09 08:46:59,301 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: Hadoop1/188.188.3.241:8020. Already tr
ied 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
日志
2013-07-09 08:43:45,812 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: NameNode is not formatted.
解决
[root@Hadoop1 root]# hadoop namenode -format

错误
2013-07-09 08:38:26,370 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /opt/hadoop/hdfs/datanode, expected: rwx------, while actual: rwxr-xr-x
2013-07-09 08:38:26,370 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
解决
chmod -R 700 /opt/hadoop/hdfs

错误Warning: $HADOOP_HOME is deprecated.
解决
echo "export HADOOP_HOME_WARN_SUPPRESS=TRUE" >>/usr/local/hadoop/conf/hadoop-env.sh && export HADOOP_HOME_WARN_SUPPRESS=TRUE

错误
[root@Hadoop1 tmp]# time hadoop -ls /
Unrecognized option: -ls
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
解决
stop-all.sh
start-all.sh

错误
 Cannot create directory /user/root/.Trash/Current/test. Name node is in safe mode.
解决
hadoop dfsadmin -safemode leave
enter - 进入安全模式
leave - 强制NameNode离开安全模式
get -   返回安全模式是否开启的信息
wait - 等待,一直到安全模式结束。

错误
........(此处略去几头字)
13/07/10 07:45:40 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201307100644_0007_m_000000
........(此处略去几头字)
这个让人很纠结不过也好解决,这个错在job中是一个比较通用的报错信息,就像win安装一个程序的进度条在欢快的匀速平移,一切都很美好的时候,突然停止然后弹出一个对话框“发生错误”,然后残酷而又多余的让你点个确定之后在悲惨中结束,但是在这儿没这么纠结,因为每个job在每个节点上运算都有他的详细log,在log中,记录了他生前的一举一动,调查过程如下:
1.http://188.188.3.241:50030/jobtracker.jsp
2.然后在Retired Jobs中找到相应失败job的ID,然后用鼠标猛戳之
3.然后在Failed tasks attempts by nodes 中猛戳你最讨厌的那个node
4.然后在ERROR列,所有的事实都赤裸裸的摆在了你的面前,没错这就是实时。。。
我看到的是java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable这样的噩耗,迷途小运维看不懂,就求google点拨一下,未果找到某神明的旨意“mapred.output.key.class指定的类类型为keyClass,如果仍然没有指定,就最终使用默认的LongWritable类型,这就是上述程序出错的所在了。因为所有的判断都没有得到,所以最终使用了LongWritable类型作为keyClass,所以就导致了 MapOutputBuffer在读取InputFormat中过来的数据,并准备向Reducer发送时,由于Reducer使用的是/usr/bin /wc程序,所以它期待的是文本的读入和输出,所以此时发生了类型不匹配的错误”

解决
1.命令后加上-jobconf 设置,并指定map的输出时准备的key类型是LongWritable类型的。-jobconf mapred.mapoutput.key.class=org.apache.hadoop.io.LongWritable。
2.命令后加上-D  mapred.mapoutput.key.class=org.apache.hadoop.io.LongWritable.
于是笔者屡最近命不好都用不了于是我的解决办法是修改运算命令
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.2.0.jar -output hdfs://Hadoop1:8020/test/sum -input hdfs://Hadoop1:8020/test/*.txt -mapper /bin/cat -reducer /usr/bin/wc
官方给出的hadoop-streaming运算例子如下(笔者采用了1例)
1.$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper /bin/cat \
    -reducer /bin/wc
2.$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \  #这个mapper读取用默认或者此列中的会报上述错误
    -input myInputDirs \
    -output myOutputDir \
    -mapper org.apache.hadoop.mapred.lib.IdentityMapper \
    -reducer /bin/wc
**********************************************************


本文属笔者原创

作者:john

转载请注明出处

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值