hadoop集群测试(单词计数)

Hadoop集群安装好后,可以测试hadoop的基本功能。hadoop自带了一个jar包(hadoop-examples-0.20.205.0.jar,不同版本最后不同)中wordcount程序可以测试统计单词的个数,先来体验一下再说。

[hadoop@master ~]$ mkdir input	#先创建一个输入目录
[hadoop@master ~]$ cd input/
[hadoop@master input]$ echo "hello world">text1.txt	#将要输入的文件放到该目录
[hadoop@master input]$ echo "hello hadoop">text2.txt
[hadoop@master input]$ ls
text1.txt  text2.txt
[hadoop@master input]$ cat text1.txt 
hello world
[hadoop@master input]$ cat text2.txt 
hello hadoop
[hadoop@master input]$ cd ..
[hadoop@master ~]$ ls
input  log  公共的  模板  视频  图片  文档  下载  新文件~  音乐  桌面
[hadoop@master ~]$ /usr/bin/hadoop dfs -put ./input in	#将input目录中的两个文件放到hdfs中
[hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./in/*	#查看hdfs中的两个文件
-rw-r--r--   2 hadoop supergroup         12 2012-09-13 16:16 /user/hadoop/in/text1.txt
-rw-r--r--   2 hadoop supergroup         13 2012-09-13 16:16 /user/hadoop/in/text2.txt
#运行hadoop自带的一个jar包中的wordcount程序,这个程序统计单词的出现次数
#程序的输入是in这个目录中的两个文件,结果输出到out目录
[hadoop@master ~]$ /usr/bin/hadoop jar /usr/hadoop-examples-0.20.205.0.jar wordcount in out
12/09/13 16:20:32 INFO input.FileInputFormat: Total input paths to process : 2
12/09/13 16:20:36 INFO mapred.JobClient: Running job: job_201209131425_0001
12/09/13 16:20:37 INFO mapred.JobClient:  map 0% reduce 0%
12/09/13 16:23:38 INFO mapred.JobClient:  map 50% reduce 0%
12/09/13 16:24:31 INFO mapred.JobClient:  map 100% reduce 16%
12/09/13 16:24:40 INFO mapred.JobClient:  map 100% reduce 100%
12/09/13 16:24:45 INFO mapred.JobClient: Job complete: job_201209131425_0001
12/09/13 16:24:45 INFO mapred.JobClient: Counters: 29
12/09/13 16:24:45 INFO mapred.JobClient:   Job Counters 
12/09/13 16:24:45 INFO mapred.JobClient:     Launched reduce tasks=1
12/09/13 16:24:45 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=230205
12/09/13 16:24:45 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/09/13 16:24:45 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/09/13 16:24:45 INFO mapred.JobClient:     Launched map tasks=3
12/09/13 16:24:45 INFO mapred.JobClient:     Data-local map tasks=3
12/09/13 16:24:45 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=58667
12/09/13 16:24:45 INFO mapred.JobClient:   File Output Format Counters 
12/09/13 16:24:45 INFO mapred.JobClient:     Bytes Written=25
12/09/13 16:24:45 INFO mapred.JobClient:   FileSystemCounters
12/09/13 16:24:45 INFO mapred.JobClient:     FILE_BYTES_READ=55
12/09/13 16:24:45 INFO mapred.JobClient:     HDFS_BYTES_READ=241
12/09/13 16:24:45 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=64354
12/09/13 16:24:45 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25
12/09/13 16:24:45 INFO mapred.JobClient:   File Input Format Counters 
12/09/13 16:24:45 INFO mapred.JobClient:     Bytes Read=25
12/09/13 16:24:45 INFO mapred.JobClient:   Map-Reduce Framework
12/09/13 16:24:45 INFO mapred.JobClient:     Map output materialized bytes=61
12/09/13 16:24:45 INFO mapred.JobClient:     Map input records=2
12/09/13 16:24:45 INFO mapred.JobClient:     Reduce shuffle bytes=61
12/09/13 16:24:45 INFO mapred.JobClient:     Spilled Records=8
12/09/13 16:24:45 INFO mapred.JobClient:     Map output bytes=41
12/09/13 16:24:45 INFO mapred.JobClient:     CPU time spent (ms)=13840
12/09/13 16:24:45 INFO mapred.JobClient:     Total committed heap usage (bytes)=319361024
12/09/13 16:24:45 INFO mapred.JobClient:     Combine input records=4
12/09/13 16:24:45 INFO mapred.JobClient:     SPLIT_RAW_BYTES=216
12/09/13 16:24:45 INFO mapred.JobClient:     Reduce input records=4
12/09/13 16:24:45 INFO mapred.JobClient:     Reduce input groups=3
12/09/13 16:24:45 INFO mapred.JobClient:     Combine output records=4
12/09/13 16:24:45 INFO mapred.JobClient:     Physical memory (bytes) snapshot=329932800
12/09/13 16:24:45 INFO mapred.JobClient:     Reduce output records=3
12/09/13 16:24:45 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1133260800
12/09/13 16:24:45 INFO mapred.JobClient:     Map output records=4
#运行完成后,可以看到多了一个out目录,注意hdfs中没有当前目录的概念,也不能使用cd命令
[hadoop@master ~]$ /usr/bin/hadoop dfs -ls
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2012-09-13 16:16 /user/hadoop/in
drwxr-xr-x   - hadoop supergroup          0 2012-09-13 16:24 /user/hadoop/out
[hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./out	#进入到out目录
Found 3 items
-rw-r--r--   2 hadoop supergroup          0 2012-09-13 16:24 /user/hadoop/out/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2012-09-13 16:20 /user/hadoop/out/_logs
-rw-r--r--   2 hadoop supergroup         25 2012-09-13 16:24 /user/hadoop/out/part-r-00000
[hadoop@master ~]$ /usr/bin/hadoop dfs -cat ./out/part-r-00000	#查看结果
hadoop	1
hello	2
world	1
[hadoop@master ~]$ 

对于一个需要时间很长的作业,我们可以通过浏览器查看作业的运行状态,通过访问master节点的50030端口(http://masterip:50030)可以查看master节点jobTracker的运行状态,访问master节点的50070端口可以查看集群dfs的信息。

截图如下:

JobTracker运行截图


dfs使用情况截图


  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值