hadoop wordcount运行实例


root@hadoop1:/opt/hadoop# echo "hello hadoop world" > /tmp/test_file1.txt
root@hadoop1:/opt/hadoop# cat /tmp/test_file1.txt
hello hadoop world
root@hadoop1:/opt/hadoop# echo "hello hadoop world,I'm lpxuan" > /tmp/test_file2.txt

root@hadoop1:/opt/hadoop# cat /tmp/test_file2.txt
hello hadoop world,I'm lpxuan

root@hadoop1:/opt/hadoop# bin/hadoop dfs -copyFromLocal /tmp/test*.txt test-in

adoop# bin/hadoop dfs -ls test-in
Found 2 items
-rw-r--r--   1 root supergroup          0 2011-07-07 14:03 /user/root/test-in/test_file1.txt
-rw-r--r--   1 root supergroup          0 2011-07-07 14:03 /user/root/test-in/test_file2.txt

root@hadoop1:/opt/hadoop# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount test-in test-out
11/07/07 14:27:40 INFO mapred.FileInputFormat: Total input paths to process : 2
11/07/07 14:27:40 INFO mapred.JobClient: Running job: job_201107071424_0002
11/07/07 14:27:41 INFO mapred.JobClient:  map 0% reduce 0%
11/07/07 14:28:07 INFO mapred.JobClient:  map 33% reduce 0%
11/07/07 14:28:54 INFO mapred.JobClient:  map 66% reduce 0%
11/07/07 14:28:56 INFO mapred.JobClient:  map 100% reduce 0%
11/07/07 14:29:06 INFO mapred.JobClient:  map 100% reduce 11%
11/07/07 14:29:09 INFO mapred.JobClient:  map 100% reduce 100%
11/07/07 14:29:13 INFO mapred.JobClient: Job complete: job_201107071424_0002
11/07/07 14:29:14 INFO mapred.JobClient: Counters: 16
11/07/07 14:29:14 INFO mapred.JobClient:   File Systems
11/07/07 14:29:14 INFO mapred.JobClient:     HDFS bytes read=56
11/07/07 14:29:14 INFO mapred.JobClient:     HDFS bytes written=46
11/07/07 14:29:14 INFO mapred.JobClient:     Local bytes read=97
11/07/07 14:29:14 INFO mapred.JobClient:     Local bytes written=290
11/07/07 14:29:14 INFO mapred.JobClient:   Job Counters
11/07/07 14:29:14 INFO mapred.JobClient:     Launched reduce tasks=1
11/07/07 14:29:14 INFO mapred.JobClient:     Launched map tasks=3
11/07/07 14:29:14 INFO mapred.JobClient:     Data-local map tasks=3
11/07/07 14:29:14 INFO mapred.JobClient:   Map-Reduce Framework
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce input groups=5
11/07/07 14:29:14 INFO mapred.JobClient:     Combine output records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Map input records=2
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce output records=5
11/07/07 14:29:14 INFO mapred.JobClient:     Map output bytes=77
11/07/07 14:29:14 INFO mapred.JobClient:     Map input bytes=49
11/07/07 14:29:14 INFO mapred.JobClient:     Combine input records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Map output records=7
11/07/07 14:29:14 INFO mapred.JobClient:     Reduce input records=7

root@hadoop1:/opt/hadoop# bin/hadoop dfs -ls test-out
Found 2 items
drwxr-xr-x   - root supergroup          0 2011-07-07 14:27 /user/root/test-out/_logs
-rw-r--r--   1 root supergroup         46 2011-07-07 14:29 /user/root/test-out/part-00000


root@hadoop1:/opt/hadoop# bin/hadoop dfs -cat /user/root/test-out/part-00000
hadoop    2
hello    2
lpxuan    1
world    1
world,I'm    1





--FAQ:
root@hadoop1:/opt/hadoop# bin/hadoop jar hadoop-0.19.0-examples.jar wordcount test-in test-out
11/07/07 14:18:12 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

11/07/07 14:18:12 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 4
11/07/07 14:18:12 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

11/07/07 14:18:12 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 3
11/07/07 14:18:13 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

11/07/07 14:18:13 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 2
11/07/07 14:18:15 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

11/07/07 14:18:15 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar retries left 1
11/07/07 14:18:18 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

11/07/07 14:18:18 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
11/07/07 14:18:18 WARN hdfs.DFSClient: Could not get block locations. Aborting...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

    at org.apache.hadoop.ipc.Client.call(Client.java:696)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
Exception closing file /home/hadoop/hadoop-root/mapred/system/job_201107071354_0003/job.jar
java.io.IOException: Filesystem closed
    at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
    at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
    at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
    at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)
    at org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:243)
    at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1413)
    at org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:236)
    at org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:221)



Everybody as a beginner to hadoop must have got this. There are a number of reasons I know of. The most common is that you have reformatted the namenode leaving it in an inconsistent state. The most common solution is to stop dfs, remove the contents of the dfs directories on all the machines, run “hadoop namenode -format” on the controller, then restart dfs. That consistently fixes the problem for me. This may be serious overkill but it works.

NOTE: You will lose all of the contents of your HDFS file system.
However, this did not solve my problem on this occasion!

Another reason that may be the cause this problem is that there may not be much space on the namenode for its operation which was precisely the problem which I faced. Clear some space for hadoop to launch its operations and you are done.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值