[root@hadoop-m test]# hadoop jar /hadoop/app/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -D mapred.map.tasks=1 -D mapred.reduce.tasks=1 -D mapred.job.name="cucrz-test.k" -mapper mapper.sh -file mapper.sh -reducer reducer.sh -file reducer.sh -input /user/root/look10801C2013-11-01.txt -output /user/root/cucrz-test.k
packageJobJar: [mapper.sh, reducer.sh, /hadoop/data/hadoop/hadooptmp/hadoop-unjar2802401891342583612/] [] /tmp/streamjob2322702583865412109.jar tmpDir=null
13/11/18 15:07:49 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
13/11/18 15:07:49 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 6bb1b7f8b9044d8df9b4d2b6641db7658aab3cf8]
13/11/18 15:07:49 INFO mapred.FileInputFormat: Total input paths to process : 1
13/11/18 15:07:50 INFO streaming.StreamJob: getLocalDirs(): [/hadoop/data/hadoop/mapred/local]
13/11/18 15:07:50 INFO streaming.StreamJob: Running job: job_201311070951_0059
13/11/18 15:07:50 INFO streaming.StreamJob: To kill this job, run:
13/11/18 15:07:50 INFO streaming.StreamJob: /hadoop/app/hadoop-1.2.1/libexec/../bin/hadoop job -Dmapred.job.tracker=192.168.10.33:9010 -kill job_201311070951_0059
13/11/18 15:07:50 INFO streaming.StreamJob: Tracking URL: http://hadoop-m:50030/jobdetails.jsp?jobid=job_201311070951_0059
13/11/18 15:07:51 INFO streaming.StreamJob: map 0% reduce 0%
13/11/18 15:07:58 INFO streaming.StreamJob: map 100% reduce 0%
13/11/18 15:08:05 INFO streaming.StreamJob: map 100% reduce 33%
13/11/18 15:08:08 INFO streaming.StreamJob: map 100% reduce 94%
13/11/18 15:08:10 INFO streaming.StreamJob: map 100% reduce 100%
13/11/18 15:08:12 INFO streaming.StreamJob: Job complete: job_201311070951_0059
13/11/18 15:08:12 INFO streaming.StreamJob: Output: /user/root/cucrz-test.k
注:
(1)-input:输入文件路径 (2)-output:输出文件路径 (3)-mapper:用户自己写的mapper程序,可以是可执行文件或者脚本 (4)-reducer:用户自己写的reducer程序,可以是可执行文件或者脚本 (5)-file:打包文件到提交的作业中,可以是mapper或者reducer要用的输入文件,如配置文件,字典等。 (6)-partitioner:用户自定义的partitioner程序
(7)-combiner:用户自定义的combiner程序(必须用java实现) (8)-D:作业的一些属性(以前用的是-jonconf),具体有:
1)mapred.map.tasks:map task数目 2)mapred.reduce.tasks:reduce task数目 3)stream.map.input.field.separator/stream.map.output.field.separator: map task输入/输出数据的分隔符,默认均为\t 4)stream.num.map.output.key.fields:指定map task输出记录中key所占的域数目 5)stream.reduce.input.field.separator/stream.reduce.output.field.separator:reduce task输入/输出数据的分隔符,默认均为 6)stream.num.reduce.output.key.fields:指定reduce task输出记录中key所占的域数目
5、常见问题及解决方案 (1)作业总是运行失败,提示找不多执行程序,比如“Caused by: java.io.IOException: Cannot run program "/user/hadoop/Mapper": error=2, No such file or directory”: 可在提交作业时,采用-file选项指定这些文件,比如上面例子中,可以使用“-file Mapper -file Reducer” 或者 “-file Mapper.py -file Reducer.py”,这样,Hadoop会将这两个文件自动分发到各个节点上,比如: -input myInputDirs -output myOutputDir -mapper Mapper.py -reducer Reducerr.py -file Mapper.py -file Reducer.py