1.要想在eclipse上编写MapReduce,那么就需要在eclipse上安装hadoop插件,具体操作是将hadoop安装目录下的contrib/eclipse-plugin/hadoop-0.20.2-eclipse-plugin.jar复制到eclipse安装目录里的plugins目录中。
2.插件安装完成之后,就可以新建Map/Reduce Project了,在java文件中需要导入hadoop提供的一些包,具体有:
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
java文件中,需要继承两大类Mapper和Reducer,然后重新实现map函数和reduce函数。在MapReduce主类中还要实现run方法和main方法,run方法中设置作业名以及对参数的处理。
3.运行MapReduce程序
在Hadoop上运行MapReduce程序有2中方法,第一种是直接在eclipse上输入所需参数运行,结果在控制台上打印出来,这种方便调试;设置参数如下图所示:
中间可能会出现“org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot create directory。。。Name node is in safe mode”错误,这时就需要强制集群离开安全模式
bin/hadoop dfsadmin -safemode leave
再次以参数的方式运行,得结果
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -ls
Found 4 items
-rw-r--r-- 1 ml\root supergroup 33 2013-12-28 12:10 /user/ml/root/input
drwxr-xr-x - ml\root supergroup 0 2013-12-28 12:13 /user/ml/root/output
drwxr-xr-x - ml\administrator supergroup 0 2013-12-28 17:08 /user/ml/root/output_arg
drwxr-xr-x - ml\root supergroup 0 2013-12-28 15:45 /user/ml/root/output_ecl
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -cat output_arg/part*
I 1
Oh 1
am 1
father 1
hello 1
shit 1
your 1
结果与预期一致。
另外一种是将程序先生成jar包,顺便将生成的包发送至Hadoop安装目录下,
然后以命令行的方式执行该jar包,
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop jar myWordCount.jar MRTest input output_ecl
13/12/28 15:44:58 INFO input.FileInputFormat: Total input paths to process : 1
13/12/28 15:45:01 INFO mapred.JobClient: Running job: job_201312281151_0008
13/12/28 15:45:02 INFO mapred.JobClient: map 0% reduce 0%
13/12/28 15:45:16 INFO mapred.JobClient: map 100% reduce 0%
13/12/28 15:45:28 INFO mapred.JobClient: map 100% reduce 100%
13/12/28 15:45:30 INFO mapred.JobClient: Job complete: job_201312281151_0008
13/12/28 15:45:30 INFO mapred.JobClient: Counters: 17
13/12/28 15:45:30 INFO mapred.JobClient: Job Counters
13/12/28 15:45:30 INFO mapred.JobClient: Launched reduce tasks=1
13/12/28 15:45:30 INFO mapred.JobClient: Launched map tasks=1
13/12/28 15:45:30 INFO mapred.JobClient: Data-local map tasks=1
13/12/28 15:45:30 INFO mapred.JobClient: FileSystemCounters
13/12/28 15:45:30 INFO mapred.JobClient: FILE_BYTES_READ=160
13/12/28 15:45:30 INFO mapred.JobClient: HDFS_BYTES_READ=33
13/12/28 15:45:30 INFO mapred.JobClient: FILE_BYTES_WRITTEN=271
13/12/28 15:45:30 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=45
13/12/28 15:45:30 INFO mapred.JobClient: Map-Reduce Framework
13/12/28 15:45:30 INFO mapred.JobClient: Reduce input groups=7
13/12/28 15:45:30 INFO mapred.JobClient: Combine output records=7
13/12/28 15:45:30 INFO mapred.JobClient: Map input records=2
13/12/28 15:45:30 INFO mapred.JobClient: Reduce shuffle bytes=0
13/12/28 15:45:30 INFO mapred.JobClient: Reduce output records=7
13/12/28 15:45:30 INFO mapred.JobClient: Spilled Records=14
13/12/28 15:45:30 INFO mapred.JobClient: Map output bytes=59
13/12/28 15:45:30 INFO mapred.JobClient: Combine input records=7
13/12/28 15:45:30 INFO mapred.JobClient: Map output records=7
13/12/28 15:45:30 INFO mapred.JobClient: Reduce input records=7
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -ls output_ecl/*
drwxr-xr-x - ml\root supergroup 0 2013-12-28 15:45 /user/ml/root/output_ecl/_logs/history
-rw-r--r-- 1 ml\root supergroup 45 2013-12-28 15:45 /user/ml/root/output_ecl/part-r-00000
Administrator@ML ~/hadoop-0.20.2
$ bin/hadoop dfs -cat output_ecl/part*
I 1
Oh 1
am 1
father 1
hello 1
shit 1
your 1
程序完美执行!