Eclipse配置Hadoop开发环境详细步骤+WordCount示例

说明:Hadoop集群已经搭建完毕,集群上使用的Hadoop-2.5.0。

目的:在window10系统上利用Eclipse配置Hadoop开发环境,编写MapReduce关联Hadoop集群。

准备:JDK环境变量配置、Eclipse、hadoop-2.7.5.tar、hadoop-eclipse-plugin-2.7.3.jar、hadoop-common-2.7.3-bin-master.jar(hadoop-2.7.3的Hadoop不好找了,插件使用的2.7.3版本,如要版本统一可自行下载)

Hadoop-2.7.5下载地址:http://mirrors.shu.edu.cn/apache/hadoop/common/

hadoop-eclipse-plugin-2.7.3.jar下载地址:http://download.csdn.net/download/u010185220/10211976

hadoop-common-2.7.3-bin-master.jar下载地址:http://download.csdn.net/download/u010185220/10212069

一、环境搭建

第一步:JDK环境变量配置、Eclipse安装,略;

第二步:Hadoop环境配置

把下载好的Hadoop版本解压的本地一目录,本人使用的是Hadoop-2.7.5。添加系统环境变量:新建变量名HADOOP_HOME,值为Hadoop的解压路径,如D:\hadoop-2.7.5。添加到path中:%HADOOP_HOME%\bin。

第三步:把hadoop-eclipse-plugin-2.7.3.jar包复制到Eclipse目录下的pluguns目录中。重启Eclipse。打开Eclipse->Prefences。可以看到左侧多出了Hadoop Map/Reduce项。

点击多出的Hadoop Map/Reduce项,添加Hadoop解压路径

第四步:解压hadoop-common-2.7.3-bin-master.7z包,把解压得到的bin目录下的hadoop.dll、hadoop.exp、hadoop.lib、winutils.exe等所有文件复制到Hadoop-2.7.5的bin目录下。再把hadoop.dll复制到C:\Windows\System32目录下。

第五步:Eclipse中依次点击:Window->Open Perspective->Map/Reduce,项目结构中出现DFS Locations结构。

第六步:Eclipse中依次点击:Window->Show View ->Other->MapReduce Tools->Map/Reduce Locations。确定(open)

             

下面的控制台多出了Map/Reduce Locations试图。右键Map/Reduce Locations试图的空白处,选择新建,定义Hadoop集群的链接。Location name任起,Host填写Hadoop的mater的IP地址,port是对应的端口号,这个要与集群上core-site.xml文件中的参数一致,确保能连到集群,User name任起。

core-site.xml的位置根据自己的情况确定,我的在/etc/hadoop/2.5.0.0-1245/0/下,查看方式是:cat /etc/hadoop/2.5.0.0-1245/0/core-site.xml。

填写好以上参数后点击Finish。DFS Locations下出现定义的Hadoop连接信息。点开节点会看到集群上的文件信息。看不到这连接失败,检查上步IP地址及端口的配置是否有误。

若有文件的话,点击其中的某节点中的文件确定能查看文件内容。

此时若不能查看文件内容,若提示是editor could not be initialized. org.eclipse.ui.workbench.texteditor类似的问题,则可能是C:\Windows\System32下的hadoop.dll版本和hadoop-2.7.5/bin下的hadoop.dll版本不一致的原因。

至此,window下Eclipse配置Hadoop开发环境搭建完毕。

二、WordCount示例

第一步、新建项目 :File->new->other->Map/Reduce Project

第二步、src下创建Package,Package下创建WordCount.java类

代码如下(可直接复制粘贴到你的WordCount类):

 

 
  1. import java.io.IOException;

  2. import java.util.StringTokenizer;

  3.  
  4. import org.apache.hadoop.conf.Configuration;

  5. import org.apache.hadoop.fs.Path;

  6. import org.apache.hadoop.io.IntWritable;

  7. import org.apache.hadoop.io.Text;

  8. import org.apache.hadoop.mapreduce.Job;

  9. import org.apache.hadoop.mapreduce.Mapper;

  10. import org.apache.hadoop.mapreduce.Reducer;

  11. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

  12. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

  13. import org.apache.hadoop.util.GenericOptionsParser;

  14.  
  15.  
  16. public class WordCount {

  17.  
  18. public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

  19. private final static IntWritable one = new IntWritable(1);

  20. private Text word = new Text();

  21.  
  22. public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

  23. StringTokenizer itr = new StringTokenizer(value.toString());

  24. while (itr.hasMoreTokens()) {

  25. word.set(itr.nextToken());

  26. context.write(word, one);

  27. }

  28. }

  29. }

  30.  
  31. public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

  32. private IntWritable result = new IntWritable();

  33.  
  34. public void reduce(Text key, Iterable<IntWritable> values, Context context)

  35. throws IOException, InterruptedException {

  36. int sum = 0;

  37. for (IntWritable val : values) {

  38. sum += val.get();

  39. }

  40. result.set(sum);

  41. context.write(key, result);

  42. }

  43. }

  44.  
  45. public static void main(String[] args) throws Exception {

  46. Configuration conf = new Configuration();

  47. String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

  48. if (otherArgs.length != 2) {

  49. System.err.println("Usage: wordcount <in> <out>");

  50. System.exit(2);

  51. }

  52. @SuppressWarnings("deprecation")

  53. Job job = new Job(conf, "word count");

  54. job.setJarByClass(WordCount.class);

  55. job.setMapperClass(TokenizerMapper.class);

  56. job.setCombinerClass(IntSumReducer.class);

  57. job.setReducerClass(IntSumReducer.class);

  58. job.setOutputKeyClass(Text.class);

  59. job.setOutputValueClass(IntWritable.class);

  60. FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

  61. FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

  62. System.exit(job.waitForCompletion(true) ? 0 : 1);

  63. }

  64. }

在src下创建log4j.properties文件,不然运行程序时候会报错。可先创建txt文本文件,添加内容后修改名字及后缀名,然后复制到项目下。内容如下:

 

 
  1. # Configure logging for testing:optionally with log file

  2. #log4j.rootLogger=debug,appender

  3. log4j.rootLogger=info,appender

  4. #log4j.rootLogger=error,appender

  5. #\u8F93\u51FA\u5230\u63A7\u5236\u53F0

  6. log4j.appender.appender=org.apache.log4j.ConsoleAppender

  7. #\u6837\u5F0F\u4E3ATTCCLayout

  8. log4j.appender.appender.layout=org.apache.log4j.TTCCLayout

第三步:右键项目,依次Run as ->Run Configurations...->Java Application。选Java Application后点击左上角的New launch application,配置Main标签参数。填写Name(任起),Search...往下拉,找到WordCount,确定。

配置Argument标签参数。注意点在图上已经说明

配置完成后点击Apply,Run。出现类似以下日志,成功。

 

 
  1. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 115 bytes

  2. [pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments, 119 bytes to disk to satisfy reduce memory limit

  3. [pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 123 bytes from disk

  4. [pool-6-thread-1] INFO org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments, 0 bytes from memory into reduce

  5. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments

  6. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 1 segments left of total size: 115 bytes

  7. [pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.

  8. [pool-6-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords

  9. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task:attempt_local1399589841_0001_r_000000_0 is done. And is in the process of committing

  10. [pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.

  11. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task attempt_local1399589841_0001_r_000000_0 is allowed to commit now

  12. [pool-6-thread-1] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local1399589841_0001_r_000000_0' to hdfs://192.168.200.240:8020/user/tws/test/_temporary/0/task_local1399589841_0001_r_000000

  13. [pool-6-thread-1] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

  14. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1399589841_0001_r_000000_0' done.

  15. [pool-6-thread-1] INFO org.apache.hadoop.mapred.Task - Final Counters for attempt_local1399589841_0001_r_000000_0: Counters: 29

  16. File System Counters

  17. FILE: Number of bytes read=453

  18. FILE: Number of bytes written=292149

  19. FILE: Number of read operations=0

  20. FILE: Number of large read operations=0

  21. FILE: Number of write operations=0

  22. HDFS: Number of bytes read=87

  23. HDFS: Number of bytes written=81

  24. HDFS: Number of read operations=8

  25. HDFS: Number of large read operations=0

  26. HDFS: Number of write operations=3

  27. Map-Reduce Framework

  28. Combine input records=0

  29. Combine output records=0

  30. Reduce input groups=9

  31. Reduce shuffle bytes=123

  32. Reduce input records=9

  33. Reduce output records=9

  34. Spilled Records=9

  35. Shuffled Maps =1

  36. Failed Shuffles=0

  37. Merged Map outputs=1

  38. GC time elapsed (ms)=0

  39. Total committed heap usage (bytes)=253231104

  40. Shuffle Errors

  41. BAD_ID=0

  42. CONNECTION=0

  43. IO_ERROR=0

  44. WRONG_LENGTH=0

  45. WRONG_MAP=0

  46. WRONG_REDUCE=0

  47. File Output Format Counters

  48. Bytes Written=81

此时可以在Argument指定的路径(即输出路径,hdfs://IP:端口/路径)下查看结果。
word.txt内容如下:

运行结果:

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值