eclipse环境下使用hadoop-eclipse插件配置并开发mapreduce程序

hadoop集群的搭建请参考这两篇文章,第一篇是在redhat下,第二篇就是在ubuntu下,当然ubuntu也适用。

http://blog.csdn.net/haojun186/article/details/7466207
http://lingyibin.iteye.com/blog/875535

这几天一直在搞eclipse下开发hadoop程序的环境搭建,在插件这个问题上花了不少时间,各种错误,各种郁闷,今天终于走出来了,特为此分享下经验教训,有问题请留言啊

一、下载hadoop-eclipse-1.0.1插件

http://download.csdn.net/detail/shuangtaqibing/4461472

该插件经测试,适用于eclipse3.7(Indigo)和Eclipse3.8(judo)两个版本。

二、安装插件,直接把插件copy到Eclipse下的plugins目录中,重启Eclipse即可。安装完成后可看到以下界面,没有的话在windows->show view中打开

 

二、配置插件

本人是在vmware中安装的ubuntu10.4中测试的,Eclipse使用Judo这一版本。


 

主机host填写hadoop集群所在地址,根据每个人配置集群设置的ip来填写,有的人集群设置为localhost,有的为127.0.0.1,这些都是和你个人配置集群时所配置的core-site.xml等文件的配置有关。

设置完成后,启动集群,start-mared.sh,start-dfs.sh。

三、创建一个Map/Reduce工程

1. new 一个 project,选择Map/Reduce  Project

2、工程名叫WordCount ,

注意:hadoop install directory是你的hadoop解压目录,这样才能让eclipse找到工程所需jar文件,然后成功新建工程。

然后如下图所示,建立三个java文件

 

MyDriver.jar如下:

 

 

 

package org;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.MyMap;

import org.MyReduce;



public class MyDriver {



	public static void main(String[] args) throws Exception,

			InterruptedException {

		Configuration conf = new Configuration();



		Job job = new Job(conf, "Hello Hadoop");



		job.setJarByClass(MyDriver.class);



		job.setMapOutputKeyClass(Text.class);

		job.setMapOutputValueClass(IntWritable.class);



		job.setOutputKeyClass(Text.class);

		job.setOutputValueClass(IntWritable.class);



		job.setMapperClass(MyMap.class);

		job.setCombinerClass(MyReduce.class);

		job.setReducerClass(MyReduce.class);



		job.setInputFormatClass(TextInputFormat.class);

		job.setOutputFormatClass(TextOutputFormat.class);



		FileInputFormat.setInputPaths(job, new Path(args[0]));



		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		// JobClient.runJob(conf);

		job.waitForCompletion(true);

	}



}


 

 MyMap.java如下:

package org;



import java.io.IOException;

import java.util.StringTokenizer;



import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Mapper;



public class MyMap extends Mapper<Object, Text, Text, IntWritable> {

	private final static IntWritable one = new IntWritable(1);



	private Text word;



	public void map(Object key, Text value, Context context)

			throws IOException, InterruptedException {



		String line = value.toString();

		StringTokenizer tokenizer = new StringTokenizer(line);

		while (tokenizer.hasMoreTokens()) {

			word = new Text();

			word.set(tokenizer.nextToken());

			context.write(word, one);

		}

	}

}

MyReduce.java如下:

package org;



import java.io.IOException;



import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Reducer;



public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {

	public void reduce(Text key, Iterable<IntWritable> values, Context context)

			throws IOException, InterruptedException {

		int sum = 0;

		for (IntWritable val : values) {

			sum += val.get();

		}

		context.write(key, new IntWritable(sum));

	}

}


建立完这三个文件号,在工程WordCount文件夹下建立一个输入目录input,作为MapReduce的输入目录,在目录下放两个文件,分别是testFile1.txt,testFile2.txt文件内容分别是:

textFile1文件内容

hello hadoop,this is lingyibin

textFile2的文件内容

this is the world of lingyibin.wellcome hadoop.

 

最后设置下MyDriver的运行时参数,如下图所示:

 

然后Apply,然后选择run on hadoop ,运行结果如下:

 

 

控制台输出:

12/07/31 00:34:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/07/31 00:34:51 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/07/31 00:34:51 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
****file:/home/malipeng/workspace_hadoop1/WordCount/input
12/07/31 00:34:51 INFO input.FileInputFormat: Total input paths to process : 2
12/07/31 00:34:51 INFO mapred.JobClient: Running job: job_local_0001
12/07/31 00:34:51 INFO util.ProcessTree: setsid exited with exit code 0
12/07/31 00:34:51 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@62937c
12/07/31 00:34:51 INFO mapred.MapTask: io.sort.mb = 100
12/07/31 00:34:51 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/31 00:34:51 INFO mapred.MapTask: record buffer = 262144/327680
12/07/31 00:34:51 INFO mapred.MapTask: Starting flush of map output
12/07/31 00:34:52 INFO mapred.MapTask: Finished spill 0
12/07/31 00:34:52 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/07/31 00:34:52 INFO mapred.JobClient:  map 0% reduce 0%
12/07/31 00:34:54 INFO mapred.LocalJobRunner:
12/07/31 00:34:54 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/07/31 00:34:54 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c695a6
12/07/31 00:34:54 INFO mapred.MapTask: io.sort.mb = 100
12/07/31 00:34:54 INFO mapred.MapTask: data buffer = 79691776/99614720
12/07/31 00:34:54 INFO mapred.MapTask: record buffer = 262144/327680
12/07/31 00:34:54 INFO mapred.MapTask: Starting flush of map output
12/07/31 00:34:54 INFO mapred.MapTask: Finished spill 0
12/07/31 00:34:54 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/07/31 00:34:55 INFO mapred.JobClient:  map 100% reduce 0%
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/07/31 00:34:57 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@73a7ab
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Merger: Merging 2 sorted segments
12/07/31 00:34:57 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 143 bytes
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/07/31 00:34:57 INFO mapred.LocalJobRunner:
12/07/31 00:34:57 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/07/31 00:34:57 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to output2
12/07/31 00:35:00 INFO mapred.LocalJobRunner: reduce > reduce
12/07/31 00:35:00 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/07/31 00:35:01 INFO mapred.JobClient:  map 100% reduce 100%
12/07/31 00:35:01 INFO mapred.JobClient: Job complete: job_local_0001
12/07/31 00:35:01 INFO mapred.JobClient: Counters: 20
12/07/31 00:35:01 INFO mapred.JobClient:   File Output Format Counters
12/07/31 00:35:01 INFO mapred.JobClient:     Bytes Written=102
12/07/31 00:35:01 INFO mapred.JobClient:   FileSystemCounters
12/07/31 00:35:01 INFO mapred.JobClient:     FILE_BYTES_READ=1904
12/07/31 00:35:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=99030
12/07/31 00:35:01 INFO mapred.JobClient:   File Input Format Counters
12/07/31 00:35:01 INFO mapred.JobClient:     Bytes Read=73
12/07/31 00:35:01 INFO mapred.JobClient:   Map-Reduce Framework
12/07/31 00:35:01 INFO mapred.JobClient:     Map output materialized bytes=151
12/07/31 00:35:01 INFO mapred.JobClient:     Map input records=2
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/07/31 00:35:01 INFO mapred.JobClient:     Spilled Records=22
12/07/31 00:35:01 INFO mapred.JobClient:     Map output bytes=117
12/07/31 00:35:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=481505280
12/07/31 00:35:01 INFO mapred.JobClient:     CPU time spent (ms)=0
12/07/31 00:35:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
12/07/31 00:35:01 INFO mapred.JobClient:     Combine input records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce input records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce input groups=10
12/07/31 00:35:01 INFO mapred.JobClient:     Combine output records=11
12/07/31 00:35:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/07/31 00:35:01 INFO mapred.JobClient:     Reduce output records=10
12/07/31 00:35:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/07/31 00:35:01 INFO mapred.JobClient:     Map output records=11

 

 


 


 

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
hadoop-eclipse-plugin-2.7.4-jar是Hadoop的一个插件,用于与Eclipse集成以方便开发和调试Hadoop应用程序Hadoop是一个用于处理大规模数据集的分布式计算框架,它提供了对大数据的高效处理和存储能力。而Eclipse是一个流行的集成开发环境(IDE),广泛应用于Java开发。 通过hadoop-eclipse-plugin-2.7.4-jar,开发人员可以在Eclipse中创建、编辑和管理Hadoop应用程序。该插件提供了一系列功能,例如创建Hadoop项目、在本地运行和调试Hadoop应用程序、上传和下载文件到Hadoop集群等。 对于开发人员来说,使用hadoop-eclipse-plugin-2.7.4-jar可以带来一些好处。首先,它可以提高开发效率。开发人员可以在熟悉的Eclipse环境中编写Hadoop程序,提供更好的开发体验。其次,该插件提供了一些方便的工具和功能,如Hadoop项目模板、自动补全、错误检测和修复等,能够帮助开发人员更快地发现和解决问题。 另外,hadoop-eclipse-plugin-2.7.4-jar还支持与Hadoop集群的集成。开发人员可以通过插件直接与Hadoop集群进行交互,执行MapReduce任务,查看运行日志等。这使得开发人员可以更方便地调试和优化自己的应用程序。 总之,hadoop-eclipse-plugin-2.7.4-jar是一个强大的插件,通过与Eclipse集成,它为开发人员提供了更好的Hadoop开发环境和更高的开发效率。无论是对新手还是有经验的开发人员来说,该插件都是一个有用的工具,可以帮助他们更轻松地开发和调试Hadoop应用程序

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值