eclipse开发hadoop环境搭建

最新推荐文章于 2021-02-24 17:14:15 发布

jinyonglu168

最新推荐文章于 2021-02-24 17:14:15 发布

阅读量261

点赞数

分类专栏： Hadoop 文章标签： eclipse hadoop java

本文链接：https://blog.csdn.net/lu602171651/article/details/84765185

版权

Hadoop 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

[b]Hadoop2.6.0集群搭建完毕后，下面介绍一下eclipse是如何开发hadoop程序（即MapReduce程序）的。[/b]
1.jdk安装hadoop集群的搭建，不再详述，参考[url]http://kevin12.iteye.com/blog/2273556[/url]；
下面运行下hadoop自带的wordcount例子：
2.先将hadoop-2.6.0目录下面的README.txt和LICENSE.txt文件put到集群的/library/hadoop/data目录下面，如果目录不存在先创建；


root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -mkdir /library/hadoop/data
root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -put ./LICENSE.txt /library/hadoop/data
root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -put ./README.txt /library/hadoop/data
root@master1:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /library/hadoop/data /library/hadoop/wordcount_output1
16/02/12 12:49:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 12:49:07 INFO client.RMProxy: Connecting to ResourceManager at master1/192.168.112.130:8032
16/02/12 12:49:08 INFO input.FileInputFormat: Total input paths to process : 2
16/02/12 12:49:08 INFO mapreduce.JobSubmitter: number of splits:2
16/02/12 12:49:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455236431298_0002
16/02/12 12:49:09 INFO impl.YarnClientImpl: Submitted application application_1455236431298_0002
16/02/12 12:49:09 INFO mapreduce.Job: The url to track the job: http://master1:8088/proxy/application_1455236431298_0002/
16/02/12 12:49:09 INFO mapreduce.Job: Running job: job_1455236431298_0002
16/02/12 12:49:20 INFO mapreduce.Job: Job job_1455236431298_0002 running in uber mode : false
16/02/12 12:49:20 INFO mapreduce.Job:  map 0% reduce 0%
16/02/12 12:49:30 INFO mapreduce.Job:  map 100% reduce 0%
16/02/12 12:49:42 INFO mapreduce.Job:  map 100% reduce 100%
16/02/12 12:49:42 INFO mapreduce.Job: Job job_1455236431298_0002 completed successfully
16/02/12 12:49:43 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=12822
        FILE: Number of bytes written=342551
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=17026
        HDFS: Number of bytes written=8943
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=17371
        Total time spent by all reduces in occupied slots (ms)=7667
        Total time spent by all map tasks (ms)=17371
        Total time spent by all reduce tasks (ms)=7667
        Total vcore-seconds taken by all map tasks=17371
        Total vcore-seconds taken by all reduce tasks=7667
        Total megabyte-seconds taken by all map tasks=17787904
        Total megabyte-seconds taken by all reduce tasks=7851008
    Map-Reduce Framework
        Map input records=320
        Map output records=2336
        Map output bytes=24790
        Map output materialized bytes=12828
        Input split bytes=231
        Combine input records=2336
        Combine output records=886
        Reduce input groups=838
        Reduce shuffle bytes=12828
        Reduce input records=886
        Reduce output records=838
        Spilled Records=1772
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=526
        CPU time spent (ms)=2190
        Physical memory (bytes) snapshot=476704768
        Virtual memory (bytes) snapshot=5660733440
        Total committed heap usage (bytes)=260173824
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=16795
    File Output Format Counters 
        Bytes Written=8943
#查看结果，这里只截取一部分内容：
root@master1:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/mapreduce# hdfs dfs -cat /library/hadoop/wordcount_output1/part-r-00000
16/02/12 12:51:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
"AS    4
"Contribution"    1
"Contributor"    1
"Derivative    1
"Legal    1
"License"    1
"License");    1
"Licensor"    1
"NOTICE"    1
"Not    1
"Object"    1
"Source"    1
"Work"    1
"You"    1
"Your")    1
"[]"    1
"control"    1
"printed    1
"submitted"    1

也可以通过浏览器查看文件内容,将/library/hadoop/wordcount_output1目录下面的part-r-00000文件下载打开查看即可:
[img]http://dl2.iteye.com/upload/attachment/0115/0592/e748c1ab-5a5d-3854-9225-206df6d057c6.png[/img]
3.下载eclipse liunx版64位的，直接去eclipse官网下载，不再详述；
4.将下载好的eclipse解压到/usr/local/目录下面，进入eclipse目录，执行eclipse，打开eclipse，工作空间选择默认（/root/workspace）即可。
5.本地开发hadoop程序调试运行需要hadoop-eclipse-plugin-2.6.0.jar，将该jar包拷贝到/usr/local/eclipse目录下面的plugins目录下面。在eclipse中选择File->restart重启eclipse，重启打开后选择Window->Show View->Other选中MapReduceTools下面的Map/Reduce Locations并将点击OK。

[img]http://dl2.iteye.com/upload/attachment/0115/0594/b9666f5b-47de-3b7f-a85e-3db9803caa10.png[/img]

6.创建Hadoop Location,在下面的Map/Reduce Locations中新建一个Hadoop location,配置好Host和Port后保存，选择Java EE的浏览方式，在Project Explorer下面就可以看到DFS Loations，并且显示了集群的根目录文件信息。

[img]http://dl2.iteye.com/upload/attachment/0115/0596/04554cf1-912b-3869-a212-82eba82a6f96.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0598/b426ff1a-5115-3443-9ec5-6c779ad36a19.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0600/314548ff-18fb-3e45-868d-b0ca5cec4ddc.png[/img]
7.指定hadoop的安装目录，选择Window->Preferences->Data Management选中Hadoop Map/Reduce在右面的页签中点击“Browse..."按钮，选择hadoop的安装目录，然后点击Apply，并保存退出。

[img]http://dl2.iteye.com/upload/attachment/0115/0602/4835d76b-5ebb-39d9-b7c9-2a96b7d151c5.png[/img]
8.创建项目，点击File->New->Other，在弹出框中选择Map/Reduce Project.点击Next，填写项目名称，点击Finish,在弹出框中选择ok，eclipse自动会将hadoop安装目录下的架包引用进来，这样我们就可以开发MapReduce程序了。

[img]http://dl2.iteye.com/upload/attachment/0115/0604/43a43cee-4079-307f-9e01-26b9f8225683.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0606/f13d4594-611d-3e63-898e-cc947eabd0fc.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0608/45912229-a5ea-3d39-a85f-f69e86b9e3b0.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0610/2390c3c3-6e55-38cb-ba76-34f6e17a5122.png[/img]

9.关联源码.
首先将hadoop-2.6.0-src.tar.gz拷贝到虚拟机中并解压到/usr/local/hadoop目录下，并用tar命令解压缩；
再按Shift+Ctrl+T组合键，在弹出框中输入NameNode选择Hadoop的NamedNodeMap,在class页签中点击Attach Source，在弹出的对话框中选择External location，并点击Extenal Folder按钮选择刚才解压缩的源码hadoop-2.6.0-src，点击OK就可以关联上源码了。

[img]http://dl2.iteye.com/upload/attachment/0115/0612/6339d365-e756-30e4-8c8d-d2a6bcc23034.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0614/88b074ca-3708-3713-a1bd-1a672db580db.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0616/ffb8138d-1625-3eac-b2ab-bc07a6534a3d.png[/img]
10.运行WordCount例子
解压缩源码，在hadoop-2.6.0-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples中找到WordCount.java文件，拷贝到HadoopApps的com.imf.hadoop包中（如果包不存在先创建）；
WordCount源码如下：

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package com.imf.hadoop;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

	public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

		private final static IntWritable one = new IntWritable(1);
		private Text word = new Text();

		public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
			StringTokenizer itr = new StringTokenizer(value.toString());
			while (itr.hasMoreTokens()) {
				word.set(itr.nextToken());
				context.write(word, one);
			}
		}
	}

	public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
		private IntWritable result = new IntWritable();

		public void reduce(Text key, Iterable<IntWritable> values, Context context)
				throws IOException, InterruptedException {
			int sum = 0;
			for (IntWritable val : values) {
				sum += val.get();
			}
			result.set(sum);
			context.write(key, result);
		}
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		if (otherArgs.length < 2) {
			System.err.println("Usage: wordcount <in> [<in>...] <out>");
			System.exit(2);
		}
		Job job = new Job(conf, "word count");
		job.setJarByClass(WordCount.class);
		job.setMapperClass(TokenizerMapper.class);
		job.setCombinerClass(IntSumReducer.class);
		job.setReducerClass(IntSumReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);
		for (int i = 0; i < otherArgs.length - 1; ++i) {
			FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
		}
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

然后选择WordCount右键->Run As->Run Configruations，在Java Application中选择WordCount在右面的页签中配置Arguments的Program arguments参数如下：hdfs://master1:9000/library/hadoop/data hdfs://master1:9000/library/hadoop/wordcount_output1，点击Apply，并点击Run按钮运行。
[img]http://dl2.iteye.com/upload/attachment/0115/0618/aa60e601-dce8-3c36-8747-780414e22a25.png[/img]

11.查看结果：在浏览器中查看会发现输出目录中会多出part-r-00000文件，这个就是我们统计的结果，也可以用命令hdfs dfs -cat /library/hadoop/wordcount_output1/part-r-00000查看。
[b]注意，如果输出目录已经存在则运行会报错，可以更改一个目录或者删除该目录再次运行即可；[/b]

[img]http://dl2.iteye.com/upload/attachment/0115/0620/876d4e5c-98fd-3dde-bd7f-8268d038918d.png[/img]

12.下面将WordCount打包成jar文件在集群中运行。
右键HadoopApps选择Export,再弹出框中选择jar文件，点击Next,然后选择jar文件输出的目录，我输出的位置是/usr/local/tools 文件名称为HadoopApps.jar，然后一路Next即可。
[b]注意：在最后一步不指定主函数，而是在运行jar包时进行指定，因为这样做有助于测试其他项目。[/b]

[img]http://dl2.iteye.com/upload/attachment/0115/0622/6c5cfd47-13c3-3d9d-ba21-7b18dd197b71.png[/img]
[img]http://dl2.iteye.com/upload/attachment/0115/0624/1ff6ec00-7d74-39fa-b590-55934e9969ba.png[/img]

13.运行jar文件


root@master1:/usr/local/tools# hadoop jar HadoopApps.jar com.imf.hadoop.WordCount hdfs://master1:9000/library/hadoop/data/ hdfs://master1:9000/library/hadoop/wordcount_output2
16/02/13 14:36:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/13 14:36:59 INFO client.RMProxy: Connecting to ResourceManager at master1/192.168.112.130:8032
16/02/13 14:37:00 INFO input.FileInputFormat: Total input paths to process : 2
16/02/13 14:37:00 INFO mapreduce.JobSubmitter: number of splits:2
16/02/13 14:37:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455315784285_0004
16/02/13 14:37:01 INFO impl.YarnClientImpl: Submitted application application_1455315784285_0004
16/02/13 14:37:01 INFO mapreduce.Job: The url to track the job: http://master1:8088/proxy/application_1455315784285_0004/
16/02/13 14:37:01 INFO mapreduce.Job: Running job: job_1455315784285_0004
16/02/13 14:37:08 INFO mapreduce.Job: Job job_1455315784285_0004 running in uber mode : false
16/02/13 14:37:08 INFO mapreduce.Job:  map 0% reduce 0%
16/02/13 14:37:19 INFO mapreduce.Job:  map 100% reduce 0%
16/02/13 14:37:26 INFO mapreduce.Job:  map 100% reduce 100%
16/02/13 14:37:27 INFO mapreduce.Job: Job job_1455315784285_0004 completed successfully
16/02/13 14:37:28 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=12822
        FILE: Number of bytes written=342443
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=17026
        HDFS: Number of bytes written=8943
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=17479
        Total time spent by all reduces in occupied slots (ms)=4161
        Total time spent by all map tasks (ms)=17479
        Total time spent by all reduce tasks (ms)=4161
        Total vcore-seconds taken by all map tasks=17479
        Total vcore-seconds taken by all reduce tasks=4161
        Total megabyte-seconds taken by all map tasks=17898496
        Total megabyte-seconds taken by all reduce tasks=4260864
    Map-Reduce Framework
        Map input records=320
        Map output records=2336
        Map output bytes=24790
        Map output materialized bytes=12828
        Input split bytes=231
        Combine input records=2336
        Combine output records=886
        Reduce input groups=838
        Reduce shuffle bytes=12828
        Reduce input records=886
        Reduce output records=838
        Spilled Records=1772
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=387
        CPU time spent (ms)=2030
        Physical memory (bytes) snapshot=483635200
        Virtual memory (bytes) snapshot=5660704768
        Total committed heap usage (bytes)=259067904
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=16795
    File Output Format Counters 
        Bytes Written=8943

打开浏览器，点击part-r-00000下载并查看：

[img]http://dl2.iteye.com/upload/attachment/0115/0626/e057c657-9111-3c74-b070-91d2a629b888.png[/img]

[img]http://dl2.iteye.com/upload/attachment/0115/0628/62a400fb-7858-34ed-9b42-d4fb58f16705.png[/img]

两次运行结果进行比较，结果是一样的。