编译制作hadoop 2.4.1 Eclipse插件，在线调试运行MapReduce程序

最新推荐文章于 2020-05-07 10:56:16 发布

那个程序员

最新推荐文章于 2020-05-07 10:56:16 发布

阅读量882

点赞数

分类专栏： hadoop 文章标签： hadoop eclipse 插件

hadoop 专栏收录该内容

7 篇文章 1 订阅

订阅专栏

本文转载自http://blog.csdn.net/lsadjkfreurieurieu/article/details/39155799

环境简介：

hadoop 2.4.1 运行在linux虚拟机当中（伪分布式，其实这个不影响）
Eclipse 4.3.2 运行在windows 8.1中
ant 1.9.4 运行在windows 8.1 中，环境变量已经配置好

利用ant 编译出hadoop2.4.1的Eclipse插件：

因hadoop2.4.1版本比较新，官方并没有提供该版本的Eclipse插件，因此我们需要自行编译出来。

1、在github上下载 hadoop2x-eclipse-plugin，该版本为hadoop2.2.0，不过经过测试2.4.1也一样可以正常运行。解压

2、打开cmd 运行如下命令：

<pre name="code" class="html">G:\hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin>ant jar -Dversion=2
.4.1 -Declipse.home=G:\eclipse-standard-kepler-SR2-win32-x86_64\eclipse -Dhadoop
.home=G:\hadoop-2.4.1

初次运行这个命令会生成一些必要的额目录，但是并不会编译成功。

3、修改hadoop2x-eclipse-plugin-master\src\contrib\eclipse-plugin\下的build.xml ：

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<project default="jar" name="eclipse-plugin">

  <import file="../build-contrib.xml"/>

  <path id="eclipse-sdk-jars">
    <fileset dir="${eclipse.home}/plugins/">
      <include name="org.eclipse.ui*.jar"/>
      <include name="org.eclipse.jdt*.jar"/>
      <include name="org.eclipse.core*.jar"/>
      <include name="org.eclipse.equinox*.jar"/>
      <include name="org.eclipse.debug*.jar"/>
      <include name="org.eclipse.osgi*.jar"/>
      <include name="org.eclipse.swt*.jar"/>
      <include name="org.eclipse.jface*.jar"/>

      <include name="org.eclipse.team.cvs.ssh2*.jar"/>
      <include name="com.jcraft.jsch*.jar"/>
    </fileset> 
  </path>

  <path id="hadoop-sdk-jars">
    <fileset dir="${hadoop.home}/share/hadoop/mapreduce">
      <include name="hadoop*.jar"/>
    </fileset> 
    <fileset dir="${hadoop.home}/share/hadoop/hdfs">
      <include name="hadoop*.jar"/>
    </fileset> 
    <fileset dir="${hadoop.home}/share/hadoop/common">
      <include name="hadoop*.jar"/>
    </fileset> 
  </path>



  <!-- Override classpath to include Eclipse SDK jars -->
  <path id="classpath">
    <pathelement location="${build.classes}"/>
    <!--pathelement location="${hadoop.root}/build/classes"/-->
    <path refid="eclipse-sdk-jars"/>
    <path refid="hadoop-sdk-jars"/>
  </path>

  <!-- Skip building if eclipse.home is unset. -->
  <target name="check-contrib" unless="eclipse.home">
    <property name="skip.contrib" value="yes"/>
    <echo message="eclipse.home unset: skipping eclipse plugin"/>
  </target>

 <target name="compile" depends="init" unless="skip.contrib">
    <echo message="contrib: ${name}"/>
    <javac
     encoding="${build.encoding}"
     srcdir="${src.dir}"
     includes="**/*.java"
     destdir="${build.classes}"
     debug="${javac.debug}"
     deprecation="${javac.deprecation}">
     <classpath refid="classpath"/>
    </javac>
  </target>

  <!-- Override jar target to specify manifest -->
  <target name="jar" depends="compile" unless="skip.contrib">
    <mkdir dir="${build.dir}/lib"/>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/mapreduce">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/common">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/hdfs">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>
    <copy  todir="${build.dir}/lib/" verbose="true">
          <fileset dir="${hadoop.home}/share/hadoop/yarn">
           <include name="hadoop*.jar"/>
          </fileset>
    </copy>

    <copy  todir="${build.dir}/classes" verbose="true">
          <fileset dir="${root}/src/java">
           <include name="*.xml"/>
          </fileset>
    </copy>



    <copy file="${hadoop.home}/share/hadoop/common/lib/protobuf-java-${protobuf.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/log4j-${log4j.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-configuration-${commons-configuration.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-lang-${commons-lang.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
	<copy file="${hadoop.home}/share/hadoop/common/lib/commons-collections-${commons-collections.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-core-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/jackson-mapper-asl-${jackson.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-log4j12-${slf4j-log4j12.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/slf4j-api-${slf4j-api.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/guava-${guava.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/hadoop-auth-${hadoop.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/commons-cli-${commons-cli.version}.jar"  todir="${build.dir}/lib" verbose="true"/>
    <copy file="${hadoop.home}/share/hadoop/common/lib/netty-${netty.version}.jar"  todir="${build.dir}/lib" verbose="true"/>

    <jar
      jarfile="${build.dir}/hadoop-${name}-${version}.jar"
      manifest="${root}/META-INF/MANIFEST.MF">
      <manifest>
	 <attribute name="Bundle-ClassPath" 
		value="classes/, 
 lib/hadoop-mapreduce-client-core-${hadoop.version}.jar,
 lib/hadoop-mapreduce-client-common-${hadoop.version}.jar,
 lib/hadoop-mapreduce-client-jobclient-${hadoop.version}.jar,
 lib/hadoop-auth-${hadoop.version}.jar,
 lib/hadoop-common-${hadoop.version}.jar,
 lib/hadoop-hdfs-${hadoop.version}.jar,
 lib/protobuf-java-${protobuf.version}.jar,
 lib/log4j-${log4j.version}.jar,
 lib/commons-cli-1.2.jar,
 lib/commons-configuration-1.6.jar,
 lib/commons-httpclient-3.1.jar,
 lib/commons-lang-${commons-lang.version}.jar,
 lib/commons-collections-${commons-collections.version}.jar,
 lib/jackson-core-asl-1.8.8.jar,
 lib/jackson-mapper-asl-1.8.8.jar,
 lib/slf4j-log4j12-1.7.5.jar,
 lib/slf4j-api-1.7.5.jar,
 lib/guava-${guava.version}.jar,
 lib/netty-${netty.version}.jar"/>
	 </manifest>
      <fileset dir="${build.dir}" includes="classes/ lib/"/>
      <!--fileset dir="${build.dir}" includes="*.xml"/-->
      <fileset dir="${root}" includes="resources/ plugin.xml"/>
    </jar>
  </target>

</project>

4、修改hadoop2x-eclipse-plugin-master\ivy\libraries.properties，如下：

# This is the version of hadoop we are generating
hadoop.version=2.4.1
hadoop-gpl-compression.version=0.1.0

#These are the versions of our dependencies (in alphabetical order)
apacheant.version=1.7.0
ant-task.version=2.0.10

asm.version=3.2
aspectj.version=1.6.5
aspectj.version=1.6.11

checkstyle.version=4.2

commons-cli.version=1.2
commons-codec.version=1.4
commons-collections.version=3.2.1
commons-configuration.version=1.6
commons-daemon.version=1.0.13
commons-httpclient.version=3.0.1
commons-lang.version=2.6
commons-logging.version=1.0.4
commons-logging-api.version=1.0.4
commons-math.version=2.1
commons-el.version=1.0
commons-fileupload.version=1.2
commons-io.version=2.1
commons-net.version=3.1
core.version=3.1.1
coreplugin.version=1.3.2

hsqldb.version=1.8.0.10

ivy.version=2.1.0

jasper.version=5.5.12
jackson.version=1.8.8
#not able to figureout the version of jsp & jsp-api version to get it resolved throught ivy
# but still declared here as we are going to have a local copy from the lib folder
jsp.version=2.1
jsp-api.version=5.5.12
jsp-api-2.1.version=6.1.14
jsp-2.1.version=6.1.14
jets3t.version=0.6.1
jetty.version=6.1.26
jetty-util.version=6.1.26
jersey-core.version=1.8
jersey-json.version=1.8
jersey-server.version=1.8
junit.version=4.5
jdeb.version=0.8
jdiff.version=1.0.9
json.version=1.0

kfs.version=0.1

log4j.version=1.2.17
lucene-core.version=2.3.1

mockito-all.version=1.8.5
jsch.version=0.1.42

oro.version=2.0.8

rats-lib.version=0.5.1

servlet.version=4.0.6
servlet-api.version=2.5
slf4j-api.version=1.7.5
slf4j-log4j12.version=1.7.5

wagon-http.version=1.0-beta-2
xmlenc.version=0.52
xerces.version=1.4.4

protobuf.version=2.5.0
guava.version=11.0.2
netty.version=3.6.2.Final

5、再次运行第二部的命令，此时发现编译成功，插件的jar包生成在hadoop2x-eclipse-plugin-master\build\contrib\eclipse-plugin\路径下

6、讲生成的jar包放进Eclipse的plugin路径下，此时插件安装成功。启动Eclipse可以发现工具栏window->preference对话框出现了Hadoop Map/Reduce选项卡。

7、将windows下的Hadoop 2.4.1的安装目录配置好。

在线调试MapReduce程序：

自己编写一个统计温度的MapReduce小程序：

Mymapper.java

package com.henry.mapper;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Mymapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	
	private static final int MISSING = 9999;
	public void map(LongWritable ikey, Text ivalue, Context context)
			throws IOException, InterruptedException {
		
		String line = ivalue.toString();
		String year = line.substring(15, 19);
		
		int airTemperature;
		if(line.charAt(87) == '+') {
			airTemperature = Integer.parseInt(line.substring(88, 92));
		} else {
			airTemperature = Integer.parseInt(line.substring(87, 92));
		}
		String quality = line.substring(92, 93);
		if(airTemperature != MISSING && quality.matches("[01459]")) {
			context.write(new Text(year), new IntWritable(airTemperature));
		}
		
	}

}

MyReducer.java

package com.henry.reducer;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Myreducer extends Reducer<Text, IntWritable, Text, IntWritable> {

	public void reduce(Text _key, Iterable<IntWritable> values, Context context)
			throws IOException, InterruptedException {
		
		int maxValue = Integer.MIN_VALUE;
		// process values
		for (IntWritable val : values) {
			maxValue = Math.max(maxValue, val.get());
		}
		
		context.write(_key, new IntWritable(maxValue));
	}

}

Mymrdriver.java

package com.henry.mapreducedriver;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Mymrdriver {

	public static void main(String[] args) throws Exception {
		args = new String[2];
		// Directory /mrtest/input needs to be created first
		args[0] = "hdfs://192.168.70.132:9000/mrtest/input";
		args[1] = "hdfs://192.168.70.132:9000/mrtest/output";
		
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		
		Job job = Job.getInstance(conf, "My MapReduce Job");
		job.setJarByClass(com.henry.mapreducedriver.Mymrdriver.class);
		
		job.setMapperClass(com.henry.mapper.Mymapper.class);		
		job.setReducerClass(com.henry.reducer.Myreducer.class);

		// TODO: specify output types
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class); 

		// TODO: specify input and output DIRECTORIES (not files)
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

		if (!job.waitForCompletion(true))
			return;
	}

}

输入文件sample.txt：

0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999

输出文件的路径不可以存在，如果存在会报文件已经存在的异常。

此时如果直接运行会产生异常：Exception in thread "main" java.lang.NullPointerException

上网查了原因，原因是：在windows 环境下run on Hadoop 缺少相应的库文件，解决办法：

下载windows环境下的Hadoop bin文件（下载地址），在Hadoop的bin目录下（windows的路径）放winutils.exe，在环境变量中配置 HADOOP_HOME，把hadoop.dll拷贝到C:\Windows\System32下面即可。

输出文件part-r-00000：

1949 111
1950 22

那个程序员

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
编译制作hadoop 2.4.1 Eclipse插件，在线调试运行MapReduce程序

环境简介：hadoop 2.4.1 运行在linux虚拟机当中（伪分布式，其实这个不影响）Eclipse 4.3.2 运行在windows 8.1中ant 1.9.4 运行在windows 8.1 中，环境变量已经配置好利用ant 编译出hadoop2.4.1的Eclipse插件：因hadoop2.4.1版本比较新，官方并没有提供该版本的Eclipse插
复制链接

扫一扫

专栏目录