Window平台的eclipse连接linux的hadoop集群

前提条件:之前已经在VM上虚拟了三台linux,并且安装了hadoop集群
feixu-master, feixu-slave1, feixu-slave2, feixu-slave3

需求: 为了开发和调试的方便,因此需要在windows平台上装eclipse来远程连接linux的hadoop集群。正常是模式是在eclipse中开发和调试小的数据集,然后部署到集群上去跑作业。

软件版本: hadoop-1.2.1,  eclipse 4.2 Juno

1.  编译hadoop-eclipse-plugin-1.2.1.jar
hadoop在0.20.x的版本的时候会提供插件,在1.x和2.x的时候只提供了源代码,因此需要自己来编译插件。

修改src/contrib/eclipse-plugin/build.xml

<project default="jar" name="eclipse-plugin">

<import file="../build-contrib.xml"/>

<path id="eclipse-sdk-jars">

<fileset dir="${eclipse.home}/plugins/"> ...... </fileset>

</path>


<path id="hadoop-core-jar">
<fileset dir="${hadoop.root}/">
<include name="hadoop*.jar"/>
</fileset> 

</path>


<!-- Override classpath to include Eclipse SDK jars -->
<path id="classpath">
<pathelement location="${build.classes}"/>
<pathelement location="${hadoop.root}/build/classes"/>
<path refid="eclipse-sdk-jars"/>
<path refid="hadoop-core-jar"/>
</path>

......

<!-- Override jar target to specify manifest -->
<target name="jar" depends="compile" unless="skip.contrib">
<mkdir dir="${build.dir}/lib"/>
<!--<copy file="${hadoop.root}/build/hadoop-core-${version}.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/>
<copy file="${hadoop.root}/build/ivy/lib/Hadoop/common/commons-cli-${commons-cli.version}.jar" todir="${build.dir}/lib" verbose="true"/>
-->
<copy file="${hadoop.root}/hadoop-core-${version}.jar" tofile="${build.dir}/lib/hadoop-core.jar" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-cli-1.2.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-lang-2.4.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-configuration-1.6.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-mapper-asl-1.8.8.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/jackson-core-asl-1.8.8.jar" todir="${build.dir}/lib" verbose="true"/>
<copy file="${hadoop.root}/lib/commons-httpclient-3.0.1.jar" todir="${build.dir}/lib" verbose="true"/>

<jar
jarfile="${build.dir}/hadoop-${name}-${version}.jar"
manifest="${root}/META-INF/MANIFEST.MF">
<fileset dir="${build.dir}" includes="classes/ lib/"/>
<fileset dir="${root}" includes="resources/ plugin.xml"/>
</jar>
</target>

</project>

修改src/contrib/build-contrib.xml

<project name="hadoopbuildcontrib" xmlns:ivy="antlib:org.apache.ivy.ant">

<property name="version" value="1.2.1"/>
<property name="ivy.version" value="2.1.0"/>
<property name="eclipse.home" location="E:/eclipse"/>
<property name="name" value="${ant.project.name}"/>
<property name="root" value="${basedir}"/>
<property name="hadoop.root" location="C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1"/>

 ......

<target name="proxy-setting" >
<property name="proxy.host" value="cn-proxy.jp.oracle.com"/>
<property name="proxy.port" value="80"/>
<setproxy proxyhost= "${proxy.host}" proxyport= "${proxy.port}" />
</target>

<target name="ivy-download" description="To download ivy " unless="offline" depends="proxy-setting">
<get src="${ivy_repo_url}" dest="${ivy.jar}" usetimestamp="true"/>
</target>


</project>

修改src/contrib/eclipse-plugin/META-INF/MANIFEST.MF

Bundle-Activator: org.apache.hadoop.eclipse.Activator
Bundle-Localization: plugin
Require-Bundle: org.eclipse.ui,
org.eclipse.core.runtime,
org.eclipse.jdt.launching,
org.eclipse.debug.core,
org.eclipse.jdt,
org.eclipse.jdt.core,
org.eclipse.core.resources,
org.eclipse.ui.ide,
org.eclipse.jdt.ui,
org.eclipse.debug.ui,
org.eclipse.jdt.debug.ui,
org.eclipse.core.expressions,
org.eclipse.ui.cheatsheets,
org.eclipse.ui.console,
org.eclipse.ui.navigator,
org.eclipse.core.filesystem,
org.apache.commons.logging
Eclipse-LazyStart: true
Bundle-ClassPath: classes/,
lib/hadoop-core.jar, lib/commons-cli-1.2.jar, lib/commons-configuration-1.6.jar, lib/commons-httpclient-3.0.1.jar,
lib/commons-lang-2.4.jar, lib/jackson-core-asl-1.8.8.jar, lib/jackson-mapper-asl-1.8.8.jar

Bundle-Vendor: Apache Hadoop


在eclipse中运行src/contrib/eclipse-plugin/build.xml 脚本

Buildfile: C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\src\contrib\eclipse-plugin\build.xml
******
[echo] contrib: eclipse-plugin
[javac] C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\src\contrib\eclipse-plugin\build.xml:68: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
jar:
[copy] Copying 1 file to C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\build\contrib\eclipse-plugin\lib
[copy] Copying C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\hadoop-core-1.2.1.jar to C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\build\contrib\eclipse-plugin\lib\hadoop-core.jar
[jar] Building jar: C:\Users\feixu\workspace1\Hadoop-1.2.1\hadoop-1.2.1\build\contrib\eclipse-plugin\hadoop-eclipse-plugin-1.2.1.jar
BUILD SUCCESSFUL
Total time: 4 seconds


将生成的hadoop-eclipse-plugin-1.2.1.jar放到eclipse目录的plugin下面,然后重启,发现会有mapreduce视图

2. 测试eclipse连接hadoop集群
新建dfs location
Window平台的eclipse连接linux的hadoop集群 - spring8743 - 我的博客

查看hdfs目录,并且可以添加和删除
Window平台的eclipse连接linux的hadoop集群 - spring8743 - 我的博客
 新建test project,写三个类AvgTemperatureMapper, AvgTemperatureReducer,AvgTemperature来运行求平均温度
Window平台的eclipse连接linux的hadoop集群 - spring8743 - 我的博客
注意: 需要将hadoop-core-1.2.1.jar包和lib下面的这些jar包加到build path里面,否则的话运行会报错提示找不到jar包

配置main的输入和输出参数,然后运行AvgTemperature.java
Window平台的eclipse连接linux的hadoop集群 - spring8743 - 我的博客
 
运行完eclipse中的console记录如下:

13/11/19 20:44:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/19 20:44:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/11/19 20:44:09 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/19 20:44:09 INFO input.FileInputFormat: Total input paths to process : 1
13/11/19 20:44:09 WARN snappy.LoadSnappy: Snappy native library not loaded
13/11/19 20:44:09 INFO mapred.JobClient: Running job: job_local830574342_0001
13/11/19 20:44:09 INFO mapred.LocalJobRunner: Waiting for map tasks
13/11/19 20:44:09 INFO mapred.LocalJobRunner: Starting task: attempt_local830574342_0001_m_000000_0
13/11/19 20:44:09 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/11/19 20:44:09 INFO mapred.MapTask: Processing split: hdfs://10.182.11.77:9000/feixu/input1/2012.txt:0+67108864
13/11/19 20:44:09 INFO mapred.MapTask: io.sort.mb = 100
13/11/19 20:44:09 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/19 20:44:09 INFO mapred.MapTask: record buffer = 262144/327680
13/11/19 20:44:10 INFO mapred.JobClient: map 0% reduce 0%
13/11/19 20:44:12 INFO mapred.MapTask: Spilling map output: record full = true
13/11/19 20:44:12 INFO mapred.MapTask: bufstart = 0; bufend = 2359296; bufvoid = 99614720
13/11/19 20:44:12 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
13/11/19 20:44:13 INFO mapred.MapTask: Starting flush of map output
13/11/19 20:44:13 INFO mapred.MapTask: Finished spill 0
13/11/19 20:44:13 INFO mapred.MapTask: Finished spill 1
13/11/19 20:44:13 INFO mapred.Merger: Merging 2 sorted segments
13/11/19 20:44:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 3017744 bytes
13/11/19 20:44:13 INFO mapred.Task: Task:attempt_local830574342_0001_m_000000_0 is done. And is in the process of commiting
13/11/19 20:44:13 INFO mapred.LocalJobRunner:
13/11/19 20:44:13 INFO mapred.Task: Task 'attempt_local830574342_0001_m_000000_0' done.
13/11/19 20:44:13 INFO mapred.LocalJobRunner: Finishing task: attempt_local830574342_0001_m_000000_0
13/11/19 20:44:13 INFO mapred.LocalJobRunner: Starting task: attempt_local830574342_0001_m_000001_0
13/11/19 20:44:13 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/11/19 20:44:13 INFO mapred.MapTask: Processing split: hdfs://10.182.11.77:9000/feixu/input1/2012.txt:67108864+29277747
13/11/19 20:44:13 INFO mapred.MapTask: io.sort.mb = 100
13/11/19 20:44:13 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/19 20:44:13 INFO mapred.MapTask: record buffer = 262144/327680
13/11/19 20:44:13 INFO mapred.JobClient: map 50% reduce 0%
13/11/19 20:44:14 INFO mapred.MapTask: Starting flush of map output
13/11/19 20:44:14 INFO mapred.MapTask: Finished spill 0
13/11/19 20:44:14 INFO mapred.Task: Task:attempt_local830574342_0001_m_000001_0 is done. And is in the process of commiting
13/11/19 20:44:14 INFO mapred.LocalJobRunner:
13/11/19 20:44:14 INFO mapred.Task: Task 'attempt_local830574342_0001_m_000001_0' done.
13/11/19 20:44:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local830574342_0001_m_000001_0
13/11/19 20:44:14 INFO mapred.LocalJobRunner: Map task executor complete.
13/11/19 20:44:14 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/11/19 20:44:14 INFO mapred.LocalJobRunner:
13/11/19 20:44:14 INFO mapred.Merger: Merging 2 sorted segments
13/11/19 20:44:14 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 3938257 bytes
13/11/19 20:44:14 INFO mapred.LocalJobRunner:
13/11/19 20:44:15 INFO mapred.Task: Task:attempt_local830574342_0001_r_000000_0 is done. And is in the process of commiting
13/11/19 20:44:15 INFO mapred.LocalJobRunner:
13/11/19 20:44:15 INFO mapred.Task: Task attempt_local830574342_0001_r_000000_0 is allowed to commit now
13/11/19 20:44:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local830574342_0001_r_000000_0' to hdfs://10.182.11.77:9000/feixu/output8
13/11/19 20:44:15 INFO mapred.LocalJobRunner: reduce > reduce
13/11/19 20:44:15 INFO mapred.Task: Task 'attempt_local830574342_0001_r_000000_0' done.
13/11/19 20:44:15 INFO mapred.JobClient: map 100% reduce 100%
13/11/19 20:44:15 INFO mapred.JobClient: Job complete: job_local830574342_0001
13/11/19 20:44:15 INFO mapred.JobClient: Counters: 19
13/11/19 20:44:15 INFO mapred.JobClient: File Output Format Counters
13/11/19 20:44:15 INFO mapred.JobClient: Bytes Written=8
13/11/19 20:44:15 INFO mapred.JobClient: FileSystemCounters
13/11/19 20:44:15 INFO mapred.JobClient: FILE_BYTES_READ=12993068
13/11/19 20:44:15 INFO mapred.JobClient: HDFS_BYTES_READ=259894374
13/11/19 20:44:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=20152100
13/11/19 20:44:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=8
13/11/19 20:44:15 INFO mapred.JobClient: File Input Format Counters
13/11/19 20:44:15 INFO mapred.JobClient: Bytes Read=96390707
13/11/19 20:44:15 INFO mapred.JobClient: Map-Reduce Framework
13/11/19 20:44:15 INFO mapred.JobClient: Map output materialized bytes=3938265
13/11/19 20:44:15 INFO mapred.JobClient: Map input records=378370
13/11/19 20:44:15 INFO mapred.JobClient: Reduce shuffle bytes=0
13/11/19 20:44:15 INFO mapred.JobClient: Spilled Records=990386
13/11/19 20:44:15 INFO mapred.JobClient: Map output bytes=3222207
13/11/19 20:44:15 INFO mapred.JobClient: Total committed heap usage (bytes)=1566834688
13/11/19 20:44:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=222
13/11/19 20:44:15 INFO mapred.JobClient: Combine input records=0
13/11/19 20:44:15 INFO mapred.JobClient: Reduce input records=358023
13/11/19 20:44:15 INFO mapred.JobClient: Reduce input groups=1
13/11/19 20:44:15 INFO mapred.JobClient: Combine output records=0
13/11/19 20:44:15 INFO mapred.JobClient: Reduce output records=1
13/11/19 20:44:15 INFO mapred.JobClient: Map output records=358023



Window平台的eclipse连接linux的hadoop集群 - spring8743 - 我的博客
 
常见的问题:
1. eclipse的最新版4.3不支持hadoop1.2.1的插件,换成eclipse4.2 Juno版本就可以了
2. 将org.apache.hadoop.fs.FileUtil的方法checkRetureValue注释掉,然后重新编译打包。否则的话从windows连接到linux会出现UserPemission异常。
private static void checkReturnValue(boolean rv, File p, 
                                       FsPermission permission
                                       ) throws IOException {
/*  
    if (!rv) {
      throw new IOException("Failed to set permissions of path: " + p + 
                            " to " + 
                            String.format("%04o", permission.toShort()));
    }*/
  }

3. 在linux集群中的conf/hdfs-site.xml配置文件中添加如下,否则的话会报异常。
    <property>
       <name>dfs.permissions</name>
     <value>false</value>
    </property>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值