原文转自:https://brucebcampbell.wordpress.com/2014/09/08/cdh5-integration-with-eclipse/
笔者按照教程上来了一遍:
一. 前期准备
1. 拷贝源码包
笔者先从已经部署好的 Hadoop 安装包下将 eclipse-plugin 的包给 copy 到其他路径,为了防止破坏 Hadoop 已部署的安装包。
cd /usr/local/cluster/hadoop/src/hadoop-mapreduce1-project/src/contrib
cp -r eclipse-plugin ~/softwares/
cd ~/softwares/eclipse-plugin
2. 确定编译的 Hadoop 插件版本
因为下面一步中,设置 Maven 的国内镜像里面,选用的 oschina 镜像 http://maven.oschina.net/content/groups/public/
当 Maven 编译的时候,它会自动搜索到如下 URL 寻找相应的包
http://maven.oschina.net/content/groups/public/org/apache/hadoop/hadoop-client/
如上图所示,在此镜像中并没有笔者想要的 Hadoop-2.5.0-cdh5.3.2 。无奈下,笔者只好选用 2.5.0-cdh5.2.0 (因为编译 2.5.0-cdh5.2.1 会出现问题 )。这个 2.5.0-cdh5.2.0 在下面的配置参数中会用上。
二. 安装 Maven 并编译
安装 Maven 请参照:http://blog.csdn.net/u011414200/article/details/47857917
然后 maven 编译
mvn archetype:generate \
-DarchetypeGroupId=org.apache.maven.archetypes \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DgroupId=com.bcampbell.hadoopproject \
-DartifactId=wordcount
编译成功之后,如下
- 生成的 wordcount 工程目录,该目录下有个 pom.xml 的文件
三. 修改 pom.xml 文件
进入刚生产 wordcount 的目录下,然修改 pom.xml 文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bcampbell.hadoopproject</groupId>
<artifactId>wordcount</artifactId>
<version>0.0.1</version>
<packaging>jar</packaging>
<name>wordcount</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.5.0-cdh5.2.0</hadoop.version>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins </groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.9</version>
<configuration>
<projectNameTemplate>
${project.artifactId}
</projectNameTemplate>
<buildOutputDirectory>
eclipse-classes
</buildOutputDirectory>
<downloadSources>true</downloadSources>
<downloadJavadocs>false</downloadJavadocs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.7.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.9</version>
<configuration>
<buildOutputDirectory>eclipse-classes</buildOutputDirectory>
<downloadSources>true</downloadSources>
<downloadJavadocs>false</downloadJavadocs>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
</project>
请严格按照笔者的配置来,避免出错。下附上原文作者的坑
Sadly I’m getting many missing libraries in the Eclipse workspace. These are specified in Maven but did not get downloaded. They are all the Hadoop jars. There was a step for this, but I think I go think I got it wrong. I’ll update later when I get this working. UPDATE – I changed the hadoop dependency to;
org.apache.hadoop
hadoop-client
2.3.0-cdh5.1.2
但是笔者我已经将其提前改正了…
四. 配置并生成一个 build
还是在 wordcount 目录下
mvn validate
mvn compile
mvn package
- 生成的 target 目录
五. 生成 eclipse workspace
还是在 wordcount 的目录下
mvn -Declipse.workspace=eclipse_workspace eclipse:configure-workspace eclipse:eclipse
六. 让 Eclipse 知道 Maven
1. Window -> Preferences
2. Java -> Build Path -> Classpath Variables -> New
3. name 可以取名为 M2_REPO
4. path 可以类似 ~/.m2
5. 双击 OK 即可
Note: 关于安装 eclipse 的过程,笔者我就不赘述了。
七. 检查启动
进入 wordcount 下的 target 目录,然后
java -cp wordcount-0.0.1.jar com.bcampbell.hadoopproject.App
接着运行 jar 里的 wordcount
hadoop jar wordcount-0.0.1.jar com.bcampbell.hadoopproject.WordCount /data/wordcount/ /output10
这里首先要启动 Hadoop 集群
hadoop jar wordcount-0.0.1.jar com.bcampbell.hadoopproject.WordCount /user/bcampbell/input output44
于是笔者我就华丽丽地挂在这里了… 说是 找不到 com.bcampbell.hadoopproject.WordCount 这个类,哎…
最后的最后,笔者其实想将的是,,就是完成了这些,我依旧不知道我做的这些到底是为了什么???