D:\workspace\java>mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=siat.hadoop -DartifactId=TestHadoop -DpackageName=siat.hadoop -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-archetype-plugin:2.2:generate (default-cli) > generate-sources
@ standalone-pom >>>
[INFO]
[INFO] <<< maven-archetype-plugin:2.2:generate (default-cli) < generate-sources
@ standalone-pom <<<</pre>
解释一下,-DgroupId=siat.hadoop -DartifactId=TestHadoop -DpackageName=siat.hadoop这些参数指定好之后会自动创建,不需要手动创建
~ D:\workspace\java>cd TestHadoop
~ D:\workspace\java\TestHadoop>mvn clean install
。。。
[INFO] Installing D:\workspace\java\TestHadoop\target\TestHadoop-1.0-SNAPSHOT.j
[INFO]
[INFO] --- maven-archetype-plugin:2.2:generate (default-cli) @ standalone-pom --
-
[INFO] Generating project in Batch mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.
archetypes:maven-archetype-quickstart:1.0)
Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/archetypes/ma
ven-archetype-quickstart/1.0/maven-archetype-quickstart-1.0.jar
Downloaded: https://repo.maven.apache.org/maven2/org/apache/maven/archetypes/mav
en-archetype-quickstart/1.0/maven-archetype-quickstart-1.0.jar (5 KB at 2.6 KB/s
ec)
Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/archetypes/ma
ven-archetype-quickstart/1.0/maven-archetype-quickstart-1.0.pom
Downloaded: https://repo.maven.apache.org/maven2/org/apache/maven/archetypes/mav
en-archetype-quickstart/1.0/maven-archetype-quickstart-1.0.pom (703 B at 1.5 KB/
sec)
[INFO] -------------------------------------------------------------------------
---
[INFO] Using following parameters for creating project from Old (1.x) Archetype:
maven-archetype-quickstart:1.0
[INFO] -------------------------------------------------------------------------
---
[INFO] Parameter: groupId, Value: siat.hadoop
[INFO] Parameter: packageName, Value: siat.hadoop
[INFO] Parameter: package, Value: siat.hadoop
[INFO] Parameter: artifactId, Value: TestHadoop
[INFO] Parameter: basedir, Value: D:\workspace\java
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir: D:\workspace\java\TestHa
doop
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.390 s
[INFO] Finished at: 2014-10-23T15:45:02+08:00
[INFO] Final Memory: 11M/27M
[INFO] ------------------------------------------------------------------------
D:\workspace\java>
3. 导入项目到eclipse
File-->Import-->Maven-->Exsiting Maven Projects(不要选General导入,有可能会识别不了maven项目)
4. 增加hadoop依赖
这里我使用hadoop-1.2.1版本,修改文件:pom.xml(修改D:\workspace\java\TestHadoop下的文件,eclipse中会自动同步)
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>siat.hadoop</groupId>
<artifactId>TestHadoop</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>TestHadoop</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
5. 下载依赖
~ mvn clean install (和第2步一样)
在eclipse中刷新项目:看到Maven Denpendencies中已经成功加载了hadoop-core-1.2.1.jar等其他依赖包。项目的依赖程序,被自动加载的库路径下面。
6. 从Hadoop集群环境下载hadoop配置文件(我没有建集群,只部署了一台hadoop的master服务器)
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
保存在src/main/resources/hadoop目录下面
7.配置本地host,增加master的域名指向
c:/Windows/System32/drivers/etc/hosts
172.21.5.235 master(集群中的主服务器,即NameNode)
这里我用laptop部署好了hadoop的环境,当作服务器,6个进程全部启动,laptop和台式机都连在了一个路由器上,即一个局域网内。
8. MapReduce程序开发
编写一个简单的MapReduce程序,实现wordcount功能。
新建一个Java文件:WordCount.java
。。。
。。。(win下需要重新编译hadoop的jar包,具体参考下面网站)
9. 说明
这样,我们就实现了在win7/win8中的开发,通过Maven构建Hadoop依赖环境,在Eclipse中开发MapReduce的程序,然后运行JavaAPP。Hadoop应用会自动把我们的MR程序打成jar包,再上传的远程的hadoop环境中运行,返回日志在Eclipse控制台输出。
参考:http://blog.fens.me/hadoop-maven-eclipse/