1.将hadoop 安装部署包解压到window的硬盘中,比如解压到d:/hadoop
2.设置环境变量 HADOOP_HOME 一定要大写
3.去github上下载hadoop-common-2.2.0-bin-master 然后解压,将d:/hadoop/bin 目录全部替换为hadoop-common-2.2.0-bin-master解压后的文件 https://github.com/srccodes/hadoop-common-2.2.0-bin
环境变量如果不生效请重启电脑
4.将d:/hadoop中所有的jar包拷贝出来放入一个单独的文件夹中(可以使用命令拷贝)
5.编译或者下载hadoop的eclipse插件,我用的是hadoop2.7.1,可以用ant命令编译,百度上搜索
6.将hadoop-eclipse-plugin-2.7.1.jar拷贝到eclipse的pulgins.启动eclipse,在show view 中的other可以找到map/reduce
7.具体的配置参数不多讲,可以百度
8.新建项目,编写测试例子wordcount,如果出现错误
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977)
at org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:187)
at org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:174)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:108)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:285)
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:115)
at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:131)
at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at com.mrchor.HadoopDev.hadoopDev.WordCountApp.main(WordCountApp.java:34)
这个是hdfs 的权限引起的,要求当前的window的用户,需要加入Linux的用户组,这里我们可以采用修改hadoop的源码方式,在window上取消这个认证
具体的步骤是这样的
NativeIO.java 拷贝出来,复制到你的工程中,然后找到
public static boolean access(String path, AccessRight desiredAccess)
throws IOException {
return access0(path, desiredAccess.accessRight());
}
改为
public static boolean access(String path, AccessRight desiredAccess)
throws IOException {
return true;
}
强制取消hdfs的认证。这个时候运行程序,不会报错,运行成功。
9.运行过程中打印的WARN,INFO 这些日志不用管。
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 这个也不用管,这是因为平台之间的lib库不一致导致的,不影响程序的调试。
但是如果在Linux端,运行jar 包提示这个错误,请在Linux64位系统中从新编译hadoop,官方提供的tar部署包,是默认的32位的。