基本环境:
Windows 7 64bit
NTFS格式文件系统
Administrator账户
JDK 7u76(Oracle官网下载)
hadoop-src-2.6.0.tar.gz(在官网下载的源码:http://www.apache.org/dyn/closer.cgi/hadoop/common/)
Visual Studio 2010 Ultimate
先要找到两个基本的参考文档:
官方wiki上的:Build and Install Hadoop 2.x or newer on Windows,http://wiki.apache.org/hadoop/Hadoop2OnWindows
源码包根目录下的BUIDING.txt
按照BUILDING.txt中“Building on Windows”这一节的要求,需要安装好maven, protocolbuffer(这两个可以参考我前一篇博文),cmake(下载最新bin包解压将cmake.exe所在路径加入PATH变量),zlib(下载dll包解压,参照BUILDING.txt设置ZLIB_HOME)。由于需要用到一些unix命令,因此请安装cygwin-x86_64.exe(这实际上是个安装工具,还需要安装缺省的包,然后将bin目录的路径加入PATH变量),其他unix命令的移植似乎不一定好使,存在使得执行sh卡壳的现象——似乎是卡在tar命令上。
因为我使用Visual Studio所以就不要安装windows SDK了。注意vs有好几个命令提示符,请使用“Visual Studio x64 兼容工具命令提示(2010)”这个,好像win64那个有点问题。执行mvn命令之前检查Platform环境变量(注意这个变量名大小写要正确),确保它为x64(大小写正确)——标记为win64的命令提示符,会把Platform设置为X64。建议不要在同一份源码上切换命令提示符,因为它们使用的编译器和链接器似乎都是不同的版本。
JDK不要安装JDK8版本,否则编译文档会出现问题(首先出现的是这个问题,会不会导致其他问题,不好说)。JAVA_HOME要注意,如果安装路径名中有空格,要使用旧式的8.3名字格式,即C:\Program Files\要写成C:\Progra~1\。执行mvn命令之前使用java -version命令检查一下版本。
几个其他参考文献:
http://flashing.iteye.com/blog/2139534
http://www.blogjava.net/Bryan/archive/2014/08/22/417252.html
http://harishshan.blogspot.in/2014/10/install-hadoop-251-on-windows-7-64bit.html(这个可能不好访问,但是确实比较好;可以从这里下载http://vdisk.weibo.com/s/BICjq6gk86_pg,maff格式可以解压缩,本质上是zip格式)
附:
编译完成之后按照“Build and Install Hadoop 2.x or newer on Windows”一文给出的配置方式,尝试运行一个yarn作业,但是出错(下了个hadoop-2.5.2编译运行同样如此),原因还找不到,希望有方家指点。出错信息如下:【错误已经解决,方法在这个出错信息下面】
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15/04/14 10:16:07 INFO capacity.CapacityScheduler: Application attempt appattemp
t_1428977692524_0001_000002 released container container_1428977692524_0001_02_0
00001 on node: host: 192.168.26.51:3435 #containers=0 available=8192 used=0 with
event: FINISHED
15/04/14 10:16:07 INFO rmapp.RMAppImpl: Updating application application_1428977
692524_0001 with final state: FAILED
15/04/14 10:16:07 INFO rmapp.RMAppImpl: application_1428977692524_0001 State cha
nge from ACCEPTED to FINAL_SAVING
15/04/14 10:16:07 INFO recovery.RMStateStore: Updating info for app: application
_1428977692524_0001
15/04/14 10:16:07 INFO capacity.CapacityScheduler: Application Attempt appattemp
t_1428977692524_0001_000002 is done. finalState=FAILED
15/04/14 10:16:07 INFO scheduler.AppSchedulingInfo: Application application_1428
977692524_0001 requests cleared
15/04/14 10:16:07 INFO capacity.LeafQueue: Application removed - appId: applicat
ion_1428977692524_0001 user: yangyt queue: default #user-pending-applications: 0
#user-active-applications: 0 #queue-pending-applications: 0 #queue-active-appli
cations: 0
15/04/14 10:16:07 INFO rmapp.RMAppImpl: Application application_1428977692524_00
01 failed 2 times due to AM Container for appattempt_1428977692524_0001_000002 e
xited with exitCode: 5 due to: Exception from container-launch: ExitCodeExcepti
on exitCode=5: createTask error (5): ?????
ExitCodeException exitCode=5: createTask error (5): ?????
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 5
.Failing this attempt.. Failing the application.
15/04/14 10:16:07 INFO rmapp.RMAppImpl: application_1428977692524_0001 State cha
nge from FINAL_SAVING to FAILED
15/04/14 10:16:07 INFO capacity.ParentQueue: Application removed - appId: applic
ation_1428977692524_0001 user: yangyt leaf-queue of parent: root #applications:
0
15/04/14 10:16:07 WARN resourcemanager.RMAuditLogger: USER=yangyt OPERATIO
N=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPT
ION=App failed with state: FAILED PERMISSIONS=Application application_1428
977692524_0001 failed 2 times due to AM Container for appattempt_1428977692524_0
001_000002 exited with exitCode: 5 due to: Exception from container-launch: Exi
tCodeException exitCode=5: createTask error (5): ?????
ExitCodeException exitCode=5: createTask error (5): ?????
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.la
unchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.C
ontainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 5
.Failing this attempt.. Failing the application. APPID=application_142897
7692524_0001
15/04/14 10:16:07 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=ap
plication_1428977692524_0001,name=word count,user=yangyt,queue=default,state=FAI
LED,trackingUrl=http://yangyt-PC:8088/cluster/app/application_1428977692524_0001
,appMasterHost=N/A,startTime=1428977761161,finishTime=1428977767092,finalStatus=
FAILED
15/04/14 10:16:07 INFO ipc.Server: Socket Reader #1 for port 8032: readAndProces
s from client 192.168.26.51 threw exception [java.io.IOException: 远程主机强迫关
闭了一个现有的连接。]
java.io.IOException: 远程主机强迫关闭了一个现有的连接。
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2558)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:130)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:14
59)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:62
4)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
解决:
出现这个问题的原因是配置不对,“Build and Install Hadoop 2.x or newer on Windows”这篇文章的配置不是最新的,对2.6.0不适合,请参照这个地址http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html的配置,虽然这个配置说的是在linux下,但是对core-site.xml,hdfs-site.xml,mapreduce-site.xml,yarn-site.xml的配置可以照搬到windows,至于windows下环境变量设置等问题,还是按照“Build and Install Hadoop 2.x or newer on Windows”来。注意,修改core-site.xml,hdfs-site.xml之后重新格式化一下文件系统,即执行bin\hdfs namenode -format命令(最好重建一个,即删掉原来的tmp)。
最后执行例子命令:%HADOOP_PREFIX%\bin\yarn jar %HADOOP_PREFIX%\share\hadoop\mapreduce\hadoop-mapreduce-example s-2.6.0.jar wordcount /myfile.txt /out
我是成功执行了的。