hadoop2.2.0在windows上安装

Good news for Hadoop developers who want to use Microsoft Windows OS for their development activities. Finally Apache Hadoop 2.2.0 release officially supports for running Hadoop on Microsoft Windows as well. But the bin distribution of Apache Hadoop 2.2.0 release does not contain some windows native components (likewinutils.exe, hadoop.dll etc). As a result, if we try to run Hadoop in windows, we'll encounter ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
In this article, I'll describe how to build bin native distribution from source codes, install, configure and run Hadoop in Windows Platform.

Tools and Technologies used in this article :

  1. Apache Hadoop 2.2.0 Source codes

  2. Windows 7 OS

  3. Microsoft Windows SDK v7.1

  4. Maven 3.1.1

  5. Protocol Buffers 2.5.0

  6. Cygwin

  7. JDK 1.6

Build Hadoop bin distribution for Windows

  1. Download and install Microsoft Windows SDK v7.1 .

  2. Download and install Unix command-line tool Cygwin .

  3. Download and install Maven 3.1.1 .

  4. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf ).

  5. Add Environment Variables JAVA_HOME , M2_HOME and Platform if not added already.

    Add Environment Variables:
    Environment Variables - JAVA_HOME, M2_HOME and Platform

    Note : Variable name Platform is case sensitive. And value will be either x64 orWin32 for building on a 64-bit or 32-bit system.

    Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin ), bindirectory of Maven (say C:\maven\bin ) and installation path of Protocol Buffers(say c:\protobuf ).

    Edit Path Variable:
    PATH Variable - Cygwin & Protocol Buffers
  6. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (sayc:\hdfs ) to avoid runtime problem due to maximum path length limitation in Windows.

  7. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command Prompt . Change directory to Hadoop source code folder (c:\hdfs ). Execute mvn package with options -Pdist,native-win -DskipTests -Dtarto create Windows binary tar distribution.

    Windows SDK 7.1 Command Prompt
    Setting SDK environment relative to C:\Program Files\Microsoft SDKs\Windows\v7.1\.
    Targeting Windows 7 x64 Debug
    
    C:\Program Files\Microsoft SDKs\Windows\v7.1>cd c:\hdfs
    C:\hdfs>mvn package -Pdist,native-win -DskipTests -Dtar
    [INFO] Scanning for projects...
    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Build Order:
    [INFO] 
    [INFO] Apache Hadoop Main
    [INFO] Apache Hadoop Project POM
    [INFO] Apache Hadoop Annotations
    [INFO] Apache Hadoop Assemblies
    [INFO] Apache Hadoop Project Dist POM
    [INFO] Apache Hadoop Maven Plugins
    [INFO] Apache Hadoop Auth
    [INFO] Apache Hadoop Auth Examples
    [INFO] Apache Hadoop Common
    [INFO] Apache Hadoop NFS
    [INFO] Apache Hadoop Common Project
    [INFO] Apache Hadoop HDFS
    [INFO] Apache Hadoop HttpFS
    [INFO] Apache Hadoop HDFS BookKeeper Journal
    [INFO] Apache Hadoop HDFS-NFS
    [INFO] Apache Hadoop HDFS Project
    [INFO] hadoop-yarn
    [INFO] hadoop-yarn-api
    [INFO] hadoop-yarn-common
    [INFO] hadoop-yarn-server
    [INFO] hadoop-yarn-server-common
    [INFO] hadoop-yarn-server-nodemanager
    [INFO] hadoop-yarn-server-web-proxy
    [INFO] hadoop-yarn-server-resourcemanager
    [INFO] hadoop-yarn-server-tests
    [INFO] hadoop-yarn-client
    [INFO] hadoop-yarn-applications
    [INFO] hadoop-yarn-applications-distributedshell
    [INFO] hadoop-mapreduce-client
    [INFO] hadoop-mapreduce-client-core
    [INFO] hadoop-yarn-applications-unmanaged-am-launcher
    [INFO] hadoop-yarn-site
    [INFO] hadoop-yarn-project
    [INFO] hadoop-mapreduce-client-common
    [INFO] hadoop-mapreduce-client-shuffle
    [INFO] hadoop-mapreduce-client-app
    [INFO] hadoop-mapreduce-client-hs
    [INFO] hadoop-mapreduce-client-jobclient
    [INFO] hadoop-mapreduce-client-hs-plugins
    [INFO] Apache Hadoop MapReduce Examples
    [INFO] hadoop-mapreduce
    [INFO] Apache Hadoop MapReduce Streaming
    [INFO] Apache Hadoop Distributed Copy
    [INFO] Apache Hadoop Archives
    [INFO] Apache Hadoop Rumen
    [INFO] Apache Hadoop Gridmix
    [INFO] Apache Hadoop Data Join
    [INFO] Apache Hadoop Extras
    [INFO] Apache Hadoop Pipes
    [INFO] Apache Hadoop Tools Dist
    [INFO] Apache Hadoop Tools
    [INFO] Apache Hadoop Distribution
    [INFO] Apache Hadoop Client
    [INFO] Apache Hadoop Mini-
    [INFO]                                                                         
    [INFO] ------------------------------------------------------------------------
    [INFO] Building Apache Hadoop Main 2.2.0
    [INFO] ------------------------------------------------------------------------
    [INFO] 
    [INFO] --- maven-enforcer-plugin:1.3.1:enforce (default) @ hadoop-main ---
    [INFO] 
    [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ hadoop-main ---

    Note : I have pasted only the starting few lines of huge logs generated by maven. This building step requires Internet connection as Maven will download all the required dependencies.

  8. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0directory.

Install Hadoop

  1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop ).

  2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bindirectory of HADOOP_HOME (say C:\hadoop\bin ).

    Add Environment Variables:
    Environment Variables - HADOOP_HOME and Path

Configure Hadoop

Make following changes to configure Hadoop

  • File: C:\hadoop\etc\hadoop\core-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
      </property>
    </configuration>
    fs.defaultFS:
    The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
  • File: C:\hadoop\etc\hadoop\hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
        <name>dfs.replication</name>
        <value>1</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/data/dfs/namenode</value>
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/data/dfs/datanode</value>
      </property>
    </configuration>
    dfs.replication:
    Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
    dfs.namenode.name.dir:
    Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
    dfs.datanode.data.dir:
    Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.

    Note : Create namenode and datanode directory under c:/hadoop/data/dfs/ .

  • File: C:\hadoop\etc\hadoop\yarn-site.xml
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
      <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
      </property>
      <property>
         <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
    </configuration>
    yarn.nodemanager.aux-services:
    The auxiliary service name. Default value is  omapreduce_shuffle
    yarn.nodemanager.aux-services.mapreduce.shuffle.class:
    The auxiliary service class to use. Default value is org.apache.hadoop.mapred.ShuffleHandler
  • File: C:\hadoop\etc\hadoop\mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
      </property>
    </configuration>
    mapreduce.framework.name:
    The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.

Format namenode

For the first time only, namenode needs to be formatted.

Command Prompt
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\abhijitg>cd c:\hadoop\bin

c:\hadoop\bin>hdfs namenode -format
13/11/03 18:07:47 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ABHIJITG/x.x.x.x
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.2.0
STARTUP_MSG:   classpath = <classpath jars here>
STARTUP_MSG:   build = Unknown -r Unknown; compiled by ABHIJITG on 2013-11-01T13:42Z
STARTUP_MSG:   java = 1.7.0_03
************************************************************/
Formatting using clusterid: CID-1af0bd9f-efee-4d4e-9f03-a0032c22e5eb
13/11/03 18:07:48 INFO namenode.HostFileManager: read includes:
HostSet(
)
13/11/03 18:07:48 INFO namenode.HostFileManager: read excludes:
HostSet(
)
13/11/03 18:07:48 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
13/11/03 18:07:48 INFO util.GSet: Computing capacity for map BlocksMap
13/11/03 18:07:48 INFO util.GSet: VM type       = 64-bit
13/11/03 18:07:48 INFO util.GSet: 2.0% max memory = 888.9 MB
13/11/03 18:07:48 INFO util.GSet: capacity      = 2^21 = 2097152 entries
13/11/03 18:07:48 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
13/11/03 18:07:48 INFO blockmanagement.BlockManager: defaultReplication         = 1
13/11/03 18:07:48 INFO blockmanagement.BlockManager: maxReplication             = 512
13/11/03 18:07:48 INFO blockmanagement.BlockManager: minReplication             = 1
13/11/03 18:07:48 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
13/11/03 18:07:48 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
13/11/03 18:07:48 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
13/11/03 18:07:48 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
13/11/03 18:07:48 INFO namenode.FSNamesystem: fsOwner             = ABHIJITG (auth:SIMPLE)
13/11/03 18:07:48 INFO namenode.FSNamesystem: supergroup          = supergroup
13/11/03 18:07:48 INFO namenode.FSNamesystem: isPermissionEnabled = true
13/11/03 18:07:48 INFO namenode.FSNamesystem: HA Enabled: false
13/11/03 18:07:48 INFO namenode.FSNamesystem: Append Enabled: true
13/11/03 18:07:49 INFO util.GSet: Computing capacity for map INodeMap
13/11/03 18:07:49 INFO util.GSet: VM type       = 64-bit
13/11/03 18:07:49 INFO util.GSet: 1.0% max memory = 888.9 MB
13/11/03 18:07:49 INFO util.GSet: capacity      = 2^20 = 1048576 entries
13/11/03 18:07:49 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/11/03 18:07:49 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
13/11/03 18:07:49 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/11/03 18:07:49 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
13/11/03 18:07:49 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
13/11/03 18:07:49 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time
is 600000 millis
13/11/03 18:07:49 INFO util.GSet: Computing capacity for map Namenode Retry Cache
13/11/03 18:07:49 INFO util.GSet: VM type       = 64-bit
13/11/03 18:07:49 INFO util.GSet: 0.029999999329447746% max memory = 888.9 MB
13/11/03 18:07:49 INFO util.GSet: capacity      = 2^15 = 32768 entries
13/11/03 18:07:49 INFO common.Storage: Storage directory \hadoop\data\dfs\namenode has been successfully formatted.
13/11/03 18:07:49 INFO namenode.FSImage: Saving image file \hadoop\data\dfs\namenode\current\fsimage.ckpt_00000000000000
00000 using no compression
13/11/03 18:07:49 INFO namenode.FSImage: Image file \hadoop\data\dfs\namenode\current\fsimage.ckpt_0000000000000000000 o
f size 200 bytes saved in 0 seconds.
13/11/03 18:07:49 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/11/03 18:07:49 INFO util.ExitUtil: Exiting with status 0
13/11/03 18:07:49 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ABHIJITG/x.x.x.x
************************************************************/

Start HDFS (Namenode and Datanode)

Command Prompt
C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-dfs

Two separate Command Prompt windows will be opened automatically to runNamenode and Datanode .

Command Prompt - Namenode & Datanode

Start MapReduce aka YARN (Resource Manager and Node Manager)

Command Prompt
C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>start-yarn
starting yarn daemons

Similarly, two separate Command Prompt windows will be opened automatically to runResource Manager and Node Manager .

Command Prompt - Resource Manager & Node Manager

Verify Installation

If everything goes well then you will be able to open the Resource Manager and Node Manager at http://localhost:8042 and Namenode at http://localhost:50070 .

Node Manager: http://localhost:8042/
Browser - Resource manager and Node manager
Namenode: http://localhost:50070
Browser - Namenode

Stop HDFS & MapReduce

Command Prompt
C:\Users\abhijitg>cd c:\hadoop\sbin
c:\hadoop\sbin>stop-dfs
SUCCESS: Sent termination signal to the process with PID 876.
SUCCESS: Sent termination signal to the process with PID 3848.

c:\hadoop\sbin>stop-yarn
stopping yarn daemons
SUCCESS: Sent termination signal to the process with PID 5920.
SUCCESS: Sent termination signal to the process with PID 7612.

INFO: No tasks running with the specified criteria.

Download SrcCodes

遇到的问题

1. 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (com
pile-ms-winutils) on project hadoop-common: Command execution failed. Process ex
ited with an error: 1(Exit value: 1) -> [Help 1]
 
Windows SDK  7.1  Command Prompt run the following cmd first before mvn build.  
set Platform = x64
  1. opening a dos command prompt

  2. starting "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" which modifies the PATH variable and sets some other variables

  3. After that I started cygwin directly from this command prompt

    C:\cygwin64\bin\mintty.exe -i /Cygwin-Terminal.ico - 

2. 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (com
pile-ms-winutils) on project hadoop-common: Command execution failed. Process ex
ited with an error: 1(Exit value: 1) -> [Help 1]
 
Windows SDK  7.1  Command Prompt run the following cmd first before mvn build.  
set Platform = x64
  1. opening a dos command prompt

  2. starting "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" which modifies the PATH variable and sets some other variables

  3. After that I started cygwin directly from this command prompt

    C:\cygwin64\bin\mintty.exe -i /Cygwin-Terminal.ico - 

ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:
run (pre-dist) on project hadoop-project-dist: An Ant BuildException has occured
: Execute failed: java.io.IOException: Cannot run program "sh" (in directory "C:
\hadoop\hadoop-2.5.0-src\hadoop-project-dist\target"): CreateProcess error=2, ??
???????
[ERROR] around Ant part ...<exec dir="C:\hadoop\hadoop-2.5.0-src\hadoop-project-
dist\target" executable="sh" failοnerrοr="true">... @ 31:104 in C:\hadoop\hadoop
-2.5.0-src\hadoop-project-dist\target\antrun\build-main.xml
[ERROR] -> [Help 1]






[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (com
pile-ms-winutils) on project hadoop-common: Command execution failed. Process ex
ited with an error: 1(Exit value: 1) -> [Help 1]
 
Windows SDK  7.1  Command Prompt run the following cmd first before mvn build.  
set Platform = x64
  1. opening a dos command prompt

  2. starting "c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" which modifies the PATH variable and sets some other variables

  3. After that I started cygwin directly from this command prompt

    C:\cygwin64\bin\mintty.exe -i /Cygwin-Terminal.ico - 
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
HadoopWindows下的安装过程相对于其他操作系统稍微复杂一些,以下是一般的安装步骤: 1. 下载Hadoop:你可以从Apache官方网站(https://hadoop.apache.org/)下载最新的Hadoop稳定版本。选择适合Windows平台的二进制版本。 2. 安装Java:Hadoop是基于Java开发的,所以需要先安装Java环境。确保已经安装了JDK(Java Development Kit)。你可以从Oracle官方网站(https://www.oracle.com/java/technologies/javase-jdk11-downloads.html)下载适合你的Windows版本的JDK,并按照安装向导进行安装。 3. 配置环境变量:打开控制面板,进入"系统与安全" -> "系统" -> "高级系统设置" -> "环境变量"。在"系统变量"中新建一个名为`JAVA_HOME`的变量,并将其值设置为JDK的安装路径(例如:C:\Program Files\Java\jdk-11.0.12)。然后,在"系统变量"的"Path"变量中添加`%JAVA_HOME%\bin`。 4. 解压Hadoop:将下载好的Hadoop压缩包解压到一个目录(例如:C:\hadoop)。 5. 配置Hadoop:进入Hadoop解压目录,在`etc/hadoop`目录下找到`hadoop-env.cmd`文件,用文本编辑器打开。找到并修改以下行,设置Java环境变量: ``` set JAVA_HOME=<你的JDK安装路径> ``` 6. 配置核心文件:在`etc/hadoop`目录下找到`core-site.xml.template`文件,将其复制并重命名为`core-site.xml`。用文本编辑器打开`core-site.xml`,并添加以下配置: ``` <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> ``` 7. 配置HDFS文件系统:在`etc/hadoop`目录下找到`hdfs-site.xml.template`文件,将其复制并重命名为`hdfs-site.xml`。用文本编辑器打开`hdfs-site.xml`,并添加以下配置: ``` <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> ``` 8. 启动Hadoop:打开命令提示符,进入Hadoop解压目录的`bin`目录。运行以下命令启动Hadoop: ``` start-all.cmd ``` 9. 验证安装:打开浏览器,访问http://localhost:50070/,查看Hadoop的Web界面是否正常显示。 这些是基本的安装步骤,你可以根据实际需求进行进一步的配置和调整。希望对你有所帮助!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值