Hadoop0.20.2在Windows 7部署，以及Eclipse3.7的插件配置

最新推荐文章于 2022-11-23 10:53:15 发布

cuiyanmin1936

最新推荐文章于 2022-11-23 10:53:15 发布

阅读量106

点赞数

文章标签：大数据开发工具操作系统

Hadoop0.20.2在Windows 7部署，以及Eclipse3.7的插件配置

参考网上N多资料，自己弄了多半天；原本认为window平台安装部署应该比较简单，看起来并不简单，并不轻松。

将自己的部署经过书写如下。

一、准备部署需要的资源

1、Cygwin，是一款在windows平台模拟linux环境的工具，访问http://cygwin.com；

2、JDK1.6.0_20 for win64

3、hadoop 0.20.2

4、eclipse-java-indigo-win32-x86_64.zip

二、安装JDK

定义JAVA_HOME，CLASSPATH环境变量。

三、安装Cygwin

需要安装下面的组件：

Net Category: openssh，openssl

Base Category：sed（这个是后面Eclipse配置插件需要的）

Editor Category：vim（可选）

Devel Category：subversion（可选）

安装完，在桌面上会出现快捷图标。

然后，双击该图标，在窗口中输入ssh-host-config，如下：

配置过程如下：

Mike@Mike-PC ~

$ ssh-host-config

*** Query: Overwrite existing /etc/ssh_config file? (yes/no) yes

*** Info: Creating default /etc/ssh_config file

*** Query: Overwrite existing /etc/sshd_config file? (yes/no) yes

*** Info: Creating default /etc/sshd_config file

*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.

*** Info: However, this requires a non-privileged account called 'sshd'.

*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ ME.privsep.

*** Query: Should privilege separation be used? (yes/no) yes

*** Info: Note that creating a new user requires that the current account have

*** Info: Administrator privileges. Should this script. attempt to create a

*** Query: new local account 'sshd'? (yes/no) yes

*** Info: Updating /etc/sshd_config file

*** Query: Do you want to install sshd as a service?

*** Query: (Say "no" if it is already installed as a service) (yes/no) yes

*** Query: Enter the value of CYGWIN for the daemon: [] ntsec

*** Info: On Windows Server 2003, Windows Vista, and above, the

*** Info: SYSTEM account cannot setuid to other users -- a capability

*** Info: sshd requires. You need to have or to create a privileged

*** Info: account. This script. will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,

*** Info: or later. On these systems, it's not possible to use the LocalSystem

*** Info: account for services that can change the user id without an

*** Info: explicit password (such as passwordless logins [e.g. public key

*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create

*** Info: a new account with special privileges (unless a similar account

*** Info: already exists). This account is then used to run these special

*** Info: servers.

*** Info: Note that creating a new user requires that the current account

*** Info: have Administrator privileges itself.

*** Info: No privileged account could be found.

*** Info: This script. plans to use 'cyg_server'.

*** Info: 'cyg_server' will only be used by registered services.

*** Query: Do you want to use a different name? (yes/no) yes

*** Query: Enter the new user name: Mike

*** Query: Reenter: Mike

*** Warning: Privileged account 'Mike' was specified,

*** Warning: but it does not have the necessary privileges.

*** Warning: Continuing, but will probably use a different account.

*** Warning: The specified account 'Mike' does not have the

*** Warning: required permissions or group memberships. This may

*** Warning: cause problems if not corrected; continuing...

*** Query: Please enter the password for user 'Mike':

*** Query: Reenter:

*** Info: The sshd service has been installed under the 'Mike'

*** Info: account. To start the service now, call `net start sshd' or

*** Info: `cygrunsrv -S sshd'. Otherwise, it will start automatically

*** Info: after the next reboot.

*** Info: Host configuration finished. Have fun!

出现“Have fun!”，说明ssh服务创建成功。

“计算机”右键，选择“管理”，选择“服务”，启动CYGWIN sshd服务，如下图所示：

启动后，在cygwin执行下面命令：

1、ssh-keygen

2、cd .ssh

3、cp id_rsa_pub anthorized_keys

4、退出cygwin

5、运行ssh localhost

四、安装Hadoop

将压缩包解压，并定义hadoop_home，将解压的目录移动到hadoop_home下。

下面开始配置Hadoop。需要配置的文件：（hadoop/conf目录下）

l hadoop-env.sh

l core-site.xml

l hdfs-site.xml

l mapred-site.xml

第一个文件 hadoop-env.sh

把里面的JAVA_HOME改掉，注意export前面的#号要去掉。

而且必须要使用linux的路径表达方式。在CygWin中对应的路径为： “/cygdrive/c/Progam Files/java/jdk1.6.0_20”

第二个文件：core-site.xml

首先删除它，然后把hadoop/src/core目录下的core-default.xml文件复制到conf目录下，并命名为core-site.xml。然后修改其中的fs.default.name变量，如下所示。

（确保端口号9001未被占用）

第三个文件：hdfs-site.xml

首先把它删除，然后复制src/hdfs目录下的hdfs-default.xml到conf目录下，并改名为hdfs-site.xml；然后修改dfs.replication变量，如下图示：

由于是伪分布式部署，所以设为1.

第四个文件：mapred-site.xml

首先删除它，然后复制src/mapred目录下的mapred-default.xml到conf目录下，并改名为mapred-site.xml，然后修改其mapred.job.tracker变量：

（同样确保端口号未被占用）

五、启动Hadoop

1、在hadoop_home下，创建logs目录；

2、格式化namenode，创建HDFS，命令：bin/hadoop namenode –format

3、启动，命令：bin/start-all.sh

4、执行jps，查看状态

不过此时执行PS命令，能看到5个JVM进程。

为了验证HDFS能够正常工作我们可以进行文件的上传操作。

执行命令：

bin/hadoop fs -mkdir in

bin/hadoop fs -put *.txt in

六、配置eclipse插件

下载eclipse-java-indigo-win32-x86_64.zip，版本是3.7，插件使用hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar，这个插件地址是https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar。千万注意eclipse是3.7，不要使用3.7.1、3.7.2；就因为这个问题，整整搞了3、4个小时。