Hadoop0.20.2在Windows 7部署,以及Eclipse3.7的插件配置
参考网上N多资料,自己弄了多半天;原本认为window平台安装部署应该比较简单,看起来并不简单,并不轻松。
将自己的部署经过书写如下。
一、准备部署需要的资源
1、Cygwin,是一款在windows平台模拟linux环境的工具,访问http://cygwin.com;
2、JDK1.6.0_20 for win64
3、hadoop 0.20.2
4、eclipse-java-indigo-win32-x86_64.zip
二、安装JDK
定义JAVA_HOME,CLASSPATH环境变量。
三、安装Cygwin
需要安装下面的组件:
Net Category: openssh,openssl
Base Category:sed(这个是后面Eclipse配置插件需要的)
Editor Category:vim(可选)
Devel Category:subversion(可选)
安装完,在桌面上会出现快捷图标。
然后,双击该图标,在窗口中输入ssh-host-config,如下:
配置过程如下:
Mike@Mike-PC ~
$ ssh-host-config
*** Query: Overwrite existing /etc/ssh_config file? (yes/no) yes
*** Info: Creating default /etc/ssh_config file
*** Query: Overwrite existing /etc/sshd_config file? (yes/no) yes
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ ME.privsep.
*** Query: Should privilege separation be used? (yes/no) yes
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges. Should this script. attempt to create a
*** Query: new local account 'sshd'? (yes/no) yes
*** Info: Updating /etc/sshd_config file
*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] ntsec
*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires. You need to have or to create a privileged
*** Info: account. This script. will help you do so.
*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later. On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).
*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.
*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.
*** Info: No privileged account could be found.
*** Info: This script. plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.
*** Query: Do you want to use a different name? (yes/no) yes
*** Query: Enter the new user name: Mike
*** Query: Reenter: Mike
*** Warning: Privileged account 'Mike' was specified,
*** Warning: but it does not have the necessary privileges.
*** Warning: Continuing, but will probably use a different account.
*** Warning: The specified account 'Mike' does not have the
*** Warning: required permissions or group memberships. This may
*** Warning: cause problems if not corrected; continuing...
*** Query: Please enter the password for user 'Mike':
*** Query: Reenter:
*** Info: The sshd service has been installed under the 'Mike'
*** Info: account. To start the service now, call `net start sshd' or
*** Info: `cygrunsrv -S sshd'. Otherwise, it will start automatically
*** Info: after the next reboot.
*** Info: Host configuration finished. Have fun!
出现“Have fun!”,说明ssh服务创建成功。
“计算机”右键,选择“管理”,选择“服务”,启动CYGWIN sshd服务,如下图所示:
启动后,在cygwin执行下面命令:
1、ssh-keygen
2、cd .ssh
3、cp id_rsa_pub anthorized_keys
4、退出cygwin
5、运行ssh localhost
四、安装Hadoop
将压缩包解压,并定义hadoop_home,将解压的目录移动到hadoop_home下。
下面开始配置Hadoop。需要配置的文件:(hadoop/conf目录下)
l hadoop-env.sh
l core-site.xml
l hdfs-site.xml
l mapred-site.xml
第一个文件 hadoop-env.sh
把里面的JAVA_HOME改掉,注意export前面的#号要去掉。
而且必须要使用linux的路径表达方式。在CygWin中对应的路径为: “/cygdrive/c/Progam Files/java/jdk1.6.0_20”
第二个文件:core-site.xml
首先删除它,然后把hadoop/src/core目录下的core-default.xml文件复制到conf目录下,并命名为core-site.xml。然后修改其中的fs.default.name变量,如下所示。
(确保端口号9001未被占用)
第三个文件:hdfs-site.xml
首先把它删除,然后复制src/hdfs目录下的hdfs-default.xml到conf目录下,并改名为hdfs-site.xml;然后修改dfs.replication变量,如下图示:
由于是伪分布式部署,所以设为1.
第四个文件:mapred-site.xml
首先删除它,然后复制src/mapred目录下的mapred-default.xml到conf目录下,并改名为mapred-site.xml,然后修改其mapred.job.tracker变量:
(同样确保端口号未被占用)
五、启动Hadoop
1、在hadoop_home下,创建logs目录;
2、格式化namenode,创建HDFS,命令:bin/hadoop namenode –format
3、启动,命令:bin/start-all.sh
4、执行jps,查看状态
不过此时执行PS命令,能看到5个JVM进程。
为了验证HDFS能够正常工作我们可以进行文件的上传操作。
执行命令:
bin/hadoop fs -mkdir in
bin/hadoop fs -put *.txt in
六、配置eclipse插件
下载eclipse-java-indigo-win32-x86_64.zip,版本是3.7,插件使用hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar,这个插件地址是https://issues.apache.org/jira/secure/attachment/12460491/hadoop-eclipse-plugin-0.20.3-SNAPSHOT.jar。千万注意eclipse是3.7,不要使用3.7.1、3.7.2;就因为这个问题,整整搞了3、4个小时。
将这个jar包改名字为“hadoop-0.20.2-eclipse-plugin.jar”,复制到eclipse的plugins目录中,启动eclipse。
选择“window”,再选择“Preferences”。
选择“Hadoop Map//Reduce”,填写hadoop安装目录,点击“OK”。
安装上图选择,出现下图:
选择“Map/Reduce Locations”,点击“OK”。出现下图:
点击右键,选择“create map/reduce location”,然后进行配置。
点击“finish”。
然后创建project,通过项目窗口,可以看到HDFS的树结构。
到此在windows 7平台上部署完hadoop,并且配置了eclipse插件。现在可以开始使用eclipse进行hadoop开发了。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/26812308/viewspace-748143/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/26812308/viewspace-748143/