环境模拟是用cygwin搭建的,可以参考:
cygwin架设SSH服务
Hadoop配置:
hadoop-env.sh
增加:export JAVA_HOME="/cygdrive/d/java/jdk1.6.0_10"
core-site.xml
这个配置中的hadoop.tmp.dir必须是linux路径格式:/temp/hadooptmp(但format后,其实是在d:\temp\hadooptmp目录下)
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/temp/hadooptmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://165.163.93.125:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>165.163.93.125:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
注意事项:
1. 将hadoop-1.0.2目录放在cygwin安装目录的用户目录下,如用户名为test,就是:D:\java\cygwin\home\test。
2. 如果想通过putty连接,在/home/test/.ssh输入以下命令后:
#ssh-keygen -t rsa -P '' -f id_rsa
#cat id_rsa.pub >> authorized_keys
#ssh localhost
将.ssh目录下的id_rsa,通过PUTTYGEN.EXE的load功能,load进来后,保存成.ppk后缀的密钥,就可以通过putty免密码登录了。
3. eclipse中运行map/reduce程序时,需要配置参数,因为安装cygwin时,默认创建的用户是cyg_server,
所以program arguments不能只写input output,不然会默认找
/user/cyg_server/input目录,应该写成/user/cyg_server/input /user/cyg_server/output。