伪分布式模式就是用单机模拟多台机器的情况。
1.需要添加hadoop登陆权限,在分布式系统中,NameNode需要ssh权限来控制dataNode节点上进程的开始 过程和结束
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
注意,需要开启sshd服务
2.修改配置文件:伪分布式模式下的node节点配置情况。需要修改core-site.xml文件,添加如下内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>localhost:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
fs.default.name配置namenode节点的ip和端口号;dfs.replication是datanode节点block数据的备份冗余数目。
老的教程中会让你修改hadoop-site.xml文件,0.2的版本后已经被core-site替代,但是其实也是支持hadoop-site的配置的。
3.格式化hdfs文件系统:bin/hadoop namenode - format会在/tmp下面生成一个分布式文件系统目录。
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = songjings-macpro31.local/10.13.42.56
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/08/08 19:10:36 INFO namenode.FSNamesystem: fsOwner=songjing,staff,_lpadmin,com.apple.sharepoint.group.1,_appserveradm,_appserverusr,admin
11/08/08 19:10:36 INFO namenode.FSNamesystem: supergroup=supergroup
11/08/08 19:10:36 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/08/08 19:10:36 INFO common.Storage: Image file of size 98 saved in 0 seconds.
11/08/08 19:10:36 INFO common.Storage: Storage directory /tmp/hadoop-songjing/dfs/name has been successfully formatted.
11/08/08 19:10:36 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at songjings-macpro31.local/10.13.42.56
4.启动hadoop进程:
bin/start-all.sh
会启动五个java进程:JobTrack,TaskTrack,namenode,datanode和SecondNameNode(back up for namenode)
5.跑例子:starting namenode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-namenode-songjings-macpro31.local.out localhost: starting datanode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-datanode-songjings-macpro31.local.out localhost: starting secondarynamenode, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-secondarynamenode-songjings-macpro31.local.out starting jobtracker, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-jobtracker-songjings-macpro31.local.out localhost: starting tasktracker, logging to /Users/songjing/projects/hadoop-0.20.2/bin/../logs/hadoop-songjing-tasktracker-songjings-macpro31.local.out
先将本地文件copy到dfs文件系统中:
bin/hadoop dfs -put ./test-in input然后执行计算:
bin/hadoop jar hadoop-0.16.0-examples.jar wordcount input output
11/08/08 19:28:35 INFO input.FileInputFormat: Total input paths to process : 2
11/08/08 19:28:36 INFO mapred.JobClient: Running job: job_201108081923_0001
11/08/08 19:28:37 INFO mapred.JobClient: map 0% reduce 0%
11/08/08 19:28:55 INFO mapred.JobClient: map 100% reduce 0%
11/08/08 19:29:09 INFO mapred.JobClient: map 100% reduce 100%
11/08/08 19:29:14 INFO mapred.JobClient: Job complete: job_201108081923_0001
11/08/08 19:29:14 INFO mapred.JobClient: Counters: 17
11/08/08 19:29:14 INFO mapred.JobClient: Job Counters
11/08/08 19:29:14 INFO mapred.JobClient: Launched reduce tasks=1
11/08/08 19:29:14 INFO mapred.JobClient: Launched map tasks=2
11/08/08 19:29:14 INFO mapred.JobClient: Data-local map tasks=2
11/08/08 19:29:14 INFO mapred.JobClient: FileSystemCounters
11/08/08 19:29:14 INFO mapred.JobClient: FILE_BYTES_READ=78
11/08/08 19:29:14 INFO mapred.JobClient: HDFS_BYTES_READ=49
11/08/08 19:29:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=226
11/08/08 19:29:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=40
11/08/08 19:29:14 INFO mapred.JobClient: Map-Reduce Framework
11/08/08 19:29:14 INFO mapred.JobClient: Reduce input groups=5
11/08/08 19:29:14 INFO mapred.JobClient: Combine output records=6
11/08/08 19:29:14 INFO mapred.JobClient: Map input records=2
11/08/08 19:29:14 INFO mapred.JobClient: Reduce shuffle bytes=84
11/08/08 19:29:14 INFO mapred.JobClient: Reduce output records=5
11/08/08 19:29:14 INFO mapred.JobClient: Spilled Records=12
11/08/08 19:29:14 INFO mapred.JobClient: Map output bytes=81
11/08/08 19:29:14 INFO mapred.JobClient: Combine input records=8
11/08/08 19:29:14 INFO mapred.JobClient: Map output records=8
11/08/08 19:29:14 INFO mapred.JobClient: Reduce input records=6
察看结果:
bin/hadoop dfs -cat output/*也可以将其copy到本地再察看:
by 1
goodbye 1
hadoop 2
hello 2
world 2
$ bin/hadoop dfs -get output output $ cat output/*