之前配置Hadoop的很多步骤不太详细,配置文件的许多优化点也没太搞清楚,重新写一下。
2012.06.22更新:Hadoop版本兼容到1.0.3。
0、ssh免密码登录
1 | ssh -keygen -t rsa -P "" |
2 | cat $HOME/. ssh /id_rsa.pub >> $HOME/. ssh /authorized_keys |
3 | echo "StrictHostKeyChecking no" >> ~/. ssh /config |
1、安装JDK7
01 | #下载 && 解压缩 && 安装 |
03 | tar -xzf jdk-7u2-linux-i586. tar .gz |
04 | mv ./jdk1.7.0_02 ~/jdk |
05 | |
06 | #配置JAVA_HOME环境变量 |
07 | vim ~/.bashrc |
08 | export JAVA_HOME=/home/hadoop/jdk/ |
09 | export JAVA_BIN=/home/hadoop/jdk/bin |
10 | export PATH=$JAVA_HOME/bin:$PATH |
11 | export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar |
2、安装Hadoop(0.23.0)
01 | #安装解压缩Hadoop |
03 | tar -xzvf hadoop-1.0.3-bin. tar .gz |
04 | mv ./hadoop-1.0.3 ~/hadoop_home |
05 | |
06 | #创建运行时目录 |
07 | cd ~/hadoop_home |
08 | mkdir var |
09 | cd var |
10 | mkdir tmp mapred hdfs |
11 | cd hdfs |
12 | mkdir name data |
13 | |
14 | #导出Java_HOME |
15 | cd ~/hadoop_home/conf/ |
16 | vim ./hadoop- env .sh |
17 | export JAVA_HOME=/home/hadoop/jdk/ |
更新:注意权限,新版本中,所有HDFS目录权限务必是755,不能是775。
1 | chmod 755 data name |
3、准备环境变量
主要是HADOOP_HOME,在1.0之后,还要这个参数
1 | export HADOOP_HOME=/home/hadoop/hadoop_home/ |
2 | export HADOOP_HOME_WARN_SUPPRESS=1 |
4、配置hosts(Linux和Hadoop)
01 | #配置每个结点上的hosts文件 |
02 | sudo vim /etc/hosts |
03 | #Hosts for hadoop |
04 | 10.70.0.101 hadoop1 |
05 | 10.70.0.102 hadoop2 |
06 | ...... |
07 | |
08 | #配置masters和slaves |
09 | vim ~/hadoop_home/conf |
10 | vim masters |
11 | hadoop1 |
12 | vim slaves |
13 | hadoop1 |
14 | hadoop2 |
15 | ...... |
5、配置文件:
参数详细配置参考:http://hadoop.apache.org/common/docs/current/cluster_setup.html
core-site.xml
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
03 | < configuration > |
04 | < property > |
05 | < name >fs.default.name</ name > |
06 | < value >hdfs://hadoop1:54310</ value > |
07 | </ property > |
08 | < property > |
09 | < name >hadoop.tmp.dir</ name > |
10 | < value >/home/hadoop/hadoop_home/var/tmp</ value > |
11 | </ property > |
12 | <!--Following use more memory but speed up more --> |
13 | < property > |
14 | < name >fs.inmemory.size.mb</ name > |
15 | < value >200</ value > |
16 | </ property > |
17 | < property > |
18 | < name >io.sort.factor</ name > |
19 | < value >100</ value > |
20 | </ property > |
21 | < property > |
22 | < name >io.sort.mb</ name > |
23 | < value >200</ value > |
24 | </ property > |
25 | </ configuration > |
hdfs-site.xml
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
03 | < configuration > |
04 | < property > |
05 | < name >dfs.replication</ name > |
06 | < value >3</ value > |
07 | </ property > |
08 | < property > |
09 | < name >dfs.data.dir</ name > |
10 | < value >/home/hadoop/hadoop_home/var/hdfs/data</ value > |
11 | </ property > |
12 | < property > |
13 | < name >dfs.name.dir</ name > |
14 | < value >/home/hadoop/hadoop_home/var/hdfs/name</ value > |
15 | </ property > |
16 | <!--Here is 128MB !! --> |
17 | < property > |
18 | < name >dfs.block.size</ name > |
19 | < value >134217728</ value > |
20 | </ property > |
21 | <!--Parrel RPC Handler for namenode--> |
22 | < property > |
23 | < name >dfs.namenode.handler.count</ name > |
24 | < value >40</ value > |
25 | </ property > |
26 | </ configuration > |
mapred-site.xml
01 | <? xml version = "1.0" ?> |
02 | <? xml-stylesheet type = "text/xsl" href = "configuration.xsl" ?> |
03 | < configuration > |
04 | < property > |
05 | < name >mapred.job.tracker</ name > |
06 | < value >hadoop1:54311</ value > |
07 | </ property > |
08 | < property > |
09 | < name >mapred.reduce.parallel.copies</ name > |
10 | < value >20</ value > |
11 | </ property > |
12 | < property > |
13 | < name >mapred.local.dir</ name > |
14 | < value >/home/hadoop/hadoop_home/var/mapred</ value > |
15 | </ property > |
16 | < property > |
17 | < name >mapred.tasktracker.map.tasks.maximum</ name > |
18 | < value >12</ value > |
19 | </ property > |
20 | < property > |
21 | < name >mapred.tasktracker.reduce.tasks.maximum</ name > |
22 | < value >6</ value > |
23 | </ property > |
24 | <!--Following use more memory but speed up more --> |
25 | < property > |
26 | < name >mapred.map.child.java.opts</ name > |
27 | < value >-Xmx512M</ value > |
28 | </ property > |
29 | < property > |
30 | < name >mapred.reduce.child.java.opts</ name > |
31 | < value >-Xmx512M</ value > |
32 | </ property > |
33 | </ configuration > |
6、格式化namenode
1 | cd bin |
2 | ./hadoop namenode - format |
7、启动Hadoop
01 | cd ~/hadoop_home/bin |
02 | ./start-all.sh |
03 | |
04 | #查看启动情况 |
05 | jps |
06 | 7532 SecondaryNameNode |
07 | 7346 NameNode |
08 | 7433 DataNode |
09 | 7605 JobTracker |
10 | 7759 Jps |
11 | 7701 TaskTracker |
GUI:http://localhost:50030 (集群)
GUI:http://hadoop1:50070(HDFS)
8、其他说明:
mapred.tasktracker.map.tasks.maximum 每个结点最多运行多少个map
mapred.tasktracker.reduce.tasks.maximum 每个结点最多运行多少个reduce