本文转载自:http://www.iteblog.com/archives/817
经过好多天的各种折腾,终于在几台电脑里面配置好了Hadoop2.2.0分布式系统,现在总结一下如何配置。
前提条件:
(1)、首先在每台
Linux电脑上面安装好JDK6或其以上版本,并设置好JAVA_HOME等,测试一下java、javac、jps等命令是否可以在终端使用,如何配置JDK这里就不说了;
(2)、在每台
Linux上安装好SSH,如何安装请参加
《Linux平台下安装SSH》。后面会说如何配置SSH无密码登录。
有了上面的前提条件之后,我们接下来就可以进行安装Hadoop分布式平台了。步骤如下:
1、先设定电脑的IP为静态地址:
由于各个Linux发行版本静态IP的设置不一样,这里将介绍CentOS、Ubunt、Fedora 19静态IP的设置步骤:
(1)、CentOS静态IP地址设置步骤如下:
1 | [wyp @wyp hadoop]$ sudo vim /etc/sysconfig/network-scripts/ifcfg-eth0 |
里面的IPADDR地址设置你想要的,我这里是192.168.142.139。
设置好后,需要让IP地址生效,运行下面命令:
1 | [wyp @wyp hadoop]$ sudo service network restart |
2 | Shutting down interface eth0: Device state: 3 (disconnected) |
4 | Shutting down loopback interface : [ OK ] |
5 | Bringing up loopback interface : [ OK ] |
6 | Bringing up interface eth0: Active connection state: activated |
7 | Active connection path: /org/freedesktop/NetworkManager/ActiveConnection/ 7 |
然后运行ifconfig检验一下设置是否生效:
01 | [wyp @wyp hadoop]$ ifconfig |
02 | eth0 Link encap:Ethernet HWaddr 00 :0C: 29 :9F:FB:C0 |
03 | inet addr: 192.168 . 142.139 Bcast: 192.168 . 142.255 Mask: 255.255 . 255.0 |
04 | inet6 addr: fe80::20c:29ff:fe9f:fbc0/ 64 Scope:Link |
05 | UP BROADCAST RUNNING MULTICAST MTU: 1500 Metric: 1 |
06 | RX packets: 389330 errors: 0 dropped: 0 overruns: 0 frame: 0 |
07 | TX packets: 171679 errors: 0 dropped: 0 overruns: 0 carrier: 0 |
08 | collisions: 0 txqueuelen: 1000 |
09 | RX bytes: 473612019 ( 451.6 MiB) TX bytes: 30110196 ( 28.7 MiB) |
11 | lo Link encap:Local Loopback |
12 | inet addr: 127.0 . 0.1 Mask: 255.0 . 0.0 |
13 | inet6 addr: :: 1 / 128 Scope:Host |
14 | UP LOOPBACK RUNNING MTU: 16436 Metric: 1 |
15 | RX packets: 80221 errors: 0 dropped: 0 overruns: 0 frame: 0 |
16 | TX packets: 80221 errors: 0 dropped: 0 overruns: 0 carrier: 0 |
17 | collisions: 0 txqueuelen: 0 |
18 | RX bytes: 1051174395 ( 1002.4 MiB) TX bytes: 1051174395 ( 1002.4 MiB) |
可见IP地址已经设置为192.168.142.139了!
(2)、Ubuntu静态IP地址设置步骤如下:
1 | wyp @node1 :~$ sudo vim /etc/network/interfaces |
7 | address 192.168 . 142.140 |
同样需要让IP地址生效:
1 | wyp @node1 :~$ sudo /etc/init.d/networking restart |
同样也是输入ifconfig来检验IP设置是否生效,这里就不说了。
(3)、Fedora 19静态IP地址设置步骤如下(Fedora其他版本的静态IP设置和19版本不一样,这里就不给出了):
1 | [wyp @wyp network-scripts]$ sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33 |
5 | IPADDR0= 192.168 . 142.138 |
设置好后,需要让IP地址生效,运行下面命令:
1 | [wyp @wyp network-scripts]$ sudo service network restart |
2 | Restarting network (via systemctl): [ 确定 ] |
同样也是输入ifconfig来检验IP设置是否生效,这里就不说了。
2、设置各个主机的hostname
在步骤1中,我分别配置了CentOS、Ubuntu以及Fedora三台主机,我打算用它们作为集群中的电脑,其中Fedora主机作为master,其余的两台电脑作为slave。这步将说说如何修改这三台电脑的hostname:
(1)、Fedora19 设置hostname步骤如下:
1 | [wyp @wyp network-scripts]$ sudo hostnamectl set-hostname master |
5 | [wyp @wyp network-scripts]$ hostname |
(2)、Ubuntu设置hostname步骤如下:
1 | wyp @node1 :~$ sudo vim /etc/hostname |
3 | 在里面添加自己需要取的hostname,我这里是取node1。 |
(3)、CentOS设置hostname步骤如下:
01 | [wyp @node network-scripts]$ sudo vim /etc/sysconfig/network |
03 | 将里面的HOSTNAME修改为你想要的hostname,我这里是取node |
09 | [wyp @node network-scripts]$ hostname |
3、在以上三台电脑的/etc/hosts添加以下配置:
1 | [wyp @master ~]$ sudo vim /etc/hosts |
其实就是上面三台电脑的静态IP地址和其hostname的对应关系。检验是否修改生效,可以用ping来查看:
1 | [wyp @master ~]$ ping node |
2 | PING node ( 192.168 . 142.139 ) 56 ( 84 ) bytes of data. |
3 | 64 bytes from node ( 192.168 . 142.139 ): icmp_seq= 1 ttl= 64 time= 0.541 ms |
4 | 64 bytes from node ( 192.168 . 142.139 ): icmp_seq= 2 ttl= 64 time= 0.220 ms |
6 | --- node ping statistics --- |
7 | 2 packets transmitted, 2 received, 0 % packet loss, time 1000ms |
8 | rtt min/avg/max/mdev = 0.220 / 0.380 / 0.541 / 0.161 ms |
如果上面的命令可以ping通,说明设置生效了。
4、设置SSH无密码登陆
在本博客里面已经介绍了如何安装SSH(《Linux平台下安装SSH》),和怎么设置SSH无密码登陆(《Ubuntu和CentOS如何配置SSH使得无密码登陆》),这里主要是想说一下需要注意的事项,首先在master主机上面设置好了SSH无密码登陆之后,然后将生成的id_dsa.pub文件拷贝到node和node1上面去,可以运行下面的命令:
1 | [wyp @localhost ~]$ cat /home/wyp/.ssh/id_dsa.pub | \ |
2 | ssh wyp @192 .168. 142.139 'cat - >> ~/.ssh/authorized_keys' |
要确保192.168.142.139主机的SSH服务是运行的。wyp@192.168.142.139的wyp是你需要登录192.168.142.139主机的用户名。同样,你也可以用上面类似的命令将id_dsa.pub拷贝到192.168.142.140主机上面去。
当然,你也可以用scp命令将文件拷贝到相应的主机:
1 | [wyp @master Documents]$ scp /home/wyp/.ssh/id_dsa.pub \ |
2 | wyp @192 .168. 142.139 :~/.ssh/authorized_keys |
检验是否可以从master无密码登录node和node1,可以用下面的命令:
1 | [wyp @master Documents]$ ssh node |
2 | The authenticity of host 'node (192.168.142.139)' can't be established. |
3 | RSA key fingerprint is ae: 99 : 43 :f0:cf:c6:a9: 82 :6c: 93 :a1: 65 : 54 : 70 :a6: 97 . |
4 | Are you sure you want to continue connecting (yes/no)? yes |
5 | Warning: Permanently added 'node,192.168.142.139' (RSA) |
6 | to the list of known hosts. |
7 | Last login: Wed Nov 6 14 : 54 : 55 2013 from master |
第一次运行上面的命令会出现上述信息。上面[wyp@node ~]已经暗示了我们成功从master无密码登录node;如果在登陆过程中出现了需要输入密码才能登录node,说明SSH无密码登录没成功,一般都是文件权限的问题,解决方法请参照《Ubuntu和CentOS如何配置SSH使得无密码登陆》。
5、下载好Hadoop,这里用到的是hadoop-2.2.0.tar.gz,你可以用下面的命令去下载:
下面的操作都是在master机器上进行的。
1 | [wyp @wyp /home]$ mkdir /home/wyp/Downloads/hadoop |
2 | [wyp @wyp /home]$ cd /home/wyp/Downloads/hadoop |
3 | [wyp @wyp hadoop]$ wget \ |
运行完上面的命令之后,hadoop-2.2.0.tar.gz文件将会保存在/home/wyp/Downloads/hadoop里面,请解压它
1 | [wyp @wyp hadoop]$ tar- zxvf hadoop- 2.2 . 0 .tar.gz |
之后将会在hadoop文件夹下面生成hadoop-2.2.0文件夹,运行下面的命令
01 | [wyp @wyp hadoop]$ cd hadoop- 2.2 . 0 |
02 | [wyp @wyp hadoop- 2.2 . 0 ]$ ls -l |
04 | drwxr-xr-x. 2 wyp wyp 4096 Oct 7 14 : 38 bin |
05 | drwxr-xr-x. 3 wyp wyp 4096 Oct 7 14 : 38 etc |
06 | drwxr-xr-x. 2 wyp wyp 4096 Oct 7 14 : 38 include |
07 | drwxr-xr-x. 3 wyp wyp 4096 Oct 7 14 : 38 lib |
08 | drwxr-xr-x. 2 wyp wyp 4096 Oct 7 14 : 38 libexec |
09 | -rw-r--r--. 1 wyp wyp 15164 Oct 7 14 : 46 LICENSE.txt |
10 | drwxrwxr-x. 3 wyp wyp 4096 Oct 28 14 : 38 logs |
11 | -rw-r--r--. 1 wyp wyp 101 Oct 7 14 : 46 NOTICE.txt |
12 | -rw-r--r--. 1 wyp wyp 1366 Oct 7 14 : 46 README.txt |
13 | drwxr-xr-x. 2 wyp wyp 4096 Oct 28 12 : 37 sbin |
14 | drwxr-xr-x. 4 wyp wyp 4096 Oct 7 14 : 38 share |
显示出刚刚解压文件的文件夹。
6、配置Hadoop的环境变量
01 | [wyp @wyp hadoop]$ sudo vim /etc/profile |
03 | 在/etc/profile文件的末尾加上以下配置 |
05 | export HADOOP_DEV_HOME=/home/wyp/Downloads/hadoop/hadoop- 2.2 . 0 |
06 | export PATH=$PATH:$HADOOP_DEV_HOME/bin |
07 | export PATH=$PATH:$HADOOP_DEV_HOME/sbin |
08 | export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME} |
09 | export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME} |
10 | export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME} |
11 | export YARN_HOME=${HADOOP_DEV_HOME} |
12 | export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop |
然后按:wq保存。为了让刚刚的设置生效,运行下面的命令
1 | [wyp @wyp hadoop]$ sudo source /etc/profile |
在终端输入hadoop命令查看Hadoop的环境变量是否生效:
02 | Usage: hadoop [--config confdir] COMMAND |
03 | where COMMAND is one of: |
04 | fs run a generic filesystem user client |
05 | version print the version |
06 | jar <jar> run a jar file |
07 | checknative [-a|-h] check native hadoop and compression libraries |
09 | distcp <srcurl> <desturl> copy file or directories recursively |
10 | archive -archiveName NAME -p <parent path> <src>* <dest> create |
12 | classpath prints the class path needed to get the |
13 | Hadoop jar and the required libraries |
14 | daemonlog get/set the log level for each daemon |
16 | CLASSNAME run the class named CLASSNAME |
18 | Most commands print help when invoked w/o parameters. |
如果显示上面的信息,说明环境变量生效了,如果显示不了,重启一下电脑再试试。
7、修改Hadoop的配置文件
修改Hadoop的hadoop-env.sh配置文件,设置jdk所在的路径:
1 | [wyp @wyp hadoop]$ vim etc/hadoop/hadoop-env.sh |
3 | 在里面找到JAVA_HOME,并将它的值设置为你电脑jdk所在的绝对路径 |
5 | # The java implementation to use. |
6 | export JAVA_HOME=/home/wyp/Downloads/jdk1. 7 .0_45 |
依次修改core-site.xml、yarn-site.xml、mapred-site.xml和hdfs-site.xml配置文件
01 | ----------------core-site.xml |
03 | <name>fs. default .name</name> |
08 | <name>hadoop.tmp.dir</name> |
09 | <value>/home/wyp/cloud/tmp/hadoop2. 0 </value> |
12 | ------------------------- yarn-site.xml |
14 | <name>yarn.resourcemanager.address</name> |
15 | <value>master: 8032 </value> |
19 | <name>yarn.resourcemanager.scheduler.address</name> |
20 | <value>master: 8030 </value> |
24 | <name>yarn.resourcemanager.resource-tracker.address</name> |
25 | <value>master: 8031 </value> |
29 | <name>yarn.resourcemanager.admin.address</name> |
30 | <value>master: 8033 </value> |
34 | <name>yarn.resourcemanager.webapp.address</name> |
35 | <value>master: 8088 </value> |
39 | <name>yarn.nodemanager.aux-services</name> |
40 | <value>mapreduce_shuffle</value> |
44 | <name>yarn.nodemanager.aux-services.mapreduce.shuffle. class </name> |
45 | <value>org.apache.hadoop.mapred.ShuffleHandler</value> |
48 | ------------------------ mapred-site.xml |
50 | <name>mapreduce.framework.name</name> |
55 | <name>mapred.system.dir</name> |
56 | <value>file:/hadoop/mapred/system/</value> |
61 | <name>mapred.local.dir</name> |
62 | <value>file:/opt/cloud/hadoop_space/mapred/local</value> |
66 | ----------- hdfs-site.xml |
68 | <name>dfs.namenode.name.dir</name> |
69 | <value>file:/opt/cloud/hadoop_space/dfs/name</value> |
74 | <name>dfs.datanode.data.dir</name> |
75 | <value>file:/opt/cloud/hadoop_space/dfs/data</value> |
76 | <description>Determines where on the local |
77 | filesystem an DFS data node should store its blocks. |
78 | If this is a comma-delimited list of directories, |
79 | then data will be stored in all named |
80 | directories, typically on different devices. |
81 | Directories that do not exist are ignored. |
86 | <name>dfs.replication</name> |
91 | <name>dfs.permissions</name> |
配置好Hadoop的相关东西之后,请将hadoop-2.2.0整个文件夹分别拷贝到node和node1主机上面去,设置都不需要改!
8、关掉master、node和node1的防火墙
如果在node上启动nodemanager,遇到java.net.NoRouteToHostException异常
01 | java.net.NoRouteToHostException: No Route to Host from |
02 | localhost.localdomain/ 192.168 . 142.139 to 192.168 . 142.138 : 8031 |
03 | failed on socket timeout exception: java.net.NoRouteToHostException: |
04 | No route to host; For more details see: |
08 | ..................省略了好多东西 |
10 | Caused by: java.net.NoRouteToHostException: No route to host |
11 | at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) |
13 | ..................省略了好多东西 |
15 | at org.apache.hadoop.ipc.Client.getConnection(Client.java: 1399 ) |
16 | at org.apache.hadoop.ipc.Client.call(Client.java: 1318 ) |
说明了没有关闭防火墙,各个linux平台关闭防火墙的方法不一样,这里也分享一下:
(1)、对于ubuntu关闭防火墙
2 | 如果你要防火墙可以运行: apt-get remove iptables |
(2)、对于fedora关闭防火墙可以运行:
1 | [wyp @wyp hadoop]$ sudo systemctl stop firewalld.service |
2 | [wyp @wyp hadoop]$ sudo systemctl disable firewalld.service |
9、查看Hadoop是否运行成功
首先在master上面格式化一下HDFS,如下命令
01 | [wyp @wyp hadoop]$ cd $hadoop_home |
02 | [wyp @wyp hadoop- 2.2 . 0 ]$ hdfs namenode -format |
03 | 13 / 10 / 28 16 : 47 : 33 INFO namenode.NameNode: STARTUP_MSG: |
04 | /************************************************************ |
06 | ..............此处省略好多文字...................... |
08 | ************************************************************/ |
09 | 13 / 10 / 28 16 : 47 : 33 INFO namenode.NameNode: registered UNIX signal |
10 | handlers for [TERM, HUP, INT] |
11 | Formatting using clusterid: CID-9931f367-92d3- 4693 -a706-d83e120cacd6 |
12 | 13 / 10 / 28 16 : 47 : 34 INFO namenode.HostFileManager: read includes: |
15 | 13 / 10 / 28 16 : 47 : 34 INFO namenode.HostFileManager: read excludes: |
19 | ..............此处也省略好多文字...................... |
21 | 13 / 10 / 28 16 : 47 : 38 INFO util.ExitUtil: Exiting with status 0 |
22 | 13 / 10 / 28 16 : 47 : 38 INFO namenode.NameNode: SHUTDOWN_MSG: |
23 | /************************************************************ |
24 | SHUTDOWN_MSG: Shutting down NameNode at wyp/192.168.142.138 |
25 | ************************************************************/ |
26 | [wyp @wyp hadoop- 2.2 . 0 ]$ |
在master中启动 namenode 和 resourcemanager
1 | [wyp @wyp hadoop- 2.2 . 0 ]$ sbin/hadoop-daemon.sh start namenode |
2 | [wyp @wyp hadoop- 2.2 . 0 ]$ sbin/yarn-daemon.sh start resourcemanager |
在node和node1中启动datanode 和 nodemanager
1 | [wyp @wyp hadoop- 2.2 . 0 ]$ sbin/hadoop-daemon.sh start datanode |
2 | [wyp @wyp hadoop- 2.2 . 0 ]$ sbin/yarn-daemon.sh start nodemanager |
检查Hadoop集群是否安装好了,在master上面运行jps,如果有NameNode、ResourceManager二个进程,说明master安装好了。
1 | [wyp @master hadoop]$ jps |
在node(node1)上面运行jps,如果有DataNode、NodeManager二个进程,说明node(node1)安装好了。
1 | [wyp @node network-scripts]$ jps |