本节书摘来自华章社区《Hadoop大数据分析与挖掘实战》一书中的第2章,第2.2节安装与配置,作者张良均 樊哲 赵云龙 李成华 ,更多章节内容可以访问云栖社区“华章社区”公众号查看
2.2 安装与配置
使用表2-1中的软件版本进行配置。
2.准备机器
通过VMware新建一台CentOS 6.4虚拟机,操作系统安装完成后,使用root用户登录,添加一个新用户hadoop。设置hadoop用户的密码并授予hadoop用户sudo权限。
\[root@localhost ~\]$useradd hadoop
\[root@localhost ~\]$passwd hadoop
\[root@localhost ~\]$chmod u+w /etc/sudoers
\[root@localhost ~\]$vim /etc/sudoers
# 在root ALL=(ALL) ALL 下添加hadoop ALL=(ALL) ALL
\[root@localhost ~\]$chmod u-w /etc/sudoers3.设置静态IP
VMware默认使用动态的IP,但是由于Hadoop集群是使用机器名进行定位的,在/etc/hosts中配置了机器名和IP的映射,如果IP不断变化,则需要不断修改配置文件,所以这里需要把IP设置为静态,方便后面的操作。
1)修改/etc/sysconfig/network-scripts/ifcfg-eth0。\[root@localhost ~\]$vim /etc/sysconfig/network-scripts/ifcfg-eth0
# 修改内容如下:
DEVICE=eth0
BOOTPROTO=static
IPADDR=192.168.222.131
NETMASK=255.255.255.0
GATEWAY=192.168.222.2
# HWADDR=00:0C:29:C3:34:BF # 这个需要根据自己的机器进行设置
ONBOOT=yes
TYPE=Ethernet
IPV6INIT=no
DNS1=192.168.222.22)修改/etc/sysconfig/network。\[root@localhost ~\]$vim /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=localhost.localdomain
GATEWAY=192.168.222.23)修改DNS信息。\[root@localhost ~\]$vim/etc/resolv.conf
nameserver 192.168.222.2
search bogon
#使配置信息立即生效
\[root@localhost ~\]$source /etc/resolv.conf
#重启网络服务
\[root@localhost ~\]$service network restart4)关闭防火墙并修改其启动策略为不开机启动。\[root@localhost ~\]$service iptables stop
#防火墙不开机启动
\[root@localhost ~\]$chkconfig iptables off4.安装JDK
1)使用yum search jdk在线查找jdk列表,任意选择一个版本进行安装,这里安装“java-1.7.0-openjdk-devel.x86_64”。\[root@localhost ~\]$yum search jdk
\[root@localhost ~\]$yum install java-1.7.0-openjdk-devel.x86_64 -y2)配置Java环境变量。# 查询JDK路径
\[root@localhost ~\]$whereis java
\[root@localhost ~\]$ll /usr/bin/java
\[root@localhost ~\]$ll /etc/alternatives/java #这是可以看到JDK路径了
#修改配置文件
\[root@localhost ~\]$vim /etc/profile
#在末尾追加
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64
export MAVEN_HOME=/home/hadoop/local/opt/apache-maven-3.3.1
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#保存配置后使用source命令是配置立即生效
\[root@localhost ~\]$source /etc/profile3)使用java -version命令查看环境变量配置是否成功。\[root@localhost ~\]$java -version
OpenJDK Runtime Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
至此,完成JDK的安装和配置,接下来使用VMware克隆两台机器,并分别设置静态IP地址为192.168.222.132,192.168.222.133,如图2-3和图2-4所示。
克隆完成,启动机器后,会出现没有网络设备信息,无法连接网络的情况,解决方案如下:
删除/etc/udev/rules.d/70-persistent-net.rules,修改/etc/sysconfig/network-scripts/ifcfg-eth0,注释硬件地址那一行,重启系统。
5.配置ssh免登录
1)启动三台机器,分别修改机器名为master、slave1、slave2,重启系统。[root@localhost ~]$vim /etc/sysconfig/network
# 修改内容如下
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=master2)修改master上的/etc/hosts。\[hadoop@master ~\]$sudo vim /etc/hosts
# 内容如下
192.168.222.131 master
192.168.222.132 slave1
192.168.222.133 slave23)将hosts文件复制到slave1和slave2。\[hadoop@master ~\]$sudo scp /etc/hosts root@slave1:/etc
\[hadoop@master ~\]$sudo scp /etc/hosts root@slave2:/etc4)在master机器上使用hadoop用户登录(确保接下来的操作都是通过hadoop用户执行)。执行$ssh-keygen -t rsa命令产生公钥。\[hadoop@master ~\]$ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
7b:75:98:eb:fd:13:ce:0f:c4:cf:2c:65:cc:73:70:53 hadoop@master
The key's randomart image is:
+--\[ RSA 2048\]----+
| E|
| .|
| ...|
| +=.|
| S ++.*|
| . . + Bo|
| . . . ==|
| . . . * |
| . ..=|
+-----------------+5)将公钥复制到slave1和slave2。\[hadoop@master ~\]$ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
# 输入hadoop@slave1的密码
\[hadoop@master ~\]$ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
# 输入hadoop@slave2的密码6)再次登录,已经可以不需要密码可以登录slave1,slave2。\[hadoop@master ~\]$ssh slave1
Last login: Wed Mar 25 14:40:41 2015 from master
\[hadoop@slave1 ~\]$6.安装Hadoop
1)在Hadoop官网网站,下载稳定版的并且已经编译好的二进制包,并解压缩。\[hadoop@master ~\]$wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
\[hadoop@master ~\]$tar -zxf hadoop-2.6.0.tar.gz -C ~/local/opt
\[hadoop@master ~\]$cd ~/local/opt/hadoop-2.6.02)设置环境变量:\[hadoop@master ~\]$vim ~/.bashrc
export HADOOP_PREFIX=$HOME/local/opt/hadoop-2.6.0
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin3)修改配置文件(etc/hadoop/hadoop-env.sh),添加下面的命令(这里需要注意JAVA_HOME的设置需要根据自己机器的实际情况进行设置):export JAVA_HOME=/usr/lib/jvm/java4)修改配置文件(etc/hadoop/core-site.xml),内容如下:<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/local/var/hadoop/tmp/hadoop-${user.name}</value>
</property>
</configuration>5)修改配置文件(etc/hadoop/hdfs-site.xml),内容如下:<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/local/var/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/local/var/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///home/hadoop/local/var/hadoop/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>6)修改配置文件(etc/hadoop/yarn-site.xml),内容如下:<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>7)修改配置文件(etc/hadoop/mapred-site.xml),内容如下:<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
</property>
</configuration>8)格式化HDFS:\[hadoop@master ~\]$hdfs namenode -format9)启动hadoop集群,启动结束后使用jps命令列出守护进程验证安装是否成功。#启动HDFS
\[hadoop@master ~\]$start-dfs.sh
#启动Yarn
\[hadoop@master ~\]$start-yarn.sh
# master主节点:
\[hadoop@master ~\]$jps
3717 SecondaryNameNode
3855 ResourceManager
3539 NameNode
3903 JobHistoryServer
4169 Jps
#slave1节点
\[hadoop@slave1 ~\]$jps
2969 Jps
2683 DataNode
2789 NodeManager
# slave2 节点
\[hadoop@slave2 ~\]$jps
2614 Jps
2363 DataNode
2470 NodeManager