1 准备
1、JDK7,详情请参考《CentOS7下安装JDK7.docx》
2、安装包hadoop-2.7.2.tar.gz
三台机器
10.1.1.241 MASTER1
10.1.1.242 SLAVE2
10.1.1.243 SLAVE3
2 Host
由于我搭建Hadoop集群包含三台机器,所以需要修改调整各台机器的hosts文件配置,命令如下
三台服务器都要增加如下配置
[root@MASTER1 bin]# vi /etc/hosts
10.1.1.241 MASTER1
10.1.1.242 SLAVE2
10.1.1.243 SLAVE3
3 SSH
由于NameNode与DataNode之间通信,使用了SSH,所以需要配置免登录。
首先登录Master机器,生成SSH的公钥,命令如下:
[root@MASTER1 bin]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key(/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in/root/.ssh/id_rsa.
Your public key has been saved in/root/.ssh/id_rsa.pub.
The key fingerprint is:
ba:78:58:05:bf:36:a6:68:6a:b1:36:18:a1:b2:35:f8root@MASTER1
The key's randomart image is:
+--[ RSA 2048]----+
| |
| . |
| o |
|. o |
|.o .S. |
|= + ..= |
|.* + +.+ . |
|o E +.o. |
| o.+... |
执行命令后会在当前用户目录下生成.ssh目录,然后进入此目录将id_rsa.pub追加到authorized_keys文件中,命令如下:
[root@MASTER1 bin]# cd ~/.ssh
[root@MASTER1 .ssh]# cat id_rsa.pub>> authorized_keys
最后将authorized_keys文件复制到其它机器节点,命令如下
[root@MASTER1 .ssh]# scp authorized_keysroot@SLAVE2:/root/.ssh/
[root@MASTER1 .ssh]# scp authorized_keysroot@SLAVE3:/root/.ssh/
The authenticity of host 'slave3(10.1.1.243)' can't be established.
ECDSA key fingerprint is3e:77:b7:27:eb:c7:6c:d8:50:b1:1d:d2:8f:78:ee:2e.
Are you sure you want to continueconnecting (yes/no)? yes
Warning: Permanently added'slave3,10.1.1.243' (ECDSA) to the list of known hosts.
root@slave3's password: #此处输入root用户密码
scp: /root/.ssh/: Is a directory
4 文件目录
#三台服务器都需要执行
[root@MASTER1 .ssh]# mkdir -p/data/program/hdfs/{name,data,tmp}
5 Hadoop的安装与配置
5.1 解压缩
[root@ MASTER1 ~]# cd /data/software
[root@MASTER1 software]# mkdir/data/program/hadoop
[root@MASTER1 software]# tar zxvfhadoop-2.7.2.tar.gz -C /data/program/hadoop
5.2 环境变量
[root@MASTER1 software]# vi /etc/profile #完全拷贝下面内容
#hadoop
exportHADOOP_DEV_HOME=/data/program/hadoop/hadoop-2.7.2
export PATH=$PATH:$HADOOP_DEV_HOME/bin
export PATH=$PATH:$HADOOP_DEV_HOME/sbin
exportHADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
exportHADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
export YARN_HOME=${HADOOP_DEV_HOME}
exportHADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
exportHDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
exportYARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
#使其立即生效
[root@MASTER1 software]# . /etc/profile
5.3 Hadoop的配置
进入hadoop的配置目录
[root@MASTER1software]# cd /data/program/hadoop/hadoop-2.7.2/etc/hadoop
依次修改core-site.xml、hdfs-site.xml、mapred-site.xml及yarn-site.xml文件
5.3.1 core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file: /data/program/hdfs/tmp</value>
<description>A base for other temporarydirectories.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://MASTER1:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
5.3.2 hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/program/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/program/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>MASTER1:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
5.3.3 mapred-site.xml (新增)
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.3.4 yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>MASTER1:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>MASTER1:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>MASTER1:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>MASTER1:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>MASTER1:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>
5.3.5 配置hadoop-env.sh
[root@MASTER1 .ssh]# vi/data/program/hadoop/hadoop-2.7.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_40
5.3.6 配置slaves
[root@MASTER1 program]# cd /data/program/hadoop/hadoop-2.7.2/etc/hadoop
[root@MASTER1 hadoop]# vi slaves
#输入以下内容,删除默认的localhost,增加2个从节点,
10.1.1.242
10.1.1.243
5.3.7 scp远程拷贝
将整个hadoop文件夹及其子文件夹使用scp复制到两台Slave的相同目录中
[root@MASTER1 hadoop]# cd /data/program
[root@MASTER1 program]# scp -r hadooproot@SLAVE2:/data/program/
[root@MASTER1 program]# scp -r hadooproot@SLAVE3:/data/program/
5.4 运行Hadoop
5.4.1 运行HDFS
5.4.1.1 格式化NameNode
执行命令
[root@MASTER1program]# hadoop namenode –format
5.4.1.2 启动NameNode
[root@MASTER1 program]# hadoop-daemon.shstart namenode
starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out
在MASTER1上执行jps命令,得到如下结果
[root@MASTER1 program]# jps
15056 NameNode
15129 Jps
5.4.1.3 启动DataNode
执行命令如下
[root@MASTER1 hadoop]# hadoop-daemons.shstart datanode
10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out
10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out
5.4.2 SLAVE2上执行命令
[root@SLAVE2 hadoop]# ps -ef | grep hadoop
root 7610 1 18 22:50 ? 00:00:07 /usr/java/jdk1.7.0_40/bin/java-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,console-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop-root-datanode-SLAVE2.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,RFA-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFASorg.apache.hadoop.hdfs.server.datanode.DataNode
root 7737 3729 0 22:50 pts/0 00:00:00 grep --color=auto hadoop
5.4.3 SLAVE3上执行命令
[root@SLAVE3 hadoop]# ps -ef | grep hadoop
root 5469 1 12 22:50 ? 00:00:07 /usr/java/jdk1.7.0_40/bin/java-Dproc_datanode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2-Dhadoop.id.str=root -Dhadoop.root.logger=INFO,console-Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true-Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true-Dhadoop.log.dir=/data/program/hadoop/hadoop-2.7.2/logs-Dhadoop.log.file=hadoop-root-datanode-SLAVE3.log-Dhadoop.home.dir=/data/program/hadoop/hadoop-2.7.2 -Dhadoop.id.str=root-Dhadoop.root.logger=INFO,RFA -Djava.library.path=/data/program/hadoop/hadoop-2.7.2/lib/native-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=INFO,RFASorg.apache.hadoop.hdfs.server.datanode.DataNode
root 5644 3333 0 22:50 pts/0 00:00:00 grep --color=auto hadoop
5.4.4 start-dfs.sh 运行HDFS
以上启动NameNode和DataNode的方式,可以用start-dfs.sh脚本替代
[root@MASTER1 hadoop]# start-dfs.sh
Starting namenodes on [MASTER1]
MASTER1: starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out
10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out
10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out
Starting secondary namenodes [MASTER1]
MASTER1: starting secondarynamenode,logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-MASTER1.out
stop-dfs.sh停止
[root@MASTER1hadoop]# stop-dfs.sh
Stoppingnamenodes on [MASTER1]
MASTER1:stopping namenode
10.1.1.242:stopping datanode
10.1.1.243:stopping datanode
Stoppingsecondary namenodes [MASTER1]
MASTER1:stopping secondarynamenode
5.4.5 运行YARN
运行Yarn也有与运行HDFS类似的方式。启动ResourceManager使用以下命令
[root@MASTER1 hadoop]# yarn-daemon.sh startresourcemanager
starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out
批量启动多个NodeManager使用以下命令
[root@MASTER1 hadoop]# yarn-daemons.shstart nodemanager
10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out
10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out
在两台Slave上执行jps,也会看到NodeManager运行正常
[root@SLAVE2hadoop]# jps
14504NodeManager
14680 Jps
11887 DataNode
5.4.6 start-yarn.sh 运行YARN
以上方式我们就不赘述了,来看看使用start-yarn.sh的简洁的启动方式
[root@MASTER1 hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out
10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out
10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out
stop-yarn.sh停止命令
[root@MASTER1 hadoop]# stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
10.1.1.242: no nodemanager to stop
10.1.1.243: no nodemanager to stop
在Master上执行jps
[root@MASTER1 hadoop]# jps
11729 ResourceManager
10933 NameNode
11999 Jps
说明ResourceManager运行正常
5.5 快捷启动整个hadoop
#启动
[root@MASTER1 mapreduce]# start-all.sh
This script is Deprecated. Instead usestart-dfs.sh and start-yarn.sh
Starting namenodes on [MASTER1]
MASTER1: starting namenode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-namenode-MASTER1.out
10.1.1.243: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE3.out
10.1.1.242: starting datanode, logging to/data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-datanode-SLAVE2.out
Starting secondary namenodes [MASTER1]
MASTER1: starting secondarynamenode,logging to /data/program/hadoop/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-MASTER1.out
starting yarn daemons
starting resourcemanager, logging to/data/program/hadoop/hadoop-2.7.2/logs/yarn-root-resourcemanager-MASTER1.out
10.1.1.243: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE3.out
10.1.1.242: starting nodemanager, loggingto /data/program/hadoop/hadoop-2.7.2/logs/yarn-root-nodemanager-SLAVE2.out
#停止
[root@MASTER1 mapreduce]# stop-all.sh
This script is Deprecated. Instead usestop-dfs.sh and stop-yarn.sh
Stopping namenodes on [MASTER1]
MASTER1: stopping namenode
10.1.1.242: stopping datanode
10.1.1.243: stopping datanode
Stopping secondary namenodes [MASTER1]
MASTER1: stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
10.1.1.242: stopping nodemanager
10.1.1.243: stopping nodemanager
no proxyserver to stop
5.6 测试Hadoop
5.6.1 测试HDFS
5.6.2 测试YARN
http://10.1.1.241:18088/cluster
可以访问YARN的管理界面,验证YARN,如下图所示:
5.6.3 测试mapreduce
本人比较懒,不想编写mapreduce代码。幸好Hadoop安装包里提供了现成的例子,在Hadoop的share/hadoop/mapreduce目录下。运行例子
[root@MASTER1 hadoop]# cd/data/program/hadoop/hadoop-2.7.2/share/hadoop/mapreduce
[root@MASTER1 mapreduce]# hadoop jarhadoop-mapreduce-examples-2.7.2.jar pi 510