安装Hadoop环境
1)下载Hadoop安装包 访问http://hadoop.apache.org/releases.html官网,可以看到官网提供了很多版本提供下载。
为了方便,使用了个人云链接,可以使用wget安装
$ wget https://pan.baidu.com/s/1yyFCpLUSpgWW97ZnE8VJGQ
如果没有安装wget,可以通过
$ yum -y install wget
2) 因为hadoop运行需要java支持,所以要安装jdk
* 首先查看本机是否有JDK
[hadoop@localhost ~]$ yum list installed | grep java
未显示任何内容,代表不含有JDK
如果有JDK,
[hadoop@localhost ~]$ yum -y remove 写刚刚查到jdk版本
使用yum自带的方式安装JDK
* 查看yum有的JDK包
[hadoop@localhost ~]$ yum -y list java*
- 安装JDK1.8版本所有程序
[hadoop@localhost ~]$ yum -y install java-1.8.0-openjdk*
如果遇到这个问题,切换到root权限
Loaded plugins: fastestmirror
You need to be root to perform this command.
[root@localhost hadoop]# su root
切换到root用户后,再次安装
[root@localhost hadoop]# yum -y install java-1.8.0-openjdk*
安装默认目录在: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64
- 安装完成后 ,查看是否安装成功
[root@localhost hadoop]# java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
如果报错,则要配置JAVA_HOME
*打开etc/profile
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64
JRE_HOME=$JAVA_HOME/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME PATH CLASSPATH
* 使配置文件生效:source /etc/profile
3)安装Hadoop安装包
- 解压
[root@localhost hadoop]# tar vxf hadoop-2.6.1.tar.gz
- 新增tmp目录
[root@localhost hadoop-2.6.1]# cd /home/hadoop/hadoop-2.6.1
[root@localhost hadoop-2.6.1]# mkdir tmp
4)配置Hadoop文件
切换到hadoop配置目录
[root@localhost hadoop]# cd /home/hadoop/hadoop-2.6.1/etc/hadoop
- 修改masters文件
vim masters
写入master节点
master
- 修改slave文件
vim slaves
这里一共一台master节点,两台slave节点,因此这里写入
slave1
slave2
- 修改core-site.xml文件
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.1/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.137.20:9000</value>
</property>
</configuration>
- 修改mapred-site.xml文件
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.137.20:9001</value>
</property>
</configuration>
- 修改hdfs-site.xml文件
<configuration>
<!--指定hdfs保存数据的副本数量-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--指定hdfs中namenode的存储位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop-2.6.1/hdfs/name</value>
</property>
<!--指定hdfs中datanode的存储位置-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop-2.6.1/hdfs/data</value>
</property>
</configuration>
* 修改hadoop-env.sh文件
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64
- 配置本地网络配置hosts文件
192.168.137.20 master
192.168.137.21 slave1
192.168.137.22 slave2
5)复制虚拟机
- 找到master虚拟机安装路径
- 复制master为slave1,slave2
6)master/slave建立互信关系
- 在命令行输入ssh-keygen,遇到输入时,可以不用输入,一路回车就好:
[root@localhost /]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:fcU+vjmSU5FisDwhtMiFiUjjREZ/jNedKMA9rxzgwCE root@localhost
The key's randomart image is:
+---[RSA 2048]----+
|E=O+.o +o |
| =+o+==+.+o. . |
| .oo.Bo+oo+ o. |
| .o....+ ooo |
| . oS .o..o. |
| o . ... |
| o. |
| + .o |
| oo. |
+----[SHA256]-----+
- 复制master公钥文件:
[root@localhost /]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 将master公钥复制到slave1 slave2节点上
同样地在slave1,slave2机器上输入ssh-keygen,回车默认配置
[root@localhost hadoop]# scp master:~/.ssh/authorized_keys ~/.ssh/authorized_keys
The authenticity of host 'master (192.168.137.20)' can't be established.
ECDSA key fingerprint is SHA256:lHkFsq7hVO3xRhNTvKsk0br1mGt/thcNWbi74xufyYY.
ECDSA key fingerprint is MD5:f5:ba:5e:ee:2e:72:ee:c2:73:27:4e:b0:29:18:5c:69.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master,192.168.137.20' (ECDSA) to the list of known hosts.
root@master's password:
authorized_keys 100% 396 224.5KB/s 00:00
- 测试连接
在master机器上向所有slave机器发送ssh连接请求:
[root@localhost /]# ssh slave1
The authenticity of host 'slave1 (192.168.137.21)' can't be established.
ECDSA key fingerprint is SHA256:lHkFsq7hVO3xRhNTvKsk0br1mGt/thcNWbi74xufyYY.
ECDSA key fingerprint is MD5:f5:ba:5e:ee:2e:72:ee:c2:73:27:4e:b0:29:18:5c:69.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave1,192.168.137.21' (ECDSA) to the list of known hosts.
Last login: Tue Jul 31 09:29:59 2018
【注意】连接上之后进行的所有操作都是在slave,因此不需要时记得exit退出连接。
[root@localhost ~]# exit
logout
Connection to slave1 closed.
7)启动Hadoop
- 初始化
切换到
[root@localhost bin]# cd /home/hadoop/hadoop-2.6.1/bin
运行 ./hadoop namenode -format 初始化 hadoop文件系统.
[root@localhost bin]# ./hadoop namenode -format
如果觉得这样启动太麻烦,可以设置hadoop环境变量
[root@localhost hadoop-2.6.1]# vim /etc/profile
在profile·中加入
export HADOOP_HOME=/home/hadoop/hadoop-2.6.1
export PATH=$HADOOP_HOME/bin:$PATH
执行source /etc/profile
使之生效
输入hadoop version
测试是否设置成功
[root@localhost hadoop-2.6.1]# hadoop version
Hadoop 2.6.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b4d876d837b830405ccdb6af94742f99d49f9c04
Compiled by jenkins on 2015-09-16T21:07Z
Compiled with protoc 2.5.0
From source with checksum ba9a9397365e3ec2f1b3691b52627f
This command was run using /home/hadoop/hadoop-2.6.1/share/hadoop/common/hadoop-common-2.6.1.jar
- 启动
切换到
[root@localhost bin]# cd /home/hadoop/hadoop-2.6.1/sbin
执行
[root@localhost sbin]# ./start-all.sh
中间过程全部输入yes
输入jps,查看进程是否启动成功
- master
[root@localhost sbin]# jps
8368 Jps
7818 ResourceManager
7675 SecondaryNameNode
7502 NameNode
- slave1
[root@localhost hadoop]# jps
3220 DataNode
3418 Jps
3307 NodeManager
- slave2
[root@localhost hadoop]# jps
3222 DataNode
3309 NodeManager
3421 Jps
8)测试系统
- 创建一个文件夹
[root@localhost hadoop]# hadoop fs -touchz /text.txt
- 查看文件
[root@localhost hadoop]# hadoop fs -ls /
Found 1 items
-rw-r--r-- 2 root supergroup 0 2018-07-31 12:15 /text.txt
使用xshell建立隧道,可访问图形化界面
这里将master的50070端口映射到本机的60070端口
总结
本文一步步搭建了一个三个实例的hadoop集群,过程中虽然很繁琐,但是也为后面的hadoop学习奠定了一个基础。