一、部署HDFS
部署四台虚拟机分别是node1、node2、node3、node4(环境准备)
.所有节点禁用 selinux
SELINUX = disabled
.所有节点禁用 firewalld
systemctl stop firewalld
systemctl mask firewalld
.所有节点安装java-1.8.0-openjdk-devel
yum -y install java-1.8.0-openjdk-devel
.官网下载hadoop并安装hadoop(node1上安装)
链接:link
tar -xf hadoop-2.7.7.tar.gz
mv hadoop-2.7.7 /usr/local/hadoop
.修改/usr/local/hadoop/etc/hadoop/hadoop-env.sh设置环境变量
JAVA_HOME="JAVA安装路径"
HADOOP_CONF_DIR="hadoop配置文件路径"
.所有节点配置/etc/hosts
所有节点能相互ping通
配置SSH信任关系(node1)修改ssh配置文件
/etc/ssh/ssh_config
StrictHostKeyChecking no
.在node1生成密钥对并将公钥导给所有节点包括本机
ssh-keygen
ssh-copy-id node1
.修改核心配置文件core-sit.xml
通过官网修改核心配置文件link
fs.defaultsFs : 文件系统配置参数
hadoop.tmp.dir : 数据目录配置参数
.........
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop</value>
</property>
</configuration>
修改配置文件hdfs-site.xml
通过官网修改hdfs配置文件link
namenode : 地址声明
dfs.namenode.http-address
Secondarynamenode : 地址声明
dfs.namenode.secondary.http-address
文件冗余份数
dfs.replication
.......
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
.配置SSH信任关系(node1)
1.修改/etc/ssh/ssh_config
StrictHostKeyChecking no
2.生成秘钥并将公钥文件导给所有节点包括本机
ssh-keygen
for i in node1 node2 node3 node4; do ssh-copy-id $i; done
同步配置
hadoop所有节点的配置参数完全一样,只需将node1上的hadoop数据同步到其他主机
for i in node2 node3 node4; do rsync -av /usr/local/hadoop root$i:/usr/local/;done
.node1创建/var/hadoop/文件夹
mkdir /var/hadoop
.在node1上执行格式化操作
./bin/hdfs namenode -format
启动hdfs集群
./sbin/start-dfs.sh
.安装jps验证角色
1.node1上验证
[root@node1 hadoop]# jps
1139 Jps
1030 SecondaryNameNode
846 NameNode
2.另外三节点中任意一台jps验证
[root@node2 hadoop]# jps
23140 DataNode
23212 Jps
节点验证(node1上)
[root@node1 hadoop]# ./bin/hdfs dfsadmin -report
Safe mode is ON
Configured Capacity: 96602099712 (89.97 GB)
Present Capacity: 91950841856 (85.64 GB)
DFS Remaining: 91950817280 (85.64 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
**
二、部署MapReduce分布式计算框架
**
改模板文件名字(node1上)
cd /usr/local/hadoop/etc/hadoop
mv mapred-site.xml.template mapred-site.xml
修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
三、部署yarn
修改yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
.同步配置
for i in node2 node3 node4; do rsync -av /usr/local/hadoop $i:/usr/local/; done
启动服务
/usr/local/hadoop/sbin/start-dfs.sh
/usr/local/hadoop/sbin/start-yarn.sh
node1jps查看
[root@node1 hadoop]# jps
1105 NameNode
1443 ResourceManager
1763 Jps
1288 SecondaryNameNode
.node2jps上查看
[root@node2 hadoop]# jps
23765 DataNode
23863 NodeManager
23976 Jps
./bin/yarn node -list 查看说有node节点运行情况
[root@nn01 hadoop]# ./bin/yarn node -list
......
Total Nodes:3
......
node1:36191 RUNNING node1:8042 0
node3:41933 RUNNING node3:8042 0
node2:44295 RUNNING node2:8042 0
四、最后我们web访问Hadoop
1、namenode web 页面(node1)
http://node1:50070/
2、secondory name web 页面(node1)
http://node1:50090/
3、datanode web 页面(node2、node3、node4)
http://node2:50075
4、resourcemanager web 页面(node1)
http://node1:8088
5、nodemanager web页面(node2、node3、node4)
http://node2:8042