前置条件:配置好网络
初始化环境
yum install -y epel-release
yum -y install wget
cd /etc/yum.repos.d
mv ./CentOS-Base.repo ./CentOS-Base.repo.bak
wget http://mirrors.163.com/.help/CentOS7-Base-163.repo
mv CentOS7-Base-163.repo /etc/yum.repos.d/CentOS-Base.repo
yum clean all;yum makecache
yum -y update
yum groups install Development\ Tools -y
yum install -y ntp vim lrzsz lsof pcre pcre-devel zlib zlib-devel ruby unzip zip net-tools gcc-c++
关闭防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld
setenforce 0
安装jdk
-
下载https://www.aliyundrive.com/s/V4kVwG9MTPn
-
rpm -ivh jdk-8u301-linux-x64.rpm
-
不需要配java环境变量
安装mysql
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server
service mysqld start
systemctl enable mysqld
获取mysql临时密码
grep 'temporary password' /var/log/mysqld.log
登录mysql
mysql -uroot -p
修改密码策略
set global validate_password_policy=0;
set global validate_password_length=1;
ALTER USER 'root'@'localhost' IDENTIFIED BY '123456';
create database hive default character set utf8 default collate utf8_general_ci;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'root'@'localhost' IDENTIFIED BY '123456' WITH GRANT OPTION;
FLUSH PRIVILEGES;
创建用户
useradd hadoop
设置密码
passwd hadoop
设置完全执行权限
vi /etc/sudoers
echo "hadoop ALL=(ALL) NOPASSWD:ALL">>/etc/sudoers
创建用户环境变量
vi /etc/profile.d/hadoop.env
添加环境变量
修改登录用户,使环境环境变量生效
echo "source /etc/profile.d/hadoop.env">>~/.bashrc
上传xsync文件到/usr/bin/
使用auto_ssh_host.sh自动配置免密登录和hosts
步骤一
软件放在/opt/module目录下
确保多个集群节点配置正确
确保环境变量正常 source /etc/profile.d/hadoop.env
确保文件夹权限正常 chown -R hadoop:hadoop /opt/module
基本文件目录结构
xsync /opt/module/hadoop-3.1.3/
配置hadoop文件
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9820</value>
<description>文件系统</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/tmp/hadoop</value>
<description>临时文件夹</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<final>4096</final>
<description>流文件的缓冲区为4K</description>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>zakza</value>
<description>用户</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!--nn web-->
<property>
<name>dfs.namenode.http-address</name>
<value>master:9870</value>
<description>namenode web</description>
</property>
<!--2nn web-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node2:9868</value>
<description>secondary namenode web</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description> </description>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<description>日志聚集功能</description>
</property>
<!--指定resourcemanager-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
<description>resourcemanager</description>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs</value>
<description>日志web地址</description>
</property>
<!-- Site specific YARN configuration properties -->
<!--指定MR走shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--环境变量继承-->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,HADOOP_MAPRED_HOME</value>
<description>环境变量继承</description>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
<description>日志聚集功能</description>
</property>
</configuration>
xsync /opt/module/hadoop-3.1.3/etc/hadoop
启动集群
第一次启动格式化,如果data错误或丢失,要删除所有节点的data\ logs\目录,再重新格式化
hdfs namenode -format
start-dfs.sh
start-yarn.sh
mapred --daemon start historyserver
node1上启动resourcemanager
yarn --daemon start resourcemanager
master | node1 | node2 | |
HDFS | nn | 2nn | |
dn | dn | dn | |
YARN | historyserver | rm | |
nm | nm | nm |
注:
- nn:NameNode
- dn:DataNode
- nm:NodeManager
- 2nn:SecondaryNameNode
- rm:ResourceManager
访问地址
日志地址http://master:19888/jobhistory
yarn地址http://node1:8088/cluster