前言
经过长时间的测试总结出在目前集群搭建最稳定的步骤是:
至少我按这个过程基本0失误,且初始化次数最少。当然也可以尝试先单机把所有软件都装好,最后直接克隆生成子机,但这样失误率高,出错难查找不利于学习!
同时提醒大家一般安装都会提前把需要的软件下载存盘,以免更新链接不可以。此做法仅为了方便学习,更了解Linux文件结构。
所需软件
Vm15p(已创建的虚拟机并装有纯净版ConOS7 linux )、putty
虚拟机准备工作
详细的内容我会在后续逐步更新,现在先一步一步教大家实现搭建集群!
1.改名
vi /etc/hostname
2.改ip
vi /etc/sysconfig/network-scripts/ifcfg-ens33
3.设置映射
192.168.113.120 hadoopm
192.168.113.130 hadoops1
192.168.113.131 hadoops2
4.安装wget(类似迅雷)
yum -y install wget
一、Jdk安装
1.下载jdk
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz
2.解压jdk
tar -zxvf jdk-8u141-linux-x64.tar.gz -C /usr/lib/jvm
3.设置jdk变量
vi /etc/profile
内容:
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_141
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
让环境变量生效:
source /etc/profile
4.验证
java -version
二、安装Hadoop
1.下载hadoop
wget -c -t 0 https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
2.解压hadoop
tar -zxvf hadoop-3.3.1.tar.gz -C /usr/local
cd /usr/local
为解压的hadoop改名
mv ./hadoop-3.3.1/ ./hadoop
给hadoop文件附上root权限
chown -R root ./hadoop
3.设置hadoop环境变量
vi /etc/profile
加在上步jdk的配置后面就可以了
# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
变量生效
source /etc/profile
4.配置hadoop配置文件
进入hadoop配置文件目录
cd /usr/local/hadoop/etc/hadoop
配置六个文件
vi core-site.xml
内容:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopm:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
</property>
</configuration>
vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 设置namenode的http通讯地址 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value> hadoopm:50090</value>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file: /usr/local/hadoop/tmp/dfs/name</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file: /usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
vi mapred-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 通知框架MR使用YARN -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoopm:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoopm:19888</value>
</property>
</configuration>
vi yarn-site.xml
<?xml version="1.0"?>
<configuration>
<!-- 设置 resourcemanager 在哪个节点-->
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoopm</value>
</property>
<!-- reducer取数据的方式是mapreduce_shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
vi workers
hadoops1
hadoops2
第54行export JAVA_HOME=去掉行前面的#在等号后面加上jdk位置
vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_141
5.ssh免密设置
cd ~
查看防火墙状态
systemctl status firewalld
关闭防火墙暂时
systemctl stop firewalld
获取ssh密钥
ssh-keygen -t rsa
进入密钥保存目录
cd .ssh/
解密
cat ./id_rsa.pub >> ./authorized_keys
测试免密是否成功(首次免密登录要输入yes确定)
ssh hadoopm
以下2步是将防火墙关闭,方便后续操作
systemctl disable firewalld
vi /etc/selinux/config
找到SELINUX更改为
SELINUX=disabled
6.克隆
克隆子机,并对子机进行改名与改ip操作,并让每台子机执行(获取ssh密钥)原因是自己不可以访问主机,而我们是克隆主机的,所以如果不执行获取密钥则子机也可以免密进入主机。
ssh-keygen -t rsa
完成后则进入hadoopm进行
7.启动hadoop
初始化hadoop
hdfs namenode -format
启动hadoop所以服务
start-all.sh
主机显示如下则为成功:
子机:
Hadoop集群启动成功
stop-all.sh (停止命令)
在自己电脑上的浏览器输入以下命令进入对于的web界面
192.168.113.120:8088#查看集群状态
192.168.113.120.9870#查看HDFS系统
二、安装zookeeper
1.下载zookeeper
wget -c -t 0 http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
tar -zxvf zookeeper-3.4.5.tar.gz -C /usr/local
cd /usr/local
mv ./zookeeper-3.4.5/ ./zookeeper(为解压的hadoop改名)
cd zookeeper
创建日志、快照存放文件夹
mkdir data
mkdir logs
cd conf
3.配置zookeeper配置
此处有两种做法
方法一:复制
cp zoo_sample.cfg zoo.cfg
dataDir=/usr/local/zookeeper/data#快照存放位置
dataLogDir=/usr/local/zookeeper/logs#日志
在最下面加上
方法二:直接配置
vi zoo.cfg
tickTime=2000
dataDir=/usr/local/zookeeper/data
dataLogDir=/usr/local/zookeeper/logs
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.113.120:2888:3888
server.2=192.168.113.130:2888:3888
server.3=192.168.113.131:2888:3888
cd ..
cd data
vi myid
1
分别将zookeeper发送到子机,同时进分别data设置vi myid为2,3
发送命令:
scp -r /usr/local/zookeeper root@192.168.113.130:/usr/local/zookeeper
scp -r /usr/local/zookeeper root@192.168.113.131:/usr/local/zookeeper
4.zookeeper环境变量
vi /etc/profile
#zookeeper
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
source /etc/profile
5.验证
启动:(zookeeper启动需要逐台启动)
zkServer.sh start
zkServer.sh status
输入jps
三、HBase安装
1.下载hbase
wget -c -t 0 http://archive.apache.org/dist/hbase/2.4.6/hbase-2.4.6-bin.tar.gz
常规操作如下:一行一步
tar -zxvf hbase-2.4.6-bin.tar.gz -C /usr/local/hadoop
cd /usr/local/hadoop
mv ./hbase-2.4.6/ ./hbase(为解压的hadoop改名)
mkdir pids
cd hbase
2.配置hbase
vi hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_141
export HBASE_CLASSPATH=/usr/local/hadoop/hbase
export HBASE_MANAGES_ZK=false#(因为要使用自己安装的zookeeper,所以这里设置成false来禁止使用hbase自带的)
export HBASE_PID_DIR=/usr/local/hadoop/pids#(这是用来存放pid的目录,需要自己创建)
vi hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoopm:9000/hbase</value>
</property>
<!-- 如果hadoop是HA模式,那么rootdir可以改成对应的名称。同时需要将hadoop的core-site.xml和hdfs-site.xml复制到hbase的conf目录下,否则会报找不到ns1的错误 -->
<!--
<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase</value>
</property>
-->
<!-- 启用分布式模式 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- 指定独立Zookeeper安装路径 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hadoop/zookeeper</value>
</property>
<!-- 指定ZooKeeper集群端口 -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!-- 指定Zookeeper集群位置 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoopm:2181,hadoops1:2181,hadoops2:2181</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>60000000</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
vi regionservers
hadoopm
hadoops1
hadoops2
3.配置hbase环境变量
vi /etc/profile
#hbase
export HBASE_HOME=/usr/local/Hadoop/hbase
export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile
将配置完的hbase发给子机
scp -r /usr/local/hadoop/hbase root@192.168.113.130:/usr/local/hadoop/hbase
scp -r /usr/local/hadoop/hbase root@192.168.113.131:/usr/local/hadoop/hbase
传输完记得配置子机的hbase环境变量(同主机)并在/usr/local/hadoop目录下创建pids文件夹
做完这些准备启动集群验证了!
4.验证(启动集群)
start-all.sh(启动hadoop)
zkServer.sh start(启动zookeeper)
zkServer.sh status(启动zookeeper)
start-hbase.sh(启动Hbase)
此时输入jps
接下来在自己电脑浏览器输入:
192.168.113.120:16010(Hbase web显示)
四、mysql安装配置
我们安装的linux自带mariadb数据库,如果不卸载它后续配置会出错!
查找:
yum list installed | grep mariadb
如果存在就卸载:
yum -y remove mariadb-libs.x86_64
2.下载MYSQL
wget -c -t 0 https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-5.7.36-linux-glibc2.12-x86_64.tar.gz
常规操作:
tar -zxvf mysql-5.7.36-linux-glibc2.12-x86_64.tar.gz
mv mysql-5.7.36-linux-glibc2.12-x86_64 /usr/local/mysql
cd /usr/local
创建mysql工作组
groupadd mysql
useradd -r -g mysql mysql
创建目录并赋予root权限
mkdir -p /data/mysql
chown mysql:mysql -R /data/mysql
vi /etc/my.cnf
[mysqld]
bind-address=0.0.0.0
port=3306
user=mysql
basedir=/usr/local/mysql
datadir=/data/mysql
socket=/tmp/mysql.sock
log-error=/data/mysql/mysql.err
pid-file=/data/mysql/mysql.pid
#character config
character_set_server=utf8mb4
symbolic-links=0
explicit_defaults_for_timestamp=true
cd /usr/local/mysql/bin/
准备初始化方法一:(这种是不设密码【推荐】)
./mysqld --initialize-insecure --user=mysql --datadir=/data/mysql/ --basedir=/usr/local/mysql/
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
service mysql start
启动成功!
准备初始化方法二:(这种是设密码)
初始化
./mysqld --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql/ --datadir=/data/mysql/ --user=mysql –initialize
查看密码
cat /data/mysql/mysql.err
cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
service mysql start
查看mysql进程
ps -ef|grep mysql
登录:
./mysql -uroot -p
设置密码(123456)
SET PASSWORD = PASSWORD('123456');
ALTER USER 'root'@'localhost' PexitASSWORD EXPIRE NEVER;
FLUSH PRIVILEGES;
下面这三步不设置就不能远程连接
use mysql #访问mysql库
update user set host = '%' where user = 'root'; #使root能再任何host访问
FLUSH PRIVILEGES;
将mysql权限赋予给hive
create database hive;
grant all on *.* to root@localhost identified by '123456';
flush privileges;
Exit(退出mysql)
五、安装hive
!!这步在自己电脑上下载!!下载mysql的驱动程序(只用到压缩包里的mysql-connector-java-5.1.38-bin.jar建议本地解压,通过共享文件夹提取jar传入虚拟机)
mysql-connector-java-5.1.38-bin.jar链接:https://pan.baidu.com/s/1uKrpy21bRxivzayRlibCJA
提取码:391u
vm上下载hive
wget -c -t 0 http://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
tar -zxvf apache-hive-3.1.2-bin.tar.gz
mv ./apache-hive-3.1.2-bin/ ./hive /usr/local/hadoop/hive
通过vm软件创建共享文件夹
设置完共享文件夹:
进入共享文件夹work就是我建个文件夹/mnt/hgfs/是虚拟机默认的共享位置
cd /mnt/hgfs/work
复制mysql-connector-java-5.1.38-bin.jar进hive的lib目录
cp mysql-connector-java-5.1.38-bin.jar /usr/local/hadoop/hive/lib/
配置环境变量
vi /etc/profile
#hive
export HIVE_HOME=/usr/local/hadoop/hive/
export PATH=$PATH:$HIVE_HOME/bin;
source /etc/profile
cd conf
.template是配置文件模板可以直接cp改名修改,cp命令如下:(但我是直接新建设置所以不进行以下操作!)
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
vi hive-env.sh
export HADOOP_HOME=/usr/local/hadoop/
export HIVE_CONF_DIR=/usr/local/hadoop/hive/conf/
export HIVE_AUX_JARS_PATH=/usr/local/hadoop/hive/lib/#hive jar包路径
启动hadoop集群(命令见上面的hadoop启动)
创建三个目录
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -mkdir -p /user/hive/tmp
hadoop fs -mkdir -p /user/hive/log
建完可以在web9870端口看到我们创建的文件夹!
改权限
hadoop fs -chmod -R 777 /user/hive/warehouse
hadoop fs -chmod -R 777 /user/hive/tmp
hadoop fs -chmod -R 777 /user/hive/log
关闭集群
vi hive-site.xml
<configuration>
<!-- 记录HIve中的元数据信息 记录在mysql中 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.113.120:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<!-- jdbc mysql驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- mysql的用户名和密码 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
</property>
<!-- 日志目录 -->
<property>
<name>hive.querylog.location</name>
<value>/user/hive/log</value>
</property>
<!-- 设置metastore的节点信息 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoopm:9083</value>
</property>
<!-- 客户端远程连接的端口 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>0.0.0.0</value>
</property>
<!-- hive服务的页面的端口 -->
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.server2.long.polling.timeout</name>
<value>5000</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
</configuration>
进hadoop配置文件配置hadoop的core-site.xml加上下面内容
vi /usr/local/hadoop/etc/hadoop/core-site.xml
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
解决guava jar冲突
先屏蔽hive本身guava jar包
cd ..
cd lib
mv guava-19.0.jar guava-19.0.jar.bak
#复制hadoop内jar包给hive
cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/hadoop/hive/lib/
登录mysql
./schematool -initSchema -dbType mysql
启动hive
./hive --service metastore &
./hive
查看hive表
show databases;
同时也可以打开192.168.113.120:9870查看hive元数据信息!
更多细节后续会不断更新!
—— 小海出品!