Hive集群部署指导文档(下)

​​​​​​5 搭建Hadoop

5.1 解压安装包并修改解压后的文件夹名

将hadoop安装包解压至master的/opt目录:

tar -zxvf hadoop-3.1.3.tar.gz -C /opt

mv /opt/hadoop-3.1.3 /opt/hadoop

5.2 配置环境变量

vi /etc/profile

export HADOOP_HOME=/opt/hadoop

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_COF_DIR=$HADOOP_HOME/etc/hadoop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

export LD_LIBRARY_path=$HADOOP_HOME/lib/native

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$PATH

source /etc/profile


5.3 修改配置

5.3.1 配置hadoop-env.sh

进入hadoop配置文件目录:

cd /opt/hadoop/etc/hadoop/

vi hadoop-env.sh

export JAVA_HOME=/opt/jdk1.8.0_251

export HDFS_NAMENODE_USER=root

export HDFS_DATANODE_USER=root

export HDFS_JOURNALNODE_USER=root

export HDFS_ZKFC_USER=root

export HDFS_SECONDARYNAMENODE_USER=root

export YARN_RESOURCEMANAGER_USER=root

export YARN_NODEMANAGER_USER=root

export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"

export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"

 

添加以上内容后保存并退出。

5.3.2 配置core-site.xml

进入hadoop配置文件目录:

cd /opt/hadoop/etc/hadoop/

vi core-site.xml

<configuration>

<property>

      <name>fs.default.name</name>

      <value>hdfs://mycluster</value>

</property>

<property>

      <name>hadoop.tmp.dir</name>

      <value>/srv/data2/hadoop/tmp</value>

</property>

<property>

      <name>ha.zookeeper.quorum</name>    

     <value>openlookeng-gz-gh-0001:2181,openlookeng-gz-gh-0002:2181,openlookeng-gz-gh-0003:2181</value>

</property>

<property>

      <name>hadoop.proxyuser.root.hosts</name>

      <value>*</value>

</property>

<property>

      <name>hadoop.proxyuser.root.groups</name>

      <value>*</value>

</property>

</configuration>

添加以上内容后保存,其中斜体标红的部分修改为实际节点对应的主机名,hadoop.tmp.dir的值可根据实际情况自行修改。

5.3.3 配置hdfs-site.xml

vi hdfs-site.xml

<configuration>

<property>

    <name>dfs.nameservices</name>

    <value>mycluster</value>

</property>

<!-- 定义hadoop集群文件的副本数量 -->

<property>

    <name>dfs.replication</name>

    <value>2</value>               

</property>

<!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->

<property>

    <name>dfs.ha.namenodes.mycluster</name>

    <value>nn1,nn2</value>

</property>

<!-- nn1的RPC通信地址 -->

<property>

    <name>dfs.namenode.rpc-address.mycluster.nn1</name>

    <value>openlookeng-gz-gh-0001:9000</value>

</property>

<!-- nn1的http通信地址,有个管理界面用于下载文件 -->

<property>

    <name>dfs.namenode.http-address.mycluster.nn1</name>

    <value>openlookeng-gz-gh-0001:50070</value>

</property>

<!-- nn2的RPC通信地址 -->

<property>

    <name>dfs.namenode.rpc-address.mycluster.nn2</name>

    <value>openlookeng-gz-gh-0002:9000</value>

</property>

<!-- nn2的http通信地址 -->

<property>

    <name>dfs.namenode.http-address.mycluster.nn2</name>

    <value>openlookeng-gz-gh-0002:50070</value>

</property>

<!-- nn3的RPC通信地址 -->

<property>

    <name>dfs.namenode.rpc-address.mycluster.nn3</name>

    <value>openlookeng-gz-gh-0003:9000</value>

</property>

<!-- nn3的http通信地址 -->

<property>

    <name>dfs.namenode.http-address.mycluster.nn3</name>

    <value>openlookeng-gz-gh-0003:50070</value>

</property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->

<property>

    <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://openlookeng-gz-gh-0001:8485;openlookeng-gz-gh-0002:8485;openlookeng-gz-gh-0003:8485/mycluster</value>

</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->

<property>

    <name>dfs.journalnode.edits.dir</name>

    <value>/srv/data2/hadoop/journal</value>

</property>

<!-- 开启NameNode失败自动切换 -->

<property>

    <name>dfs.ha.automatic-failover.enabled</name>

    <value>true</value>

</property>

<property>

    <name>dfs.client.failover.proxy.provider.mycluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

    <name>dfs.ha.fencing.methods</name>

    <value>

        sshfence

        shell(/bin/true)

    </value>

</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->

<property>

    <name>dfs.ha.fencing.ssh.private-key-files</name>

    <value>/root/.ssh/id_rsa</value>

</property>

<!-- 配置sshfence隔离机制超时时间(active坏了之后,standby如果没有在30秒之内未连接上,那么standby将变成active) -->

<property>

    <name>dfs.ha.fencing.ssh.connect-timeout</name>

    <value>30000</value>

</property>

<property>

   <name>dfs.permissions.enable</name>

   <value>true</value>

</property>

<property>

   <name>dfs.namenode.acls.enabled</name>

   <value>true</value>

</property>

<property>

    <name>dfs.datanode.data.dir</name>

<value>/srv/data1/hadoop,/srv/data2/hadoop,/srv/data3/hadoop,/srv/data4/hadoop,/srv/data5/hadoop,/srv/data6/hadoop,/srv/data7/hadoop,/srv/data8/hadoop,/srv/data9/hadoop</value>

</property>

</configuration>

以上内容中标红的内容根据实际情况进行修改。

5.3.4 配置mapred-site.xml

vi mapred-site.xml

<configuration>

<property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

</property>

</configuration>

5.3.5 配置yarn-site.xml

vi yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

   <name>yarn.resourcemanager.ha.enabled</name>

   <value>true</value>

</property>

<!-- 指定RM的cluster id -->

<property>

   <name>yarn.resourcemanager.cluster-id</name>

   <value>yrc</value>

</property>

<!-- 指定RM的名字 -->

<property>

   <name>yarn.resourcemanager.ha.rm-ids</name>

   <value>rm1,rm2</value>

</property>

<!-- 分别指定RM的地址 -->

<property>

   <name>yarn.resourcemanager.hostname.rm1</name>

   <value>openlookeng-gz-gh-0002</value>

</property>

<property>

   <name>yarn.resourcemanager.hostname.rm2</name>

   <value>openlookeng-gz-gh-0003</value>

</property>

<!-- 指定zk集群地址 -->

<property>

   <name>yarn.resourcemanager.zk-address</name>

   <value>openlookeng-gz-gh-0001:2181,openlookeng-gz-gh-0002:2181,openlookeng-gz-gh-0003:2181</value>

</property>

<!--reduce获取数据时通过shuffle方式-->

<property>

   <name>yarn.nodemanager.aux-services</name>

   <value>mapreduce_shuffle</value>

</property>

<property>

    <name>yarn.nodemanager.vmem-check-enabled</name>

    <value>false</value>

</property>

<property>

    <name>yarn.nodemanager.vmem-pmem-ratio</name>

    <value>5</value>

</property>

</configuration>

以上内容中标红的内容根据实际情况进行修改。

5.3.6 配置slaves和workers

在slaves和workers文件中写入集群各节点的主机名。

openlookeng-gz-gh-0001

openlookeng-gz-gh-0002

openlookeng-gz-gh-0003

5.4 配置其它节点

将master解压并配置好的hadoop文件夹传至其它节点上:

scp -r /opt/hadoop-3.1.3 root@192.168.0.200:/opt

scp -r /opt/hadoop-3.1.3 root@192.168.0.158:/opt

配置环境变量,同5.2

每个节点上创建core-site.xml配置的hadoop.tmp.dir目录:

mkdir -p /srv/data2/hadoop/tmp

5.5 格式化namenode和zkfc服务

1、启动各节点的journalnode服务:

cd /opt/hadoop-3.1.3/sbin

hdfs --daemon start journalnode

2、Master节点上执行命令格式化namenode:

hdfs namenode -format

3、将格式化nm1生成的文件目录拷贝到nm2的对应目录下:

scp -r /srv/data2/hadoop/tmp/dfs root@192.168.0.200:/srv/data2/hadoop/tmp/

4、Master节点上执行命令格式化zkfc:

hdfs zkfc -formatZK

5.6 启动hadoop

在集群master节点上执行脚本启动hadoop和yarn服务:

start-all.sh

停止服务:stop-all.sh

6 搭建hive

6.1 安装mysql

6.1.1 安装MySQL并启动服务

6.1.2 修改root用户密码及权限

  1. 查看初始密码:cat /var/log/mysqld.log |grep password
  2. 使用初始密码进入mysql客户端:mysql -u root -p  (回车之后键入初始密码)
  3. 修改root用户密码及权限

alter user 'root'@'localhost' identified by 'openLooKeng@123';

use mysql;

update user set host='%' where user='root';

select host, user from user;

flush privileges;

grant all PRIVILEGES on *.* to 'root'@'%' WITH GRANT OPTION;

flush privileges;

6.2 配置hive

6.2.1 解压安装包并修改文件夹名

将hive安装包解压至master的/opt目录:

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt

cd /opt && mv apache-hive-3.1.2-bin hive

并将mysql的jdbc驱动jar包放到hive的lib目录下。

6.2.2 配置环境变量

vi /etc/profile

export HIVE_HOME=/opt/hive

export HIVE_CONF_DIR=/opt/hive/conf

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH

source /etc/profile

6.2.3 修改配置文件

进入hive的配置文件目录:

cd /opt/hive-3.1.2/conf

将hadoop的core-site.xml和hdfs-site.xml文件拷贝至改目录:

cp /opt/hadoop-3.1.3/etc/hadoop/core-site.xml .

cp /opt/hadoop-3.1.3/etc/hadoop/hdfs-site.xml .

 

配置hive-site.xml:

<configuration>

<property>

    <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.0.131:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true</value>

</property>

<property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

</property>

<property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>root</value>

    <description>Username to use against metastore database</description>

</property>

<property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>openLooKeng@123</value>

    <description>password to use against metastore database</description>

</property>

<property>

    <name>hive.metastore.schema.verification</name>

    <value>false</value>

</property>

<property>

    <name>hive.metastore.warehouse.dir</name>

    <value>/user/hive/warehouse</value>

    <description>location of default database for the warehouse</description>

</property>

<property>

    <name>hive.metastore.uris</name>

    <value>thrift://192.168.0.37:9083,thrift://192.168.0.157:9083</value>

    <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>

</property>

<property>

    <name>hive.exec.dynamic.partition.mode</name>

    <value>nonstrict</value>

</property>

<property>

    <name>hive.server2.thrift.bind.host</name>

    <value/>

    <description>Bind host on which to run the HiveServer2 Thrift service.</description>

</property>

<property>

    <name>hive.server2.thrift.port</name>

    <value>10000</value>

    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>

</property>

<property>

    <name>hive.server2.logging.operation.enabled</name>

    <value>true</value>

    <description>When true, HS2 will save operation logs and make them available for clients</description>

</property>

<property>

    <name>hive.server2.logging.operation.log.location</name>

    <value>/tmp/operation_logs</value>

    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>

</property>

<property>

    <name>hive.execution.engine</name>

    <value>mr</value>

</property>

<property>

    <name>hive.exec.mode.local.auto</name>

    <value>true</value>

</property>

<property>

    <name>hive.support.concurrency</name>

    <value>true</value>

</property>

</configuration>

以上内容中标红的内容根据实际情况进行修改。

6.2.4配置其它节点

将master解压并配置好的hive文件夹传至其它节点上:

scp -r /opt/hive-3.1.2 root@192.168.0.200:/opt

scp -r /opt/hive-3.1.2 root@192.168.0.158:/opt

配置环境变量,同6.2.2

6.2.5初始化数据库

  1. hdfs文件系统上创建目录并修改权限:

hdfs dfs -mkdir -p /user/hive/warehouse

hdfs dfs -mkdir -p /tmp/

hdfs dfs -chmod -R 777 /user/hive/warehouse

hdfs dfs -chmod -R 777 /tmp/

    2. Master上执行命令初始化数据库:

schematool  -initSchema -dbType mysql

6.3 启动hive

执行命令启动元数据服务:

nohup hive --service metastore>metastore.log &

启动hiveserver2服务后可使用beeline登录客户端:

nohup hive --service hiveserver2 > hiveserver2.log &

使用Beeline登录客户端的方式:

beeline -u jdbc:hive2://192.168.0.131:10000 -n root

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值