电商数仓项目(十一) Hive安装与配置,以及配置tez


本节讲解Hive的安装与配置。
配置文件下载

一、集群规划

在node01 安装,同步到node02,node03

node01node02node03
hivehivehive

二、下载与设置

# 1. 下载hive
[jack@node01 u02]$ wget https://mirror.bit.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

# 2. 解压缩
[jack@node01 u02]$ tar -zxf apache-hive-3.1.2-bin.tar.gz -C /u01
[jack@node01 u02]$ cd /u01
[jack@node01 u01]$ mv apache-hive-3.1.2-bin hive-3.1.2

# 3. 修改环境变量
[jack@node01 u01]$ sudo vi /etc/profile
export HIVE_HOME=/u01/hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin
[jack@node01 u01]$ source /etc/profile

# 4. 创建文件目录
[jack@node01 hive-3.1.2]$ mkdir -p logs

# 5. 复制数据库驱动文件和guava
[jack@node01 u02]$ scp mysql-connector-java-8.0.22.jar /u01/hive-3.1.2/lib
[jack@node01 u02]$ scp guava-27.0-jre.jar /u01/hive-3.1.2/lib

# 6. 删除日志和guava的jar
[jack@node01 u02]$ cd /u01/hive-3.1.2/lib
[jack@node01 lib]$ rm -rf log4j-slf4j-impl-2.10.0.jar guava-19.0.jar

# 7. 创建hdfs目录
[jack@node01 conf]$ hdfs dfs -mkdir -p /hive
[jack@node01 conf]$ hdfs dfs -mkdir -p /hive/warehouse

# 8. 添加env配置文件
[jack@node01 conf]$ mv hive-env.sh.template hive-env.sh
[jack@node01 conf]$ vi hive-env.sh
export HADOOP_HOME=/u01/hadoop-3.2.2
export HIVE_CONF_DIR=/u01/hive-3.1.2/conf

# 9. 添加日志配置文件
[jack@node01 conf]$ mv hive-log4j2.properties.template hive-log4j2.properties
[jack@node01 conf]$ vi hive-log4j2.properties
property.hive.log.dir = /u01/hive-3.1.2/logs

# 10. 添加xml配置文件
# 在conf目录添加hive-site.xml文件,
# 主要mysql的驱动,连接用户名和密码,hive的用户名和密码
[jack@node01 conf]$ touch hive-site.xml
[jack@node01 conf]$ vim hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		 <value>jdbc:mysql://node02:3306/hivemeta?createDatabaseIfNotExist=true</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.cj.jdbc.Driver</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>3480abcd</value>
	</property>
	<property>
		<name>datanucleus.schema.autoCreateAll</name>
		<value>true</value>
	</property>
	<!-- 显示表的列名 -->
	<property>
		<name>hive.cli.print.header</name>
		<value>true</value>
	</property>
	<!-- 显示数据库名称 -->
	<property>
		<name>hive.cli.print.current.db</name>
		<value>true</value>
	</property>
    <!-- 设置 metastore hdfs 目录-->
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>hdfs://node01:9000/hive/warehouse</value>
		<description>location of default database for the warehouse</description>
	</property>
	<property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
	
	<property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <!-- 客户端远程连接的端口 -->
    <property> 
       <name>hive.server2.thrift.port</name> 
       <value>10000</value>
    </property>
    <property> 
       <name>hive.server2.thrift.bind.host</name> 
       <value>node01</value>
    </property>
    <property>
       <name>hive.server2.webui.host</name>
       <value>node01</value>
    </property>
    <!-- hive服务的页面的端口 -->
    <property>
       <name>hive.server2.webui.port</name>
       <value>10002</value>
    </property>
    <property> 
       <name>hive.server2.long.polling.timeout</name> 
       <value>5000</value>                               
    </property>
    <property>
       <name>hive.server2.enable.doAs</name>
       <value>true</value>
    </property>
    <property>
       <name>hive.server2.thrift.client.user</name>
       <value>jack</value>
       <description>Username to use against thrift client</description>
    </property>
    <property>
       <name>hive.server2.thrift.client.password</name>
       <value>3480abcd</value>
       <description>Password to use against thrift client</description>
    </property>
</configuration>
# 11. 修改hadoop的配置文件,添加hive用户jack,通过hadoop设置用户组,重启hadoop
# 修改/u01/hadoop-3.2.2/etc/hadoop/core-site.xml文件,添加如下内容
<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>

  <property>
    <name>hadoop.proxyuser.jack.hosts</name>
    <value>*</value>
  </property>

  <property>
    <name>hadoop.proxyuser.jack.groups</name>
    <value>*</value>
  </property>

三、初始化元数据

  1. 在mysql中创建数据库,hivemeta,初始化数据库
  2. 初始化数据库
[jack@node01 bin]$ schematool -initSchema -dbType mysql -verbose
  1. 修改数据库字符集
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;

四、hive启与停

  1. 启动脚本
    在hive2.x以上版本中,需要先启动metaStore 和 HiveServer2服务。
    编写启动脚本如下:放在/u01/bin下,名称:hive.sh
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs

mkdir -p $HIVE_LOG_DIR

#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
    pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
    ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
    echo $pid
    [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}

function hive_start()
{
    metapid=$(check_process HiveMetastore 9083)
    cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
    cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
    [ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
    server2pid=$(check_process HiveServer2 10000)
    cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
    [ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}

function hive_stop()
{
    metapid=$(check_process HiveMetastore 9083)
    [ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
    server2pid=$(check_process HiveServer2 10000)
    [ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}

case $1 in
"start")
    hive_start
    ;;
"stop")
    hive_stop
    ;;
"restart")
    hive_stop
    sleep 2
    hive_start
    ;;
"status")
    check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
    check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
    ;;
*)
    echo Invalid Args!
    echo 'Usage: '$(basename $0)' start|stop|restart|status'
    ;;
esac
  1. 启动与停止
[jack@node01 hive-3.1.2]$ cd /u01/bin
[jack@node01 bin]$ hive.sh start
[jack@node01 bin]$ hive.sh stop

五、tez的配置

  1. 在hadoop中创建tez目录存放tez-0.10.0.tar.gz文件
[jack@node01 /]$ hdfs dfs -mkdir -p /tez
[jack@node01 /]$ hdfs dfs -put /u01/tez/tez-0.10.0.tar.gz /tez
  1. 解压缩
[jack@node01 /]$ tar -zxf tez-0.10.0-minimal.tar.gz -C /u01
  1. tez的配置文件
    配置文件tez-site.xml,放在$HADOOP_HOME/etc/hadoop目录下。同步到其他hadoop节点。
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>tez.lib.uris</name>
    <value>${fs.defaultFS}/tez/tez-0.10.0.tar.gz</value>
  </property> 
  <property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>true</value>
  </property>
  <property>
    <name>tez.history.logging.service.class</name>
    <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
  </property>
  
  <property>
     <name>tez.am.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.am.resource.cpu.vcores</name>
     <value>1</value>
  </property>
  <property>
     <name>tez.task.resource.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
     <name>tez.task.resource.cpu.vcores</name>
     <value>1</value>
  </property>
  <property>
     <name>tez.container.max.java.heap.fraction</name>
     <value>0.2</value>
  </property>
</configuration>
  1. Hadoop的配置文件修改
    (1) 修改mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
   <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
   </property>
   
   <property>
     <name>yarn.app.mapreduce.am.env</name>
     <value>HADOOP_MAPRED_HOME=/u01/hadoop-3.2.2/etc/hadoop:/u01/hadoop-3.2.2/share/hadoop/common/lib/*:/u01/hadoop-3.2.2/share/hadoop/common/*:/u01/hadoop-3.2.2/share/hadoop/hdfs:/u01/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/u01/hadoop-3.2.2/share/hadoop/hdfs/*:/u01/hadoop-3.2.2/share/hadoop/mapreduce/*:/u01/hadoop-3.2.2/share/hadoop/yarn:/u01/hadoop-3.2.2/share/hadoop/yarn/lib/*:/u01/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>
   <property>
     <name>mapreduce.map.env</name>
     <value>HADOOP_MAPRED_HOME=/u01/hadoop-3.2.2/etc/hadoop:/u01/hadoop-3.2.2/share/hadoop/common/lib/*:/u01/hadoop-3.2.2/share/hadoop/common/*:/u01/hadoop-3.2.2/share/hadoop/hdfs:/u01/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/u01/hadoop-3.2.2/share/hadoop/hdfs/*:/u01/hadoop-3.2.2/share/hadoop/mapreduce/*:/u01/hadoop-3.2.2/share/hadoop/yarn:/u01/hadoop-3.2.2/share/hadoop/yarn/lib/*:/u01/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>
   <property>
     <name>mapreduce.reduce.env</name>
     <value>HADOOP_MAPRED_HOME=/u01/hadoop-3.2.2/etc/hadoop:/u01/hadoop-3.2.2/share/hadoop/common/lib/*:/u01/hadoop-3.2.2/share/hadoop/common/*:/u01/hadoop-3.2.2/share/hadoop/hdfs:/u01/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/u01/hadoop-3.2.2/share/hadoop/hdfs/*:/u01/hadoop-3.2.2/share/hadoop/mapreduce/*:/u01/hadoop-3.2.2/share/hadoop/yarn:/u01/hadoop-3.2.2/share/hadoop/yarn/lib/*:/u01/hadoop-3.2.2/share/hadoop/yarn/*</value>
   </property>

   <property>  
     <name>mapred.map.output.compression.codec</name>  
     <value>com.hadoop.compression.lzo.LzoCodec</value>  
   </property>  

   <property>  
     <name>mapred.child.env</name>  
     <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value>  
   </property>
   
   <property>  
     <name>mapred.child.java.opts</name>  
     <value>-Xmx1048m</value>  
   </property> 
   
   <property>  
     <name>mapreduce.map.java.opts</name>  
     <value>-Xmx1310m</value>  
   </property> 
   
   <property>  
     <name>mapreduce.reduce.java.opts</name>  
     <value>-Xmx2620m</value>  
   </property> 
   
   <property>
     <name>mapreduce.job.counters.limit</name>
     <value>20000</value>
     <description>Limit on the number of counters allowed per job. The default value is 200.</description>
   </property>
</configuration>

(2) 修改yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
   <!-- Site specific YARN configuration properties -->
   <!-- Reducer获取数据的方式 -->
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
   
   <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>node01</value>
   </property>
   
   <property>
      <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>7192</value>
   </property>
   
   <property>
      <description>The minimum allocation for every container request at the RM,in MBs. 
	  Memory requests lower than this won't take effect,and the specified value will get allocated at minimum.</description>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>1024</value>
   </property>


   <property>
      <description>The maximum allocation for every container request at the RM,in MBs. 
	  Memory requests higher than this won't take effect, and will get capped to this value.</description>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>7192</value>
   </property>

   <property>
      <name>yarn.nodemanager.vmem-check-enabled</name>
	  <value>false</value>
   </property>
</configuration>

(3) 新增shellprofile.d目录下的tez.sh文件

hadoop_add_profile tez
function _tez_hadoop_classpath
{
hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after
hadoop_add_classpath "/u01/tez-0.10.0/*" after
hadoop_add_classpath "/u01/tez-0.10.0/lib/*" after
}

(4) 修改hive-site.xml,添加如下:

    <property>
      <name>hive.execution.engine</name>
      <value>tez</value>
    </property>
	
    <property>
      <name>hive.tez.container.size</name>
      <value>1024</value>
    </property>

(5) 修改hive-env.sh

# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/u01/hadoop-3.1.4

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/u01/hive-3.1.2/conf

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export TEZ_HOME=/u01/tez-0.10.0
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done

export HIVE_AUX_JARS_PATH=/u01/hadoop-3.1.4/share/hadoop/common/hadoop-lzo-0.4.21.jar$TEZ_JARS

(6) 同步hadoop配置到其他节点,同步tez到其他节点。同时停止hadoop,再启用hadoop

[jack@node01 hadoop]$ cd /u01/bin
[jack@node01 bin]$ xsync /u01/hadoop-3.2.2/etc/hadoop
[jack@node01 bin]$ xsync /u01/tez-0.10.0
[jack@node01 bin]$ stop-all.sh
[jack@node01 bin]$ start-all.sh

电商数仓项目(一) 系统规划和配置
电商数仓项目(二) Maven 安装和hadoop-lzo编译
电商数仓项目(三) hadoop3.2.2 安装与配置
电商数仓项目(四) 模拟电商日志数据开发
电商数仓项目(五) azkaban安装、配置和使用
电商数仓项目(六) zookeeper安装和配置
电商数仓项目(七) kafka 安装、配置和简单操作
电商数仓项目(八) Flume(1) 安装和配置
电商数仓项目(八) Flume(2) 拦截器开发
电商数仓项目(八) Flume(3) 生产者和消费者配置
电商数仓项目(九) Sqoop安装与配置

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

涛2021

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值