existed hive ods_离线电商数仓（三十二）之系统业务数据仓库（五）数仓搭建-ODS层(一) 安装Hive2.3...

最新推荐文章于 2021-05-07 17:23:24 发布

zqk666mkq

最新推荐文章于 2021-05-07 17:23:24 发布

阅读量116

点赞数

文章标签： existed hive ods

本文链接：https://blog.csdn.net/weixin_28949743/article/details/112048324

版权

0 简介

1)保持数据原貌不做任何修改，起到备份数据的作用。

2)数据采用LZO压缩，减少磁盘存储空间。100G数据可以压缩到10G以内。

3)创建分区表，防止后续的全表扫描，在企业开发中大量使用分区表。

4)创建外部表。在企业开发中，除了自己用的临时表，创建内部表外，绝大多数场景都是创建外部表。

1安装Hive

1)上传apache-hive-2.3.0-bin.tar.gz到/opt/software目录下，并解压到/opt/module

[atguigu@hadoop102 software]$ tar -zxvf apache-hive-2.3.6-bin.tar.gz -C /opt/module/

2)修改apache-hive-2.3.6-bin名称为hive

[atguigu@hadoop102 module]$ mv apache-hive-2.3.6-bin hive

3)将Mysql的mysql-connector-java-5.1.27-bin.jar拷贝到/opt/module/hive/lib/

[atguigu@hadoop102 module]$ cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/

4)在/opt/module/hive/conf路径上，创建hive-site.xml文件

[atguigu@hadoop102 conf]$ vim hive-site.xml

添加如下内容

javax.jdo.option.ConnectionURL

jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true

JDBC connect string for a JDBC metastore

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

Driver class name for a JDBC metastore

javax.jdo.option.ConnectionUserName

root

username to use against metastore database

javax.jdo.option.ConnectionPassword

000000

password to use against metastore database

hive.metastore.warehouse.dir

/user/hive/warehouse

location of default database for the warehouse

hive.cli.print.header

true

hive.cli.print.current.db

true

hive.metastore.schema.verification

false

datanucleus.schema.autoCreateAll

true

hive.metastore.uris

thrift://hadoop102:9083

注意：hive安装在哪个服务器节点，thrift://hadoop102:9083中的主机名就更换为相应的主机名。

5)启动服务

[atguigu@hadoop102 hive]$ nohup bin/hive --service metastore &

[atguigu@hadoop102 hive]$ nohup bin/hive --service hiveserver2 &

注意：hive2.x版本需要启动两个服务metastore和hiveserver2，否则会报错Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

4)服务启动完毕后在启动Hive

[atguigu@hadoop102 hive]$ bin/hive

2安装Tez

2)拷贝apache-tez-0.9.1-bin.tar.gz到hadoop102的/opt/software目录

[atguigu@hadoop102 software]$ ls

apache-tez-0.9.1-bin.tar.gz

3)将apache-tez-0.9.1-bin.tar.gz上传到HDFS的/tez目录下。

[atguigu@hadoop102 conf]$ hadoop fs -mkdir /tez

[atguigu@hadoop102 conf]$ hadoop fs -put /opt/software/apache-tez-0.9.1-bin.tar.gz/ /tez

4)解压缩apache-tez-0.9.1-bin.tar.gz

[atguigu@hadoop102 software]$ tar -zxvf apache-tez-0.9.1-bin.tar.gz -C /opt/module

5)修改名称

[atguigu@hadoop102 module]$ mv apache-tez-0.9.1-bin/ tez-0.9.1

3集成Tez

1)进入到Hive的配置目录：/opt/module/hive/conf

[atguigu@hadoop102 conf]$ pwd

/opt/module/hive/conf

2)在Hive的/opt/module/hive/conf下面创建一个tez-site.xml文件

[atguigu@hadoop102 conf]$ pwd

/opt/module/hive/conf

[atguigu@hadoop102 conf]$ vim tez-site.xml

添加如下内容

tez.lib.uris

${fs.defaultFS}/tez/apache-tez-0.9.1-bin.tar.gz

tez.use.cluster.hadoop-libs

true

tez.history.logging.service.class

org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService

2)在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置

[atguigu@hadoop102 conf]$ mv hive-env.sh.template hive-env.sh

[atguigu@hadoop102 conf]$ vim hive-env.sh

添加如下配置

# Set HADOOP_HOME to point to a specific hadoop install directory

export HADOOP_HOME=/opt/module/hadoop-2.7.2# Hive Configuration Directory can be controlled by:

export HIVE_CONF_DIR=/opt/module/hive/conf

# Folder containing extra libraries requiredfor hive compilation/execution can be controlled by:

export TEZ_HOME=/opt/module/tez-0.9.1#是你的tez的解压目录

export TEZ_JARS=""

for jar in `ls $TEZ_HOME |grep jar`; doexport TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar

donefor jar in `ls $TEZ_HOME/lib`; doexport TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar

done

export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS

3)在hive-site.xml文件中添加如下配置，更改hive计算引擎

hive.execution.engine

tez

4 测试

1)启动Hive

[atguigu@hadoop102 hive]$ bin/hive

2)创建表

hive (default)> create table student(

id int,

name string);

3)向表中插入数据

hive (default)> insert into student values(1,"zhangsan");

4)如果没有报错就表示成功了

hive (default)> select * from student;

1 zhangsan

5注意事项

1)运行Tez时检查到用过多内存而被NodeManager杀死进程问题：

Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.

Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

这种问题是从机上运行的Container试图使用过多的内存，而被NodeManager kill掉了。

[摘录] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don't want.

2)解决方法：

(1)关掉虚拟内存检查，修改yarn-site.xml，

yarn.nodemanager.vmem-check-enabled

false

(2)修改后一定要分发，并重新启动hadoop集群。

[atguigu@hadoop102 hadoop]$ xsync yarn-site.xml

6创建数据库

1)启动hive

[atguigu@hadoop102 hive]$ nohup bin/hive --service metastore &

[atguigu@hadoop102 hive]$ nohup bin/hive --service hiveserver2 &

[atguigu@hadoop102 hive]$ bin/hive

2)显示数据库

hive (default)> show databases;

3)创建数据库

hive (default)> create database gmall;

4)使用数据库

hive (default)> use gmall;

zqk666mkq

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
existed hive ods_离线电商数仓（三十二）之系统业务数据仓库（五）数仓搭建-ODS层(一) 安装Hive2.3...

0 简介1)保持数据原貌不做任何修改，起到备份数据的作用。2)数据采用LZO压缩，减少磁盘存储空间。100G数据可以压缩到10G以内。3)创建分区表，防止后续的全表扫描，在企业开发中大量使用分区表。4)创建外部表。在企业开发中，除了自己用的临时表，创建内部表外，绝大多数场景都是创建外部表。1安装Hive1)上传apache-hive-2.3.0-bin.tar.gz到/opt/software目录...
复制链接

扫一扫