数仓项目分层简介和Hive环境搭建(替换计算引擎为Tez)

一 分层架构

数仓分层架构

  • ODS(Operation Data Sore)原始数据层,将数据原封不动的加载到hive表中
  • DWD(Data Warehouse Detail)数据明细层,对原始数据进行ETL后的数据
  • DWS(Data Warehouse Service) 数据服务层,以DWD为基础进行轻度汇总
  • ADS(application data store)数据应用层,再DWS基础上对数据进行汇总
    命名规范:
  • ODS层命名为ods
  • DWD层命名为dwd
  • DWS层命名为dws
  • ADS层命名为ads
  • 临时表数据库命名为xxx_tmp
  • 备份数据数据库命名为xxx_bak

二 hive 安装(1.2.1)

2.1 集群规划

服务器hadoop102服务器hadoop102服务器hadoop102
hive
mysql

2.2 hive 安装配置

2.2.1 上传hive安装包修改配置文件

修改hive安装目录下conf目录下的hive-env.sh.template为hive.env.sh

export HADOOP_HOME=/opt/module/hadoop-2.7.2
export HIVE_CONF_DIR=/opt/module/hive1.2.1/conf

2.2.2 修改元数据存储为mysql(安装mysql切换至root用户)

2.2.2.1 检查是否安装的有mysql,有则卸载掉自带的
2.2.2.2 安装mysql服务端
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm 

查看生成的密码
查看默认生成密码文件中的密码,复制密码

cat /root/.mysql_secret

2.2.2.3 安装mysql客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm 

使用默认密码登陆后修改密码

mysql -uroot -phT1sbje3p4fP2bQJ
mysql> SET PASSWORD=PASSWORD('root');
 update user set host='%' where host='localhost';
 flush privileges;
2.2.2.4 修改mysql密码和授权用户访问
SET PASSWORD=PASSWORD('root');
delete from mysql.user where host!='localhost'
2.2.2.5 新增hive-site.xml 配置mysql连接四个字符串
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
	  <name>javax.jdo.option.ConnectionURL</name>
	  <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
	  <description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionDriverName</name>
	  <value>com.mysql.jdbc.Driver</value>
	  <description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionUserName</name>
	  <value>root</value>
	  <description>username to use against metastore database</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionPassword</name>
	  <value>000000</value>
	  <description>password to use against metastore database</description>
	</property>
</configuration>
<!--控制使用hive客户端的时候显示数据库和列名-->
<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>
2.2.2.6 拷贝mysql驱动至hive/lib目录下
cp mysql-connector-java-5.1.27-bin.jar /opt/module/hive1.2.1/lib/

2.2.3 修改hive的log存储目录(默认存在/tmp)

mv hive-log4j.properties.template hive-log4j.properties
vim hive-log4j.properties
hive.log.dir=/opt/module/hive1.2.1/logs/

2.2.4 hive启动测试

尖叫提示:启动hive之前需先启动hadoop
启动之前在hdfs上创建hive的数据目录

hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

启动hive使用belline连接

/opt/module/hive1.2.1/bin/hiveserver2 &
/opt/module/hive1.2.1/bin/belline
!connect jdbc:hive2://hadoop102:10000
输入用户名回车
0: jdbc:hive2://hadoop102:10000> show databases;
OK
+----------------+--+
| database_name  |
+----------------+--+
| default        |
+----------------+--+
1 row selected (1.262 seconds)
0: jdbc:hive2://hadoop102:10000> create table test(id int,name string);
OK
No rows affected (0.29 seconds)
0: jdbc:hive2://hadoop102:10000> show tables;
OK
+-----------+--+
| tab_name  |
+-----------+--+
| test      |
+-----------+--+
1 row selected (0.047 seconds)

查看对应hdfs上的目录
hdfs上数据存放目录

2.2.5 修改计算引擎为Tez

Tez是一个Hive的运行引擎,性能优于MR

2.2.5.1 上传Tez压缩包解压
tar -zxf apache-tez-0.9.1-bin.tar.gz -C /opt/module/

修改文件名为

mv apache-tez-0.9.1-bin/ tez-0.9.1
2.2.5.2 hive配置Tez引擎
1 在hive-env.sh文件中添加tez环境变量配置和依赖包环境变量配置
export TEZ_HOME=/opt/module/tez-0.9.1  #是你的tez的解压目录
export TEZ_JARS=""
for jar in `ls $TEZ_HOME |grep jar`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/$jar
done
for jar in `ls $TEZ_HOME/lib`; do
    export TEZ_JARS=$TEZ_JARS:$TEZ_HOME/lib/$jar
done
export HIVE_AUX_JARS_PATH=/opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20.jar$TEZ_JARS
2 在hive-site.xml文件中添加如下配置,更改hive计算引擎
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>
3 在Hive的/opt/module/hive/conf下面创建一个tez-site.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
	<name>tez.lib.uris</name>   
	#${fs.defaultFS}/tez/tez-0.9.1 hadoop上对应的目录
	 <value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
	<name>tez.lib.uris.classpath</name>    	
	<value>${fs.defaultFS}/tez/tez-0.9.1,${fs.defaultFS}/tez/tez-0.9.1/lib</value>
</property>
<property>
     <name>tez.use.cluster.hadoop-libs</name>
     <value>true</value>
</property>
<property>
 <name>tez.history.logging.service.class</name>      
 <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
4 将/opt/module/tez-0.9.1上传到HDFS的/tez路径
hadoop fs -mkdir /tez
hadoop fs -put /opt/module/tez-0.9.1/  /tez
5 测试
create table test(id int,name string);
insert into  test values(1,'zhangsan');

运行时界面

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
VERTICES: 00/00  [>>--------------------------] 0%    ELAPSED TIME: 20.10 s    
--------------------------------------------------------------------------------

运行报错:(这种问题是从机上运行的Container试图使用过多的内存,而被NodeManager kill掉了)
240.6 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

Failing this attempt. Failing the application.
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Application application_1566549334337_0002 failed 2 times due to AM Container for appattempt_1566549334337_0002_000002 exited with  exitCode: -103
For more detailed output, check application tracking page:http://hadoop103:8088/cluster/app/application_1566549334337_0002Then, click on links to logs of each attempt.
Diagnostics: Container [pid=10784,containerID=container_1566549334337_0002_02_000001] is running beyond virtual memory limits. Current usage: 240.6 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1566549334337_0002_02_000001 :

解决:
方案一:或者是关掉虚拟内存检查。我们选这个,修改yarn-site.xml

<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

方案二:mapred-site.xml中设置Map和Reduce任务的内存配置如下:(value中实际配置的内存需要根据自己机器内存大小及应用情况进行修改)

<property>
  <name>mapreduce.map.memory.mb</name>
  <value>1536</value>
</property>
<property>
  <name>mapreduce.map.java.opts</name>
  <value>-Xmx1024M</value>
</property>
<property>
  <name>mapreduce.reduce.memory.mb</name>
  <value>3072</value>
</property>
<property>
  <name>mapreduce.reduce.java.opts</name>
  <value>-Xmx2560M</value>
</property>

分发配置文件,重启集群测试成功

hive (default)> insert into test values(1,"zhangsan");
Query ID = guochao_20190823172000_58ae4b86-f130-4dd6-b1bb-8969c0b19f72
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1566551738666_0001)

--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 8.11 s     
--------------------------------------------------------------------------------
Loading data to table default.test
Table default.test stats: [numFiles=1, numRows=1, totalSize=11, rawDataSize=10]
OK
_col0	_col1
Time taken: 10.39 seconds

总结:

  • 在启动hive的时候需先启动hdfs
  • 为防止权限检测问题,需先在hdfs上创建指定的目录,并修改对应的权限

存在的问题:

  • 由于hive的元数据信息存储在mysql中,所以mysql需要做高可用,可参考技术
    基于keepalive实现mysql的高可用。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Master_slaves

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值