hive数据分析实战及执行效率执行策略分析

最新推荐文章于 2022-05-11 21:02:52 发布

豹先生_MR-BAO

最新推荐文章于 2022-05-11 21:02:52 发布

阅读量1.8k

点赞数

分类专栏： hive cloudera hadoop 文章标签：数据分析 user join service hadoop 任务

本文链接：https://blog.csdn.net/A221133/article/details/6873012

版权

cloudera 同时被 3 个专栏收录

69 篇文章 0 订阅

订阅专栏

hadoop

67 篇文章 0 订阅

订阅专栏

hive

15 篇文章 1 订阅

订阅专栏

1,首先在hadoop集群中启动jobtrakker

2,hive以提供远程服务模式启动

nohup hive –service hiveserver &

3,用户关系表user_relation

字段 uid1, uid2

样例数据 1 2

2 1

2 5

5 2

4,根据需求执行分析所有用户一度好友

select a.uid1,a.uid2 from user_relation a join user_relation b on (a.uid2=b.uid1 and a.uid1=b.uid2)

总数据量：198,340,072

对于一次表关联hive会作为一个job执行，

执行结果如下：

User: hdfs
Job Name: select e.user,e.fans,f.secfans f...f.secfans(Stage-1)
Job File: hdfs://X.X.X.X:9000/home/hdfs/tmp/mapred/staging/hdfs/.staging/job_201110132010_0001/job.xml
Submit Host: XXX
Submit Host Address: X.X.X.X
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Succeeded
Started at: Thu Oct 13 20:20:07 CST 2011
Finished at: Thu Oct 13 21:53:31 CST 2011
Finished in: 1hrs, 33mins, 24sec
Job Cleanup: Successful

5,根据需求执行分析所有用户二度好友

select e.uid1,f.uid2

from

(select a.uid1,a.uid2 from user_relation a join user_relation b on (a.uid2=b.uid1 and a.uid1=b.uid2)) e join

(select a.uid1,a.uid2 from user_relation a join user_relation b on (a.uid2=b.uid1 and a.uid1=b.uid2)) f

where e.uid1<>f.uid2

总数据量：198,340,072

对于子表关联，hive会分成多个任务进行串行在上sql中会分成三个job并注意是进行串行执行的。

执行结果如下

storage1：

Started at: Fri Oct 14 00:21:15 CST 2011
Finished at: Fri Oct 14 01:54:40 CST 2011
Finished in: 1hrs, 33mins, 25sec
storage2：

Started at: Fri Oct 14 01:54:42 CST 2011
Finished at: Fri Oct 14 03:24:58 CST 2011
Finished in: 1hrs, 30mins, 16sec

storage3：

Started at: Fri Oct 14 03:25:00 CST 2011
Finished at: Fri Oct 14 03:39:59 CST 2011
Finished in: 14mins, 58sec

共消耗时间为：3小时17分钟

注：hive所有job不管是多个程序启动的job还是job内的多个job都是串行的，可考虑是否可降多个程序job建立job是否可以并行执行？

豹先生_MR-BAO

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
hive数据分析实战及执行效率执行策略分析

1,首先在hadoop集群中启动jobtrakker2,hive以提供远程服务模式启动nohup hive –service hiveserver & 3,用户关系表user_relation字段 uid1, uid2样例数据 1
复制链接

扫一扫

专栏目录