大数据实战-Hive-技巧实战

最新推荐文章于 2022-08-12 16:36:37 发布

Hoult-吴邪

最新推荐文章于 2022-08-12 16:36:37 发布

阅读量113

点赞数

本文链接：https://blog.csdn.net/hu_lichao/article/details/114495167

版权

本文涵盖了Hive的各种实战技巧，包括union与union all的区别，distribute by和sort by的作用，分桶表的创建与优化，动态分区的内存与性能调整，解决Hive内存溢出问题的方法，SQL查询优化策略，小文件问题与任务被kill的解决，Hive窗口函数的注意事项，客户端日志配置，以及Beeline和Hivehistory文件导致的OOM问题解决方案。

摘要由CSDN通过智能技术生成

1.union 和 union all

前者可以去重

select sex,address from test where dt='20210218' union all select sex,address from test where dt='20210218';
+------+----------+--+
| sex  | address  |
+------+----------+--+
| m    | A        |
| m    | A        |
| m    | B        |
| m    | B        |
| m    | B        |
| m    | B        |
+------+----------+--+

后者不会去重

select sex,address from test where dt='20210218' union select sex,address from test where dt='20210218';
+------+----------+--+
| sex  | address  |
+------+----------+--+
| m    | A        |
| m    | B        |
+------+----------+--+