group by 需要跑多个ReduceTask,所以比较慢,怎么解决?
开启本地模式,也就是不让他在集群上跑,而是跑hive机器的本地hadoop上;
--查看hive本地执行模式状态
set hive.exec.mode.local.auto;
hive (default)> set hive.exec.mode.local.auto;
hive.exec.mode.local.auto=false
--修改为true
set hive.exec.mode.local.auto=true;
set hive.exec.mode.local.auto;
--测试速度--报错内存溢出了
select
deptno, avg(sal) avg_sal
from
emp
group by deptno;
从报错信息中看不出什么
去查看日志
tail -n 300 hive.log
内存溢出
free
查看内存情况
将hive-env.sh.template后缀.template去掉
默认HADOOP_HEAPSIZE为256 (MB),设置成1024M
[fengxq@hadoop102 conf]$ mv hive-env.sh.template hive-env.sh
# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server.
重启hiveservices.sh
修改成功
./bin/hiveservices.sh restart
如果对您有用,请帮忙点个赞,让我知道你在看!!!