HIVE动态分区,由于动态分区个数过多,map端内存溢出,报错。
containerID=container_e86_1608865192015_2953765_01_000002] is running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 5.1 GB of 6.3 GB virtual memory used. Killing container. Dump of the process-tree for container_e86_1608865192015_2953765_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINEattempt_1608865192015_2953765_m_000000_0 94557999988738 1>/data/data24/yarn/container-logs/application_1608865192015_2953765/container_e86_1608865192015_2953765_01_000002/stdout 2>/data/data24/yarn/container-attempt_1608865192015_2953765_m_000000_0 94557999988738 Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143
解决方式:启用hive.optimize.sort.dynamic.partition,将其设置为true。通过这个优化,这个只有map任务的mapreduce会引入reduce过程,这样动态分区的那个字段比如日期在传到reducer时会被排序。分区字段是会做排序的,因此每个reducer只需要保持一个文件写入器(file writer)随时处于打开状态,在收到来自特定分区的所有行后,关闭记录写入器(record writer),从而减小内存压力。这种优化方式在写parquet文件时使用的内存要相对少一些,但代价是要对分区字段进行排序。