问题一:
【解决】Hive动态增加partitions不能超过100的问题,全量动态生成partitions超过100会出现如下异常:
解决100限制,可设置如下参数:
设置动态partitions为100000,该语句插入几年按天的partitions超过1000个,成功执行语句。
问题二:
过多的动态分区会导致job出现以下情况:
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /tmp/hive-maintain/hive_2012-11-28_22-39-43_810_1689858262130334284/_task_tmp.-ext-10002/part=33436268/_tmp.000004_0 File does not exist. Holder DFSClient_attempt_201211250925_9859_m_000004_0 does not have any open files.
解决方案:
加大动态分区数即可,下面是在当前session下设置。
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
从hbase或者hdfs导入hive时,数据量太大,会造成exeutors的内存不足,需采用 distribute by 参数
- insert overwrite table ods.cms_entity PARTITION (DAY)
- select
- ENTITY_ID ,
- ENTITY_NAME ,
- ENTITY_DESC ,
- ENTITY_TYPE ,
- ENTITY_PID ,
- ENTITY_TIME ,
- ENTITY_PRIORITY ,
- ENTITY_STATUS ,
- ENTITY_CHANNEL ,
- ENTITY_EDITOR ,
- ENTITY_TEMPLATE ,
- ENTITY_URL ,
- ENTITY_CATEGORY ,
- ENTITY_PARAM ,
- ENTITY_SHORTNAME ,
- ENTITY_SUBTYPE ,
- ENTITY_COMPDELAY ,
- day
- from stage.cms_entity_by_day distribute by day ;