hive 分区值支持中文

最新推荐文章于 2024-05-03 08:00:00 发布

陈淀薄发

最新推荐文章于 2024-05-03 08:00:00 发布

阅读量3.9k

点赞数 11

分类专栏： hive 文章标签： hive

本文链接：https://blog.csdn.net/weixin_43979134/article/details/121661350

版权

hive 专栏收录该内容

5 篇文章

订阅专栏

本文详细探讨了在Hive中遇到中文分区名插入失败的问题，通过MySQL字符集排查，发现PARTITIONS表使用了latin1字符集导致冲突。解决办法包括修改表字符集至utf8，以及调整关键字段长度。重点在于如何确保字符集兼容性以避免此类问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

背景

hive 插入分区时（insert into 或者add partition等操作）分区值是中文的时候，就出现报错。
alter table test add partition(part=“中文”);
hivemetastore报错信息

2021-12-01T15:25:03,474 ERROR [pool-8-thread-134] bonecp.ConnectionHandle: Database access problem. Killing off this connection and all remaining connections in the connection pool. SQL State = HY000
2021-12-01T15:25:03,478 ERROR [pool-8-thread-134] DataNucleus.Transaction: Operation rollback failed on resource: org.datanucleus.store.rdbms.ConnectionFactoryImpl$EmulatedXAResource@6e8aa02d, error code UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=
2021-12-01T15:25:03,479 ERROR [pool-8-thread-134] metastore.RetryingHMSHandler: Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOException: Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`PART_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `A0`.`PART_NAME` = ?

原因定位

怀疑是中英文的问题，但是还是不能确认。进入mysql控制台，执行上面的命令（需要通过开启mysql通用日志获取到上面错误日志中？的内容）
执行后出现这个错误

ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

这个时候，可以确定是字符集的问题了。
确认一下，果然PARTITIONS表的字符集是latin1，不支持中文

MariaDB [hivemeta]> show create table PARTITIONS;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PARTITIONS | CREATE TABLE `PARTITIONS` (
  `PART_ID` bigint(20) NOT NULL,
  `CREATE_TIME` int(11) NOT NULL,
  `LAST_ACCESS_TIME` int(11) NOT NULL,
  `PART_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
  `SD_ID` bigint(20) DEFAULT NULL,
  `TBL_ID` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`PART_ID`),
  UNIQUE KEY `UNIQUEPARTITION` (`PART_NAME`,`TBL_ID`),
  KEY `PARTITIONS_N49` (`TBL_ID`),
  KEY `PARTITIONS_N50` (`SD_ID`),
  CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
  CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

解决方案

在hive连接的mysql中执行下面命令，更改字符集为utf8

alter table PARTITIONS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table SDS default character set utf8;
field coding
alter table PARTITIONS modify column PART_name varchar(190) character set utf8;
alter table PARTITION_KEY_VALS modify column PART_KEY_VAL varchar(256) character set utf8;
alter table SDS modify column LOCATION varchar(4000) character set utf8;

细节说明

MariaDB [hivemeta]> alter table PARTITIONS modify column PART_NAME varchar(767) CHARACTER SET utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
原因：PART_NAME是索引建，按照规定最大的长度就是767字节，而设置为utf8字符集之后，就相当于一个utf8字符要占用3个字节，因此就不能直接设置767，设置200就不会报错了。