背景
hive 插入分区时(insert into 或者add partition等操作)分区值是中文的时候,就出现报错。
alter table test add partition(part=“中文”);
hivemetastore报错信息
2021-12-01T15:25:03,474 ERROR [pool-8-thread-134] bonecp.ConnectionHandle: Database access problem. Killing off this connection and all remaining connections in the connection pool. SQL State = HY000
2021-12-01T15:25:03,478 ERROR [pool-8-thread-134] DataNucleus.Transaction: Operation rollback failed on resource: org.datanucleus.store.rdbms.ConnectionFactoryImpl$EmulatedXAResource@6e8aa02d, error code UNKNOWN and transaction: [DataNucleus Transaction, ID=Xid=
2021-12-01T15:25:03,479 ERROR [pool-8-thread-134] metastore.RetryingHMSHandler: Retrying HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOException: Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`PART_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `A0`.`PART_NAME` = ?
原因定位
怀疑是中英文的问题,但是还是不能确认。进入mysql控制台,执行上面的命令(需要通过开启mysql通用日志获取到上面错误日志中?的内容)
执行后出现这个错误
ERROR 1267 (HY000): Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
这个时候,可以确定是字符集的问题了。
确认一下,果然PARTITIONS表的字符集是latin1,不支持中文
MariaDB [hivemeta]> show create table PARTITIONS;
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PARTITIONS | CREATE TABLE `PARTITIONS` (
`PART_ID` bigint(20) NOT NULL,
`CREATE_TIME` int(11) NOT NULL,
`LAST_ACCESS_TIME` int(11) NOT NULL,
`PART_NAME` varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
`SD_ID` bigint(20) DEFAULT NULL,
`TBL_ID` bigint(20) DEFAULT NULL,
PRIMARY KEY (`PART_ID`),
UNIQUE KEY `UNIQUEPARTITION` (`PART_NAME`,`TBL_ID`),
KEY `PARTITIONS_N49` (`TBL_ID`),
KEY `PARTITIONS_N50` (`SD_ID`),
CONSTRAINT `PARTITIONS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`),
CONSTRAINT `PARTITIONS_FK2` FOREIGN KEY (`SD_ID`) REFERENCES `SDS` (`SD_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
解决方案
在hive连接的mysql中执行下面命令,更改字符集为utf8
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table SDS default character set utf8;
field coding
alter table PARTITIONS modify column PART_name varchar(190) character set utf8;
alter table PARTITION_KEY_VALS modify column PART_KEY_VAL varchar(256) character set utf8;
alter table SDS modify column LOCATION varchar(4000) character set utf8;
细节说明
- MariaDB [hivemeta]> alter table PARTITIONS modify column PART_NAME varchar(767) CHARACTER SET utf8;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
原因:PART_NAME是索引建,按照规定最大的长度就是767字节,而设置为utf8字符集之后,就相当于一个utf8字符要占用3个字节,因此就不能直接设置767,设置200就不会报错了。
mysql的日志开启
-
show variables like ‘general_log’; – 查看日志是否开启
-
set global general_log=on; – 开启日志功能
-
show variables like ‘general_log_file’; – 看看日志文件保存位置