使用Hive创建动态分区时,如果分区中含有中文,会报以下错误。
Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation ‘=’
原因应该是分区表的编码和全局编码不相同造成的
提供两种解决方案:
- 方案一:修改mysql配置
临时修改:进入mysql中设置如下参数
set character_set_client = utf8;
set character_set_connection = utf8;
set character_set_results = utf8;
SET collation_server = utf8_general_ci
SET collation_database = utf8_general_ci
永久修改:修改mysql配置文件后重启
[root@ambari03 etc] vim /etc/my.cnf
# [client]下添加如下内容
[client]
default-character-set=utf8
# [mysqld]下添加如下内容
[mysqld]
default-character-set=utf8
init_connect='SET NAMES utf8'
# [mysql]下添加如下内容
[mysql]
default-character-set=utf8
然后systemctl restart mysqld
重启mysql服务
使用该方法后仍未能解决问题!
- 方案二:修改mysql中Hive元数据库各个表的编码
进入mysql后,执行以下语句修改hive元数据表的编码信息
alter database hive_meta default character set utf8;
alter table BUCKETING_COLS default character set utf8;
alter table CDS default character set utf8;
alter table COLUMNS_V2 default character set utf8;
alter table DATABASE_PARAMS default character set utf8;
alter table DBS default character set utf8;
alter table FUNCS default character set utf8;
alter table FUNC_RU default character set utf8;
alter table GLOBAL_PRIVS default character set utf8;
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEYS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table PARTITION_PARAMS default character set utf8;
-- alter table PART_COL_STATS default character set utf8;
alter table ROLES default character set utf8;
alter table SDS default character set utf8;
alter table SD_PARAMS default character set utf8;
alter table SEQUENCE_TABLE default character set utf8;
alter table SERDES default character set utf8;
alter table SERDE_PARAMS default character set utf8;
alter table SKEWED_COL_NAMES default character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP default character set utf8;
alter table SKEWED_STRING_LIST default character set utf8;
alter table SKEWED_STRING_LIST_VALUES default character set utf8;
alter table SKEWED_VALUES default character set utf8;
alter table SORT_COLS default character set utf8;
alter table TABLE_PARAMS default character set utf8;
alter table TAB_COL_STATS default character set utf8;
alter table TBLS default character set utf8;
alter table VERSION default character set utf8;
alter table BUCKETING_COLS convert to character set utf8;
alter table CDS convert to character set utf8;
alter table COLUMNS_V2 convert to character set utf8;
alter table DATABASE_PARAMS convert to character set utf8;
alter table DBS convert to character set utf8;
alter table FUNCS convert to character set utf8;
alter table FUNC_RU convert to character set utf8;
alter table GLOBAL_PRIVS convert to character set utf8;
alter table PARTITIONS convert to character set utf8;
alter table PARTITION_KEYS convert to character set utf8;
alter table PARTITION_KEY_VALS convert to character set utf8;
alter table PARTITION_PARAMS convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
alter table ROLES convert to character set utf8;
alter table SDS convert to character set utf8;
alter table SD_PARAMS convert to character set utf8;
alter table SEQUENCE_TABLE convert to character set utf8;
alter table SERDES convert to character set utf8;
alter table SERDE_PARAMS convert to character set utf8;
alter table SKEWED_COL_NAMES convert to character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP convert to character set utf8;
alter table SKEWED_STRING_LIST convert to character set utf8;
alter table SKEWED_STRING_LIST_VALUES convert to character set utf8;
alter table SKEWED_VALUES convert to character set utf8;
alter table SORT_COLS convert to character set utf8;
alter table TABLE_PARAMS convert to character set utf8;
alter table TAB_COL_STATS convert to character set utf8;
alter table TBLS convert to character set utf8;
alter table VERSION convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_client = utf8 ;
-- SET character_set_connection = utf8 ;
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_database = utf8 ;
SET character_set_results = utf8 ;
SET character_set_server = utf8 ;
-- SET collation_connection = utf8 ;
-- SET collation_database = utf8 ;
-- SET collation_server = utf8 ;
SET NAMES 'utf8';
修改后能够插入带有中文分区的表格,但是插入后虽然可以查询到数据已经插入,Hive仍然会报错
查看日志后定位到如下错误
2021-02-23T10:54:26,249 ERROR [HiveServer2-Background-Pool: Thread-248] exec.StatsTask: Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Insert of object "org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@710c136f" using statement "INSERT INTO `PART_COL_STATS` (`CS_ID`,`AVG_COL_LEN`,`BIT_VECTOR`,`CAT_NAME`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIMAL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`NUM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`PART_ID`,`PARTITION_NAME`,`TABLE_NAME`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Incorrect string value: '\xE6\xB9\x96\xE5\x8C\x97' for column 'PARTITION_NAME' at row 1)
Mysql中查看metastore库的ART_COL_STATS
表
use metastore;
-- show create table PART_COL_STATS;
show full columns from PART_COL_STATS;
部分字段结果如下
Field | Type | Collation |
---|---|---|
CS_ID | bigint(20) | |
CAT_NAME | varchar(256) | latin1_bin |
DB_NAME | varchar(128) | latin1_bin |
TABLE_NAME | varchar(256) | latin1_bin |
PARTITION_NAME | varchar(500) | latin1_bin |
COLUMN_NAME | varchar(767) | latin1_bin |
COLUMN_TYPE | varchar(128) | latin1_bin |
PART_ID | bigint(20) | |
LONG_LOW_VALUE | bigint(20) | |
LONG_HIGH_VALUE | bigint(20) | |
DOUBLE_HIGH_VALUE | double(53,4) | |
DOUBLE_LOW_VALUE | double(53,4) | |
BIG_DECIMAL_LOW_VALUE | varchar(4000) | latin1_bin |
BIG_DECIMAL_HIGH_VALUE | varchar(4000) | latin1_bin |
NUM_NULLS | bigint(20) | |
NUM_DISTINCTS | bigint(20) | |
BIT_VECTOR | blob | |
AVG_COL_LEN | double(53,4) | |
MAX_COL_LEN | bigint(20) | |
NUM_TRUES | bigint(20) | |
NUM_FALSES | bigint(20) | |
LAST_ANALYZED | bigint(20) |
可以看到需要PARTITION_NAME
字段编码仍然为latin1,因此需要将PARTITION_NAME
字段编码修改为utf8
use metastore;
alter table PART_COL_STATS modify column PARTITION_NAME varchar(500) character set utf8;
修改后重新插入带有中文分区的表格,可以正常运行
参考案例如下
create database mydb;
use mydb;
-- ① 创建不带分区的stu表
create table stu
(
name string,
age int
) row format delimited fields terminated by '\t';
-- ② 上传数据到hdfs对应目录
-- ③ 创建分区表emp
create table emp
(
name string,
age int
)
partitioned by (provice string)
row format delimited fields terminated by '\t';
-- ④ 插入数据含有中文分区的emp表中
insert into emp partition(provice = "湖北")
select * from stu;
参考资料:
hive报错Illegal mix of collations