mysql skewed_Hive分区字段含中文报错问题解决方案

使用Hive创建动态分区时,如果分区中含有中文,会报以下错误。

Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation ‘=’

复制代码

原因应该是分区表的编码和全局编码不相同造成的

提供两种解决方案:

方案一:修改mysql配置

临时修改:进入mysql中设置如下参数

set character_set_client = utf8;

set character_set_connection = utf8;

set character_set_results = utf8;

SET collation_server = utf8_general_ci

SET collation_database = utf8_general_ci

复制代码

永久修改:修改mysql配置文件后重启

[root@ambari03 etc] vim /etc/my.cnf

[client]下添加如下内容

[client]

default-character-set=utf8

[mysqld]下添加如下内容

[mysqld]

default-character-set=utf8

init_connect=’SET NAMES utf8′

[mysql]下添加如下内容

[mysql]

default-character-set=utf8

复制代码

然后systemctl restart mysqld重启mysql服务

使用该方法后仍未能解决问题!

方案二:修改mysql中Hive元数据库各个表的编码

进入mysql后,执行以下语句修改hive元数据表的编码信息

alter database hive_meta default character set utf8;

alter table BUCKETING_COLS default character set utf8;

alter table CDS default character set utf8;

alter table COLUMNS_V2 default character set utf8;

alter table DATABASE_PARAMS default character set utf8;

alter table DBS default character set utf8;

alter table FUNCS default character set utf8;

alter table FUNC_RU default character set utf8;

alter table GLOBAL_PRIVS default character set utf8;

alter table PARTITIONS default character set utf8;

alter table PARTITION_KEYS default character set utf8;

alter table PARTITION_KEY_VALS default character set utf8;

alter table PARTITION_PARAMS default character set utf8;

— alter table PART_COL_STATS default character set utf8;

alter table ROLES default character set utf8;

alter table SDS default character set utf8;

alter table SD_PARAMS default character set utf8;

alter table SEQUENCE_TABLE default character set utf8;

alter table SERDES default character set utf8;

alter table SERDE_PARAMS default character set utf8;

alter table SKEWED_COL_NAMES default character set utf8;

alter table SKEWED_COL_VALUE_LOC_MAP default character set utf8;

alter table SKEWED_STRING_LIST default character set utf8;

alter table SKEWED_STRING_LIST_VALUES default character set utf8;

alter table SKEWED_VALUES default character set utf8;

alter table SORT_COLS default character set utf8;

alter table TABLE_PARAMS default character set utf8;

alter table TAB_COL_STATS default character set utf8;

alter table TBLS default character set utf8;

alter table VERSION default character set utf8;

alter table BUCKETING_COLS convert to character set utf8;

alter table CDS convert to character set utf8;

alter table COLUMNS_V2 convert to character set utf8;

alter table DATABASE_PARAMS convert to character set utf8;

alter table DBS convert to character set utf8;

alter table FUNCS convert to character set utf8;

alter table FUNC_RU convert to character set utf8;

alter table GLOBAL_PRIVS convert to character set utf8;

alter table PARTITIONS convert to character set utf8;

alter table PARTITION_KEYS convert to character set utf8;

alter table PARTITION_KEY_VALS convert to character set utf8;

alter table PARTITION_PARAMS convert to character set utf8;

— alter table PART_COL_STATS convert to character set utf8;

alter table ROLES convert to character set utf8;

alter table SDS convert to character set utf8;

alter table SD_PARAMS convert to character set utf8;

alter table SEQUENCE_TABLE convert to character set utf8;

alter table SERDES convert to character set utf8;

alter table SERDE_PARAMS convert to character set utf8;

alter table SKEWED_COL_NAMES convert to character set utf8;

alter table SKEWED_COL_VALUE_LOC_MAP convert to character set utf8;

alter table SKEWED_STRING_LIST convert to character set utf8;

alter table SKEWED_STRING_LIST_VALUES convert to character set utf8;

alter table SKEWED_VALUES convert to character set utf8;

alter table SORT_COLS convert to character set utf8;

alter table TABLE_PARAMS convert to character set utf8;

alter table TAB_COL_STATS convert to character set utf8;

alter table TBLS convert to character set utf8;

alter table VERSION convert to character set utf8;

— alter table PART_COL_STATS convert to character set utf8;

SET character_set_client = utf8 ;

— SET character_set_connection = utf8 ;

— alter table PART_COL_STATS convert to character set utf8;

SET character_set_database = utf8 ;

SET character_set_results = utf8 ;

SET character_set_server = utf8 ;

— SET collation_connection = utf8 ;

— SET collation_database = utf8 ;

— SET collation_server = utf8 ;

SET NAMES ‘utf8’;

复制代码

修改后能够插入带有中文分区的表格,但是插入后虽然可以查询到数据已经插入,Hive仍然会报错

查看日志后定位到如下错误

2021-02-23T10:54:26,249 ERROR [HiveServer2-Background-Pool: Thread-248] exec.StatsTask: Failed to run stats task

org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Insert of object “org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@710c136f” using statement “INSERT INTOPART_COL_STATS

(CS_ID

,AVG_COL_LEN

,BIT_VECTOR

,CAT_NAME

,COLUMN_NAME

,COLUMN_TYPE

,DB_NAME

,BIG_DECIMAL_HIGH_VALUE

,BIG_DECIMAL_LOW_VALUE

,DOUBLE_HIGH_VALUE

,DOUBLE_LOW_VALUE

,LAST_ANALYZED

,LONG_HIGH_VALUE

,LONG_LOW_VALUE

,MAX_COL_LEN

,NUM_DISTINCTS

,NUM_FALSES

,NUM_NULLS

,NUM_TRUES

,PART_ID

,PARTITION_NAME

,TABLE_NAME

) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)” failed : Incorrect string value: ‘\xE6\xB9\x96\xE5\x8C\x97’ for column ‘PARTITION_NAME’ at row 1)

复制代码

Mysql中查看metastore库的ART_COL_STATS表

use metastore;

— show create table PART_COL_STATS;

show full columns from PART_COL_STATS;

复制代码

部分字段结果如下

Field Type Collation CS_ID bigint(20) CAT_NAME varchar(256) latin1_bin DB_NAME varchar(128) latin1_bin TABLE_NAME varchar(256) latin1_bin PARTITION_NAME varchar(500) latin1_bin COLUMN_NAME varchar(767) latin1_bin COLUMN_TYPE varchar(128) latin1_bin PART_ID bigint(20) LONG_LOW_VALUE bigint(20) LONG_HIGH_VALUE bigint(20) DOUBLE_HIGH_VALUE double(53,4) DOUBLE_LOW_VALUE double(53,4) BIG_DECIMAL_LOW_VALUE varchar(4000) latin1_bin BIG_DECIMAL_HIGH_VALUE varchar(4000) latin1_bin NUM_NULLS bigint(20) NUM_DISTINCTS bigint(20) BIT_VECTOR blob AVG_COL_LEN double(53,4) MAX_COL_LEN bigint(20) NUM_TRUES bigint(20) NUM_FALSES bigint(20) LAST_ANALYZED bigint(20)

可以看到需要PARTITION_NAME字段编码仍然为latin1,因此需要将PARTITION_NAME字段编码修改为utf8

use metastore;

alter table PART_COL_STATS modify column PARTITION_NAME varchar(500) character set utf8;

复制代码

修改后重新插入带有中文分区的表格,可以正常运行

参考案例如下

create database mydb;

use mydb;

— ① 创建不带分区的stu表

create table stu

(

name string,

age int

) row format delimited fields terminated by ‘\t’;

— ② 上传数据到hdfs对应目录

— ③ 创建分区表emp

create table emp

(

name string,

age int

)

partitioned by (provice string)

row format delimited fields terminated by ‘\t’;

— ④ 插入数据含有中文分区的emp表中

insert into emp partition(provice = “湖北”)

select * from stu;

复制代码

参考资料:

hive报错Illegal mix of collations

Hive 解决中文分区问题 Illegal mix of collations

Hive 中文分区 展示 乱码 注释乱码问题

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值