Hive分区字段含中文报错问题解决方案

本文详细讲述了在Hive中遇到中文分区名引发的非法字符集错误,并提供了两种解决方案:一是修改MySQL配置,二是调整Hive元数据库表的字符集。通过实例说明了如何解决Hive中文分区显示乱码和插入错误的问题。
摘要由CSDN通过智能技术生成

使用Hive创建动态分区时,如果分区中含有中文,会报以下错误。

Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation ‘=’

原因应该是分区表的编码和全局编码不相同造成的

提供两种解决方案:

  • 方案一:修改mysql配置

临时修改:进入mysql中设置如下参数

set character_set_client = utf8;
set character_set_connection = utf8;
set character_set_results = utf8;
 
SET collation_server = utf8_general_ci
SET collation_database = utf8_general_ci

永久修改:修改mysql配置文件后重启

[root@ambari03 etc] vim /etc/my.cnf
# [client]下添加如下内容
[client]
default-character-set=utf8
# [mysqld]下添加如下内容
[mysqld]
default-character-set=utf8
init_connect='SET NAMES utf8'
# [mysql]下添加如下内容
[mysql]
default-character-set=utf8

然后systemctl restart mysqld重启mysql服务

使用该方法后仍未能解决问题!

  • 方案二:修改mysql中Hive元数据库各个表的编码

进入mysql后,执行以下语句修改hive元数据表的编码信息

alter database hive_meta default character set utf8;
alter table BUCKETING_COLS default character set utf8;
alter table CDS default character set utf8;
alter table COLUMNS_V2 default character set utf8;
alter table DATABASE_PARAMS default character set utf8;
alter table DBS default character set utf8;
alter table FUNCS default character set utf8;
alter table FUNC_RU default character set utf8;
alter table GLOBAL_PRIVS default character set utf8;
alter table PARTITIONS default character set utf8;
alter table PARTITION_KEYS default character set utf8;
alter table PARTITION_KEY_VALS default character set utf8;
alter table PARTITION_PARAMS default character set utf8;
-- alter table PART_COL_STATS default character set utf8;
alter table ROLES default character set utf8;
alter table SDS default character set utf8;
alter table SD_PARAMS default character set utf8;
alter table SEQUENCE_TABLE default character set utf8;
alter table SERDES default character set utf8;
alter table SERDE_PARAMS default character set utf8;
alter table SKEWED_COL_NAMES default character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP default character set utf8;
alter table SKEWED_STRING_LIST default character set utf8;
alter table SKEWED_STRING_LIST_VALUES default character set utf8;
alter table SKEWED_VALUES default character set utf8;
alter table SORT_COLS default character set utf8;
alter table TABLE_PARAMS default character set utf8;
alter table TAB_COL_STATS default character set utf8;
alter table TBLS default character set utf8;
alter table VERSION default character set utf8;
alter table BUCKETING_COLS convert to character set utf8;
alter table CDS convert to character set utf8;
alter table COLUMNS_V2 convert to character set utf8;
alter table DATABASE_PARAMS convert to character set utf8;
alter table DBS convert to character set utf8;
alter table FUNCS convert to character set utf8;
alter table FUNC_RU convert to character set utf8;
alter table GLOBAL_PRIVS convert to character set utf8;
alter table PARTITIONS convert to character set utf8;
alter table PARTITION_KEYS convert to character set utf8;
alter table PARTITION_KEY_VALS convert to character set utf8;
alter table PARTITION_PARAMS convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
alter table ROLES convert to character set utf8;
alter table SDS convert to character set utf8;
alter table SD_PARAMS convert to character set utf8;
alter table SEQUENCE_TABLE convert to character set utf8;
alter table SERDES convert to character set utf8;
alter table SERDE_PARAMS convert to character set utf8;
alter table SKEWED_COL_NAMES convert to character set utf8;
alter table SKEWED_COL_VALUE_LOC_MAP convert to character set utf8;
alter table SKEWED_STRING_LIST convert to character set utf8;
alter table SKEWED_STRING_LIST_VALUES convert to character set utf8;
alter table SKEWED_VALUES convert to character set utf8;
alter table SORT_COLS convert to character set utf8;
alter table TABLE_PARAMS convert to character set utf8;
alter table TAB_COL_STATS convert to character set utf8;
alter table TBLS convert to character set utf8;
alter table VERSION convert to character set utf8;
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_client = utf8 ;
-- SET character_set_connection = utf8 ;
 
-- alter table PART_COL_STATS convert to character set utf8;
SET character_set_database = utf8 ;
SET character_set_results = utf8 ;
SET character_set_server = utf8 ;
-- SET collation_connection = utf8 ;
-- SET collation_database = utf8 ;
-- SET collation_server = utf8 ;
SET NAMES 'utf8';

修改后能够插入带有中文分区的表格,但是插入后虽然可以查询到数据已经插入,Hive仍然会报错

查看日志后定位到如下错误

2021-02-23T10:54:26,249 ERROR [HiveServer2-Background-Pool: Thread-248] exec.StatsTask: Failed to run stats task
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Insert of object "org.apache.hadoop.hive.metastore.model.MPartitionColumnStatistics@710c136f" using statement "INSERT INTO `PART_COL_STATS` (`CS_ID`,`AVG_COL_LEN`,`BIT_VECTOR`,`CAT_NAME`,`COLUMN_NAME`,`COLUMN_TYPE`,`DB_NAME`,`BIG_DECIMAL_HIGH_VALUE`,`BIG_DECIMAL_LOW_VALUE`,`DOUBLE_HIGH_VALUE`,`DOUBLE_LOW_VALUE`,`LAST_ANALYZED`,`LONG_HIGH_VALUE`,`LONG_LOW_VALUE`,`MAX_COL_LEN`,`NUM_DISTINCTS`,`NUM_FALSES`,`NUM_NULLS`,`NUM_TRUES`,`PART_ID`,`PARTITION_NAME`,`TABLE_NAME`) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)" failed : Incorrect string value: '\xE6\xB9\x96\xE5\x8C\x97' for column 'PARTITION_NAME' at row 1)

Mysql中查看metastore库的ART_COL_STATS

use metastore;
-- show create table PART_COL_STATS;
show full columns from PART_COL_STATS;

部分字段结果如下

FieldTypeCollation
CS_IDbigint(20)
CAT_NAMEvarchar(256)latin1_bin
DB_NAMEvarchar(128)latin1_bin
TABLE_NAMEvarchar(256)latin1_bin
PARTITION_NAMEvarchar(500)latin1_bin
COLUMN_NAMEvarchar(767)latin1_bin
COLUMN_TYPEvarchar(128)latin1_bin
PART_IDbigint(20)
LONG_LOW_VALUEbigint(20)
LONG_HIGH_VALUEbigint(20)
DOUBLE_HIGH_VALUEdouble(53,4)
DOUBLE_LOW_VALUEdouble(53,4)
BIG_DECIMAL_LOW_VALUEvarchar(4000)latin1_bin
BIG_DECIMAL_HIGH_VALUEvarchar(4000)latin1_bin
NUM_NULLSbigint(20)
NUM_DISTINCTSbigint(20)
BIT_VECTORblob
AVG_COL_LENdouble(53,4)
MAX_COL_LENbigint(20)
NUM_TRUESbigint(20)
NUM_FALSESbigint(20)
LAST_ANALYZEDbigint(20)

可以看到需要PARTITION_NAME字段编码仍然为latin1,因此需要将PARTITION_NAME字段编码修改为utf8

use metastore;
alter table PART_COL_STATS modify column PARTITION_NAME varchar(500) character set utf8;

修改后重新插入带有中文分区的表格,可以正常运行

参考案例如下

create database mydb;
use mydb;
-- ① 创建不带分区的stu表
create table stu
(
    name string,
    age  int
) row format delimited fields terminated by '\t';

-- ② 上传数据到hdfs对应目录

-- ③ 创建分区表emp
create table emp
(
    name string,
    age  int
)
    partitioned by (provice string)
    row format delimited fields terminated by '\t';

-- ④ 插入数据含有中文分区的emp表中
insert into emp partition(provice = "湖北")
select * from stu;

参考资料:

hive报错Illegal mix of collations

Hive 解决中文分区问题 Illegal mix of collations

Hive 中文分区 展示 乱码 注释乱码问题

回答: 当在Hive中查询decimal类型字段报错,可能是由于以下原因导致的。首先,Hive中的decimal字段默认精度为0,如果不指定小数点,数据将被四舍五入。例如,0.82在Hive中会变成1。因此,在建表时最好指定精度,如decimal(10,2)。\[1\]其次,当使用Sqoop导出数据时,需要注意设置--input-null-string '\\N'和--input-null-non-string '\\N'这两个参数。因为Hive中的null默认是\N,而Sqoop在导出数据时会解析\N。如果目标关系型数据库中的表字段不是string类型,就会报错。通过设置这两个参数,Sqoop会知道\N表示null,而不会将其解析为字符。\[2\]最后,如果在修改了字段类型后,表的schema发生了变化,而之前的分区仍然使用旧的schema,就会导致表和分区的schema不一致,从而导致select报错。在这种情况下,需要更新分区的schema以与表的schema保持一致。\[3\] #### 引用[.reference_title] - *1* *2* [Hadoop记录——hive中的decimal字段,shell的数据类型以及sqoop导出空string报错](https://blog.csdn.net/chenKFKevin/article/details/53199925)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [hive字段类型修改问题总结](https://blog.csdn.net/qq_36302506/article/details/117091191)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值