Hive中文乱码:注释及中文字段都是乱码
剖析:
1.Hive的元数据库编码
通常hive的元数据存放在mysql中,mysql默认编码格式为latin1,但实际上环境中用的多为utf-8编码,
注意:Hive的存储引擎必须为latin1,否则创建表会失败。
如果编码非latin1,修改为utf-8
alter database hive元数据库名 character set latin1;
解决方法:
修改编码,设置为utf8
在 mysql 配置文件/etc/my.cnf中[mysqld]的下面增加以下内容
[mysqld]
.........
default-character-set=utf8
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
character-set-filesystem = utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
[client]
default-character-set=utf8
重启mysql 服务(这样确保缺省编码是utf8)
service mysqld restart
输入命令 "\s"检查编码是否修改成功
输入命令:show variables like 'char%'
输入命令:show variables like "colla%";
修改metastore的连接url: 在hive.xml中(find / -name "hive-site.xml" 如果不清楚 hive.xml路径,通过命令查找)
2.修改制定表字段记录
①修改表字段注解和表注解
alter table COLUMNS_V2 modify column COMMENT varchar(256) CHARACTER SET utf8;
alter table COLUMNS_V2 modify column COLUMN_NAME varchar(767) CHARACTER SET utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) CHARACTER SET utf8;
② 修改分区字段注解:
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) CHARACTER SET utf8;
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) CHARACTER SET utf8;
③修改索引注解:
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) CHARACTER SET utf8;
HUE显示乱码
show create database <数据库名>;
alter database hue character set latin1;
alter table beeswax_queryhistory modify `query` longtext CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
ALTER TABLE desktop_document2 modify column name varchar(255) CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column description longtext CHARACTER SET utf8;
ALTER TABLE desktop_document2 modify column search longtext CHARACTER SET utf8;