hive中文问题解决办法:
关键是 终端console-->linux OS环境变量-->hdfs上的hive table对应文件 三个层次,都设为使用UTF-8。
1. 到sourceforge.net上,下载putty英文版0.63(不要下中文版,曾有带木马),在如下位置设为UTF-8:
2. 将linux OS层/etc/sysconfig/i18n中LANG修改为en_US.UTF-8
[hadoop@CNSZ141196 ~]$ cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="lat0-sun16"
3. hive table文件默认为UTF-8字符集编码,无需改动
4. 测试如下,可见在putty终端console上可正常输入中文,写入hive表;select出来也可正常显示;load into、 hive -f 方式也测试通过:
[hadoop@CNSZ141196 ~]$ hive -e "insert into table test01.test01 select '中文测试' from dual;"
……
[hadoop@CNSZ141196 ~]$ hive -e "select * from test01.test01;"
Logging initialized using configuration in file:/appcom/hive-config/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_3297@CNSZ141196.app.paic.com.cn_201309111524_494470832.txt
OK
中文测试
中国
中文
Time taken: 3.199 seconds, Fetched: 3 row(s)
[hadoop@CNSZ141196 ~]$
OS语言环境变量配置
for i in cnsz0317{74..87};do /home/hadoop/scripts/autoLoginRoot.sh hadoop $i paic1234 anyword ""'sed -i "/LANG=/{s%"en_US"%"en_US.UTF-8"%g}" /etc/sysconfig/i18n'"";done
检查点:
for i in cnsz0317{74..87};do ssh $i ""'echo "#############";hostname;cat /etc/sysconfig/i18n;sleep 0.5'""; done
问题:
hive comment无法支持中文
处理:
修改mysql中hive_meta数据库使hive comment支持中文:
alter table DBS modify column `DESC` varchar(4000) character set utf8;
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
查看修改结果:
mysql> SHOW CREATE TABLE TABLE_PARAMS \G
*************************** 1. row ***************************
Table: TABLE_PARAMS
Create Table: CREATE TABLE `TABLE_PARAMS` (
`TBL_ID` bigint(20) NOT NULL,
`PARAM_KEY` varchar(256) CHARACTER SET latin2 COLLATE latin2_bin NOT NULL,
`PARAM_VALUE` varchar(4000) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`TBL_ID`,`PARAM_KEY`),
KEY `TABLE_PARAMS_N49` (`TBL_ID`),
CONSTRAINT `TABLE_PARAMS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin2
1 row in set (0.00 sec)
修改hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://CNSZ141196.app.paic.com.cn:3306/hive_meta?useUnicode=true&characterEncoding=UTF-8&createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.stats.dbconnectionstring</name>
<value>jdbc:mysql://CNSZ141196.app.paic.com.cn:3306/hive_stats?useUnicode=true&characterEncoding=UTF-8&user=hive&password=hv123.&createDatabaseIfNotExist=true</value>
<description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
关键是 终端console-->linux OS环境变量-->hdfs上的hive table对应文件 三个层次,都设为使用UTF-8。
1. 到sourceforge.net上,下载putty英文版0.63(不要下中文版,曾有带木马),在如下位置设为UTF-8:
2. 将linux OS层/etc/sysconfig/i18n中LANG修改为en_US.UTF-8
[hadoop@CNSZ141196 ~]$ cat /etc/sysconfig/i18n
LANG="en_US.UTF-8"
SYSFONT="lat0-sun16"
3. hive table文件默认为UTF-8字符集编码,无需改动
4. 测试如下,可见在putty终端console上可正常输入中文,写入hive表;select出来也可正常显示;load into、 hive -f 方式也测试通过:
[hadoop@CNSZ141196 ~]$ hive -e "insert into table test01.test01 select '中文测试' from dual;"
……
[hadoop@CNSZ141196 ~]$ hive -e "select * from test01.test01;"
Logging initialized using configuration in file:/appcom/hive-config/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_3297@CNSZ141196.app.paic.com.cn_201309111524_494470832.txt
OK
中文测试
中国
中文
Time taken: 3.199 seconds, Fetched: 3 row(s)
[hadoop@CNSZ141196 ~]$
OS语言环境变量配置
for i in cnsz0317{74..87};do /home/hadoop/scripts/autoLoginRoot.sh hadoop $i paic1234 anyword ""'sed -i "/LANG=/{s%"en_US"%"en_US.UTF-8"%g}" /etc/sysconfig/i18n'"";done
检查点:
for i in cnsz0317{74..87};do ssh $i ""'echo "#############";hostname;cat /etc/sysconfig/i18n;sleep 0.5'""; done
问题:
hive comment无法支持中文
处理:
修改mysql中hive_meta数据库使hive comment支持中文:
alter table DBS modify column `DESC` varchar(4000) character set utf8;
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
查看修改结果:
mysql> SHOW CREATE TABLE TABLE_PARAMS \G
*************************** 1. row ***************************
Table: TABLE_PARAMS
Create Table: CREATE TABLE `TABLE_PARAMS` (
`TBL_ID` bigint(20) NOT NULL,
`PARAM_KEY` varchar(256) CHARACTER SET latin2 COLLATE latin2_bin NOT NULL,
`PARAM_VALUE` varchar(4000) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`TBL_ID`,`PARAM_KEY`),
KEY `TABLE_PARAMS_N49` (`TBL_ID`),
CONSTRAINT `TABLE_PARAMS_FK1` FOREIGN KEY (`TBL_ID`) REFERENCES `TBLS` (`TBL_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin2
1 row in set (0.00 sec)
修改hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://CNSZ141196.app.paic.com.cn:3306/hive_meta?useUnicode=true&characterEncoding=UTF-8&createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>hive.stats.dbconnectionstring</name>
<value>jdbc:mysql://CNSZ141196.app.paic.com.cn:3306/hive_stats?useUnicode=true&characterEncoding=UTF-8&user=hive&password=hv123.&createDatabaseIfNotExist=true</value>
<description>The default connection string for the database that stores temporary hive statistics.</description>
</property>