sqoop将mysql数据库中数据导入hdfs或hive中后中文乱码问题解决办法
[root@spark1 ~]# vi /etc/my.cnf 修改配置文件
在文件内的[mysqld]和client下增加如下1行
[mysqld]
default-character-set=utf8
[client]
default-character-set=utf8
然后在创建数据库和表时都指定字符集为utf8
mysql> create database wujiadong1 character set utf8;
mysql> create table stud_info(
-> stud_code varchar(50) not null,
-> stud_name varchar(50) not null,
-> stud_gend varchar(50) not null default 'M',
-> birthday date null,
-> log_date date null,
-> orig_addr varchar(50) null,
-> lev_date date null,
-> college_code varchar(50) null,
-> college_name varchar(50) null,
-> state varchar(50) null,
-> primary key(stud_code)
-> )character set utf8;
mysql> load data local infile '/root/hive_test/stud_info.csv' into table stud_info
-> fields terminated by ','
-> lines terminated by '\n'
-> ignore 1 lines;
mysql> select * from stud_info; #看中文字符能否正常显示
再向hdfs中导入数据
[root@spark1 ~]# sqoop import --connect jdbc:mysql://192.168.220.144:3306/wujiadong1 --username root --table stud_info --target-dir 'hdfs://spark1:9000/user/sqoop_test1' -m 1
[root@spark1 ~]# hadoop fs -lsr /user/sqoop_test1
[root@spark1 ~]# hadoop fs -cat /user/sqoop_test1/part-m-00000
mysql数据导入hdfs中中文乱码问题总结
修改mysql里面的my.conf文件
创建数据库,指定字符集是utf8
再新的数据库里面创建表,在create table语句里面指定字符集是 utf8
插入中文汉字记录
select看到中文是正常的
依次完成这些操作以后,再用sqoop导入
导入hdfs解决中文乱码问题后,再去导入hive中就没出现乱码问题了,所以应该是一样的解决方法
mysql中的编码查看和修改方法
查看编码方式
mysql> show variables like 'collation_%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
mysql> show variables like 'character_set_%'; 查看mysql数据库默认编码
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
修改编码方式在/etc/my.cnf这个文件中修改
[root@spark1 ~]# vi /etc/my.cnf
root@spark1 ~]# service mysqld restart 重启mysql
查看是否变成utf8
mysql> \s
--------------
mysql Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (x86_64) using readline 5.1
Connection id:6
Current database:
Current user:root@localhost
SSL:Not in use
Current pager:stdout
Using outfile:''
Using delimiter:;
Server version:5.1.73 Source distribution
Protocol version:10
Connection:Localhost via UNIX socket
Server characterset:utf8
Db characterset:utf8
Client characterset:utf8
Conn. characterset:utf8
UNIX socket:/var/lib/mysql/mysql.sock
Uptime:22 min 3 sec
Threads: 1 Questions: 59 Slow queries: 0 Opens: 20 Flush tables: 1 Open tables: 9 Queries per second avg: 0.44
--------------
mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> show variables like "colla%";
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)