字符集迁移是将表或字段在不同字符集之间进行转换,不影响数据的完整性。
一、ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name;
改变表默认字符集和所有字符列(CHAR,VARCHAR,TEXT)到一个新的字符集。
转换列数据类型VARCHAR或TEXT,将可能改变数据类型,确保新列足够长,能够存储原列的字符。latin1一个字符使用单字节存储,所以一个列能存储65535字符,如果转换成utf8,每个字符可能需要三个字节存储,最大可能长度为3*65535字节。避免数据类型改变,使用MODIFY改变个别列,如:
ALTER TABLE t MODIFY latin1_text_col TEXT CHARACTER SET utf8;
ALTER TABLE t MODIFY latin1_varchar_col VARCHAR(M) CHARACTER SET utf8;
如果指定CONVERT TO CHARACTER SET binary,CHAR,VARCHAR和TEXT列将转换到它们对应的二进制字符类型(BINARY,VARBINARY,BLOB)。
root@[test] 16:37:05>alter table t4 convert to character set binary;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@[test] 16:37:15>show create table t4\G;
*************************** 1. row ***************************
Table: t4
Create Table: CREATE TABLE `t4` (
`id` int(11) NOT NULL,
`a1` binary(66) DEFAULT NULL,
`a2` varbinary(66) DEFAULT NULL,
`a3` mediumblob,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=binary
1 row in set (0.00 sec)
ERROR:
No query specified
如果指定CONVERT TO CHARACTER SET没有校队集,使用字符集默认的校队集。
如果charset_name是DEFAULT,使用数据库字符集设置(character_set_database)。
CONVERT TO操作转换列值,在字符集之间。
仅仅改变表的默认字符集:在新添加列如果没有指定字符集将使用表默认字符集
ALTER TABLE tbl_name DEFAULT CHARACTER SET charset_name;
例子:
1、将utf8转换成gbk
root@[test] 16:12:59>create table t4(id int primary key,a1 char(66),a2 varchar(66),a3 text) default charset utf8;
Query OK, 0 rows affected (0.00 sec)
root@[test] 16:13:51>insert into t4 values(1,'mysql是关系型数据库管理系统,使用innodb存储引擎','mysql是关系型数据 库管理系统,使用innodb存储引擎','mysql是关系型数据库管理系统,使用innodb存储引擎');
Query OK, 1 row affected (0.00 sec)
中文字符占用3个字符:
root@[test] 16:14:30>select length(a1),char_length(a1) from t4;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 65 | 29 |
+------------+-----------------+
1 row in set (0.00 sec)
root@[test] 16:15:01>select * from t4;
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
转换到gbk字符集:
root@[test] 16:15:07>set names gbk;
Query OK, 0 rows affected (0.00 sec)
root@[test] 16:15:20>alter table t4 convert to character set gbk;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0
需要修改客户端工具(如securecrt或xshell)的字符集,也为gbk,查询结果正常:
root@[test] 16:15:48>select * from t4;
+----+-------------------------------------------------+-------------------------------------------------+---------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------+-------------------------------------------------+---------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统, | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------+-------------------------------------------------+---------------------+
1 row in set (0.00 sec)
转换后,每个中文字符占用2个字节(gbk):
root@[test] 16:15:56>select length(a1),char_length(a1) from t4;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 47 | 29 |
+------------+-----------------+
1 row in set (0.00 sec)
root@[test] 16:16:06>show create table t4\G;
*************************** 1. row ***************************
Table: t4
Create Table: CREATE TABLE `t4` (
`id` int(11) NOT NULL,
`a1` char(66) DEFAULT NULL,
`a2` varchar(66) DEFAULT NULL,
`a3` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=gbk
1 row in set (0.00 sec)
ERROR:
No query specified
2、转换gbk为utf8字符集
root@[test] 16:16:25>set names utf8;
Query OK, 0 rows affected (0.00 sec)
root@[test] 16:17:01>alter table t4 convert to character set utf8;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
需要修改客户端工具(如securecrt或xshell)的字符集,也为utf8,查询结果正常:
root@[test] 16:17:15>select * from t4;
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
转换成utf8后,每个中文字符占用3字节(utf8):
root@[test] 16:17:31>select length(a1),char_length(a1) from t4;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 65 | 29 |
+------------+-----------------+
1 row in set (0.00 sec)
总结:
1、使用alter table ... convert to character set语句可以正常将gbk转成utf8或utf8转成gbk字符集,不会造成数据损坏;
2、使用alter table方式,将需要拷贝表,阻塞写,对大表需要评估操作影响;
二、使用中间层字符集(二进制字符)
1、latin1转换成utf8:
root@[test] 17:17:51>set names latin1;
Query OK, 0 rows affected (0.00 sec)
root@[test] 17:18:03>create table t5(id int primary key,a1 char(66),a2 varchar(66),a3 text) default charset latin1;
Query OK, 0 rows affected (0.00 sec)
root@[test] 17:18:04>insert into t5 values(1,'mysql是关系型数据库管理系统,使用innodb存储引擎','mysql是关系型数据库管理系统,使用innodb存储引擎','mysql是关系型数据库管理系统,使用innodb存储引擎');
Query OK, 1 row affected (0.00 sec)
root@[test] 17:27:34>select * from t5;
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
root@[test] 17:18:19>select length(a1),char_length(a1) from t5;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 65 | 65 |
+------------+-----------------+
1 row in set (0.00 sec)
直接转换成utf8:
root@[test] 17:18:34>set names utf8;
Query OK, 0 rows affected (0.00 sec)
root@[test] 17:18:51>alter table t5 convert to character set utf8;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@[test] 17:28:13>show create table t5;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t5 | CREATE TABLE `t5` (
`id` int(11) NOT NULL,
`a1` char(66) DEFAULT NULL,
`a2` varchar(66) DEFAULT NULL,
`a3` mediumtext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
直接将latin1转换成utf8后(在connection和results为utf8时),查询乱码:
root@[test] 17:28:20>select * from t5;
+----+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| 1 | mysql是关系型数æ®åº“管ç†ç³»ç»Ÿï¼Œä½¿ç”¨innodbå˜å‚¨å¼•æ“Ž | mysql是关系型数æ®åº“管ç†ç³»ç»Ÿï¼Œä½¿ç”¨innodbå˜å‚¨å¼•æ“Ž | mysql是关系型数æ®åº“管ç†ç³»ç»Ÿï¼Œä½¿ç”¨innodbå˜å‚¨å¼•æ“Ž |
+----+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
root@[test] 17:19:28>select length(a1),char_length(a1) from t5;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 128 | 65 |
+------------+-----------------+
1 row in set (0.00 sec)
如果使用中间字符集进行转换:先将latin1转换成binary,然后转换成utf8
root@[test] 17:29:25>set names latin1;
Query OK, 0 rows affected (0.00 sec)
root@[test] 17:29:55>create table t5(id int primary key,a1 char(66),a2 varchar(66),a3 text) default charset latin1;
Query OK, 0 rows affected (0.01 sec)
root@[test] 17:30:01>insert into t5 values(1,'mysql是关系型数据库管理系统,使用innodb存储引擎','mysql是关系型数据库管理系统,使用innodb存储引擎','mysql是关系型数据库管理系统,使用innodb存储引擎');
Query OK, 1 row affected (0.00 sec)
root@[test] 17:30:07>select * from t5;
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
root@[test] 17:30:15>select length(a1),char_length(a1) from t5;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 65 | 65 |
+------------+-----------------+
1 row in set (0.00 sec)
先将latin1转换成binary:
root@[test] 17:30:21>set names utf8;
Query OK, 0 rows affected (0.00 sec)
root@[test] 17:30:27>alter table t5 convert to character set binary;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
root@[test] 17:30:32>select length(a1),char_length(a1) from t5;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 66 | 66 |
+------------+-----------------+
1 row in set (0.00 sec)
然后将binary转换成utf8:
root@[test] 17:30:39>alter table t5 convert to character set utf8;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
root@[test] 17:30:44>select length(a1),char_length(a1) from t5;
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 66 | 66 |
+------------+-----------------+
1 row in set (0.00 sec)
将latin1转换成binary,然后再转换成utf8,查询正常:
root@[test] 17:30:50>select * from t5;
+----+--------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+--------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+--------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
总结:
1、使用中间字符集进行转换,需要两次alter操作,涉及到两次表拷贝;
2、适合latin1转成utf8或gbk;
三、使用mysqldump导出,再导入
1、将latin1转换成utf8
导出表数据:-t不包括表定义
./bin/mysqldump -S /log/mysql56/mysql.sock -t --single-transaction -q --default-character-set=latin1 --master-data=2 test t5 > t5_data.sql
导出表定义:-d不包括数据
./bin/mysqldump -S /log/mysql56/mysql.sock -d --single-transaction -q --default-character-set=latin1 --master-data=2 test t5 > t5_schema.sql
./bin/mysql -S /log/mysql56/mysql.sock -e "drop table test.t5"
替换导出文件的字符集为utf8:
sed -i 's/latin1/utf8/' t5_schema.sql
sed -i 's/latin1/utf8/' t5_data.sql
导入表定义和数据:
./bin/mysql -S /log/mysql56/mysql.sock test < t5_schema.sql
./bin/mysql -S /log/mysql56/mysql.sock test < t5_data.sql
查询正常:
./bin/mysql -S /log/mysql56/mysql.sock -e "select * from test.t5"
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| id | a1 | a2 | a3 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
| 1 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 | mysql是关系型数据库管理系统,使用innodb存储引擎 |
+----+-------------------------------------------------------------------+-------------------------------------------------------------------+-------------------------------------------------------------------+
./bin/mysql -S /log/mysql56/mysql.sock -e "select length(a1),char_length(a1) from test.t5"
+------------+-----------------+
| length(a1) | char_length(a1) |
+------------+-----------------+
| 65 | 29 |
+------------+-----------------+
或者使用在导出数据使用binary导出,这样就不需要修改导出数据文件的字符集,也适合数据库中混合多种字符集的表时进行迁移
./bin/mysqldump -S /log/mysql56/mysql.sock -t --single-transaction -q --default-character-set=binary --master-data=2 test t5 > t5_data.sql
总结:
1、使用mysqldump进行数据导入在导出,适合大多数业务字符集迁移;
2、对导出的表定义需要修改字符集;
参考:
1、http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/
2、http://www.haidongji.com/2008/11/11/convert-character-set-to-utf8-in-mysql/
3、http://code.openark.org/forge/openark-kit
4、http://blog.pradeepjindal.com/blog:4