MySQL字符串分为两类,二进制还是非二进制,非二进制字符串的特征之一是他们有一个字符集,
mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| binary | Binary pseudo charset | binary | 1 |
+----------+-----------------------------+---------------------+--------+
36 rows in set (0.00 sec)
MySQL支持多种字符集,默认的是latin1,每张表都可以指定自己的字符集,相当灵活。
mysql> show create table mail /G
*************************** 1. row ***************************
Table: mail
Create Table: CREATE TABLE `mail` (
`t` datetime DEFAULT NULL,
`srcuser` char(8) DEFAULT NULL,
`srchost` char(20) DEFAULT NULL,
`dstuser` char(8) DEFAULT NULL,
`dsthost` char(20) DEFAULT NULL,
`size` bigint(20) DEFAULT NULL,
KEY `t` (`t`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
非二进制字符串的另一个特征是Collation,它决定字符集中字符的排序
mysql> show table status like 'mail%' /G
*************************** 1. row ***************************
Name: mail
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 16
Avg_row_length: 1024
Data_length: 16384
Max_data_length: 0
Index_length: 16384
Data_free: 4194304
Auto_increment: NULL
Create_time: 2011-01-25 09:09:43
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
每种字符集支持多种Collation排序,其中一种是默认的Collation (yes)
mysql> show collation like 'latin%';
+---------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+---------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
| latin2_czech_cs | latin2 | 2 | | Yes | 4 |
| latin2_general_ci | latin2 | 9 | Yes | Yes | 1 |
| latin2_hungarian_ci | latin2 | 21 | | Yes | 1 |
| latin2_croatian_ci | latin2 | 27 | | Yes | 1 |
| latin2_bin | latin2 | 77 | | Yes | 1 |
| latin5_turkish_ci | latin5 | 30 | Yes | Yes | 1 |
| latin5_bin | latin5 | 78 | | Yes | 1 |
| latin7_estonian_cs | latin7 | 20 | | Yes | 1 |
| latin7_general_ci | latin7 | 41 | Yes | Yes | 1 |
| latin7_general_cs | latin7 | 42 | | Yes | 1 |
| latin7_bin | latin7 | 79 | | Yes | 1 |
+---------------------+---------+----+---------+----------+---------+
19 rows in set (0.01 sec)
我们在执行SQL的时候可以选择不同的排序方法,其中最常用的后缀标示ci,cs,bin分别代表大小不敏感,大小写敏感和二进制排序
mysql> select c from t order by c COLLATE latin1_swedish_ci;
mysql> select c from t order by c COLLATE latin1_swedish_cs;
一个表既能包含二进制字符串列,也能包含非二进制字符串列,并且它的非二进制列能使用不同的字符集和Collation,你可以如下方式定义表
CREATE TABLE mytable
(utf8data VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_danish_ci,
sjisdata VARCHAR(100) CHARACTER SET sjis COLLATE sjis_japanese_ci);
MySQL客户端指定字符集的方法如下:
mysql> SET NAMES ‘utf8’;
mysql> SET NAMES ‘utf8’ COLLATE ‘utf8-general_ci’;
可以使用CONVERT()函数改变字符串的字符集
mysql> set @s1 = 'my string';
mysql> set @s2 = CONVERT(@s1 USING utf8);
mysql> select CHARSET(@s1),CHARSET(@s2);
+--------------+--------------+
| CHARSET(@s1) | CHARSET(@s2) |
+--------------+--------------+
| latin1 | utf8 |
+--------------+--------------+
同样可以使用COLLATE操作符改变字符串的COLLATION:
mysql> set @s1 = 'my string';
mysql> set @s2 = @s1 COLLATE latin1_spanish_ci;
mysql> select COLLATION(@s1),COLLATION(@s2);
+----------------+-------------------+
| COLLATION(@s1) | COLLATION(@s2) |
+----------------+-------------------+
| latin1_bin | latin1_spanish_ci |
二者也可结合使用
mysql> set @s1 = 'my string';
mysql> set @s2 = CONVERT(@s1 USING utf8) COLLATE utf8_spanish_ci;
mysql> select CHARSET(@s1),COLLATION(@s1),CHARSET(@s2),COLLATION(@s2);
+--------------+----------------+--------------+-----------------+
| CHARSET(@s1) | COLLATION(@s1) | CHARSET(@s2) | COLLATION(@s2) |
+--------------+----------------+--------------+-----------------+
| latin1 | latin1_bin | utf8 | utf8_spanish_ci |