MySQL字符集

字符集

重要的字符集简介

  • ASCII
    一共收录128个字符,包括空格、标点符号、数字、大小写字符和一个写不可见字符,由于字符一共才128和所以可以用一个字节来进行编码 11111111(十进制255)
  • IOS 8859-1
    共收录256个字符,他在ASCII的基础上扩充了128个西欧常用字符(包括德法两国)。OS 8859-1可以采用一个字节编码(这个字符集叫Latin1)
  • GB2312
    收入了6763个汉字、其他字符682个(拉丁字母、希腊字母、日文的平假名和片假名字母、鳄鱼西里尔字母)。这种字符兼容ASCII,如果该字符为ASCII的字符,则占用一个字节;否则占用2个字节。
  • GBK
    只对GB2312字符集进行了扩充,编码方式兼容GB2312字符集
  • UTF-8
    几乎收录了当今世界各个国家/地区的字符,而且还在不断扩充。UTF-8是一种Unicode字符集的一种编码方案,而Unicode包含了UTF-8,UTF-16,UTF-32这几种方案,UTF-8编码时需使用1-4字节,UTF-16编码时使用1-2字节,UTF-16编码时使用4字节。
    utf8存在utf8mb3utf8mb4UTF8和UTF8mb4区别:
    • utf8mb3:“阉割”过的UTF-8字符集,只使用1-3字节(只能表示常用的汉字),在MySQL中,utf8是utf8mb3的缩写
    • utf8mb4:正宗的UTF-8字符集,使用1-4字节表示字符。

字符集查看

#SHOW (CHARACTER SET|CHARSET) [LIKE 匹配模式]

mysql> show charset;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.08 sec)

测试的MySQL版本一共支持41中字符集,其中Default collation列表示这种字符集中一种默认的比较规则,Mexlen,它表示字符集最多需要几个字节来表示一个字符

字符集比较规则

#SHOW COLLATION [LIKE 匹配的模式]
mysql> show collation like"utf8_%";
+--------------------------+---------+-----+---------+----------+---------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci          | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci        | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci          | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci         | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci        | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci           | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci         | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci          | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci          | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci          | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci            | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci           | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci       | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci           | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci         | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci            | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci          | utf8    | 208 |         | Yes      |       8 |
| utf8_esperanto_ci        | utf8    | 209 |         | Yes      |       8 |
| utf8_hungarian_ci        | utf8    | 210 |         | Yes      |       8 |
| utf8_sinhala_ci          | utf8    | 211 |         | Yes      |       8 |
| utf8_german2_ci          | utf8    | 212 |         | Yes      |       8 |
| utf8_croatian_ci         | utf8    | 213 |         | Yes      |       8 |
| utf8_unicode_520_ci      | utf8    | 214 |         | Yes      |       8 |
| utf8_vietnamese_ci       | utf8    | 215 |         | Yes      |       8 |
| utf8_general_mysql500_ci | utf8    | 223 |         | Yes      |       1 |
| utf8mb4_general_ci       | utf8mb4 |  45 | Yes     | Yes      |       1 |
| utf8mb4_bin              | utf8mb4 |  46 |         | Yes      |       1 |
| utf8mb4_unicode_ci       | utf8mb4 | 224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci     | utf8mb4 | 225 |         | Yes      |       8 |
| utf8mb4_latvian_ci       | utf8mb4 | 226 |         | Yes      |       8 |
| utf8mb4_romanian_ci      | utf8mb4 | 227 |         | Yes      |       8 |
| utf8mb4_slovenian_ci     | utf8mb4 | 228 |         | Yes      |       8 |
| utf8mb4_polish_ci        | utf8mb4 | 229 |         | Yes      |       8 |
| utf8mb4_estonian_ci      | utf8mb4 | 230 |         | Yes      |       8 |
| utf8mb4_spanish_ci       | utf8mb4 | 231 |         | Yes      |       8 |
| utf8mb4_swedish_ci       | utf8mb4 | 232 |         | Yes      |       8 |
| utf8mb4_turkish_ci       | utf8mb4 | 233 |         | Yes      |       8 |
| utf8mb4_czech_ci         | utf8mb4 | 234 |         | Yes      |       8 |
| utf8mb4_danish_ci        | utf8mb4 | 235 |         | Yes      |       8 |
| utf8mb4_lithuanian_ci    | utf8mb4 | 236 |         | Yes      |       8 |
| utf8mb4_slovak_ci        | utf8mb4 | 237 |         | Yes      |       8 |
| utf8mb4_spanish2_ci      | utf8mb4 | 238 |         | Yes      |       8 |
| utf8mb4_roman_ci         | utf8mb4 | 239 |         | Yes      |       8 |
| utf8mb4_persian_ci       | utf8mb4 | 240 |         | Yes      |       8 |
| utf8mb4_esperanto_ci     | utf8mb4 | 241 |         | Yes      |       8 |
| utf8mb4_hungarian_ci     | utf8mb4 | 242 |         | Yes      |       8 |
| utf8mb4_sinhala_ci       | utf8mb4 | 243 |         | Yes      |       8 |
| utf8mb4_german2_ci       | utf8mb4 | 244 |         | Yes      |       8 |
| utf8mb4_croatian_ci      | utf8mb4 | 245 |         | Yes      |       8 |
| utf8mb4_unicode_520_ci   | utf8mb4 | 246 |         | Yes      |       8 |
| utf8mb4_vietnamese_ci    | utf8mb4 | 247 |         | Yes      |       8 |
+--------------------------+---------+-----+---------+----------+---------+
53 rows in set (0.11 sec)

utf8_general_ci
其中,general是一种通用的比较规则,而utf8_polish_ci中的polish表示波兰区的比较规则;而后缀ci则表示不区分大小写,具体如下:

后缀英文解释描述
_araccent insensitive不区分重音
_asaccent sensitive区分重音
_cicase insensitive不区大小写
_cscase sensitive区分大小写
_binbinary以二进制方式比较

字符集和比较规则应用

  • 服务器级别
    • 查看
    #服务器级别的字符集
    SHOW VARIABLES LIKE 'character_set_server';
    +----------------------+-------+
    | Variable_name        | Value |
    +----------------------+-------+
    | character_set_server | utf8  |
    +----------------------+-------+
    1 row in set (0.05 sec)
    
    
    #服务器界别的比较规则
    SHOW VARIABLES LIKE 'collation_server';
    +------------------+-----------------+
    | Variable_name    | Value           |
    +------------------+-----------------+
    | collation_server | utf8_general_ci |
    +------------------+-----------------+
    1 row in set (0.04 sec)
    
    • 修改
    [server]
    character_set_server=gb2312
    collation_server=gb2312_chinese_ci
    
  • 数据库级别
#CREATE DATABASE 数据库名
#	[[DEFAULT] CHARACTER SET 字符集名称]
#	[[DEFAULT] COLLATE 比较规则名称]
#ALTER DATABASE 数据库名
#	[[DEFAULT] CHARACTER SET 字符集名称]
#	[[DEFAULT] COLLATE 比较规则名称]

mysql> create database charset_demo_db
    -> character set gb2312
    -> collate gb2312_chinese_ci;
Query OK, 1 row affected (0.00 sec)
系统变量描述
character_set_database当前数据库的字符集
collation_database当前数据库的比较规则

我们不能通过修改这两个变量的值来改变当前数据库的字符集和规则。
当不指定字符集合和比较规则时,创建表时将采用服务级别的字符集。

  • 表级别
#CREATE TABLE 表名
#	[[DEFAULT] CHARACTER SET 字符集名称]
#	[[DEFAULT] COLLATE 比较规则名称]
#ALTER TABLE 表名
#	[[DEFAULT] CHARACTER SET 字符集名称]
#	[[DEFAULT] COLLATE 比较规则名称]

mysql> create table t(
    -> col varchar(10) 
    -> ) character set utf8 collate utf8_general_ci;
Query OK, 0 rows affected (0.02 sec)
  • 列级别
#列级别的需要在创建表时,在需要的列上进行设置
CREATE TABLE 表明 (
	列名 字符串类型 [CHARACTER SET 字符集名称] [COLLATE 比较规则名称],
	...
)

#修改列的设置
ALTER TABLE 表明 MODIFY 列名 字符串类型 [CHARACTER SET 字符集名称] [COLLATE 比较规则名称];

ALTER TABLE t MODIFY col VARCHAR(10) CHARACTER set gbk collate gbk_chinese_ci;

mysql> ALTER TABLE t MODIFY col VARCHAR(10) CHARACTER set gbk collate gbk_chinese_ci;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

如果列最初使用的是utf8,后来希望修改成ascii的话就会出错,因为ascii字符集并不能表示汉字字符。

  • 仅修改字符集或者仅修改标记规则
    由于字符集和比较规则之间相互关联,所以存在如下规则:
    • 只修改字符集,则比较规则将变为修改后的字符集默认的比较规则;
    • 只修改比较规则,则字符集变为修改后的比较规则对应的字符集。
    • 只修改字符集,则比较规则会变成修改后的默认字符集的比较规则
    • 只修改比较规则,则字符集将变为修改后的比较规则对应的字符集
  • 各个级别字符集的比较规则
    • 如果创建或修改列时没有显式指定字符集和比较规则,则该列默认使用表的字符集和比较规则;
    • 如果创建表时没有显式指定字符集和比较规则,则该表默认使用数据库的字符集和比较规则;
    • 如果创建数据库时没有显式指定字符集和比较规则,则该数据库默认使用服务器的字符集和比较规则。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值