mysql armscii8_八、MySQL字符集

字符集和排序规则简介

MySQL提供了多种字符集和排序规则选择,其中字符集设置和数据存储以及客户端与MySQL实例的交互相关,排序规则和字符串的对比规则相关

字符集的设置可以在MySQL实例、数据库、表、列四个级别

MySQL设置字符集支持在InnoDB, MyISAM, Memory三个存储引擎

查看当前MySQL支持的字符集的方式有两种,一种是通过查看information_schema.character_set系统表,一种是通过命令show character set查看

mysql> SHOW CHARACTER SET;

+----------+---------------------------------+---------------------+--------+

| Charset | Description | Default collation | Maxlen |

+----------+---------------------------------+---------------------+--------+

| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |

| ascii | US ASCII | ascii_general_ci | 1 |

| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |

| binary | Binary pseudo charset | binary | 1 |

| cp1250 | Windows Central European | cp1250_general_ci | 1 |

| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |

| cp1256 | Windows Arabic | cp1256_general_ci | 1 |

| cp1257 | Windows Baltic | cp1257_general_ci | 1 |

| cp850 | DOS West European | cp850_general_ci | 1 |

| cp852 | DOS Central European | cp852_general_ci | 1 |

| cp866 | DOS Russian | cp866_general_ci | 1 |

| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |

| dec8 | DEC West European | dec8_swedish_ci | 1 |

| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |

| euckr | EUC-KR Korean | euckr_korean_ci | 2 |

| gb18030 | China National Standard GB18030 | gb18030_chinese_ci | 4 |

| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |

| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |

| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |

| greek | ISO 8859-7 Greek | greek_general_ci | 1 |

| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |

| hp8 | HP West European | hp8_english_ci | 1 |

| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |

| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |

| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |

| latin1 | cp1252 West European | latin1_swedish_ci | 1 |

| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |

| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |

| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |

| macce | Mac Central European | macce_general_ci | 1 |

| macroman | Mac West European | macroman_general_ci | 1 |

| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |

| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |

| tis620 | TIS620 Thai | tis620_thai_ci | 1 |

| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |

| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |

| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |

| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |

| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |

| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |

| utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |

+----------+---------------------------------+---------------------+--------+

41 rows in set (0.00 sec)

mysql> SHOW CHARACTER SET LIKE 'latin%';

+---------+-----------------------------+-------------------+--------+

| Charset | Description | Default collation | Maxlen |

+---------+-----------------------------+-------------------+--------+

| latin1 | cp1252 West European | latin1_swedish_ci | 1 |

| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |

| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |

| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |

+---------+-----------------------------+-------------------+--------+

4 rows in set (0.00 sec)

每个指定的字符集都会有一个或多个支持的排序规则,可以通过两种方式查看,一种是查看information_schema.collations表,另一种是通过show collation命令查看

不同的字符集不可能有相同的排序规则

每个字符集都会有一个默认的排序规则

mysql> SHOW COLLATION WHERE Charset = 'latin1';

+-------------------+---------+----+---------+----------+---------+---------------+

| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |

+-------------------+---------+----+---------+----------+---------+---------------+

| latin1_bin | latin1 | 47 | | Yes | 1 | PAD SPACE |

| latin1_danish_ci | latin1 | 15 | | Yes | 1 | PAD SPACE |

| latin1_general_ci | latin1 | 48 | | Yes | 1 | PAD SPACE |

| latin1_general_cs | latin1 | 49 | | Yes | 1 | PAD SPACE |

| latin1_german1_ci | latin1 | 5 | | Yes | 1 | PAD SPACE |

| latin1_german2_ci | latin1 | 31 | | Yes | 2 | PAD SPACE |

| latin1_spanish_ci | latin1 | 94 | | Yes | 1 | PAD SPACE |

| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | PAD SPACE |

+-------------------+---------+----+---------+----------+---------+---------------+

8 rows in set (0.01 sec)

上述各行排序规则代表含义:

02e6ded8ef67

每个字符集可以对应多个排序规则,但每个排序规则只能对应一个字符集,例如下面的对应关系会报错:

mysql> SELECT _latin1 'x' COLLATE latin2_bin;

ERROR 1253 (42000): COLLATION 'latin2_bin' is not valid for CHARACTER SET 'latin1'

排序规则命名规则

排序规则的命令通常是以对应的字符集的名字为开头,并以自己的特定属性结尾,比如排序规则utf8_general_ci和latin1_swedish_ci就分别是对应utf8和latin1字符集的排序规则

当排序规则特指某种语言时,则中间的部分就为这种语言的名字,比如utf8_turkish_ci和utf8_hungarian_ci就代表UTF8字符集中的土耳其语和匈牙利语

排序规则名字的结尾字符代表是否大小写敏感,重音敏感以及是否是二进制的

02e6ded8ef67

当排序规则名字中没有指定_as或者_ai时,则是否口音敏感是由_ci或者_cs决定,当使用的是_ci是则暗指_ai,反之则暗指_as

比如latin1_general_ci显示指定大小写不敏感,也暗指口音不敏感,而latin1_general_cs显示指定大小写敏感,也暗指口音敏感

对Unicode的字符集来说,对应的排序规则也可能会包含unicode排序算法的版本号,如果没有这个版本号显示则表示是基于4.0.0这个版本,比如utf8_unicode_520_ci和utf8_unicode_ci

字符集和排序规则举例

mysql8默认情况下的字符集是utf8mb4

mysql> select version();

+-----------+

| version() |

+-----------+

| 8.0.13 |

+-----------+

1 row in set (0.00 sec)

mysql> show variables like '%character%';

+--------------------------+----------------------------------+

| Variable_name | Value |

+--------------------------+----------------------------------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8 |

| character_sets_dir | /usr/local/mysql/share/charsets/ |

+--------------------------+----------------------------------+

8 rows in set (0.00 sec)

在Latin1下创建的表默认字符集也是utf8mb4

mysql> create table temp(name varchar(100),name2 varchar(100));

Query OK, 0 rows affected (0.14 sec)

mysql> show create table temp;

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Table | Create Table |

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| temp | CREATE TABLE `temp` (

`name` varchar(100) COLLATE utf8mb4_general_ci DEFAULT NULL,

`name2` varchar(100) COLLATE utf8mb4_general_ci DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |

+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

ASCII码(American Standard Code for Information Interchange,美国信息交换标准代码)是基于拉丁字母的一套电脑编码系统。它主要用于显示现代英语

ASCII的局限在于只能显示26个基本拉丁字母、阿拉伯数目字和英式标点符号,因此只能用于显示现代美国英语(而且在处理英语当中的外来词如naïve、café、élite等等时,所有重音符号都不得不去掉,即使这样做会违反拼写规则)

02e6ded8ef67

Latin字符集:ISO 8859-1,正式编号为ISO/IEC 8859-1:1998,又称Latin-1或“西欧语言”,是国际标准化组织内ISO/IEC 8859的第一个8位字符集。它以ASCII为基础,在空置的0xA0-0xFF的范围内,加入96个字母及符号,藉以供使用附加符号的拉丁字母语言使用

02e6ded8ef67

在latin字符集的表中插入英文和中文字符串后

mysql> create database test;

Query OK, 1 row affected (0.01 sec)

mysql> use test

Database changed

mysql> show variables like '%character%';

+--------------------------+----------------------------------+

| Variable_name | Value |

+--------------------------+----------------------------------+

| character_set_client | utf8mb4 |

| character_set_connection | utf8mb4 |

| character_set_database | utf8mb4 |

| character_set_filesystem | binary |

| character_set_results | utf8mb4 |

| character_set_server | utf8mb4 |

| character_set_system | utf8 |

| character_sets_dir | /usr/local/mysql/share/charsets/ |

+--------------------------+----------------------------------+

8 rows in set (0.00 sec)

mysql

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值