DB2迁移MySQL注意事项:MySQL中字符串比较时默认不区分大小写

在MySQL中,字符比较默认不区分大小写,这与DB2不同。通过查看字符集(charset)和校对规则(collate),如utf8mb4_general_ci,可以发现默认校对规则是大小写不敏感的。要实现大小写敏感,可选择如utf8mb4_bin的校对规则。创建或修改表的校对规则对已存在列不会生效,且不同字段可设置不同校对规则。
摘要由CSDN通过智能技术生成

在DB2数据库中,字符/字符串比较的时候,默认是区分大小写的,即'abc'和'ABC'是两个不同的字符串,但在 MySQL中,默认是不区分大小写的,因此迁移的时候需要注意。

示例1中,可以看到'abc' 和 'ABC' 在innodb中存放的数据是不一样的,但比较的时候认为是一样的

mysql> create table mqs1(col1 char(20)) CHARSET=utf8mb4;
Query OK, 0 rows affected (0.26 sec)

mysql> insert into mqs1 values('abc'),('aBC'),('ABC'),('AbC');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from mqs1 where col1='abc';
+------+
| col1 |
+------+
| abc  |
| aBC  |
| ABC  |
| AbC  |
+------+
4 rows in set (0.00 sec)

mysql> select HEX(col1) from mqs1;
+-----------+
| HEX(col1) |
+-----------+
| 616263    |
| 614243    |
| 414243    |
| 416243    |
+-----------+
4 rows in set (0.00 sec)

出现以上现象的原因是对于一个字段有 charset 和 collate 两个属性,字符比较时是根据它的 collate 来决定的。
可以使用 show collation 来看一个 charset 有哪些 collate

mysql> show collation where charset='utf8mb4';
+------------------------+---------+-----+---------+----------+---------+
| Collation              | Charset | Id  | Default | Compiled | Sortlen |
+------------------------+---------+-----+---------+----------+---------+
| utf8mb4_general_ci     | utf8mb4 |  45 | Yes     | Yes      |       1 |
| utf8mb4_bin            | utf8mb4 |  46 |         | Yes      |       1 |
| utf8mb4_unicode_ci     | utf8mb4 | 224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci   | utf8mb4 | 225 |         | Yes      |       8 |
| utf8mb4_latvian_ci     | utf8mb4 | 226 |         | Yes      |       8 |
| utf8mb4_romanian_ci    | utf8mb4 | 227 |         | Yes      |       8 |
| utf8mb4_slovenian_ci   | utf8mb4 | 228 |         | Yes      |       8 |
| utf8mb4_polish_ci      | utf8mb4 | 229 |         | Yes      |       8 |
| utf8mb4_estonian_ci    | utf8mb4 | 230 |         | Yes      |       8 |
| utf8mb4_spanish_ci     | utf8mb4 | 231 |         | Yes      |       8 |
| utf8mb4_swedish_ci     | utf8mb4 | 232 |         | Yes      |       8 |
| utf8mb4_turkish_ci     | utf8mb4 | 233 |         | Yes      |       8 |
| utf8mb4_czech_ci       | utf8mb4 | 234 |         | Yes      |       8 |
| utf8mb4_danish_ci      | utf8mb4 | 235 |         | Yes      |       8 |
| utf8mb4_lithuanian_ci  | utf8mb4 | 236 |         | Yes      |       8 |
| utf8mb4_slovak_ci      | utf8mb4 | 237 |         | Yes      |       8 |
| utf8mb4_spanish2_ci    | utf8mb4 | 238 |         | Yes      |       8 |
| utf8mb4_roman_ci       | utf8mb4 | 239 |         | Yes      |       8 |
| utf8mb4_persian_ci     | utf8mb4 | 240 |         | Yes      |       8 |
| utf8mb4_esperanto_ci   | utf8mb4 | 241 |         | Yes      |       8 |
| utf8mb4_hungarian_ci   | utf8mb4 | 242 |         | Yes      |       8 |
| utf8mb4_sinhala_ci     | utf8mb4 | 243 |         | Yes      |       8 |
| utf8mb4_german2_ci     | utf8mb4 | 244 |         | Yes      |       8 |
| utf8mb4_croatian_ci    | utf8mb4 | 245 |         | Yes      |       8 |
| utf8mb4_unicode_520_ci | utf8mb4 | 246 |         | Yes      |       8 |
| utf8mb4_vietnamese_ci  | utf8mb4 | 247 |         | Yes      |       8 |
+------------------------+---------+-----+---------+----------+---------+
26 rows in set (0.00 sec)

可以看到 utf8mb4 默认的 collation 是 utf8mb4_general_ci, 最后的 "_ci" 表示 case insensitive (大小写不敏感)

那么如何调整为大小写敏感呢?可以看到 utf8mb4 对应的 Collation 只有一个不是 "_ci" 结尾的,即 utf8mb4_bin, 那只能试一下了,发现可以区分大小写:

mysql> create table mqs3(col1 char(20) not null primary key) CHARSET utf8mb4 collate utf8mb4_bin;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into mqs3 values('abc'),('aBC'),('ABC'),('AbC');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select * from mqs3 where col1='abc';
+------+
| col1 |
+------+
| abc  |
+------+
1 row in set (0.00 sec)


mysql> show create table mqs3 \G
*************************** 1. row ***************************
       Table: mqs3
Create Table: CREATE TABLE "mqs3" (
  "col1" char(20) COLLATE utf8mb4_bin NOT NULL,
  PRIMARY KEY ("col1")
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

注意一点: 修改表的 COLLATE 对已经存在的列不生效,主要原因是在创建表的时候,COLLATE 既可以是某一个字段的选项,也可以是整个表的选项,两者是相对独立的。 因此修改表的 COLLATE 不影响已有字段的定义。 

事实上,同一张表的不同的字段可以有不同的 COLLATE,示例如下:

mysql> create table mqs4(col1 char(20) COLLATE utf8_general_ci, col2 char(20) COLLATE utf8_bin);
Query OK, 0 rows affected (0.04 sec)

mysql> insert into mqs4 values('abc','abc'),('ABC','ABC');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from mqs4;
+------+------+
| col1 | col2 |
+------+------+
| abc  | abc  |
| ABC  | ABC  |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from mqs4 where col1='abc';
+------+------+
| col1 | col2 |
+------+------+
| abc  | abc  |
| ABC  | ABC  |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from mqs4 where col2='abc';
+------+------+
| col1 | col2 |
+------+------+
| abc  | abc  |
+------+------+
1 row in set (0.00 sec)

附1: 你是怎么确认所有的charset比较时,默认都不区分大小写?

可以看一下所有支持的charset以及其对应的默认 collation,就会发现都是"_ci"结尾的

mysql> show charset;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset           | binary              |      1 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.00 sec)

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值