排序规则和区分大小写Collations and Case Sensitivity
04/27/2020
本文内容
备注
EF Core 5.0 中引入了此功能。This feature was introduced in EF Core 5.0.
数据库中的文本处理可能会很复杂,并且需要更多用户注意。Text processing in databases can be a complex, and requires more user attention that one would suspect. 一方面,数据库处理文本的方式有很大差别;例如,尽管某些数据库在默认情况下是区分大小写的 (例如 Sqlite、PostgreSQL) 等,但其他数据库不区分大小写, (SQL Server MySQL) 。For one thing, databases vary considerably in how they handle text; for example, while some databases are case-sensitive by default (e.g. Sqlite, PostgreSQL), others are case-insensitive (SQL Server, MySQL). 此外,由于索引使用情况,区分大小写和类似方面可能会对查询性能产生很大的影响:尽管在区分大小写的 string.Lower 数据库中使用强制不区分大小写的比较可能会很有吸引力,但这样做可能会阻止应用程序使用索引。In addition, because of index usage, case-sensitivity and similar aspects can have a far-reaching impact on query performance: while it may be tempting to use string.Lower to force a case-insensitive comparison in a case-sensitive database, doing so may prevent your application from using indexes. 本页详细说明了如何配置区分大小写或更一般的排序规则,以及如何在不影响查询性能的情况下有效地执行此操作。This page details how to configure case sensitivity, or more generally, collations, and how to do so in an efficient way without compromising query performance.
排序规则简介Introduction to collations
文本处理中的基本概念是 排序规则,这是一组规则,用于确定文本值如何排序和比较是否相等。A fundamental concept in text processing is the collation, which is a set of rules determining how text values are ordered and compared for equality. 例如,虽然不区分大小写的排序规则不会在大写字母和小写字母之间忽略差异,所以区分大小写的排序规则不会。For example, while a case-insensitive collation disregards differences between upper- and lower-case letters for the purposes of equality comparison, a case-sensitive collation does not. 但是,因为区分大小写是区分区域性的 (例如, i 并 I 表示土耳其) 中的不同字母,所以存在多个不区分大小写的排序规则,每个排序规则都具有自己的一组规则。However, since case-sensitivity is culture-sensitive (e.g. i and I represent different letter in Turkish), there exist multiple case-insensitive collations, each with its own set of rules. 排序规则的范围还超出了区分大小写的字符数据的其他方面;例如,在德语中,它有时 (,但并不