SQL SERVER和ORACLE的排序问题

最新推荐文章于 2024-07-12 16:34:24 发布

cuju9432

最新推荐文章于 2024-07-12 16:34:24 发布

阅读量224

点赞数

文章标签：数据库操作系统

SQL SERVER和ORACLE的排序问题

Author:Rainny Zhong

Date: 2007-9-18

一,SQL SERVER的排序问题

SQL SERVER的排序由COLLATION指定.首先我们明白什么是COLLATION?

COLLATION: The physical storage of character strings in Microsoft® SQL Server™ 2000 is controlled by collations. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.

也就是说COLLATION指定SQL SERVER字符的编码和排序,比较规则.

1.安装SQL SERVER时指定实例的COLLATION:

当我们在安装SQL SERVER时,会要我们指定SQL SERVER实例的COLLATION.有两个选项:

l WINDOWS COLLATION

l SQL COLLATION

SQL COLLATION是为了兼容以前的版本而保留的.所以一般是选择WINDOWS COLLATION,其缺省值是来自于WINDOWS作系统的LOCAL SETTING:

举例:如果安装SQL SERVER的操作系统其语言的英语,则WINDOWS COLLATION的缺省值是适用于大部分欧洲及美洲字符的拉丁字符,称为: LATIN1_GENERAL.

在选定了WINDOWS COLLATION以后,还可以选定排序规则,分别有:

(1) 二进制排序

(2) 字典排序,其又细分为:

l 大小写敏感

l 口音敏感

l <日> 假名敏感

l 宽度敏感

2.新建DATABASE时指定DB的COLLATION:

在新建一个DB时,我们可以指定DB级的COLLATION,如果没有指定,缺省是继承自实例的COLLATION.

3.新建表时指定字段的COLLATION:

除了在新建DB时可以指定COLLATION,我们还可以在新建TABLE时,指定COLUMN的COLLATION:

4.在SQL语句中指定COLLATION

You can override the server, database, or column collation by specifying a collation in the ORDER BY clause of a SELECT statement. For more information，for example:

Select * from test order by c collate SQL_Latin1_General_CP1_CS_AS

5.排序规则的优化级

排序规则的优先从高到低是：

n SQL语句级

n 字段级

n DATABASE级

n SERVER级

二,ORACLE的排序

1.Linguistic Sorting and String Searching

Oracle provides the following types of sorts:

■ Binary sort

■ Monolingual linguistic sort

■ Multilingual linguistic sort

2.Using Linguistic Sorts

n Monolingual Linguistic Sorts

n Multilingual Linguistic Sorts

Oracle provides multilingual linguistic sorts so that you can sort data in more than one language in one sort. This is useful for regions or languages that have complex sorting rules and for multilingual databases. Oracle Database 10g supports all of the sort orders defined by previous releases.

For Asian language data or multilingual data, Oracle provides a sorting mechanism based on the ISO 14651 standard and the Unicode 4.0 standard. Chinese characters are ordered by the number of strokes, PinYin, or radicals.

In addition, multilingual sorts can handle canonical equivalence and supplementary characters. Canonical equivalence is a basic equivalence between characters or sequences of characters. For example, ç is equivalent to the combination of c and ,.Supplementary characters are user-defined characters or predefined characters in Unicode 4.0 that require two code points within a specific code range. You can define up to 1.1 million code points in one multilingual sort.

For example, Oracle supports a monolingual French sort (FRENCH), but you can specify a multilingual French sort (FRENCH_M). _M represents the ISO 14651 standard for multilingual sorting. The sorting order is based on the GENERIC_M sorting order and can sort diacritical marks from right to left. Oracle Corporation recommends using

a multilingual linguistic sort if the tables contain multilingual data. If the tables contain only French, then a monolingual French sort may have better performance because it uses less memory. It uses less memory because fewer characters are defined in a monolingual French sort than in a multilingual French sort. There is a tradeoff between the scope and the performance of a sort.

3.Monolingual Linguistic Sorts

(1)monolingual linguistic sort name

Pls refer to 369 page: Table A–14 (Cont.) Monolingual Linguistic Sorts

(2).example of Monolingual Linguistic Sorts

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT=german');

4.Multilingual Linguistic Sorts

(1).multilingual linguistic sort name

Pls refer to 370 page: Table A–15 Multilingual LInguistic Sorts

(2).example of Multilingual Linguistic Sorts

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT= FRENCH_M');

(3).Multilingual Sorting Levels

■ Primary Level Sorts

A primary level sort distinguishes between base letters, such as the difference between characters a and b. It is up to individual locales to define whether a is beforeb, b is before a, or if they are equal. The binary representation of the characters is completely irrelevant. If a character is an ignorable character, then it is assigned a primary level order (or weight) of zero, which means it is ignored at the primary level.Characters that are ignorable on other levels are given an order of zero at those levels.

■ Secondary Level Sorts

A secondary level sort distinguishes between base letters (the primary level sort) before distinguishing between diacritics on a given base letter. For example, the character Ä differs from the character A only because it has a diacritic. Thus, Ä and A are the same on the primary level because they have the same base letter (A) but differ on the secondary level.

■ Tertiary Level Sorts

A tertiary level sort distinguishes between base letters (primary level sort), diacritics (secondary level sort), and case (upper case and lower case). It can also include special characters such as +, -, and *.

The following are examples of tertiary level sorts:

■ Characters a and A are equal on the primary and secondary levels but different on the tertiary level because they have different cases.

■ Characters ä and A are equal on the primary level and different on the secondary and tertiary levels.

■ The primary and secondary level orders for the dash character - is 0. That is, it is ignored on the primary and secondary levels. If a dash is compared with another character whose primary level order is nonzero, for example, u, then no result for

the primary level is available because u is not compared with anything. In this case, Oracle finds a difference between - and u only at the tertiary level.

5.Linguistic Sort Parameters

n NLS_SORT

设置级别：Initialization parameter, environment variable, ALTER SESSION, and SQL functions

n NLS_COMP

设置级别：Initialization parameter, environment variable, and ALTER SESSION

事实上,这两个参数在9I和10G都有,区别在于NLS_COMP,我们先来说说这两个参数的作用:

Ø NLS_SORT这个参数是用于设定ORDER BY子句是否按字符串的二进制值(大小写敏感)还是按某种语言的拼音字母表排序,如英语的英文字母表(大小写不敏感).

Ø NLS_COMP这个参数是用于设定ORACLE中有关字符串比较是否大小写敏感的,它应用于WHERE 条件的=, LIKE 和IN 和ORDER BY 子句

NLS_SORT参数的设值,9I和10G都是差不多的.区别在于NLS_COMP的取值:

(1)在9I中,NLS_COMP只可设定两个值:

Ø BINARY:按字符的二进制值比较

Ø ANSI:按字符的美国标准编码值进行比较

所以,在9I中没办法做到在WHERE条件的=,LIKE,IN谓词比较中大小写不敏感,只可以做得到在ORDER BY子句中大小写不敏感

(2)在10G中,NLS_COMP多了一个参数值:

Ø BINARY:按字符的二进制值比较

Ø ANSI:按字符的美国标准编码值进行比较

Ø LINGUISTIC:当设定此值后,WHERE条件的=,LIKE,IN的比较取决于NLS_SORT参数值,举例:当我们设定NLS_COMP=LINGUISTIC,NLS_SORT=BINARY_CI,则表示WHERE条件中字符串的比较是按BINARY_CI来进行比较的(大小写不敏感),所以在10G中,做到了真正的大小写敏感.

怎么来设置这两个参数?

可以通过两种方法来设置这两个参数:

在WINDOWS的命令行或LINUX的终端中:

Ø WINDOWS: SET NLS_SORT=, SET NLS_COMP=

Ø LINUX: export NLS_SORT=, export NLS_COMP=

但是,不管是9I还是10G,我们都只可以在会话级(SESSION)更改这两个参数,也就是说作用域是当前会话

6.Case-Insensitive and Accent-Insensitive Linguistic Sorts

缺省地，ORACLE都是大小写繁感的，要做到大小写不繁感，有下列方案：

n 使用upper(),lower()函数

举例：select * from test order by upper(c);

Select * from test where upper(c)=’A’;

缺点：必须建函数索引，否则会导致排序字段的索引不可用

n 使用NLS_UPPER()和NLS_LOWER()函数

举例：SELECT word FROM test1

WHERE NLS_UPPER(word, 'NLS_SORT = XGERMAN') = 'GROSSE';

优点：The NLS_UPPER()and NLS_LOWER() change the case of strings based on a specific linguistic sort definition. This enables you to perform case-insensitive searches regardless of the language being used.

缺点：如果不建指定语言的索引，将不会使用字段的索引

n 使用linguistic sorts

In Oracle Database 10g, Oracle provides case-insensitive and accent-insensitive options for linguistic sorts. Oracle provides the following types of monolingual and multilingual linguistic sorts:

■ Linguistic sorts that use information about base letters, diacritics, punctuation, and case. These are the standard monolingual and multilingual linguistic sorts that are described in "Using Linguistic Sorts" on page 5-2.

■ Linguistic sorts that use information about base letters, diacritics, and punctuation. This type of sort is called case-insensitive.

■ Linguistic sorts that use information about base letters only. This type of sort is called accent-insensitive. (Accent is another word for diacritic.) An accent-insensitive sort is always case-insensitive as well.

7.Examples of Case-Insensitive and Accent-Insensitive Sorts

(1)通过修改NLS_SORT参数实现：

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;

SELECT * FROM test2 ORDER BY letter;

(2)通过NLSSORT函数在SQL语句级实现：

SELECT * FROM test3 ORDER BY NLSSORT(name, 'NLS_SORT= BINARY_CI');

8.Performing Linguistic Comparisons

通过设定NLS_COMP 和NLS_SORT参数来实现。

(1)Linguistic Comparison Examples

Example 8–1 Binary Comparison Binary Sort

SQL> ALTER SESSION SET NLS_COMP=BINARY;

SQL> ALTER SESSION SET NLS_SORT=BINARY;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–2 Linguistic Comparison Binary Case-Insensitive Sort

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_CI;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–3 Linguistic Comparison Binary Accent-Insensitive Sort

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=BINARY_AI;

SQL> SELECT ename FROM emp1 WHERE ename LIKE 'McC%e';

Example 8–4 Linguistic Comparisons Using XSPANISH

SQL> ALTER SESSION SET NLS_COMP=LINGUISTIC;

SQL> ALTER SESSION SET NLS_SORT=XSPANISH;

SQL> SELECT ename FROM emp3 WHERE ename LIKE 'C%';

9.Using Linguistic Indexes

建立基于语方的索引。

举例：

CREATE INDEX nls_index ON my_table (NLSSORT(name, 'NLS_SORT = German'));

(1)Linguistic Indexes for Multiple Languages

■ Build a linguistic index for each language that the application supports.

CREATE INDEX french_index ON employees (NLSSORT(employee_id, 'NLS_

SORT=FRENCH'));

CREATE INDEX german_index ON employees (NLSSORT(employee_id, 'NLS_

SORT=GERMAN'));

■ Build a single linguistic index for all languages.

在表中创建一个字段，存储你所需要的语言:LANG_COL

This requires a language column

(LANG_COL in "Example: Setting Up a French Linguistic Index" on page 5-19) to be used as a parameter of the NLSSORT function. The language column contains NLS_LANGUAGE values for the data in the column on which the index is built. The following example builds a single linguistic index for multiple languages. With this index, the rows with the same values for NLS_LANGUAGE are sorted together:

CREATE INDEX i ON t (NLSSORT(col, 'NLS_SORT=' || LANG_COL));

Queries choose an index based on the argument of the NLSSORT function specified in the ORDER BY clause.

■ Build a single linguistic index for all languages using one of the multilingual linguistic sorts such as GENERIC_M or FRENCH_M.

CREATE INDEX i on t (NLSSORT(col,'NLS_SORT=GENERIC_M');