8.3. Character Types
8.3.字符类型

Table 8.4 shows the general-purpose character types available in PostgreSQL.
表8.4给出了PostgreSQL常用的字符数据类型。
SQL defines two primary character types: character varying(n) and character(n), where n is a positive integer. Both of these types can store strings up to n characters (not bytes) in length. An attempt to store a longer string into a column of these types will result in an error, unless the excess characters are all spaces, in which case the string will be truncated to the maximum length. (This somewhat bizarre exception is required by the SQL standard.) If the string to be stored is shorter than the declared length, values of type character will be space-padded; values of type character varying will simply store the shorter string.
SQL定义了两种主要的字符类型:character varying(n)和character(n),其中n为正整数。两种数据类型均可保存最大到n个字符(不是字节)的字符串。如果尝试插入超长的值会报错,除非超出的字符都是空格,在这种情况下,字符串会被截断。(这种怪异的异常为SQL标准所规定。)如果存储的字符串比限定的长度短,那么character会补空格至限定长度,而character varying则仅是保存该值。
If one explicitly casts a value to character varying(n) or character(n), then an overlength value will be truncated to n characters without raising an error. (This too is required by the SQL standard.)
如果将值强制转换为character varying(n)或者character(n),那么超长部分会直接截断且不会提示错误。(这也是SQL标准的要求。)
The notations varchar(n) and char(n) are aliases for character varying(n) and character(n), respectively. character without length specifier is equivalent to character(1). If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.
varchar(n)和char(n)分别为character varying(n)和character(n)的别名。character如果不限定长度,则为character(1)。如果character varying没有限定长度,那么其可以输入任意长度的字符串。后者为PostgreSQL的一个扩展实现。
In addition, PostgreSQL provides the text type, which stores strings of any length. Although the type text is not in the SQL standard, several other SQL database management systems have it as well.
另外,PostgreSQL还提供了text数据类型,可存储任意长度的字符串。虽然SQL标准中并没有text类型,但是在许多其他SQL数据库管理系统中也有此类型。
Values of type character are physically padded with spaces to the specified width n, and are stored and displayed that way. However, trailing spaces are treated as semantically insignificant and disregarded when comparing two values of type character. In collations where whitespace is significant,this behavior can produce unexpected results; for example SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2) returns true, even though C locale would consider a space to be greater than a newline. Trailing spaces are removed when converting a character value to one of the other string types. Note that trailing spaces are semantically significant in character varying and text values, and when using pattern matching, that is LIKE and regular expressions.
character值实际上是加空格至指定的长度n,而且也是这样存储和展示的。然而, 在比较两个character类型的值时,尾随空格被视为语义上无关紧要的并且会被忽略。在空白很重要的排序规则中,此行为会产生意外的结果; 例如SELECT 'a '::CHAR(2) collate "C" < E'a\n'::CHAR(2)返回true,即使C语言环境认为空格大于换行符也是如此。将character值转换为其他字符串类型时,会删除其尾部空格。请注意,尾随空格在character varying和text值以及使用模式匹配(即LIKE和正则表达式)时在语义上很重要。
The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of character. Longer strings have 4 bytes of overhead instead of 1. Long strings are compressed by the system automatically, so the physical requirement on disk might be less. Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values. In any case, the longest possible character string that can be stored is about 1 GB. (The maximum value that will be allowed for n in the data type declaration is less than that. It wouldn't be useful to change this because with multibyte character encodings the number of characters and bytes can be quite different. If you desire to store long strings with no specific upper limit, use text or character varying without a length specifier, rather than making up an arbitrary length limit.)
短字符串(最多126个字节)的存储要求是1个字节加上实际的字符串,其中包括在character情况下的空格填充。较长的字符串的开销为4个字节,而不是1个字节。较长的字符串由系统自动进行压缩,因此对磁盘的物理需求可能会更少。非常长的值也存储在后台表中,这样它们就不会干扰对较短列值的快速访问。在任何情况下,可以存储的最长字符串约为1 GB。(数据类型声明中允许n的最大值小于该值。更改此值无用,因为使用多字节字符编码时,字符和字节数可能会大不相同。如果要存储 没有特定上限的长字符串,请使用没有长度说明符的text或character varying类型,而不是设定任意长度限制。)
Tip
小贴士
There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
这三种类型之间没有性能差异,除了使用空格填充类型时增加了存储空间,并且在存储到受长度限制的列中时,还有一些额外的CPU来检查长度。尽管character(n)在某些其他数据库系统中具有性能优势,但在PostgreSQL中却没有这种优势。 实际上,character(n)通常是这三个中最慢的,因为它需要额外的存储成本。 在大多数情况下,应改用text或character varying类型。
Refer to Section 4.1.2.1 for information about the syntax of string literals, and to Chapter 9 for information about available operators and functions. The database character set determines the character set used to store textual values; for more information on character set support, refer to Section 23.3.
Example 8.1. Using the Character Types
示例8.1.使用字符类型
CREATE TABLE test1 (a character(4));
INSERT INTO test1 VALUES ('ok');
SELECT a, char_length(a) FROM test1;
a | char_length
------+-------------
ok | 2
CREATE TABLE test2 (b varchar(5));
INSERT INTO test2 VALUES ('ok');
INSERT INTO test2 VALUES ('good ');
INSERT INTO test2 VALUES ('too long');
ERROR: value too long for type character varying(5)
INSERT INTO test2 VALUES ('too long'::varchar(5)); -- explicit truncation
SELECT b, char_length(b) FROM test2;
b | char_length
-------+-------------
ok | 2
good | 5
too l | 5
1 The char_length function is discussed in Section 9.4.
函数char_length在9.4节讨论。
There are two other fixed-length character types in PostgreSQL, shown in Table 8.5. The name type exists only for the storage of identifiers in the internal system catalogs and is not intended for use by the general user. Its length is currently defined as 64 bytes (63 usable characters plus terminator) but should be referenced using the constant NAMEDATALEN in C source code. The length is set at compile time (and is therefore adjustable for special uses); the default maximum length might change in a future release. The type "char" (note the quotes) is different from char(1) in that it only uses one byte of storage. It is internally used in the system catalogs as a simplistic enumeration type.
PostgreSQL还有两个固定长度的字符类型,见表8.5。name类型仅用于在内部系统表中存储标识符,该类型一般用户不应使用。目前,其长度定义为64个字节(63个可用字符加终止符),但应使用C源代码中的常量NAMEDATALEN进行引用。该长度是在编译时设置的(因此对于特殊用途可以调整); 默认的最大长度可能会在将来的版本中更改。类型“ char”(请注意引号)与char(1)的不同之处在于,它仅使用一个字节的存储空间。在内部系统目录中使用它作为一种简单的枚举类型。
