PostgreSql字符串操作

最新推荐文章于 2024-10-03 20:47:47 发布

大米饭66

最新推荐文章于 2024-10-03 20:47:47 发布

阅读量1.9k

点赞数

分类专栏：数据库文章标签： pgsql 字符串

数据库专栏收录该内容

12 篇文章 0 订阅

订阅专栏

9.4. 字符串函数和操作符

本节描述了用于检查和操作字符串数值的函数和操作符。在这个环境中的字串包括所有类型 character， character varying，和 text 的值。除非另外说明，所有下面列出的函数都可以处理这些类型，不过要小心的是，在使用 character 类型的时候，它的自动填充的潜在影响。通常这里描述的函数也能用于非字串类型，我们只要先把那些数据转化为字串表现形式就可以了。有些函数还可以处理位串类型。

SQL 定义了一些字串函数，它们有指定的语法，它们里面是用某种特定的关键字，而不是逗号来分隔参数。详情请见Table 9-5，这些函数也用正常的函数调用说法实现了。（参阅 Table 9-6。）

Table 9-5. SQL 字串函数和操作符

函数	返回类型	描述	例子	结果
string \|\| string	text	字串连接	'Post' \|\| 'greSQL'	PostgreSQL
bit_length(string)	int	字串里二进制位的个数	bit_length('jose')	32
char_length(string) 或 character_length(string)	int	字串中的字符个数	char_length('jose')	4
convert(string using conversion_name)	text	使用指定的转换名字改变编码。转换可以通过 CREATE CONVERSION 定义。当然系统里有一些预定义的转换名字。参阅 Table 9-7 获取可用的转换名。	convert('PostgreSQL' using iso_8859_1_to_utf8)	UTF8 （Unicode, 8 位）编码的'PostgreSQL'
lower(string)	text	把字串转化为小写	lower('TOM')	tom
octet_length(string)	int	字串中的字节数	octet_length('jose')	4
overlay(string placing string from int [for int])	text	替换子字串	overlay('Txxxxas' placing 'hom' from 2 for 4)	Thomas
position(substring in string)	integer	指定的子字串的位置	position('om' in 'Thomas')	3
substring(string [from int] [for int])	text	抽取子字串	substring('Thomas' from 2 for 3)	hom
substring(string from pattern)	text	抽取匹配 POSIX 正则表达式的子字串	substring('Thomas' from '...$')	mas
substring(string from pattern for escape)	text	抽取匹配SQL正则表达式的子字串	substring('Thomas' from '%#"o_a#"_' for '#')	oma
trim([leading \| trailing \| both] [characters] fromstring)	text	从字串 string 的开头/结尾/两边/ 删除只包含 characters （缺省是一个空白）的最长的字串	trim(both 'x' from 'xTomxx')	Tom
upper(string)	text	把字串转化为大写。	upper('tom')	TOM

还有额外的字串操作函数可以用，它们在Table 9-6列出。它们有些在内部用于实现Table 9-5列出的SQL标准字串函数。

Table 9-6. 其他字串函数

函数	返回类型	描述	例子	结果
ascii(text)	int	参数第一个字符的 ASCII 码	ascii('x')	120
btrim(string text [, characters text])	text	从 string 开头和结尾删除只包含在 characters 里（缺省是空白）的字符的最长字串	btrim('xyxtrimyyx','xy')	trim
chr(int)	text	给出 ASCII 码的字符	chr(65)	A
convert(string text, [src_encoding name,]dest_encoding name)	text	把字串转换为 dest_encoding 。原来的编码是用 src_encoding 声明的。如果省略了 src_encoding, 则假设为数据库编码。	convert( 'text_in_utf8', 'UTF8', 'LATIN1')	以 ISO 8859-1 编码表示的text_in_utf8
decode(string text, type text)	bytea	把早先用encode编码的，存放在 string 里面的二进制数据解码。参数类型和encode一样。	decode('MTIzAAE=', 'base64')	123\000\001
encode(data bytea, type text)	text	把二进制数据编码为只包含 ASCII 形式的数据。支持的类型有base64，hex，escape。	encode('123\\000\\001', 'base64')	MTIzAAE=
initcap(text)	text	把每个单词的第一个子母转为大写，其它的保留小写。单词是一系列字母数字组成的字符，用非字母数字分隔。	initcap('hi thomas')	Hi Thomas
length(string text)	int	string 中字符的数目	length('jose')	4
lpad(string text, length int [, fill text])	text	通过填充字符 fill （缺省时为空白），把 string 填充为长度 length。如果 string 已经比 length 长则将其截断（在右边）。	lpad('hi', 5, 'xy')	xyxhi
ltrim(string text [, characters text])	text	从字串 string 的开头删除只包含 characters （缺省是一个空白）的最长的字串。	ltrim('zzzytrim','xyz')	trim
md5(string text)	text	计算给出 string 的 MD5 散列，以十六进制返回结果。	md5('abc')	900150983cd24fb0d6963f7d28e17f72
pg_client_encoding()	name	当前客户端编码名称。	pg_client_encoding()	SQL_ASCII
quote_ident(string text)	text	返回给出字串的一个适用于在SQL语句字串里当作标识符引起使用的形式。只有在必要的时候才会添加引号（也就是说，如果字串包含非标识符字符或者会转换大小写的字符）。嵌入的引号被恰当地写了双份。	quote_ident('Foo bar')	"Foo bar"
quote_literal(string text)	text	返回给出字串的一个适用于在SQL语句字串里当作文本使用的形式。嵌入的引号和反斜杠被恰当地写了双份。	quote_literal('O\'Reilly')	'O''Reilly'
repeat(string text, number int)	text	重复 string number 次。	repeat('Pg', 4)	PgPgPgPg
replace(string text, from text, to text)	text	把字串string里出现地所有子字串 from 替换成子字串 to。	replace('abcdefabcdef', 'cd', 'XX')	abXXefabXXef
rpad(string text, length int [, fill text])	text	通过填充字符 fill （缺省时为空白），把 string 填充为长度 length。如果 string 已经比 length 长则将其截断。	rpad('hi', 5, 'xy')	hixyx
rtrim(string text [, character text])	text	从字串 string 的结尾删除只包含 character （缺省是个空白）的最长的字串。	rtrim('trimxxxx','x')	trim
split_part(string text, delimiter text, fieldint)	text	根据 delimiter 分隔 string 返回生成的第 field 个子字串（一为基）。	split_part('abc~@~def~@~ghi', '~@~', 2)	def
strpos(string, substring)	text	声明的子字串的位置。（和 position(substring in string一样），不过要注意参数顺序是相反的）	strpos('high','ig')	2
substr(string, from [, count])	text	抽取子字串。（和 substring(string from from for count)一样）	substr('alphabet', 3, 2)	ph
to_ascii(text [, encoding])	text	把 text 从其它编码转换为 ASCII。 [a]	to_ascii('Karel')	Karel
to_hex(number int 或者 bigint)	text	把 number 转换成其对应地十六进制表现形式。	to_hex(9223372036854775807)	7fffffffffffffff
translate(string text, from text, to text)	text	把在 string 中包含的任何匹配 from 中的字符的字符转化为对应的在 to 中的字符。	translate('12345', '14', 'ax')	a23x5
Notes: a. to_ascii 函数只支持从 LATIN1， LATIN2，LATIN9 和 WIN1250 编码进行转换。

Table 9-7. 内置的转换

转换名 [a]	源编码	目的编码
ascii_to_mic	SQL_ASCII	MULE_INTERNAL
ascii_to_utf8	SQL_ASCII	UTF8
big5_to_euc_tw	BIG5	EUC_TW
big5_to_mic	BIG5	MULE_INTERNAL
big5_to_utf8	BIG5	UTF8
euc_cn_to_mic	EUC_CN	MULE_INTERNAL
euc_cn_to_utf8	EUC_CN	UTF8
euc_jp_to_mic	EUC_JP	MULE_INTERNAL
euc_jp_to_sjis	EUC_JP	SJIS
euc_jp_to_utf8	EUC_JP	UTF8
euc_kr_to_mic	EUC_KR	MULE_INTERNAL
euc_kr_to_utf8	EUC_KR	UTF8
euc_tw_to_big5	EUC_TW	BIG5
euc_tw_to_mic	EUC_TW	MULE_INTERNAL
euc_tw_to_utf8	EUC_TW	UTF8
gb18030_to_utf8	GB18030	UTF8
gbk_to_utf8	GBK	UTF8
iso_8859_10_to_utf8	LATIN6	UTF8
iso_8859_13_to_utf8	LATIN7	UTF8
iso_8859_14_to_utf8	LATIN8	UTF8
iso_8859_15_to_utf8	LATIN9	UTF8
iso_8859_16_to_utf8	LATIN10	UTF8
iso_8859_1_to_mic	LATIN1	MULE_INTERNAL
iso_8859_1_to_utf8	LATIN1	UTF8
iso_8859_2_to_mic	LATIN2	MULE_INTERNAL
iso_8859_2_to_utf8	LATIN2	UTF8
iso_8859_2_to_windows_1250	LATIN2	WIN1250
iso_8859_3_to_mic	LATIN3	MULE_INTERNAL
iso_8859_3_to_utf8	LATIN3	UTF8
iso_8859_4_to_mic	LATIN4	MULE_INTERNAL
iso_8859_4_to_utf8	LATIN4	UTF8
iso_8859_5_to_koi8_r	ISO_8859_5	KOI8
iso_8859_5_to_mic	ISO_8859_5	MULE_INTERNAL
iso_8859_5_to_utf8	ISO_8859_5	UTF8
iso_8859_5_to_windows_1251	ISO_8859_5	WIN1251
iso_8859_5_to_windows_866	ISO_8859_5	WIN866
iso_8859_6_to_utf8	ISO_8859_6	UTF8
iso_8859_7_to_utf8	ISO_8859_7	UTF8
iso_8859_8_to_utf8	ISO_8859_8	UTF8
iso_8859_9_to_utf8	LATIN5	UTF8
johab_to_utf8	JOHAB	UTF8
koi8_r_to_iso_8859_5	KOI8	ISO_8859_5
koi8_r_to_mic	KOI8	MULE_INTERNAL
koi8_r_to_utf8	KOI8	UTF8
koi8_r_to_windows_1251	KOI8	WIN1251
koi8_r_to_windows_866	KOI8	WIN866
mic_to_ascii	MULE_INTERNAL	SQL_ASCII
mic_to_big5	MULE_INTERNAL	BIG5
mic_to_euc_cn	MULE_INTERNAL	EUC_CN
mic_to_euc_jp	MULE_INTERNAL	EUC_JP
mic_to_euc_kr	MULE_INTERNAL	EUC_KR
mic_to_euc_tw	MULE_INTERNAL	EUC_TW
mic_to_iso_8859_1	MULE_INTERNAL	LATIN1
mic_to_iso_8859_2	MULE_INTERNAL	LATIN2
mic_to_iso_8859_3	MULE_INTERNAL	LATIN3
mic_to_iso_8859_4	MULE_INTERNAL	LATIN4
mic_to_iso_8859_5	MULE_INTERNAL	ISO_8859_5
mic_to_koi8_r	MULE_INTERNAL	KOI8
mic_to_sjis	MULE_INTERNAL	SJIS
mic_to_windows_1250	MULE_INTERNAL	WIN1250
mic_to_windows_1251	MULE_INTERNAL	WIN1251
mic_to_windows_866	MULE_INTERNAL	WIN866
sjis_to_euc_jp	SJIS	EUC_JP
sjis_to_mic	SJIS	MULE_INTERNAL
sjis_to_utf8	SJIS	UTF8
tcvn_to_utf8	TCVN	UTF8
uhc_to_utf8	UHC	UTF8
utf8_to_ascii	UTF8	SQL_ASCII
utf8_to_big5	UTF8	BIG5
utf8_to_euc_cn	UTF8	EUC_CN
utf8_to_euc_jp	UTF8	EUC_JP
utf8_to_euc_kr	UTF8	EUC_KR
utf8_to_euc_tw	UTF8	EUC_TW
utf8_to_gb18030	UTF8	GB18030
utf8_to_gbk	UTF8	GBK
utf8_to_iso_8859_1	UTF8	LATIN1
utf8_to_iso_8859_10	UTF8	LATIN6
utf8_to_iso_8859_13	UTF8	LATIN7
utf8_to_iso_8859_14	UTF8	LATIN8
utf8_to_iso_8859_15	UTF8	LATIN9
utf8_to_iso_8859_16	UTF8	LATIN10
utf8_to_iso_8859_2	UTF8	LATIN2
utf8_to_iso_8859_3	UTF8	LATIN3
utf8_to_iso_8859_4	UTF8	LATIN4
utf8_to_iso_8859_5	UTF8	ISO_8859_5
utf8_to_iso_8859_6	UTF8	ISO_8859_6
utf8_to_iso_8859_7	UTF8	ISO_8859_7
utf8_to_iso_8859_8	UTF8	ISO_8859_8
utf8_to_iso_8859_9	UTF8	LATIN5
utf8_to_johab	UTF8	JOHAB
utf8_to_koi8_r	UTF8	KOI8
utf8_to_sjis	UTF8	SJIS
utf8_to_tcvn	UTF8	TCVN
utf8_to_uhc	UTF8	UHC
utf8_to_windows_1250	UTF8	WIN1250
utf8_to_windows_1251	UTF8	WIN
utf8_to_windows_1252	UTF8	WIN1252
utf8_to_windows_1256	UTF8	WIN1256
utf8_to_windows_866	UTF8	ALT
utf8_to_windows_874	UTF8	WIN874
windows_1250_to_iso_8859_2	WIN1250	LATIN2
windows_1250_to_mic	WIN1250	MULE_INTERNAL
windows_1250_to_utf8	WIN1250	UTF8
windows_1251_to_iso_8859_5	WIN	ISO_8859_5
windows_1251_to_koi8_r	WIN	KOI8
windows_1251_to_mic	WIN	MULE_INTERNAL
windows_1251_to_utf8	WIN	UTF8
windows_1251_to_windows_866	WIN	ALT
windows_1252_to_utf8	WIN1252	UTF8
windows_1256_to_utf8	WIN1256	UTF8
windows_866_to_iso_8859_5	ALT	ISO_8859_5
windows_866_to_koi8_r	ALT	KOI8
windows_866_to_mic	ALT	MULE_INTERNAL
windows_866_to_utf8	ALT	UTF8
windows_866_to_windows_1251	ALT	WIN
windows_874_to_utf8	WIN874	UTF8
Notes: a. 转换名遵循一个标准的命名模式：将源编码中的所有非字母数字字符用下划线替换，后面跟着 _to_，然后后面再跟着经过同样处理的目标编码的名字。因此这些名字可能和客户的编码名字不同。