postgresql 数据库 Encoding、Collate、Ctype

os: centos 7.4
db: postgresql 12.2

版本

# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core) 
# 
# su - postgres
$
$ psql -c "select version();"
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

查看

$ psql
psql (10.11)
Type "help" for help.

postgres=# \h create database
Command:     CREATE DATABASE
Description: create a new database
Syntax:
CREATE DATABASE name
    [ [ WITH ] [ OWNER [=] user_name ]
           [ TEMPLATE [=] template ]
           [ ENCODING [=] encoding ]
           [ LC_COLLATE [=] lc_collate ]
           [ LC_CTYPE [=] lc_ctype ]
           [ TABLESPACE [=] tablespace_name ]
           [ ALLOW_CONNECTIONS [=] allowconn ]
           [ CONNECTION LIMIT [=] connlimit ]
           [ IS_TEMPLATE [=] istemplate ] ]

URL: https://www.postgresql.org/docs/12/sql-createdatabase.html

postgres=# \l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 pgbenchdb | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
(4 rows)


postgres=# select * from pg_database;
  oid  |  datname  | datdba | encoding | datcollate  |  datctype   | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | datminmxid | dattablespace |               datacl                
-------+-----------+--------+----------+-------------+-------------+---------------+--------------+--------------+---------------+--------------+------------+---------------+-------------------------------------
 14187 | postgres  |     10 |        6 | en_US.UTF-8 | en_US.UTF-8 | f             | t            |           -1 |         14186 |          479 |          1 |          1663 | 
 16384 | pgbenchdb |     10 |        6 | en_US.UTF-8 | en_US.UTF-8 | f             | t            |           -1 |         14186 |          479 |          1 |          1663 |      
     1 | template1 |     10 |        6 | en_US.UTF-8 | en_US.UTF-8 | t             | t            |           -1 |         14186 |          479 |          1 |          1663 | {=c/postgres,postgres=CTc/postgres}
 14186 | template0 |     10 |        6 | en_US.UTF-8 | en_US.UTF-8 | t             | f            |           -1 |         14186 |          479 |          1 |          1663 | {=c/postgres,postgres=CTc/postgres}
(4 rows)

postgres=# select name,setting,category,short_desc from pg_settings ps where ps.name like 'lc%' or ps.name like '%coding%'order by name;
      name       |   setting   |                      category                      |                           short_desc                           
-----------------+-------------+----------------------------------------------------+----------------------------------------------------------------
 client_encoding | UTF8        | Client Connection Defaults / Locale and Formatting | Sets the client's character set encoding.
 lc_collate      | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Shows the collation order locale.
 lc_ctype        | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Shows the character classification and case conversion locale.
 lc_messages     | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Sets the language in which messages are displayed.
 lc_monetary     | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Sets the locale for formatting monetary amounts.
 lc_numeric      | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Sets the locale for formatting numbers.
 lc_time         | en_US.UTF-8 | Client Connection Defaults / Locale and Formatting | Sets the locale for formatting date and time values.
 server_encoding | UTF8        | Client Connection Defaults / Locale and Formatting | Sets the server (database) character set encoding.
(8 rows)

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'			# locale for system error message
					# strings
lc_monetary = 'en_US.UTF-8'			# locale for monetary formatting
lc_numeric = 'en_US.UTF-8'			# locale for number formatting
lc_time = 'en_US.UTF-8'				# locale for time formatting

# default configuration for text search
default_text_search_config = 'pg_catalog.english'

Encoding

postgresql 数据库的编码一般都是 UTF8,兼容性会好些 (相当于oracle的AL32UTF8,mysql的 utf8mb4)
在这里插入图片描述在这里插入图片描述
支持的字符集可以参考
https://www.postgresql.org/docs/12/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED


   

Collate

lc_collate

Collation order (LC_COLLATE) to use in the new database.
This affects the sort order applied to strings, e.g. in queries with ORDER BY, as well as the order used in indexes on text columns.
The default is to use the collation order of the template database. See below for additional restrictions.

https://www.postgresql.org/docs/current/collation.html
https://www.postgresql.org/docs/current/sql-createcollation.html

postgres=# select * from pg_collation ;

Ctype

lc_ctype

Character classification (LC_CTYPE) to use in the new database.
This affects the categorization of characters, e.g. lower, upper and digit. The default is to use the character classification of the template database. See below for additional restrictions.


LC_COLLATE = ‘C’ LC_CTYPE = ‘C’

有时会碰到 Collate、Ctype 显示为 C 的数据库,设置为 C 是为了去除所有本地化的设置

创建一个 Collate、Ctype为 C 的数据库

postgres=# CREATE DATABASE testdb WITH encoding 'UTF8'
LC_COLLATE = 'C'  
LC_CTYPE = 'C' ;

ERROR:  new collation (C) is incompatible with the collation of the template database (en_US.UTF-8)
HINT:  Use the same collation as in the template database, or use template0 as template.

从 en_US.UTF-8 collation 到 C collation 报错是受 initdb 的创建的 template1 的模板数据库(en_US.UTF-8)的影响。

使用 template0 当做模板数据库看下

postgres=# create database testdb with encoding 'UTF8'
template template0
lc_collate='C'
lc_ctype='C';

postgres=# \l testdb
                         List of databases
  Name  |  Owner   | Encoding | Collate | Ctype | Access privileges 
--------+----------+----------+---------+-------+-------------------
 testdb | postgres | UTF8     | C       | C     | 
(1 row)

initdb

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ /usr/pgsql-12/bin/initdb --help
initdb initializes a PostgreSQL database cluster.

Usage:
  initdb [OPTION]... [DATADIR]

Options:
  -A, --auth=METHOD         default authentication method for local connections
      --auth-host=METHOD    default authentication method for local TCP/IP connections
      --auth-local=METHOD   default authentication method for local-socket connections
 [-D, --pgdata=]DATADIR     location for this database cluster
  -E, --encoding=ENCODING   set default encoding for new databases
  -g, --allow-group-access  allow group read/execute on data directory
      --locale=LOCALE       set default locale for new databases
      --lc-collate=, --lc-ctype=, --lc-messages=LOCALE
      --lc-monetary=, --lc-numeric=, --lc-time=LOCALE
                            set default locale in the respective category for
                            new databases (default taken from environment)
      --no-locale           equivalent to --locale=C
	  

LC_MESSAGES,LC_MONETARY,LC_NUMERIC,LC_TIME 等参数是初始化后可以需要随时调整。
LC_COLLATE,LC_CTYPE 是在初始化数据库的时候设置的,初始化后是无法更改的。如果不是需要的,可以指定为你需要的语言环境设置

$ initdb -D /var/lib/pgsql/pg12_3307 \
--encoding=UTF8 \
--locale=en_US.UTF-8  \
--lc-collate=en_US.UTF-8  \
--lc-ctype=en_US.UTF-8  \
--lc-messages=en_US.UTF-8  \
--lc-monetary=en_US.UTF-8  \
--lc-numeric=en_US.UTF-8  \
--lc-time=en_US.UTF-8 \
--debug \
--data-checksums \
;

参考:
https://www.postgresql.org/docs/12/sql-createdatabase.html
https://www.postgresql.org/docs/current/locale.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

数据库人生

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值