PostgreSQL学习笔记 ---- Initdb流程

aSimpleSheep

于 2024-07-29 15:30:21 发布

阅读量295

点赞数 3

分类专栏： PostgreSQL 文章标签： postgresql 学习笔记数据库 c语言

本文链接：https://blog.csdn.net/qq_39182381/article/details/140769669

版权

PostgreSQL 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

执行 initdb 时，代码流程是从 src\bin\initdb\initdb.c 中的 main() 函数开始执行的：

根据用户的输入命令行获取输入的命令名（get_progname）。
设置系统编码为 LC_ALL，查找执行命令的绝对路径。
设置系统变量（pg_data等），获取系统配置文件的源文件路径（ bki 等文件），并检查该路径的有效性。
设置中断信号处理函数
创建数据库目录
测试当前服务器系统性能
在 bootstrap 模式下创建模板数据库 template1
创建系统试图、系统表等，并初始化系统表的初始元组，复制 template1 来创建 template0 和 postgres
打印操作成功等相关信息，退出

初期准备和环境配置

获取程序实际名称

在 get_progname() 函数中调用 last_dir_separator() 读取用户从终端输入的 argv[0] 的最后一个 ‘/’ 的程序名称：

	nodir_name = last_dir_separator(argv0);
	if (nodir_name)
		nodir_name++;
	else
		nodir_name = skip_drive(argv0);

设置环境变量

设置环境变量主要通过函数 set_pglocale_pgservice 。

首先设置或读取地域化信息 setlocale()，参数 LC_ALL 代表所有，“” 代表默认本地；
获取到 initdb 的绝对路径 find_my_exec()，并存储在 my_exec_path 中，我的是 /home/sheep/pginstall/pg15.1/bin；

帮助和版本信息

当用户不执行初始化动作，只是查看帮助信息或者版本信息时进入如下逻辑：

	if (argc > 1)
	{
		if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
		{
			usage(progname);
			exit(0);
		}
		if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
		{
			puts("initdb (PostgreSQL) " PG_VERSION);
			exit(0);
		}
	}

解析命令行选项

	while ((c = getopt_long(argc, argv, "A:dD:E:gkL:nNsST:U:WX:", long_options, &option_index)) != -1)
	{
		switch (c)
		{
			case 'A':
				authmethodlocal = authmethodhost = pg_strdup(optarg);
			......
			default:
				/* getopt_long already emitted a complaint */
				pg_log_error_hint("Try \"%s --help\" for more information.", progname);
				exit(1);
		}
	}

如上代码，主要是获取输入的命令后的参数选项；

其中 "A:dD:E:gkL:nNsST:U:WX:" 中每个字母即代表一个命令行选项，带 : 表示该选项后面必须带参数值。

检查认证模式有效性

如下代码：

	check_authmethod_unspecified(&authmethodlocal);
	check_authmethod_unspecified(&authmethodhost);

	check_authmethod_valid(authmethodlocal, auth_methods_local, "local");
	check_authmethod_valid(authmethodhost, auth_methods_host, "host");

	check_need_password(authmethodlocal, authmethodhost);

检查验证方式及是否需要 password 。这里 authmethod 如果为空，认证方式将被设置为默认的 trust。

设置命令行工具路径

	setup_pgdata();

	setup_bin_paths(argv[0]);

获取初始化数据库集簇对应路径，并设置环境变量 PGDATA 。

然后设定 bin 和 share 路径，即 /home/sheep/pginstall/pg15.1/bin 和 /home/sheep/pginstall/pg15.1/share

设置超级用户名

	effective_user = get_id();
	if (!username)
		username = effective_user;

	if (strncmp(username, "pg_", 3) == 0)
		pg_fatal("superuser name \"%s\" is disallowed; role names cannot begin with \"pg_\"", username);

	printf(_("The files belonging to this database system will be owned "
			 "by user \"%s\".\n"
			 "This user must also own the server process.\n\n"),
		   effective_user);

若未命令行指定超级用户名称，那么就获取当前操作系统用户名称作为超级用户名称，非 WIN32 不支持 root 运行。

接着检查用户名有效性，用户不能以 “pg_” 作为起始。

设置版本等信息

	set_info_version();

	setup_data_file_paths();

	setup_locale_encoding();

	setup_text_search();

获取版本信息，存在 infoversion 中;

设置后面数据集簇初期化需要的文件，检查所需文件是否存在；

设置区域和字符集；未设置就系统默认；

设置文字搜索；未设置特殊参数，则系统默认 simple ；

设置超级用户口令

	if (pwprompt || pwfilename)
		get_su_pwd();

若认证方式需要用户口令，就从从终端获取设置的超级用户的口令。

初始化集簇

数据初始化

数据初始化的主函数入口为 initialize_data_directory() 。函数主要逻辑如下：

void
initialize_data_directory(void)
{
	PG_CMD_DECL;
	int			i;

	setup_signals();

	/*
	 * Set mask based on requested PGDATA permissions.  pg_mode_mask, and
	 * friends like pg_dir_create_mode, are set to owner-only by default and
	 * then updated if -g is passed in by calling SetDataDirectoryCreatePerm()
	 * when parsing our options (see above).
	 */
	umask(pg_mode_mask);

	create_data_directory();

	create_xlog_or_symlink();

	/* Create required subdirectories (other than pg_wal) */
	printf(_("creating subdirectories ... "));
	fflush(stdout);

	for (i = 0; i < lengthof(subdirs); i++)
	{
		char	   *path;

		path = psprintf("%s/%s", pg_data, subdirs[i]);

		/*
		 * The parent directory already exists, so we only need mkdir() not
		 * pg_mkdir_p() here, which avoids some failure modes; cf bug #13853.
		 */
		if (mkdir(path, pg_dir_create_mode) < 0)
			pg_fatal("could not create directory \"%s\": %m", path);

		free(path);
	}

	check_ok();

	/* Top level PG_VERSION is checked by bootstrapper, so make it first */
	write_version_file(NULL);

	/* Select suitable configuration settings */
	set_null_conf();
	test_config_settings();

	/* Now create all the text config files */
	setup_config();

	/* Bootstrap template1 */
	bootstrap_template1();

	/*
	 * Make the per-database PG_VERSION for template1 only after init'ing it
	 */
	write_version_file("base/1");

	/*
	 * Create the stuff we don't need to use bootstrap mode for, using a
	 * backend running in simple standalone mode.
	 */
	fputs(_("performing post-bootstrap initialization ... "), stdout);
	fflush(stdout);

	snprintf(cmd, sizeof(cmd),
			 "\"%s\" %s %s template1 >%s",
			 backend_exec, backend_options, extra_options,
			 DEVNULL);

	PG_CMD_OPEN;

	setup_auth(cmdfd);

	setup_run_file(cmdfd, system_constraints_file);

	setup_run_file(cmdfd, system_functions_file);

	setup_depend(cmdfd);

	/*
	 * Note that no objects created after setup_depend() will be "pinned".
	 * They are all droppable at the whim of the DBA.
	 */

	setup_run_file(cmdfd, system_views_file);

	setup_description(cmdfd);

	setup_collation(cmdfd);

	setup_run_file(cmdfd, dictionary_file);

	setup_privileges(cmdfd);

	setup_schema(cmdfd);

	load_plpgsql(cmdfd);

	vacuum_db(cmdfd);

	make_template0(cmdfd);

	make_postgres(cmdfd);

	PG_CMD_CLOSE;

	check_ok();
}

setup_signals(); 设置初始化过程中的信号处理，对终端命令行的一些信号进行屏蔽，保证初始化顺利。

umask(pg_mode_mask); 设置接下来要创建的文件的权限

create_data_directory(); 创建 pg_data 目录，函数主要就是判断权限和路径

create_xlog_or_symlink(); 创建 pg_wal 目录。

for (i = 0; i < lengthof(subdirs); i++) 该循环是创建 pg_data 下的其他子目录，参数 subdirs 存放着需要的子目录名称，如下：

static const char *const subdirs[] = {
	"global",
	"pg_wal/archive_status",
	"pg_commit_ts",
	"pg_dynshmem",
	"pg_notify",
	"pg_serial",
	"pg_snapshots",
	"pg_subtrans",
	"pg_twophase",
	"pg_multixact",
	"pg_multixact/members",
	"pg_multixact/offsets",
	"base",
	"base/1",
	"pg_replslot",
	"pg_tblspc",
	"pg_stat",
	"pg_stat_tmp",
	"pg_xact",
	"pg_logical",
	"pg_logical/snapshots",
	"pg_logical/mappings"
};

write_version_file(NULL); 设置 PG_VERSION，现有版本是15。后续bootstrapper会检查

set_null_conf(); 设置一个空的配置文件 postgresql.conf，以便通过启动测试后端来检查配置

test_config_settings(); 主要是确定 shared_buffers 和 max_connections 的值。拼接命令调用 system()函数，根据 shared_buffers 和 max_connections 值去 fork 一个 postgres 的子进程，根据返回值判断是否可行。

setup_config(); 设置配置文件，主要就是将前面获取的配置信息写入配置文件 postgresql.conf（基本信息、如口令加密方式、最大连接数等）、postgresql.auto.conf（存储由 ALTER SYSTEM 命令设置的配置参数）、pg_hba.conf（身份验证信息）、pg_ident.conf（权限）

bootstrap_template1(); 首先是读取 share 目录下的 postgres.bki 文件，并复制替换文件中的一些信息。
然后，通过 putenv() 将正确的 LC_xxx 环境传递到 bootstrap 。拼接命令利用 for() 循环执行 bki 文件中的每一行脚本，创建 template1 的各个系统表，初始化数据库模板 template1 。最后调用chech_ok() 对过程中的信号进行处理。

BKI文件介绍：

BKI 文件是用一些特殊语言写的脚本，这些脚本是 postgres 后端可以理解的，以特殊的 “bootstrap” 模式执行。这种模式允许在不存在系统表的零初始条件下执行数据库函数，而普通的 SQL 命令要求系统表必须存在。BKI文件仅用于初始数据集簇。

postgres.bki文件是在编译过程中由 src/backend/catalog/ 目录下的脚本程序genbki.pl 读取 src/include/catalog/ 下的 pg_xxx.h 形式的系统表定义文件（包括系统表索引和TOAST表定义文件）创建的。

在 src/include/catalog/ 下的 pg_xxx.h 形式的系统表定义文件中包含如下内容的定义：

定义CATALOG宏，用于以统一的模式去定义系统表的结构以及用以描述系统表的数据结构。
通过DATA(x)和DESCR(x)来定义insert操作，这样的insert操作可能会有多个，用于定义系统表中的初始数据。

在整个源代码被编译的时候，genbki.pl 脚本会被调用，它将从每一个 pg_*.h 文件中读取系统表定义、系统表的初始化数据、系统表上的索引信息等信息，然后分别将其转换为对应的 BKI 命令，最终将所有的 BKI 命令写入到 postgres.bki 文件中。该文件的内容如下：

一个 “create bootstrap” 命令，用于创建其中一个关键表。
PS：一些基本的系统表被称为关键表（pg_class、pg_attribute、pg_prog、pg_type），在其被创建并初始化之前，不能使用 open 命令打开非关键表，因为这几个关键表存储了所有系统表的模式信息，如果它们没有被建立，open 命令不可能在其中找到要打开的系统表的模式信息。由于关键表不能用 open 打开并填充，BKI 提供了带 bootstrap 选项的 create 命令，该命令可以在创建关键表后自动将其打开。
一个或多个 insert 命令用于填充1)创建的关键表中的数据。
一个 close 命令，用于关闭1创建的关键表。
重复 1~3 创建和填充其他关键表。
一个不带 bootstrap 选项的 create 命令，用于创建一个非关键表。
一个 open 命令打开非关键表。
一个或多个 insert 命令用于填充非关键表中的数据。
一个 close 命令，关闭上面打开的非关键表。
重复创建其他非关键表。
一个或多个 “declare index” 命令用于定义索引。
一个 “build indices” 命令，用于实际建立上一步所定义的索引。

write_version_file("base/1"); 就是在 pg_data/base/1 目录即 template1 数据库的目录下，生成一个 PG_VERSION 文件。然后进行 bootstarp 初始化

setup_auth(cmdfd); 创建密码表；写入super密码

setup_run_file(cmdfd, system_constraints_file); 读取 system_constraints.sql 创建系统约束

setup_run_file(cmdfd, system_functions_file); 读取 system_functions.sql 创建系统函数

setup_depend(cmdfd); 设置依赖关系，就是执行 pg_depend_setup[] 结构体中的 sql 语句更新 pg_depend 和 pg_shdepend 表

setup_run_file(cmdfd, system_views_file); 读取 system_views.sql 创建系统视图

setup_description(cmdfd); 读取 postgres.description 创建 pg_description 和 pg_shdescription 表，就是描述表

setup_collation(cmdfd); 直接插入数据，创建 pg_collation 表，也就是排序规则表

setup_run_file(cmdfd, dictionary_file); 读取snowball_create.sql，创建一些额外的目录

setup_privileges(cmdfd); 执行修改权限的 sql 语句，初始化授权相关的表

setup_schema(cmdfd); 读取 information_schema.sql，初始化 info_schema 表

load_plpgsql(cmdfd); 就是让 postgres 执行 create extension plpgsql，添加 plpgsql 扩展。plpgsql 是 postgresql 数据库系统的一个可加载的过程语言。plpgsql 的设计目标是创建一种可加载的过程语言，可以用于创建函数和触发器过程, 为 SQL 语言增加控制结构, 执行复杂的计算继承所有用户定义类型、函数、操作符, 定义为被服务器信任的语言。

vacuum_db(cmdfd); 就是执行 ANALYZE 和 VACUUM FREEZE 清理 template1。ANALYZE：收集template1 的统计信息；VACUUM FREEZE：选择激进的元组“冻结”

make_template0(cmdfd); 把 template1 拷贝一份成为 template0

make_postgres(cmdfd); 创建数据库 postgres，并设为默认连接

初始化基本结束，后续将数据同步到磁盘。

说明：

template0 和 postgres 都是由 template1 复制拷贝而来；

template1 和 template0 数据库用于创建数据库。postgres 中采用从模板数据库复制的方法来创建新的数据库，在创建数据库的命令中可以用 “template” 来指定以哪个数据库为模板来创建新的数据库。template1 数据库是创建数据库命令默认的模板。

区别如下：

template1 数据库是可以修改的，允许用户可以制作一个自定义的模板数据库；
由于 template1 可以修改，因此为了满足用户创建一个“干净”数据库的需求，postgres 提供了 template0 作为最初始的备份数据。当需要的时候，可以以 template0 为模板生成一个“干净”的数据库；
template1 可以连接并创建对象，template0 不可以连接。pg_database 中的 datallowconn 标志位标志是否允许与该数据库发生连接，因此 template1 可以连接并创建对象，template0 不可以连接：
使用 template1 模板建库时不可指定 encoding 和 locale，而 template0 可以；
postgres 数据库，只是用于给初始用户提供一个可连接的数据库。
上述的三个系统数据库都是可以删除的，但是两个模板库在删除之前必须将其在 pg_database 中元组的 datistemplate 属性设置为 FALSE 才可以。

提示信息打印

数据初始化完成后就是提示信息的生成及打印

	if (do_sync)
	{
		fputs(_("syncing data to disk ... "), stdout);
		fflush(stdout);
		fsync_pgdata(pg_data, PG_VERSION_NUM);
		check_ok();
	}
	else
		printf(_("\nSync to disk skipped.\nThe data directory might become corrupt if the operating system crashes.\n"));

	if (authwarning)
	{
		printf("\n");
		pg_log_warning("enabling \"trust\" authentication for local connections");
		pg_log_warning_hint("You can change this by editing pg_hba.conf or using the option -A, or "
							"--auth-local and --auth-host, the next time you run initdb.");
	}

	if (!noinstructions)
	{
		/*
		 * Build up a shell command to tell the user how to start the server
		 */
		start_db_cmd = createPQExpBuffer();

		/* Get directory specification used to start initdb ... */
		strlcpy(pg_ctl_path, argv[0], sizeof(pg_ctl_path));
		canonicalize_path(pg_ctl_path);
		get_parent_directory(pg_ctl_path);
		/* ... and tag on pg_ctl instead */
		join_path_components(pg_ctl_path, pg_ctl_path, "pg_ctl");

		/* Convert the path to use native separators */
		make_native_path(pg_ctl_path);

		/* path to pg_ctl, properly quoted */
		appendShellString(start_db_cmd, pg_ctl_path);

		/* add -D switch, with properly quoted data directory */
		appendPQExpBufferStr(start_db_cmd, " -D ");
		appendShellString(start_db_cmd, pgdata_native);

		/* add suggested -l switch and "start" command */
		/* translator: This is a placeholder in a shell command. */
		appendPQExpBuffer(start_db_cmd, " -l %s start", _("logfile"));

		printf(_("\nSuccess. You can now start the database server using:\n\n"
				 "    %s\n\n"),
			   start_db_cmd->data);

		destroyPQExpBuffer(start_db_cmd);
	}