除了xlog，哪些操作可能还需要fsync ?

最新推荐文章于 2024-06-18 11:07:45 发布

Postgresql中国用户会

最新推荐文章于 2024-06-18 11:07:45 发布

阅读量391

点赞数

分类专栏：转载 PostgreSQL源码分析文章标签： kernel postgresql 数据库代码

转载同时被 2 个专栏收录

108 篇文章 0 订阅

订阅专栏

PostgreSQL源码分析

26 篇文章 0 订阅

订阅专栏

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大，国内顶级PostgreSQL数据库专家将悉数到场，并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋，平安科技数据库技术部经理
……

 
 2015年度PG大象会报名地址：http://postgres2015.eventdove.com/
PostgreSQL中国社区： http://postgres.cn/
PostgreSQL专业1群： 3336901（已满）
PostgreSQL专业2群： 100910388
PostgreSQL专业3群： 150657323

我们知道xlog的一个重要责任是用来保护用户提交的事务在数据库的持久化特性的。

那么就涉及到用户提交事务后，必须先等待这笔事务对应的XLOG fsync完成。所以xlog会涉及不断的fsync(由wal writter间歇性发起，用户进程仅仅在申请不到XLOG BUFFER时会调用fsync) （ http://blog.163.com/digoal@126/blog/static/163877040201573564223/）。

另一方面，XLOG还有一个设计初衷，就是将离散的IO归为连续的IO，因为XLOG文件是预分配的，连续写入的。

如果没有XLOG，用户事务提交时，必须对操作对象fsync，可能涉及大量的离散IO，也不利于操作系统合并IO。

那么问题来了，除了xlog需要fsync，还有没有其他操作需要fsync呢？

答案是必须有的，只是这种fsync会越来越少，至少在对操作响应要求高的场景会尽力避免非XLOG的fsync需求。

所以在一些对响应要求不是那么高的操作中还是有非xlog的fsync需求的。

例如

1. initdb

src/bin/initdb/initdb.c

* Issue fsync recursively on PGDATA and all its contents.

* We fsync regular files and directories wherever they are, but we

* follow symlinks only for pg_xlog and immediately under pg_tblspc.

* Other symlinks are presumed to point at files we're not responsible

* for fsyncing, and might not have privileges to write at all.

* Errors are reported but not considered fatal.

static void

fsync_pgdata(void)

{

bool xlog_is_symlink;

char pg_xlog[MAXPGPATH];

char pg_tblspc[MAXPGPATH];

fputs(_("syncing data to disk ... "), stdout);

fflush(stdout);

snprintf(pg_xlog, MAXPGPATH, "%s/pg_xlog", pg_data);

snprintf(pg_tblspc, MAXPGPATH, "%s/pg_tblspc", pg_data);

* If pg_xlog is a symlink, we'll need to recurse into it separately,

* because the first walkdir below will ignore it.

xlog_is_symlink = false;

#ifndef WIN32

{

struct stat st;

if (lstat(pg_xlog, &st) < 0)

fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"),

progname, pg_xlog, strerror(errno));

else if (S_ISLNK(st.st_mode))

xlog_is_symlink = true;

}

#else

if (pgwin32_is_junction(pg_xlog))

xlog_is_symlink = true;

#endif

* If possible, hint to the kernel that we're soon going to fsync the data

* directory and its contents.

#ifdef PG_FLUSH_DATA_WORKS

walkdir(pg_data, pre_sync_fname, false);

if (xlog_is_symlink)

walkdir(pg_xlog, pre_sync_fname, false);

walkdir(pg_tblspc, pre_sync_fname, true);

#endif

* Now we do the fsync()s in the same order.

* The main call ignores symlinks, so in addition to specially processing

* pg_xlog if it's a symlink, pg_tblspc has to be visited separately with

* process_symlinks = true. Note that if there are any plain directories

* in pg_tblspc, they'll get fsync'd twice. That's not an expected case

* so we don't worry about optimizing it.

walkdir(pg_data, fsync_fname_ext, false);

if (xlog_is_symlink)

walkdir(pg_xlog, fsync_fname_ext, false);

walkdir(pg_tblspc, fsync_fname_ext, true);

check_ok();

}

2. create database 或 alter database move tablespace

src/backend/commands/dbcommands.c

copydir@src/backend/storage/file/copydir.c

每一个文件都需要fsync，量比较大。

3. rewrite table 或 create table as 或 copy from file or 刷新物化视图 when wal_level=minimal。

调用heap_sync :

src/include/access/xlog.h:

#define XLogIsNeeded() (wal_level >= WAL_LEVEL_ARCHIVE)

...

if (!XLogIsNeeded())

myState->hi_options |= HEAP_INSERT_SKIP_WAL;

...

/* If we skipped using WAL, must heap_sync before commit */

if (myState->hi_options & HEAP_INSERT_SKIP_WAL)

heap_sync(myState->rel);

4. 2pc事务文件

发生在WAL replay时。

RecreateTwoPhaseFile

5. 时间线文件

因为promote或者walreceiver接收到时间线文件，需要创建新的时间线文件时。

6. replication slot文件

创建slot时，需要在pg_replslot目录中创建对应的文件。

7. pg_clog, pg_multixact

* SlruCtlData is an unshared structure that points to the active information

* in shared memory.

typedef struct SlruCtlData

{

SlruShared shared;

* This flag tells whether to fsync writes (true for pg_clog and multixact

* stuff, false for pg_subtrans and pg_notify).

bool do_fsync;

* Decide which of two page numbers is "older" for truncation purposes. We

* need to use comparison of TransactionIds here in order to do the right

* thing with wraparound XID arithmetic.

bool (*PagePrecedes) (int, int);

* Dir is set during SimpleLruInit and does not change thereafter. Since

* it's always the same, it doesn't need to be in shared memory.

char Dir[64];

} SlruCtlData;

其他

......

Postgresql中国用户会

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录