Berkeley DB 源代码分析 (4) --- 事务和日志
1. in nested txns, when child txns of any level commit, __txn_child logs are always written, no matter the child txn commits explicitly or implicitly by its parent's commit.
2. Read only txns don't write anything into the log because the __txn_regop is only written if any log records are written before the commit.
3. when the outermost txn aborts, all its children aborts. At recovery, we can not figure out the parent-children relationship, and we don't need to know this, we simply regard all abort txns as indepent outermost txns, and undo their changes. However, during verification without this information, we may report false alarms: we may report a parant txn T0's aborted child T1 to be modifying a page owned by T0, since from the log we only know T1 aborted but can't know T1 is child txn of T0.
4. So we have to add a "__txn_begin" log record for each txn to record a txn's parent txnid. But it's impossible to add a "__txn_begin" log record only for txns that has written any thing, because we don't know the begin_lsn of a txn before its first log record is written into the log, and at the same time we need the begin_lsn to write the __txn_begin record. However if we don't record __txn_begin before any other logs of this txn (e.g. log it after the 1st record of this txn), we can avoid writing it for read only txns.
Before checkpointing, we write a __dbreg_register of type DBREG_CHKPNT for each db handle, and then write a __txn_ckp record; Before recovery completes, we write a __dbreg_register of type DBREG_RCLOSE for each db handle, and then write the __txn_ckp record, and followed by a __txn_recycle record.