环境
- 操作系统:
Ubuntu 20.04
- Db2:V11.5
➜ ~ db2level
DB21085I This instance or install (instance name, where applicable:
"db2inst1") uses "64" bits and DB2 code release "SQL11050" with level
identifier "0601010F".
Informational tokens are "DB2 v11.5.0.0", "s1906101300", "DYN1906101300AMD64",
and Fix Pack "0".
Product is installed at "/opt/ibm/db2/V11.5".
backup
offline backup
如下:
➜ ~ db2 backup db sample to /home/db2inst1/backup/
Backup successful. The timestamp for this backup image is : 20220401220942
查看backup目录:
➜ ~ ll backup
total 174M
-rw------- 1 db2inst1 db2iadm1 174M Apr 1 22:09 SAMPLE.0.db2inst1.DBPART000.20220401220942.001
注意,在做backup操作时,如果DB上有其它连接,则backup会报错:
➜ ~ db2 backup db sample to /home/db2inst1/backup/
SQL1035N The operation failed because the specified database cannot be
connected to in the mode requested. SQLSTATE=57019
查看 SQL1035N
错误,如下:
➜ ~ db2 ? SQL1035N
SQL1035N The operation failed because the specified database cannot be
connected to in the mode requested.
Explanation:
When a user connects to a database, the connection is in shared mode by
default. If another user is already connected to the same database in
exclusive mode, then the shared connection attempt will fail with this
message. Similarly, if a user attempts to access a database in exclusive
mode but the database is already being accessed in shared mode then the
connect attempt will fail with this message.
............
上面的error message说的很含糊,下面的解释倒还挺清楚。简单说就是backup需要exclusive mode,而当前有别的connection连到了DB,导致无法进入exclusive mode。
可以用 db2 list applications
来查看application:
➜ ~ db2 list applications
Auth Id Application Appl. Application Id DB # of
Name Handle Name Agents
-------- -------------- ---------- -------------------------------------------------------------- -------- -----
DB2INST1 db2jcc_applica 71 127.0.0.1.34354.220401141222 SAMPLE 1
可以把application force掉,然后再做backup:
➜ ~ db2 force application all
DB20000I The FORCE APPLICATION command completed successfully.
DB21024I This command is asynchronous and may not be effective immediately.
总结:要做offline backup,需要先force掉所有application。
online backup
如下:
➜ ~ db2 backup db sample online to /home/db2inst1/backup/
SQL2413N Online backup is not allowed because the database is not recoverable
or a backup pending condition is in effect.
又出错了。查看 SQL2413N
错误:
➜ ~ db2 ? SQL2413N
SQL2413N Online backup is not allowed because the database is not
recoverable or a backup pending condition is in effect.
Explanation:
Online backup cannot be performed while the database is not logging for
forward recovery as forward recovery will be required at restore time.
Forward recovery is placed into effect by setting either the database
configuration LOGARCHMETH1 or LOGARCHMETH2 and then performing an
offline backup of the database.
User response:
Execute an offline backup or reconfigure the database for roll-forward
recovery and issue an offline backup so that subsequent online backups
will be allowed.
翻译过来就是:要想做online backup,有2个步骤:
- 首先要enable “forward recovery” (通过配置
LOGARCHMETH1
或者LOGARCHMETH2
); - 然后再做一次offline backup;
如果 1
没做,就是 the database is not recoverable
。
如果 2
没做,就是 a backup pending condition is in effect
。
我们来查看DB的CFG参数配置:
➜ ~ db2 get db cfg for sample | grep -i log
Log retain for recovery status = NO
User exit for logging status = NO
Catalog cache size (4KB) (CATALOGCACHE_SZ) = (MAXAPPLS*5)
Log buffer size (4KB) (LOGBUFSZ) = 256
Active log space disk capacity (4KB) (LOG_DISK_CAP) = 0
Log file size (4KB) (LOGFILSIZ) = 1000
Number of primary log files (LOGPRIMARY) = 3
Number of secondary log files (LOGSECOND) = 10
Changed path to log files (NEWLOGPATH) =
Path to log files = /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/
Overflow log path (OVERFLOWLOGPATH) =
Mirror log path (MIRRORLOGPATH) =
First active log file =
Block log on disk full (BLK_LOG_DSK_FUL) = NO
Block non logged operations (BLOCKNONLOGGED) = NO
Percent max primary log space by transaction (MAX_LOG) = 0
Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0
Percent log file reclaimed before soft chckpt (SOFTMAX) = 0
HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATIC(0)
HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0
First log archive method (LOGARCHMETH1) = OFF
Archive compression for logarchmeth1 (LOGARCHCOMPR1) = OFF
Options for logarchmeth1 (LOGARCHOPT1) =
Second log archive method (LOGARCHMETH2) = OFF
Archive compression for logarchmeth2 (LOGARCHCOMPR2) = OFF
Options for logarchmeth2 (LOGARCHOPT2) =
Failover log archive path (FAILARCHPATH) =
Number of log archive retries on error (NUMARCHRETRY) = 5
Log archive retry Delay (secs) (ARCHRETRYDELAY) = 20
Log pages during index build (LOGINDEXBUILD) = OFF
Log DDL Statements (LOG_DDL_STMTS) = NO
Log Application Information (LOG_APPL_INFO) = NO
可见, LOGARCHMETH1
和 LOGARCHMETH2
都是 OFF
。
我们来把 LOGARCHMETH1
打开:
➜ ~ db2 update db cfg for sample using LOGARCHMETH1 'disk:/home/db2inst1/arch'
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
➜ ~ db2 get db cfg for sample | grep -i LOGARCHMETH1
First log archive method (LOGARCHMETH1) = DISK:/home/db2inst1/arch/
Archive compression for logarchmeth1 (LOGARCHCOMPR1) = OFF
Options for logarchmeth1 (LOGARCHOPT1) =
相当于给DB设置了一个archive log目录。
接下来再做一次offline backup:
➜ ~ db2 backup db sample to /home/db2inst1/backup/
Backup successful. The timestamp for this backup image is : 20220401231046
现在,就可以做online backup了:
➜ ~ db2 backup db sample online to /home/db2inst1/backup/
Backup successful. The timestamp for this backup image is : 20220401231203
即使有其它connection连接到DB,也能做online backup。
注:我今天早些时候做测试的时候,还做过一次 db2 archive log for db <db name>
操作,online backup才成功(当时online backup总不成功,所以乱七八糟做了一堆操作)。但我现在觉得跟这个操作没有关联。
总结:要做online backup,需要先设置 LOGARCHMETH1
,并做一次offline backup。
注:做online backup时,最好把log也加进来( include logs
),以便restore之后做rollforward操作。具体细节详见下面的restore部分。
查看backup
命令为: db2 list history backup all for <DB name>
。例如:
➜ ~ db2 list history backup all for sample
List History File for sample
Number of matching file entries = 5
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
B D 20220401220942001 F D S0000000.LOG S0000000.LOG
----------------------------------------------------------------------------
Contains 5 tablespace(s):
00001 SYSCATSPACE
00002 USERSPACE1
00003 IBMDB2SAMPLEREL
00004 IBMDB2SAMPLEXML
00005 SYSTOOLSPACE
----------------------------------------------------------------------------
Comment: DB2 BACKUP SAMPLE OFFLINE
Start Time: 20220401220942
End Time: 20220401220947
Status: A
----------------------------------------------------------------------------
EID: 4 Location: /home/db2inst1/backup
............
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
B D 20220401231703001 N D S0000002.LOG S0000002.LOG
----------------------------------------------------------------------------
Contains 5 tablespace(s):
00001 SYSCATSPACE
00002 USERSPACE1
00003 IBMDB2SAMPLEREL
00004 IBMDB2SAMPLEXML
00005 SYSTOOLSPACE
----------------------------------------------------------------------------
Comment: DB2 BACKUP SAMPLE ONLINE
Start Time: 20220401231703
End Time: 20220401231709
Status: A
----------------------------------------------------------------------------
EID: 12 Location: /home/db2inst1/backup
可见一共有5次backup(其中有失败的),上面的结果内容做了节选,只列出了其中2个,一个offline backup,一个online backup。
当然,也可以直接到backup路径下查看backup文件,例如:
➜ ~ ll /home/db2inst1/backup
total 684M
-rw------- 1 db2inst1 db2iadm1 174M Apr 1 22:09 SAMPLE.0.db2inst1.DBPART000.20220401220942.001
-rw------- 1 db2inst1 db2iadm1 174M Apr 1 23:10 SAMPLE.0.db2inst1.DBPART000.20220401231046.001
-rw------- 1 db2inst1 db2iadm1 169M Apr 1 23:12 SAMPLE.0.db2inst1.DBPART000.20220401231203.001
-rw------- 1 db2inst1 db2iadm1 169M Apr 1 23:17 SAMPLE.0.db2inst1.DBPART000.20220401231703.001
可见,文件名包含了 DB name
、 timestamp
等信息。
restore
从offline backup做restore
命令为: db2 restore db <DB name> from <backup path> taken at <timestamp> into <new DB name>
。其中:
<DB name>
:做backup的DB名字,例如sample
;<backup path>
:做backup的路径,例如/home/db2inst1/backup
;<timestamp>
:做backup的时间戳,可以从db2 list history backup all for <DB name>
命令的结果里查看时间戳,或者直接到backup路径下,从backup的文件名里查看;<new DB name>
:起个新名字;
例如:
➜ ~ db2 restore db sample from /home/db2inst1/backup taken at 20220401220942 into sample2
DB20000I The RESTORE DATABASE command completed successfully.
现在,就可以连接到DB了:
➜ ~ db2 connect to sample2
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2INST1
Local database alias = SAMPLE2
➜ ~ db2 "select * from t1"
C1 C2
----------- -----------
1 111
2 222
2 record(s) selected.
从online backup做restore
第一步同上:
➜ ~ db2 restore db sample from /home/db2inst1/backup taken at 20220401220942 into sample4
DB20000I The RESTORE DATABASE command completed successfully.
但是此时尝试连接 sample4
会报错:
➜ ~ db2 connect to sample4
SQL1117N A connection to or activation of database "SAMPLE4" cannot be made
because of ROLL-FORWARD PENDING. SQLSTATE=57019
DB目前处于 ROLL-FORWARD PENDING
状态,需要做一下 rollforward
操作。
rollforward,就是把transaction log再apply一遍。
其实想一下就能理解:
- 做offline backup的时候,所有东西都在backup文件里了,所以只需从backup文件做restore就行;
- 做online backup的时候,在backup的同时,DB本身还在工作,所以backup文件本身的信息并不全,其它信息(transaction log)只能从log里取了。总结就是 backup + log才能完整的restore;
更进一步,我们也能理解为什么online backup一定要先设置 LOGARCHMETH1
(即archive log),因为Db2 默认是circular log,也就是循环使用log,因此后面的log可能会覆盖前面的log,这样的话,Db2就认为DB处于不可恢复(not recoverable)的状态,这就是默认状态下online backup报错的原因。
回到rollforward操作,命令为: db2 "rollforward db <DB name> to end of logs and complete overflow log path (<archive log path>)"
:
<DB name>
:做rollforward的DB名字,本例中为sample4
;to end of logs
:必选值,若换成别的值(比如to end of backup
),会报错,提示必须用to end of logs
;and complete
:必选值,否则虽然rollforward命令能成功,但是DB仍然处于ROLL-FORWARD PENDING
状态;overflow log path (<archive log path>)
:必选值。注意在前面做online backup的时候,设置过的LOGARCHMETH1
,也就是archive log的路径,此处就用该值(或者restore时释放出来的log路径)。
具体例子如下:
➜ ~ db2 rollforward db sample4 to end of logs and complete overflow log path (/home/db2inst1/arch/db2inst1/SAMPLE)
Rollforward Status
Input database alias = sample4
Number of members have returned status = 1
Member ID = 0
Rollforward status = not pending
Next log file to be read =
Log files processed = S0000002.LOG - S0000005.LOG
Last committed transaction = 2022-04-05-02.32.14.000000 UTC
DB20000I The ROLLFORWARD command completed successfully.
现在,就可以连接到DB了:
➜ ~ db2 connect to sample4
Database Connection Information
Database server = DB2/LINUXX8664 11.5.0.0
SQL authorization ID = DB2INST1
Local database alias = SAMPLE4
➜ ~ db2 "select * from t1"
C1 C2
----------- -----------
1 111
2 222
2 record(s) selected.
rollforward的log来源
对于online backup,在restore之后,需要做rollforward,而rollforward需要指定log路径。在上面的例子中,指定的log目录是源DB的log目录,那么现在问题来了:如果是在其它server上做restore操作,无法指定源DB的log目录,怎么办呢?
解决办法是,Db2 backup 命令有一个 INCLUDE LOGS
选项:
db2 backup db sample online to /home/db2inst1/backup/ include logs
OK,现在backup文件中就包含rollforward所需的log了。
在做restore时,指定一个空目录来存放log:
db2 restore db sample from /home/db2inst1/backup taken at 20220405120719 into sample6 logtarget /home/db2inst1/backup/20220405120719
最后,在做rollforward的时候,指定该目录:
db2 rollforward db sample6 to end of logs and complete overflow log path (/home/db2inst1/backup/20220405120719)
这样就行了。
这种做法显然通用性更好,推荐使用。
参考
- https://www.ibm.com/docs/en/db2/11.5?topic=commands-backup-database
- https://www.ibm.com/docs/en/db2/11.5?topic=recovery-rollforward