mysql - galera 故障恢复

因机房断电, mysql 集群重启后, mysql.galera 无法完成主从同步

故障日志如下

170703 00:44:14 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
170703 00:44:14 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.ImJsjZ' --pid-file='/var/lib/mysql/hh-yun-db-129042.-
recover.pid'
170703 00:44:22 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
170703  0:44:22 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
170703  0:44:22 [Note] WSREP: Read nil XID from storage engines, skipping position init
170703  0:44:22 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
170703  0:44:22 [Note] WSREP: wsrep_load(): Galera 3.5(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
170703  0:44:22 [Note] WSREP: CRC-32C: using hardware acceleration.
170703  0:44:22 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
170703  0:44:22 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
170703  0:44:22 [Note] WSREP: Passing config to GCS: base_host = 240.10.129.42; base_port = 4567; cert.log_conflicts = no; debug = no; evs.inactive_check_period = PT0.5S; ev
s.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; ev
s.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.
cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_
throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout
= PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1
; proton
170703  0:44:22 [Note] WSREP: Service thread queue flushed.
170703  0:44:22 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
170703  0:44:22 [Note] WSREP: wsrep_sst_grab()
170703  0:44:22 [Note] WSREP: Start replication
170703  0:44:22 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
170703  0:44:22 [Note] WSREP: protonet asio version 0
170703  0:44:22 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
170703  0:44:22 [Note] WSREP: backend: asio
170703  0:44:22 [Note] WSREP: GMCast version 0
170703  0:44:22 [Note] WSREP: (b3573054-5f45-11e7-964d-3a604abdf474, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
170703  0:44:22 [Note] WSREP: (b3573054-5f45-11e7-964d-3a604abdf474, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
170703  0:44:22 [Note] WSREP: EVS version 0
170703  0:44:22 [Note] WSREP: PC version 0
170703  0:44:22 [Note] WSREP: gcomm: connecting to group 'yun_clutster', peer '240.10.129.41:'
170703  0:44:22 [Note] WSREP: declaring 456416f2-5f43-11e7-9f2c-7e2e30ef17dd stable
170703  0:44:22 [Note] WSREP: Node 456416f2-5f43-11e7-9f2c-7e2e30ef17dd state prim
170703  0:44:22 [Note] WSREP: view(view_id(PRIM,456416f2-5f43-11e7-9f2c-7e2e30ef17dd,2) memb {
        456416f2-5f43-11e7-9f2c-7e2e30ef17dd,0
        b3573054-5f45-11e7-964d-3a604abdf474,0
} joined {
} left {
} partitioned {
})
170703  0:44:23 [Note] WSREP: gcomm: connected
170703  0:44:23 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
170703  0:44:23 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
170703  0:44:23 [Note] WSREP: Opened channel 'yun_clutster'
170703  0:44:23 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
170703  0:44:23 [Note] WSREP: Waiting for SST to complete.
170703  0:44:23 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
170703  0:44:23 [Note] WSREP: STATE EXCHANGE: sent state msg: b368f83d-5f45-11e7-b169-2e0622f98c98
170703  0:44:23 [Note] WSREP: STATE EXCHANGE: got state msg: b368f83d-5f45-11e7-b169-2e0622f98c98 from 0 (hh-yun-db-129041.)
170703  0:44:23 [Note] WSREP: STATE EXCHANGE: got state msg: b368f83d-5f45-11e7-b169-2e0622f98c98 from 1 (hh-yun-db-129042.)
170703  0:44:23 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 1,
        members    = 1/2 (joined/total),
        act_id     = 3524,
        last_appl. = -1,
        protocols  = 0/5/2 (gcs/repl/appl),
        group UUID = 45649783-5f43-11e7-a935-3e74e94aac23
170703  0:44:23 [Note] WSREP: Flow-control interval: [23, 23]
170703  0:44:23 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3524)
170703  0:44:23 [Note] WSREP: State transfer required:
        Group state: 45649783-5f43-11e7-a935-3e74e94aac23:3524
        Local state: 00000000-0000-0000-0000-000000000000:-1
170703  0:44:23 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:3524, view# 2: Primary, number of nodes: 2, my index: 1, protocol version
2
170703  0:44:23 [Warning] WSREP: Gap in state sequence. Need state transfer.
170703  0:44:25 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'xxxx.42' --auth 'xxxxx:xxxxx' --datadir '/var/lib/mysql/' --defaults-fil
e '/etc/my.cnf' --parent '162980''
170703  0:44:25 [Note] WSREP: Prepared SST request: rsync|240.10.129.42:4444/rsync_sst
170703  0:44:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170703  0:44:25 [Note] WSREP: REPL Protocols: 5 (3, 1)
170703  0:44:25 [Note] WSREP: Service thread queue flushed.
170703  0:44:25 [Note] WSREP: Assign initial position for certification: 3524, protocol version: 3
170703  0:44:25 [Note] WSREP: Service thread queue flushed.
170703  0:44:25 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (4
5649783-5f43-11e7-a935-3e74e94aac23): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():447. IST will be unavailable.
170703  0:44:25 [Note] WSREP: Member 1.0 (hh-yun-db-129042.) requested state transfer from '*any*'. Selected 0.0 (hh-yun-db-129041.)(SYNCED) as donor.
170703  0:44:25 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3524)
170703  0:44:25 [Note] WSREP: Requesting state transfer: success, donor: 0
170703  0:45:12 [Note] WSREP: 0.0 (hh-yun-db-129041.): State transfer to 1.0 (hh-yun-db-129042.) complete.
170703  0:45:12 [Note] WSREP: Member 0.0 (hh-yun-db-129041.) synced with group.
WSREP_SST: [INFO] Joiner cleanup. (20170703 00:45:13.006)
WSREP_SST: [INFO] Joiner cleanup done. (20170703 00:45:14.276)
170703  0:45:14 [Note] WSREP: SST complete, seqno: 3524
170703  0:45:14 InnoDB: The InnoDB memory heap is disabled
170703  0:45:14 InnoDB: Mutexes and rw_locks use GCC atomic builtins
170703  0:45:14 InnoDB: Compressed tables use zlib 1.2.3
170703  0:45:14 InnoDB: Using Linux native AIO
170703  0:45:14 InnoDB: Initializing buffer pool, size = 16.0G
170703  0:45:15 InnoDB: Completed initialization of buffer pool
170703  0:45:15 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
170703  0:45:16  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
170703  0:45:16  InnoDB: Waiting for the background threads to start
170703  0:45:17 Percona XtraDB (http://www.percona.com) 5.5.36-MariaDB-33.0 started; log sequence number 3781527514
170703  0:45:17 [Note] Plugin 'FEEDBACK' is disabled.
170703  0:45:17 [Note] Server socket created on IP: '0.0.0.0'.
170703  0:45:17 [Note] Event Scheduler: Loaded 0 events
170703  0:45:17 [Note] WSREP: Signalling provider to continue.
170703  0:45:17 [Note] WSREP: SST received: 45649783-5f43-11e7-a935-3e74e94aac23:3524
170703  0:45:17 [Note] WSREP: 1.0 (hh-yun-db-129042.): State transfer from 0.0 (hh-yun-db-129041.) complete.
170703  0:45:17 [Note] WSREP: Shifting JOINER -> JOINED (TO: 3524)
170703  0:45:17 [Note] WSREP: Member 1.0 (hh-yun-db-129042.) synced with group.
170703  0:45:17 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3524)
170703  0:45:17 [Note] WSREP: Synchronized with group, ready for connections
170703  0:45:17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170703  0:45:17 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.36-MariaDB-wsrep'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server, wsrep_25.9.r3961
170703  1:08:22 [Note] WSREP: Node b3573054-5f45-11e7-964d-3a604abdf474 state prim
170703  1:08:22 [Note] WSREP: view(view_id(PRIM,b3573054-5f45-11e7-964d-3a604abdf474,3) memb {
        b3573054-5f45-11e7-964d-3a604abdf474,0
} joined {
} left {
} partitioned {
        456416f2-5f43-11e7-9f2c-7e2e30ef17dd,0
})
170703  1:08:22 [Note] WSREP: forgetting 456416f2-5f43-11e7-9f2c-7e2e30ef17dd (tcp://240.10.129.41:4567)
170703  1:08:22 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
170703  1:08:22 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e
170703  1:08:22 [Note] WSREP: STATE EXCHANGE: sent state msg: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e
170703  1:08:22 [Note] WSREP: STATE EXCHANGE: got state msg: 0d6d078d-5f49-11e7-bba6-bac92eaecf2e from 0 (hh-yun-db-129042)
170703  1:08:22 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 2,
        members    = 1/1 (joined/total),
        act_id     = 69983,
        last_appl. = 69915,
        protocols  = 0/5/2 (gcs/repl/appl),
        group UUID = 45649783-5f43-11e7-a935-3e74e94aac23
170703  1:08:22 [Note] WSREP: Flow-control interval: [16, 16]
170703  1:08:22 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:69983, view# 3: Primary, number of nodes: 1, my index: 0, protocol version
 2
170703  1:08:22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170703  1:08:22 [Note] WSREP: REPL Protocols: 5 (3, 1)
170703  1:08:22 [Note] WSREP: Service thread queue flushed.
170703  1:08:22 [Note] WSREP: Assign initial position for certification: 69983, protocol version: 3
170703  1:08:22 [Note] WSREP: Service thread queue flushed.
170703  1:08:22 [Warning] WSREP: Releasing seqno 69983 before 69984 was assigned.
170703  1:08:27 [Note] WSREP:  cleaning up 456416f2-5f43-11e7-9f2c-7e2e30ef17dd (tcp://240.10.129.41:4567)
170703 11:29:43 [Note] /usr/libexec/mysqld: Normal shutdown

170703 11:29:43 [Note] WSREP: Stop replication
170703 11:29:43 [Note] WSREP: Closing send monitor...
170703 11:29:43 [Note] WSREP: Closed send monitor.
170703 11:29:43 [Note] WSREP: gcomm: terminating thread
170703 11:29:43 [Note] WSREP: gcomm: joining thread
170703 11:29:43 [Note] WSREP: gcomm: closing backend
170703 11:29:43 [Note] WSREP: view((empty))
170703 11:29:43 [Note] WSREP: Received self-leave message.
170703 11:29:43 [Note] WSREP: gcomm: closed
170703 11:29:43 [Note] WSREP: Flow-control interval: [0, 0]
170703 11:29:43 [Note] WSREP: Received SELF-LEAVE. Closing connection.
170703 11:29:43 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 69983)
170703 11:29:43 [Note] WSREP: RECV thread exiting 0: Success
170703 11:29:43 [Note] WSREP: recv_thread() joined.
170703 11:29:43 [Note] WSREP: New cluster view: global state: 45649783-5f43-11e7-a935-3e74e94aac23:69983, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol v
ersion 2
170703 11:29:43 [Note] WSREP: Closing replication queue.
170703 11:29:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
170703 11:29:43 [Note] WSREP: Closing slave action queue.
170703 11:29:43 [Note] WSREP: applier thread exiting (code:0)
170703 11:29:43 [Note] WSREP: applier thread exiting (code:6)
170703 11:29:43 [Note] WSREP: applier thread exiting (code:6)
170703 11:29:43 [Note] WSREP: applier thread exiting (code:6)

故障解决可尝试通过下面两种方法进行

1. 需进行数据同步的事务数量太大, 因此, 需要一个比较大的  innodb_log file, 假如之前的 log file 太小, 是无法正常恢复到集群同步状态
2. 在执行集群恢复期间,  需要使用参数 tc-heuristic-recover=ROLLBACK, 对需要恢复的事务执行 ROOLBACK 操作

操作

修改 /etc/my.cnf
innodb-log-file-size = 2048M
tc-heuristic-recover=ROLLBACK

对于日志, 需要执行手动把日志文件删除操作, 在启动服务器时候, MySQL 会自动重新创建响应的日志文件

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Terry_Tsang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值