galera mysql docker话_bug排查之空想导致的祸之MariaDB Galera Cluster的docker镜像构建中的遇到的坑爹事...

云云众生w

于 2021-03-04 02:45:37 发布

阅读量771

点赞数

文章标签： galera mysql docker话

本文链接：https://blog.csdn.net/weixin_33895572/article/details/114995340

版权

摘要

唉，技术不行就不能装逼，不然就我这样，自己把自己搞死了。什么事情都需要用实际验证之后才能下结论，不能用经验主义和理论主义。

之前自己构建了一个Docker的MariaDB Galera Cluster的镜像，测试的时候我只测试了普通功能，集群功能其实自己没测试，只是想当然的用自己之前的经验认为是OK的，谁知道今天测试的时候还真发现问题了，

1、第一次发现我在构建镜像的时候系统上没装which命令，估计你会觉得搞笑，数据库要which命令干什么，我也呵呵。。。看下面截图

既然要用Garela那么wsrep_sst_common肯定要调用的so。。。。you need have which command。

2、第二次发现构建的镜像没有rsync命令，有些人要说了这又是什么鬼。。。看Galera官方的参数说明来说事情。其实这个问题要不是自己为了装逼也不会呈现出来

为了装逼我在构建MariaDB Galera Cluster镜像的时候把method的参数都给做了变量支持了，不在局限于默认的mysqldump模式。。

3、第三次是个严重的问题了，不想详细说了，先看看err日志吧

[root@docker-node01 ~]# cat /data/mariadb/mysql-error.log

151119 22:19:33 [Note] WSREP: Read nil XID from storage engines, skipping position init

151119 22:19:33 [Note] WSREP: wsrep_load(): loading provider library 'none'

151119 22:19:33 [Note] InnoDB: Using mutexes to ref count buffer pool pages

151119 22:19:33 [Note] InnoDB: The InnoDB memory heap is disabled

151119 22:19:33 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

151119 22:19:33 [Note] InnoDB: Memory barrier is not used

151119 22:19:33 [Note] InnoDB: Compressed tables use zlib 1.2.3

151119 22:19:33 [Note] InnoDB: Using CPU crc32 instructions

151119 22:19:33 [Note] InnoDB: Initializing buffer pool, size = 128.0M

151119 22:19:33 [Note] InnoDB: Completed initialization of buffer pool

151119 22:19:33 [Note] InnoDB: The first specified data file ./ibdata1 did not exist: a new database to be created!

151119 22:19:33 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB

151119 22:19:33 [Note] InnoDB: Database physically writes the file full: wait...

151119 22:19:33 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB

151119 22:19:33 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB

151119 22:19:33 [Note] InnoDB: Setting log file ./ib_logfile2 size to 48 MB

151119 22:19:34 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0

151119 22:19:34 [Warning] InnoDB: New log files created, LSN=45781

151119 22:19:34 [Note] InnoDB: Doublewrite buffer not found: creating new

151119 22:19:34 [Note] InnoDB: Doublewrite buffer created

151119 22:19:34 [Note] InnoDB: 128 rollback segment(s) are active.

151119 22:19:34 [Warning] InnoDB: Creating foreign key constraint system tables.

151119 22:19:34 [Note] InnoDB: Foreign key constraint system tables created

151119 22:19:34 [Note] InnoDB: Creating tablespace and datafile system tables.

151119 22:19:34 [Note] InnoDB: Tablespace and datafile system tables created.

151119 22:19:34 [Note] InnoDB: Waiting for purge to start

151119 22:19:34 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 0

151119 22:19:34 [Warning] Failed to load slave replication state from table mysql.gtid_slave_pos: 1146: Table 'mysql.gtid_slave_pos' doesn't exist

151119 22:19:34 [Note] WSREP: Service disconnected.

151119 22:19:35 [Note] WSREP: Some threads may fail to exit.

151119 22:19:35 [Note] InnoDB: FTS optimize thread exiting.

151119 22:19:35 [Note] InnoDB: Starting shutdown...

151119 22:19:37 [Note] InnoDB: Shutdown completed; log sequence number 1616697

151119 22:19:37 [Note] WSREP: Read nil XID from storage engines, skipping position init

151119 22:19:37 [Note] WSREP: wsrep_load(): loading provider library 'none'

151119 22:19:37 [Note] InnoDB: Using mutexes to ref count buffer pool pages

151119 22:19:37 [Note] InnoDB: The InnoDB memory heap is disabled

151119 22:19:37 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

151119 22:19:37 [Note] InnoDB: Memory barrier is not used

151119 22:19:37 [Note] InnoDB: Compressed tables use zlib 1.2.3

151119 22:19:37 [Note] InnoDB: Using CPU crc32 instructions

151119 22:19:37 [Note] InnoDB: Initializing buffer pool, size = 128.0M

151119 22:19:37 [Note] InnoDB: Completed initialization of buffer pool

151119 22:19:37 [Note] InnoDB: Highest supported file format is Barracuda.

151119 22:19:37 [Note] InnoDB: 128 rollback segment(s) are active.

151119 22:19:37 [Note] InnoDB: Waiting for purge to start

151119 22:19:37 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 1616697

151119 22:19:37 [Note] WSREP: Service disconnected.

151119 22:19:38 [Note] WSREP: Some threads may fail to exit.

151119 22:19:38 [Note] InnoDB: FTS optimize thread exiting.

151119 22:19:38 [Note] InnoDB: Starting shutdown...

151119 22:19:40 [Note] InnoDB: Shutdown completed; log sequence number 1616707

151119 22:19:40 [Note] WSREP: Read nil XID from storage engines, skipping position init

151119 22:19:40 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'

151119 22:19:41 [Note] WSREP: wsrep_load(): Galera 25.3.9(r3385) by Codership Oy loaded successfully.

151119 22:19:41 [Note] WSREP: CRC-32C: using hardware acceleration.

151119 22:19:41 [Warning] WSREP: Could not open saved state file for reading: /data/mariadb//grastate.dat

151119 22:19:41 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

151119 22:19:41 [Note] WSREP: Passing config to GCS: base_host = IP隐藏1; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /data/mariadb/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /data/mariadb//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recover

151119 22:19:41 [Note] WSREP: Service thread queue flushed.

151119 22:19:41 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

151119 22:19:41 [Note] WSREP: wsrep_sst_grab()

151119 22:19:41 [Note] WSREP: Start replication

151119 22:19:41 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

151119 22:19:41 [Note] WSREP: protonet asio version 0

151119 22:19:41 [Note] WSREP: Using CRC-32C for message checksums.

151119 22:19:41 [Note] WSREP: backend: asio

151119 22:19:41 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)

151119 22:19:41 [Note] WSREP: restore pc from disk failed

151119 22:19:41 [Note] WSREP: GMCast version 0

151119 22:19:41 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

151119 22:19:41 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

151119 22:19:41 [Note] WSREP: EVS version 0

151119 22:19:41 [Note] WSREP: gcomm: connecting to group '*********1', peer 'IP隐藏1:,IP隐藏2：,IP隐藏2:'

151119 22:19:41 [Warning] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') address 'tcp://IP隐藏:4567' points to own listening address, blacklisting

151119 22:19:41 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') address 'tcp://IP隐藏:4567' pointing to uuid 92b4ea9d is blacklisted, skipping

151119 22:19:41 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://IP隐藏:4567

151119 22:19:41 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') address 'tcp://IP隐藏:4567' pointing to uuid 92b4ea9d is blacklisted, skipping

151119 22:19:42 [Note] WSREP: declaring 4580556b at tcp://IP隐藏:4567 stable

151119 22:19:42 [Note] WSREP: declaring 64914067 at tcp://IP隐藏:4567 stable

151119 22:19:42 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') address 'tcp://IP隐藏:4567' pointing to uuid 92b4ea9d is blacklisted, skipping

151119 22:19:42 [Note] WSREP: Node 4580556b state prim

151119 22:19:42 [Note] WSREP: (92b4ea9d, 'tcp://0.0.0.0:4567') address 'tcp://IP隐藏:4567' pointing to uuid 92b4ea9d is blacklisted, skipping

151119 22:19:42 [Note] WSREP: view(view_id(PRIM,4580556b,48) memb {

4580556b,0

64914067,0

92b4ea9d,0

} joined {

} left {

} partitioned {

})

151119 22:19:42 [Note] WSREP: save pc into disk

151119 22:19:42 [Note] WSREP: discarding pending addr without UUID: tcp://IP隐藏:4567

151119 22:19:42 [Note] WSREP: discarding pending addr proto entry 0x1d16450

151119 22:19:42 [Note] WSREP: gcomm: connected

151119 22:19:42 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

151119 22:19:42 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

151119 22:19:42 [Note] WSREP: Opened channel 'LegionMariadbGaleraCluster1'

151119 22:19:42 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3

151119 22:19:42 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

151119 22:19:42 [Note] WSREP: Waiting for SST to complete.

151119 22:19:42 [Note] WSREP: STATE EXCHANGE: sent state msg: 936ecf63-8ec8-11e5-bdc6-5f35b34c1bd4

151119 22:19:42 [Note] WSREP: STATE EXCHANGE: got state msg: 936ecf63-8ec8-11e5-bdc6-5f35b34c1bd4 from 0 (node1)

151119 22:19:42 [Note] WSREP: STATE EXCHANGE: got state msg: 936ecf63-8ec8-11e5-bdc6-5f35b34c1bd4 from 1 (node3)

151119 22:19:42 [Note] WSREP: STATE EXCHANGE: got state msg: 936ecf63-8ec8-11e5-bdc6-5f35b34c1bd4 from 2 (node10)

151119 22:19:42 [Note] WSREP: Quorum results:

version = 3,

component = PRIMARY,

conf_id = 43,

members = 2/3 (joined/total),

act_id = 113579,

last_appl. = -1,

protocols = 0/7/3 (gcs/repl/appl),

group UUID = 92a4834f-7ee8-11e5-963f-a24c2a696910

151119 22:19:42 [Note] WSREP: Flow-control interval: [28, 28]

151119 22:19:42 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 113579)

151119 22:19:42 [Note] WSREP: State transfer required:

Group state: 92a4834f-7ee8-11e5-963f-a24c2a696910:113579

Local state: 00000000-0000-0000-0000-000000000000:-1

151119 22:19:42 [Note] WSREP: New cluster view: global state: 92a4834f-7ee8-11e5-963f-a24c2a696910:113579, view# 44: Primary, number of nodes: 3, my index: 2, protocol version 3

151119 22:19:42 [Warning] WSREP: Gap in state sequence. Need state transfer.

151119 22:19:42 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address 'IP隐藏' --auth 'cluster:nicaia' --datadir '/data/mariadb/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --parent '1' --binlog 'mysql-bin' '

151119 22:19:43 [Note] WSREP: Prepared SST request: rsync|IP隐藏:4444/rsync_sst

151119 22:19:43 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

151119 22:19:43 [Note] WSREP: REPL Protocols: 7 (3, 2)

151119 22:19:43 [Note] WSREP: Service thread queue flushed.

151119 22:19:43 [Note] WSREP: Assign initial position for certification: 113579, protocol version: 3

151119 22:19:43 [Note] WSREP: Service thread queue flushed.

151119 22:19:43 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (92a4834f-7ee8-11e5-963f-a24c2a696910): 1 (Operation not permitted)

at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.

151119 22:19:43 [Note] WSREP: Member 2.0 (node10) requested state transfer from '*any*'. Selected 0.0 (node1)(SYNCED) as donor.

151119 22:19:43 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 113579)

151119 22:19:43 [Note] WSREP: Requesting state transfer: success, donor: 0

151119 22:19:43 [Warning] WSREP: 0.0 (node1): State transfer to 2.0 (node10) failed: -255 (Unknown error 255)

151119 22:19:43 [ERROR] WSREP: gcs/src/gcs_group.cpp:int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():731: Will never receive state. Need to abort.

151119 22:19:43 [Note] WSREP: gcomm: terminating thread

151119 22:19:43 [Note] WSREP: gcomm: joining thread

151119 22:19:43 [Note] WSREP: gcomm: closing backend

151119 22:19:44 [Note] WSREP: view(view_id(NON_PRIM,4580556b,48) memb {

92b4ea9d,0

} joined {

} left {

} partitioned {

4580556b,0

64914067,0

})

151119 22:19:44 [Note] WSREP: view((empty))

151119 22:19:44 [Note] WSREP: gcomm: closed

151119 22:19:44 [Note] WSREP: mysqld: Terminated.

151119 22:19:44 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.0.21-MariaDB-wsrep-log

key_buffer_size=0

read_buffer_size=524288

max_used_connections=0

max_threads=102

thread_count=2

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 106485 K bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x48000

addr2line: 'mysqld': No such file

mysqld(my_print_stacktrace+0x29)[0xae93f9]

mysqld(handle_fatal_signal+0x40a)[0x70005a]

/lib64/libpthread.so.0(+0xf790)[0x7f8a67dcd790]

/lib64/libc.so.6(abort+0xd4)[0x7f8a66415d64]

/usr/lib64/galera/libgalera_smm.so(+0xd0ba7)[0x7f8a62a30ba7]

/usr/lib64/galera/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x2e9)[0x7f8a62b29089]

/usr/lib64/galera/libgalera_smm.so(+0x1cf630)[0x7f8a62b2f630]

/lib64/libpthread.so.0(+0x7a51)[0x7f8a67dc5a51]

/lib64/libc.so.6(clone+0x6d)[0x7f8a664ca93d]

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

这他妈逆天到了要提交bug去官方么？，我还就不信邪，于是慢慢的各种排查。。。。最后定位问题出现在了MySQL初始化sql语句上，如果之前有人看了我构建MariaDB Galera cluster镜像的三个文件，那么会发现MariaDB的初始化是用--init-file=参数带sql脚本来实现的，其中做sql安全配置的第一段就是

-- What's done in this file shouldn't be replicated

-- or products like mysql-fabric won't work

SET @@SESSION.SQL_LOG_BIN=0;

DELETE FROM mysql.user;

CREATE USER 'root'@'%' IDENTIFIED BY '${MYSQL_ROOT_PASSWORD}';

GRANT ALL ON *.* TO 'root'@'%' WITH GRANT OPTION;

DROP DATABASE IF EXISTS test;

问题就是出在刚买命令的高亮行上。不多解释了，有兴趣的朋友可以自己试试，为什么会出现问题,最后我把这地调整成为了

-- What's done in this file shouldn't be replicated

-- or products like mysql-fabric won't work

SET @@SESSION.SQL_LOG_BIN=0;

DELETE FROM mysql.user;

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '${MYSQL_ROOT_PASSWORD}' WITH GRANT OPTION;

DROP DATABASE IF EXISTS test;

然后重新构建镜像，于是好了

docker run -d --name=mariadb_cluster_node3 \

-p 3306:3306 -p 4444:4444 -p 4567-4568:4567-4568 \

-p 4444:4444/udp -p 4567-4568:4567-4568/udp \

-v /data/mariadb:/data/mariadb \

-e TIMEZONE=Asia/Shanghai \

-e MYSQL_ROOT_PASSWORD=nicaia \

-e REPLICATION_USERNAME=cluster \

-e REPLICATION_PASSWORD=nicaia \

-e MAX_CONNECTIONS=20 \

-e LOG_BIN=mysql-bin \

-e SYNC_BINLOG=1 \

-e GALERA=ON \

-e NODE_NAME=node11 \

-e CLUSTER_METHOD=rsync \

-e CLUSTER_NAME=nicaia \

-e CLUSTER_NODE_ADDRESS=1.2.3.4 \

-e CLUSTER_ADDRESS="gcomm://1.2.3.4,1.2.3.5,1.2.3.6" \

benyoo/mariadb_build:latest

这是我最后成功的启动命令

云云众生w

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
galera mysql docker话_bug排查之空想导致的祸之MariaDB Galera Cluster的docker镜像构建中的遇到的坑爹事...

摘要唉，技术不行就不能装逼，不然就我这样，自己把自己搞死了。什么事情都需要用实际验证之后才能下结论，不能用经验主义和理论主义。之前自己构建了一个Docker的MariaDB Galera Cluster的镜像，测试的时候我只测试了普通功能，集群功能其实自己没测试，只是想当然的用自己之前的经验认为是OK的，谁知道今天测试的时候还真发现问题了，1、第一次发现我在构建镜像的时候系统上没装which命令，...
复制链接

扫一扫