这是一个Percona Server8.0.22下使用MGR+ProxySQL时遇到的bug
使用mydumper备份mgr时发现ProxySQL报错了
# cat proxysql.log | grep -i subquery
2021-04-28 15:14:22 MySQL_HostGroups_Manager.cpp:3875:update_group_replication_set_offline(): [WARNING] Group Replication: setting host 172.16.23.224:3309 offline because: Subquery returns more than 1 row
2021-04-28 15:14:22 MySQL_Monitor.cpp:1472:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd62cee3540 , MYSQL 0x7fd62b804600 , FD 39 : Subquery returns more than 1 row
2021-04-28 15:14:27 MySQL_Monitor.cpp:1472:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd62cee1440 , MYSQL 0x7fd62b800000 , FD 41 : Subquery returns more than 1 row
2021-04-28 15:17:37 MySQL_HostGroups_Manager.cpp:3875:update_group_replication_set_offline(): [WARNING] Group Replication: setting host 172.16.23.151:3309 offline because: Subquery returns more than 1 row
2021-04-28 15:17:37 MySQL_Monitor.cpp:1472:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd62cee78c0 , MYSQL 0x7fd62bc06400 , FD 51 : Subquery returns more than 1 row
2021-04-28 15:17:42 MySQL_Monitor.cpp:1472:monitor_group_replication_thread(): [ERROR] Got error. mmsd 0x7fd62cee0300 , MYSQL 0x7fd62bc00000 , FD 40 : Subquery returns more than 1 row
看到这个错误我怀疑是gr_member_routing_candidate_status
视图有问题(因为之前被这个视图坑过, 详见issue3406), 并且猜测是由于备份时执行FTWRL导致的. 进行了几次模拟, 发现了复现方法
环境
[root@bj2-mysql-huoyun-prod-01 data]# mysql -uroot -p -S /data/mysql_3310/run/mysql.sock
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 126072
Server version: 8.0.22-13 Percona Server (GPL), Release 13, Revision 6f7822f
Copyright (c) 2009-2020 Percona LLC and/or its affiliates
Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
root@localhost 17:54:49 [(none)]> select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 06c6f8b7-a195-11eb-994b-fa163e39df9a | 172.16.xx.151 | 3310 | ONLINE | PRIMARY | 8.0.22 |
| group_replication_applier | 0710790b-a195-11eb-a621-fa163eb36fc8 | 172.16.xx.219 | 3310 | ONLINE | SECONDARY | 8.0.22 |
| group_replication_applier | 073ebcfe-a195-11eb-8b08-fa163efdcb19 | 172.16.xx.224 | 3310 | ONLINE | SECONDARY | 8.0.22 |
+---------------------------+--------------------------------------+---------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)
ProxySQL监控检查需要查询gr_member_routing_candidate_status
表, 定义如下
DELIMITER $$
CREATE FUNCTION gr_applier_queue_length()
RETURNS INT
DETERMINISTIC
BEGIN
RETURN (SELECT sys.gtid_count( GTID_SUBTRACT( (SELECT
Received_transaction_set FROM performance_schema.replication_connection_status
WHERE Channel_name = 'group_replication_applier' ), (SELECT
@@global.GTID_EXECUTED) )));
END$$
CREATE FUNCTION gr_member_in_primary_partition()
RETURNS VARCHAR(3)
DETERMINISTIC
BEGIN
RETURN (SELECT IF( MEMBER_STATE='ONLINE' AND ((SELECT COUNT(*) FROM
performance_schema.replication_group_members WHERE MEMBER_STATE != 'ONLINE') >=
((SELECT COUNT(*) FROM performance_schema.replication_group_members)/2) = 0),
'YES', 'NO' ) FROM performance_schema.replication_group_members JOIN
performance_schema.replication_group_member_stats rgms USING(member_id) WHERE rgms.MEMBER_ID=@@SERVER_UUID ) ;
END$$
CREATE FUNCTION gr_transactions_to_cert() RETURNS int(11)
DETERMINISTIC
BEGIN
RETURN (select performance_schema.replication_group_member_stats.COUNT_TRANSACTIONS_IN_QUEUE AS transactions_to_cert
FROM
performance_schema.replication_group_member_stats where MEMBER_ID=@@SERVER_UUID );
END$$
CREATE VIEW gr_member_routing_candidate_status AS
SELECT
IFNULL(sys.gr_member_in_primary_partition(),'NO') AS viable_candidate,
IF((SELECT
((SELECT
GROUP_CONCAT(performance_schema.global_variables.VARIABLE_VALUE
SEPARATOR ',')
FROM
performance_schema.global_variables
WHERE
(performance_schema.global_variables.VARIABLE_NAME IN ('read_only' , 'super_read_only'))) <> 'OFF,OFF')
),
'YES',
'NO') AS read_only,
IFNULL(sys.gr_applier_queue_length(),0) AS transactions_behind,
IFNULL(sys.gr_transactions_to_cert(),0) AS transactions_to_cert;$$
DELIMITER ;
启动两个session, session1, session2, 连接MGR任意一个节点(PRIMARY还是SECONDARY无所谓)
session2执行
select * from gr_member_in_primary_partition();
+--------------------------------------+
| sys.gr_member_in_primary_partition() |
+--------------------------------------+
| YES |
+--------------------------------------+
1 row in set (0.00 sec)
session1执行
flush tables with read lock;
此时session2再次执行
> select sys.gr_member_in_primary_partition();
ERROR 1242 (21000): Subquery returns more than 1 row
导致查询gr_member_routing_candidate_status也报错
> select * from gr_member_routing_candidate_status;
ERROR 1242 (21000): Subquery returns more than 1 row
然而在session2直接运行gr_member_in_primary_partition
定义中的sql语句却一切正常
> SELECT IF( MEMBER_STATE='ONLINE' AND ((SELECT COUNT(*) FROM
-> performance_schema.replication_group_members WHERE MEMBER_STATE != 'ONLINE') >=
-> ((SELECT COUNT(*) FROM performance_schema.replication_group_members)/2) = 0),
-> 'YES', 'NO' ) FROM performance_schema.replication_group_members JOIN
-> performance_schema.replication_group_member_stats rgms USING(member_id) WHERE rgms.MEMBER_ID=@@SERVER_UUID;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IF( MEMBER_STATE='ONLINE' AND ((SELECT COUNT(*) FROM
performance_schema.replication_group_members WHERE MEMBER_STATE != 'ONLINE') >=
((SELECT COUNT(*) FROM performance_schema.replication_group_members)/2) = 0),
'YES', 'NO' ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| YES |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
测试启动session3, 运行查询select sys.gr_member_in_primary_partition();
无异常
session1 unlock tables
后 session2 执行查询select sys.gr_member_in_primary_partition();
仍然报错
怀疑是MySQL bug
解决办法
如何解决呢, 不使用函数, 直接用gr_member_in_primary_partition中的sql替换
SET @TEMP_LOG_BIN = @@SESSION.SQL_LOG_BIN;
SET @@SESSION.SQL_LOG_BIN= 0;
SET @TEMP_READ_ONLY = @@GLOBAL.READ_ONLY;
SET @TEMP_SUPER_READ_ONLY = @@GLOBAL.SUPER_READ_ONLY;
SET @@GLOBAL.READ_ONLY = 0;
USE sys;
DROP VIEW IF EXISTS gr_member_routing_candidate_status;
DROP FUNCTION IF EXISTS IFZERO;
DROP FUNCTION IF EXISTS LOCATE2;
DROP FUNCTION IF EXISTS GTID_NORMALIZE;
DROP FUNCTION IF EXISTS GTID_COUNT;
DROP FUNCTION IF EXISTS gr_applier_queue_length;
DROP FUNCTION IF EXISTS gr_member_in_primary_partition;
DROP FUNCTION IF EXISTS gr_transactions_to_cert;
DELIMITER $$
CREATE FUNCTION IFZERO(a INT, b INT)
RETURNS INT
DETERMINISTIC
RETURN IF(a = 0, b, a)$$
CREATE FUNCTION LOCATE2(needle TEXT(10000), haystack TEXT(10000), offset INT)
RETURNS INT
DETERMINISTIC
RETURN IFZERO(LOCATE(needle, haystack, offset), LENGTH(haystack) + 1)$$
CREATE FUNCTION GTID_NORMALIZE(g TEXT(10000))
RETURNS TEXT(10000)
DETERMINISTIC
RETURN GTID_SUBTRACT(g, '')$$
CREATE FUNCTION GTID_COUNT(gtid_set TEXT(10000))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE result BIGINT DEFAULT 0;
DECLARE colon_pos INT;
DECLARE next_dash_pos INT;
DECLARE next_colon_pos INT;
DECLARE next_comma_pos INT;
SET gtid_set = GTID_NORMALIZE(gtid_set);
SET colon_pos = LOCATE2(':', gtid_set, 1);
WHILE colon_pos != LENGTH(gtid_set) + 1 DO
SET next_dash_pos = LOCATE2('-', gtid_set, colon_pos + 1);
SET next_colon_pos = LOCATE2(':', gtid_set, colon_pos + 1);
SET next_comma_pos = LOCATE2(',', gtid_set, colon_pos + 1);
IF next_dash_pos < next_colon_pos AND next_dash_pos < next_comma_pos THEN
SET result = result +
SUBSTR(gtid_set, next_dash_pos + 1,
LEAST(next_colon_pos, next_comma_pos) - (next_dash_pos + 1)) -
SUBSTR(gtid_set, colon_pos + 1, next_dash_pos - (colon_pos + 1)) + 1;
ELSE
SET result = result + 1;
END IF;
SET colon_pos = next_colon_pos;
END WHILE;
RETURN result;
END$$
CREATE FUNCTION gr_applier_queue_length()
RETURNS INT
DETERMINISTIC
BEGIN
RETURN (SELECT sys.gtid_count( GTID_SUBTRACT( (SELECT
Received_transaction_set FROM performance_schema.replication_connection_status
WHERE Channel_name = 'group_replication_applier' ), (SELECT
@@global.GTID_EXECUTED) )));
END$$
CREATE FUNCTION gr_transactions_to_cert() RETURNS int(11)
DETERMINISTIC
BEGIN
RETURN (select performance_schema.replication_group_member_stats.COUNT_TRANSACTIONS_IN_QUEUE AS transactions_to_cert
FROM
performance_schema.replication_group_member_stats where MEMBER_ID=@@SERVER_UUID );
END$$
CREATE FUNCTION my_server_uuid() RETURNS TEXT(36) DETERMINISTIC NO SQL RETURN (SELECT @@global.server_uuid as my_id);$$
CREATE VIEW gr_member_routing_candidate_status AS
SELECT
IFNULL((SELECT
IF(MEMBER_STATE = 'ONLINE'
AND ((SELECT
COUNT(*)
FROM
performance_schema.replication_group_members
WHERE
MEMBER_STATE != 'ONLINE') >= ((SELECT
COUNT(*)
FROM
performance_schema.replication_group_members) / 2) = 0),
'YES',
'NO')
FROM
performance_schema.replication_group_members
JOIN
performance_schema.replication_group_member_stats rgms USING (member_id)
WHERE
rgms.MEMBER_ID = my_server_uuid()),
'NO') AS viable_candidate,
IF((SELECT
((SELECT
GROUP_CONCAT(performance_schema.global_variables.VARIABLE_VALUE
SEPARATOR ',')
FROM
performance_schema.global_variables
WHERE
(performance_schema.global_variables.VARIABLE_NAME IN ('read_only' , 'super_read_only'))) <> 'OFF,OFF')
),
'YES',
'NO') AS read_only,
IFNULL(sys.gr_applier_queue_length(), 0) AS transactions_behind,
IFNULL(sys.gr_transactions_to_cert(), 0) AS transactions_to_cert;$$
DELIMITER ;
SET @@SESSION.SQL_LOG_BIN = @TEMP_LOG_BIN;
SET @@GLOBAL.READ_ONLY = @TEMP_READ_ONLY;
SET @@GLOBAL.SUPER_READ_ONLY = @TEMP_SUPER_READ_ONLY;
完整版addtion_to_sys
-- SET @TEMP_LOG_BIN = @@SESSION.SQL_LOG_BIN;
-- SET @@SESSION.SQL_LOG_BIN= 0;
-- SET @TEMP_READ_ONLY = @@GLOBAL.READ_ONLY;
-- SET @TEMP_SUPER_READ_ONLY = @@GLOBAL.SUPER_READ_ONLY;
-- SET @@GLOBAL.READ_ONLY = 0;
USE sys;
DROP VIEW IF EXISTS gr_member_routing_candidate_status;
DROP FUNCTION IF EXISTS IFZERO;
DROP FUNCTION IF EXISTS LOCATE2;
DROP FUNCTION IF EXISTS GTID_NORMALIZE;
DROP FUNCTION IF EXISTS GTID_COUNT;
DROP FUNCTION IF EXISTS gr_applier_queue_length;
DROP FUNCTION IF EXISTS gr_member_in_primary_partition;
DROP FUNCTION IF EXISTS gr_transactions_to_cert;
DELIMITER $$
CREATE FUNCTION IFZERO(a INT, b INT)
RETURNS INT
DETERMINISTIC
RETURN IF(a = 0, b, a)$$
CREATE FUNCTION LOCATE2(needle TEXT(10000), haystack TEXT(10000), offset INT)
RETURNS INT
DETERMINISTIC
RETURN IFZERO(LOCATE(needle, haystack, offset), LENGTH(haystack) + 1)$$
CREATE FUNCTION GTID_NORMALIZE(g TEXT(10000))
RETURNS TEXT(10000)
DETERMINISTIC
RETURN GTID_SUBTRACT(g, '')$$
CREATE FUNCTION GTID_COUNT(gtid_set TEXT(10000))
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE result BIGINT DEFAULT 0;
DECLARE colon_pos INT;
DECLARE next_dash_pos INT;
DECLARE next_colon_pos INT;
DECLARE next_comma_pos INT;
SET gtid_set = GTID_NORMALIZE(gtid_set);
SET colon_pos = LOCATE2(':', gtid_set, 1);
WHILE colon_pos != LENGTH(gtid_set) + 1 DO
SET next_dash_pos = LOCATE2('-', gtid_set, colon_pos + 1);
SET next_colon_pos = LOCATE2(':', gtid_set, colon_pos + 1);
SET next_comma_pos = LOCATE2(',', gtid_set, colon_pos + 1);
IF next_dash_pos < next_colon_pos AND next_dash_pos < next_comma_pos THEN
SET result = result +
SUBSTR(gtid_set, next_dash_pos + 1,
LEAST(next_colon_pos, next_comma_pos) - (next_dash_pos + 1)) -
SUBSTR(gtid_set, colon_pos + 1, next_dash_pos - (colon_pos + 1)) + 1;
ELSE
SET result = result + 1;
END IF;
SET colon_pos = next_colon_pos;
END WHILE;
RETURN result;
END$$
CREATE FUNCTION gr_applier_queue_length()
RETURNS INT
DETERMINISTIC
BEGIN
RETURN (SELECT sys.gtid_count( GTID_SUBTRACT( (SELECT
Received_transaction_set FROM performance_schema.replication_connection_status
WHERE Channel_name = 'group_replication_applier' ), (SELECT
@@global.GTID_EXECUTED) )));
END$$
CREATE FUNCTION gr_transactions_to_cert() RETURNS int(11)
DETERMINISTIC
BEGIN
RETURN (select performance_schema.replication_group_member_stats.COUNT_TRANSACTIONS_IN_QUEUE AS transactions_to_cert
FROM
performance_schema.replication_group_member_stats where MEMBER_ID=@@SERVER_UUID );
END$$
CREATE FUNCTION my_server_uuid() RETURNS TEXT(36) DETERMINISTIC NO SQL RETURN (SELECT @@global.server_uuid as my_id);$$
CREATE VIEW gr_member_routing_candidate_status AS
SELECT
IFNULL((SELECT
IF(MEMBER_STATE = 'ONLINE'
AND ((SELECT
COUNT(*)
FROM
performance_schema.replication_group_members
WHERE
MEMBER_STATE != 'ONLINE') >= ((SELECT
COUNT(*)
FROM
performance_schema.replication_group_members) / 2) = 0),
'YES',
'NO')
FROM
performance_schema.replication_group_members
JOIN
performance_schema.replication_group_member_stats rgms USING (member_id)
WHERE
rgms.MEMBER_ID = my_server_uuid()),
'NO') AS viable_candidate,
IF((SELECT
((SELECT
GROUP_CONCAT(performance_schema.global_variables.VARIABLE_VALUE
SEPARATOR ',')
FROM
performance_schema.global_variables
WHERE
(performance_schema.global_variables.VARIABLE_NAME IN ('read_only' , 'super_read_only'))) <> 'OFF,OFF')
),
'YES',
'NO') AS read_only,
IFNULL(sys.gr_applier_queue_length(), 0) AS transactions_behind,
IFNULL(sys.gr_transactions_to_cert(), 0) AS transactions_to_cert;$$
DELIMITER ;
-- SET @@SESSION.SQL_LOG_BIN = @TEMP_LOG_BIN;
-- SET @@GLOBAL.READ_ONLY = @TEMP_READ_ONLY;
-- SET @@GLOBAL.SUPER_READ_ONLY = @TEMP_SUPER_READ_ONLY;