tidb节点重启后,服务无法重连

大家好,我是烤鸭:

    前几天遇到tidb节点重启后服务无法重连,确切地说是两个服务,一个可以正常重连,一个不行。

问题复现

由于线上执行慢SQL,导致TiDB 单个节点宕机重启。

其中A服务的3个节点和B服务的1个节点开始报错,重启后B服务恢复,A服务还在报错。

使用的组件版本:

A服务:tomcat-jdbc-9.0.71.jar
B服务:HikariCP-4.0.3.jar

数据库配置:

A服务:

spring.xxx-tidb-data-source.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxA?useUnicode=true&characterEncoding=utf8&zeroDateTimeBehavior=convertToNull&useSSL=false&serverTimezone=GMT%2B8

B服务:

spring.datasource.url=jdbc:mysql://xxx.xxx.xxx:6000/xxxB?useSSL=false&autoReconnect=true

可以看到两边的配置和连接池不同。

报错日志

无法重连服务的报错日志:

Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
	at sun.reflect.GeneratedConstructorAccessor139.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:762)
	at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:569)
	at com.mysql.cj.jdbc.ConnectionImpl.prepareStatement$original$McwhO8C0(ConnectionImpl.java:1580)
	... 125 common frames omitted
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: The last packet successfully received from the server was 259,130,913 milliseconds ago. The last packet sent successfully to the server was 259,130,914 milliseconds ago.may or may not be greater than the server-side timeout (the driver was unable to determine the value of either the 'wait_timeout' or 'interactive_timeout' configuration values from the server.. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem.
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
	at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
	at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167)
	at com.mysql.cj.protocol.a.NativeProtocol.readMessage(NativeProtocol.java:520)
	at com.mysql.cj.protocol.a.NativeProtocol.checkErrorMessage(NativeProtocol.java:700)
	at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:639)
	at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:987)
	at com.mysql.cj.NativeSession.execSQL(NativeSession.java:666)
	at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:930)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa(ClientPreparedStatement.java:371)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute$original$mDKgOeFa$accessor$UdYpLG4x(ClientPreparedStatement.java)
	at com.mysql.cj.jdbc.ClientPreparedStatement$auxiliary$BAL6Kq1Z.call(Unknown Source)
	at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86)
	at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java)
	at sun.reflect.GeneratedMethodAccessor261.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:118)

问题猜想

找到变量了,就顺着猜测查一下。

  • 数据库配置参数 autoReconnect=true
  • 连接池 tomcat-jdbc 和 HikariCP

报错日志都来源于 CJCommunicationsException 这个类,其实调用的是 ExceptionFactory.createLinkFailureMessageBasedOnHeuristics,报错日志来源于配置文件。

我保留了一部分一会可能用到的变量

LocalizedErrorMessages.properties

CommunicationsException.2=\ is longer than the server configured value of 
CommunicationsException.3=''wait_timeout''
CommunicationsException.4=''interactive_timeout''
CommunicationsException.5=may or may not be greater than the server-side timeout 
CommunicationsException.6=(the driver was unable to determine the value of either the 
CommunicationsException.7=''wait_timeout'' or ''interactive_timeout'' configuration values from 
CommunicationsException.8=the server.
CommunicationsException.11=. You should consider either expiring and/or testing connection validity 
CommunicationsException.12=before use in your application, increasing the server configured values for client timeouts, 
CommunicationsException.13=or using the Connector/J connection property ''autoReconnect=true'' to avoid this problem.
CommunicationsException.20=Communications link failure

...

Connection.0=Unable to connect to database.
Connection.1=Cannot connect to MySQL server on {0}:{1}.\n\nMake sure that there is a MySQL server running on the machine/port you are trying to connect to and that the machine this software is running on is able to connect to this host/port (i.e. not firewalled). Also make sure that the server has not been started with the --skip-networking flag.\n\n
Connection.2=No operations allowed after connection closed.
Connection.3=Can''t call commit when autocommit=true
Connection.4=Communications link failure during commit(). Transaction resolution unknown.
Connection.5=Unknown Java encoding for the character set with index ''{0}''. Use the ''customCharsetMapping'' property to force it.
Connection.6=Unknown character set index ''{0}'' received from server. The appropriate client character set can be forced via the ''characterEncoding'' property.
Connection.7=Can''t map {0} given for characterSetResults to a supported MySQL encoding.
Connection.8=Unable to use encoding: {0}

看看源码

从报错日志入手,根据链接状态和上次发送、接收时间判断,拼一些异常原因。

可以看出报错日志有几个组成部分:

1、上次发送packet的时间距离现在超过8小时,getLastPacketSentTime > 28800秒(8小时)

2、当前链接已关闭:No operations allowed after connection closed

ExceptionFactory

public static String createLinkFailureMessageBasedOnHeuristics(PropertySet propertySet, ServerSession serverSession,
            PacketSentTimeHolder packetSentTimeHolder, PacketReceivedTimeHolder packetReceivedTimeHolder, Throwable underlyingException) {
        long serverTimeoutSeconds = 0;
        boolean isInteractiveClient = false;

        long lastPacketReceivedTimeMs = packetReceivedTimeHolder == null ? 0L : packetReceivedTimeHolder.getLastPacketReceivedTime();
        long lastPacketSentTimeMs = packetSentTimeHolder.getLastPacketSentTime();
        if (lastPacketSentTimeMs > lastPacketReceivedTimeMs) {
            lastPacketSentTimeMs = packetSentTimeHolder.getPreviousPacketSentTime();
        }

        if (propertySet != null) {
            isInteractiveClient = propertySet.getBooleanProperty(PropertyKey.interactiveClient).getValue();

            String serverTimeoutSecondsStr = null;

            if (serverSession != null) {
                serverTimeoutSecondsStr = isInteractiveClient ? serverSession.getServerVariable("interactive_timeout")
                        : serverSession.getServerVariable("wait_timeout");
            }

            if (serverTimeoutSecondsStr != null) {
                try {
                    serverTimeoutSeconds = Long.parseLong(serverTimeoutSecondsStr);
                } catch (NumberFormatException nfe) {
                    serverTimeoutSeconds = 0;
                }
            }
        }

        StringBuilder exceptionMessageBuf = new StringBuilder();

        long nowMs = System.currentTimeMillis();

        if (lastPacketSentTimeMs == 0) {
            lastPacketSentTimeMs = nowMs;
        }

        long timeSinceLastPacketSentMs = (nowMs - lastPacketSentTimeMs);
        long timeSinceLastPacketSeconds = timeSinceLastPacketSentMs / 1000;

        long timeSinceLastPacketReceivedMs = (nowMs - lastPacketReceivedTimeMs);

        int dueToTimeout = DUE_TO_TIMEOUT_FALSE;

        StringBuilder timeoutMessageBuf = null;

        if (serverTimeoutSeconds != 0) {
            if (timeSinceLastPacketSeconds > serverTimeoutSeconds) {
                dueToTimeout = DUE_TO_TIMEOUT_TRUE;

                timeoutMessageBuf = new StringBuilder();
                timeoutMessageBuf.append(Messages.getString("CommunicationsException.2"));
                timeoutMessageBuf.append(Messages.getString(isInteractiveClient ? "CommunicationsException.4" : "CommunicationsException.3"));
            }
		  // 上次发送成功的时间距离现在超过8小时
        } else if (timeSinceLastPacketSeconds > DEFAULT_WAIT_TIMEOUT_SECONDS) {
            dueToTimeout = DUE_TO_TIMEOUT_MAYBE;
			
            timeoutMessageBuf = new StringBuilder();
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.5"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.6"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.7"));
            timeoutMessageBuf.append(Messages.getString("CommunicationsException.8"));
        }

        if (dueToTimeout == DUE_TO_TIMEOUT_TRUE || dueToTimeout == DUE_TO_TIMEOUT_MAYBE) {

            exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0
                    ? Messages.getString("CommunicationsException.ServerPacketTimingInfo",
                            new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) })
                    : Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));

            if (timeoutMessageBuf != null) {
                exceptionMessageBuf.append(timeoutMessageBuf);
            }

            exceptionMessageBuf.append(Messages.getString("CommunicationsException.11"));
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.12"));
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.13"));

        } else {
            //
            // Attempt to determine the reason for the underlying exception (we can only make a best-guess here)
            //
            if (underlyingException instanceof BindException) {
                String localSocketAddress = propertySet.getStringProperty(PropertyKey.localSocketAddress).getValue();
                boolean interfaceNotAvaliable;
                try {
                    interfaceNotAvaliable = localSocketAddress != null && NetworkInterface.getByName(localSocketAddress) == null;
                } catch (SocketException e1) {
                    interfaceNotAvaliable = false;
                }
                exceptionMessageBuf.append(interfaceNotAvaliable ? Messages.getString("CommunicationsException.LocalSocketAddressNotAvailable")
                        : Messages.getString("CommunicationsException.TooManyClientConnections"));
            }
        }

        if (exceptionMessageBuf.length() == 0) {
            // We haven't figured out a good reason, so copy it.
            exceptionMessageBuf.append(Messages.getString("CommunicationsException.20"));

            if (propertySet.getBooleanProperty(PropertyKey.maintainTimeStats).getValue() && !propertySet.getBooleanProperty(PropertyKey.paranoid).getValue()) {
                exceptionMessageBuf.append("\n\n");
                exceptionMessageBuf.append(lastPacketReceivedTimeMs != 0
                        ? Messages.getString("CommunicationsException.ServerPacketTimingInfo",
                                new Object[] { Long.valueOf(timeSinceLastPacketReceivedMs), Long.valueOf(timeSinceLastPacketSentMs) })
                        : Messages.getString("CommunicationsException.ServerPacketTimingInfoNoRecv", new Object[] { Long.valueOf(timeSinceLastPacketSentMs) }));
            }
        }

        return exceptionMessageBuf.toString();
    }

再根据上边的两个猜测,找一下对应的源码。

猜想1:autoReconnect 这个参数是有用的

网上很多说这个参数只针对mysql5以下,这个倒是没找到,官网关于这个参数的说明。
https://dev.mysql.com/doc/connectors/en/connector-j-connp-props-high-availability-and-clustering.html#cj-conn-prop_autoReconnect

如果链接断开的情况下,配置了这个参数可以发起重试。如果不配置的话,就会一直报错。

在这里插入图片描述

猜想2:HikariCP 会自动重连,tomcat-jdbc不会

可以看到 tomcat-jdbc 执行 invoke,getConnetion一气呵成,在默认配置下,也没有任何校验,如果因为第三方导致的链接断开,就麻烦了。

public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        if (compare(ISCLOSED_VAL,method)) {
            return Boolean.valueOf(isClosed());
        }
        if (compare(CLOSE_VAL,method)) {
            if (connection==null)
             {
              return null; //noop for already closed.
            }
            PooledConnection poolc = this.connection;
            this.connection = null;
            pool.returnConnection(poolc);
            return null;
        } else if (compare(TOSTRING_VAL,method)) {
            return this.toString();
        } else if (compare(GETCONNECTION_VAL,method) && connection!=null) {
            return connection.getConnection();
        } else if (method.getDeclaringClass().isAssignableFrom(XAConnection.class)) {
            try {
                return method.invoke(connection.getXAConnection(),args);
            }catch (Throwable t) {
                if (t instanceof InvocationTargetException) {
                    throw t.getCause() != null ? t.getCause() : t;
                } else {
                    throw t;
                }
            }
        }
        if (isClosed()) {
          throw new SQLException("Connection has already been closed.");
        }
        if (compare(UNWRAP_VAL,method)) {
            return unwrap((Class<?>)args[0]);
        } else if (compare(ISWRAPPERFOR_VAL,method)) {
            return Boolean.valueOf(this.isWrapperFor((Class<?>)args[0]));
        }
        try {
            PooledConnection poolc = connection;
            if (poolc!=null) {
                return method.invoke(poolc.getConnection(),args);
            } else {
                throw new SQLException("Connection has already been closed.");
            }
        }catch (Throwable t) {
            if (t instanceof InvocationTargetException) {
                throw t.getCause() != null ? t.getCause() : t;
            } else {
                throw t;
            }
        }
    }

在这里插入图片描述

HikariCP

HikarPool.getConnection 会校验当前是否存活,如果不存活,会关闭当前链接,并会重新创建连接。

在这里插入图片描述

问题解决

当天的解决方案比较简单,重启服务就行。不过如果想彻底解决这个问题的话。

  • 升级连接池,tomcat-jdbc 升级 druid 或者 HikariCP ,springboot 2.0以上默认是 HikariCP

  • 修改配置,如果还使用 tomcat-jdbc :

    增加以下配置:

    # 连接存活检查
    spring.datasource.tomcat.validation-query='SELECT 1'
    # 指定多少ms执行一次连接校验.	
    spring.datasource.tomcat.validation-interval=3000
    # 指定连接校验查询的超时时间
    spring.datasource.tomcat.validation-query-timeout=-1
    
    或者连接地址增加 autoReconnect=true
    

总结

有些问题不经常遇到,即便是预发1比1的环境可能也没办法避免。除了在事故处理的时候可以完善流程,就只能尽量保持架构统一,尤其是老项目。业务侧增加全方面中间件的监控,尤其是存活检测这种。坑是无穷无尽的,踩完了记得遇到相同的跳过去。

相关文章

https://zhuanlan.zhihu.com/p/361130333?utm_id=0

https://blog.csdn.net/zhangxin09/article/details/124901850

https://blog.csdn.net/u014398624/article/details/47746473

  • 24
    点赞
  • 16
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
TiDB 集群中增加 TiDB 节点、TiKV 节点、PD 节点的步骤略有不同,下面分别介绍一下。 1. 增加 TiDB 节点 增加 TiDB 节点需要进行以下步骤: - 使用 TiUP 工具安装 TiDB-ansible,并在 inventory.ini 配置文件中添加新的 TiDB 节点信息。 - 在新的 TiDB 节点上安装 TiDB 组件,并在配置文件中设置与现有 TiDB 节点相同的参数。 - 将新的 TiDB 节点加入到 TiDB 集群中,需要使用 TiUP 工具执行 `tiup cluster scale-out <cluster-name> <new-tidb-instance>` 命令,其中 `<new-tidb-instance>` 是新的 TiDB 节点的名称。 - 等待 TiUP 工具完成 TiDB 节点的扩容操作,然后使用 `tiup cluster display <cluster-name>` 命令检查集群状态,确保新的 TiDB 节点已经加入到集群中。 2. 增加 TiKV 节点 增加 TiKV 节点需要进行以下步骤: - 在新的服务器上安装 TiKV 组件,并在配置文件中设置与现有 TiKV 节点相同的参数。 - 将新的 TiKV 节点加入到 TiKV 集群中,需要在 TiKV 的配置文件中设置新的节点信息,并在现有 TiKV 节点上执行 `tiup cluster reload <cluster-name> tikv` 命令,让 TiKV 集群重新加载配置文件。 - 使用 TiUP 工具执行 `tiup cluster scale-out <cluster-name> <new-tikv-instance>` 命令,其中 `<new-tikv-instance>` 是新的 TiKV 节点的名称。 - 等待 TiUP 工具完成 TiKV 节点的扩容操作,然后使用 `tiup cluster display <cluster-name>` 命令检查集群状态,确保新的 TiKV 节点已经加入到集群中。 3. 增加 PD 节点 增加 PD 节点需要进行以下步骤: - 在新的服务器上安装 PD 组件,并在配置文件中设置与现有 PD 节点相同的参数。 - 将新的 PD 节点加入到 PD 集群中,需要在 PD 配置文件中设置新的节点信息,并在现有 PD 节点上执行 `tiup cluster reload <cluster-name> pd` 命令,让 PD 集群重新加载配置文件。 - 使用 TiUP 工具执行 `tiup cluster scale-out <cluster-name> <new-pd-instance>` 命令,其中 `<new-pd-instance>` 是新的 PD 节点的名称。 - 等待 TiUP 工具完成 PD 节点的扩容操作,然后使用 `tiup cluster display <cluster-name>` 命令检查集群状态,确保新的 PD 节点已经加入到集群中。 以上是增加 TiDB、TiKV、PD 节点的基本步骤,具体的操作可能会因为版本、网络环境等因素而有所不同,需要根据具体情况进行调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

烤鸭的世界我们不懂

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值