mysql leave_MySQL can leave a client process blocked forever

这篇文章来自我的mysql bug 报告,因此全部都写成了英文的。我前段时间发现在特定情况下mysql server可能在网络拥塞时候导致客户端进程永久阻塞,我解决了这个问题并把这个bug及其修复的patch 报告给了mysql 官方团队。

In this article I am sharing some of my finding about how mysql server handles net write timeout and how I located and fixed a bug that could cause a mysql client process to block forever under certain conditions. I've filed a bug here and contributed my patch.

In MySQL, there is a variable ‘net_write_timeout’, according to mysql documentation, whose meaning is ‘The number of seconds to wait for a block to be written to a connection before aborting the write. ’. And there is also a variable ‘net_read_timeout’, meaning ‘The number of seconds to wait for more data from a connection before aborting the read. ’And also the doc says ‘When the server is reading from the client, net_read_timeout is the timeout value controlling when to abort. When the server is writing to the client, net_write_timeout is the timeout value controlling when to abort. ’.

However, I recently found that under certain conditions, a client can be blocked permanently because of how mysqld server handles a timeout write or because of how a client reads from the mysqld server. To illustrate the issue, I’ll first talk about the implementation of the relevant features.

Implementation

The implementation of the client and server network communication feature is the VIO module. At server side, mysqld does a buffered network write --- each client connection has a net write buffer in which results to client is written and when the buffer is full, or when no more results to write, the ‘net’ module sends the buffered bytes to client in 16KB packets. At client side after a statement is sent to server in functions like mysql_real_query(), it calls mysql_store_result() which simply does a blocked read(recv()) to read from server. So the recv() syscall will return from block only if server sends more data or if server disconnects the socket connection.

The core functions are vio_write(), net_write_raw_loop() and vio_socket_io_wait(). The net_write_raw_loop() calls vio_write() to write the connection’s buffered result packet by packet to client side. The vio_write() does a non-block send() to send a packet, and if send() would block, it calls vio_socket_io_wait()->vio_io_wait() to wait for the socket to be writable(i.e. OS kernel’s socket buffer spared after the buffered data is written to network). And vio_io_wait() calls poll() to do a timed polling, and if the socket is found writable, vio_write() will try to send() again. However, if poll() times out, which could happen if ‘net_write_timeout’ was set small(e.g. 1) and a short network congestion happens, net_write_raw_loop() return error and the execution of the sql statement completes. The timeout error is simply ignored, and this is wrong! Server side should have disconnected the socket connection so that client side can return from blocked recv() syscall.

How to Reproduce

At server side prepare a table my_big_table with a huge amount of data, and set global and session net_write_timeout=1. And to imitate a network congestion that surely happens, we have to use gdb to block the execution of mysqld and client mysql at the right place. Use gdb to attach to the mysqld process and set a breakpoint at function vio_write() and vio_io_wait().Then in the client, issue a ‘select * from my_big_table’, and almost immediately use gdb to attach to the client mysql process, then you will most probably be blocked at such a callstack, and keep it blocked.

Then in the gdb attached to mysqld process, you will meet many breakpoint hits in vio_write() (server sending result packets to client) and then finally you will see vio_io_wait() (send() would block because OS socket write buffer is full) is called and in vio_io_wait(), this statement is executed in below call stack:

errno= SOCKET_ETIMEDOUT;

Then the stack unwinds and the statement execution finishes successfully, but only partial results are sent to the client. And if you executes ‘show processlist’ you will see something like below:

At client side, however, the query statement would block foreverat below call stack:

Problem Analysis and Fix

Below is the code of net_write_raw_loop(), if vio_write() fails from timeout it returns VIO_SOCKET_ERROR and the while loop breaks out, and in the red box below, ER_NET_WRITE_INTERRUPTED error is reported, and net->error set to 2. However, neither net->error nor the ER_NET_WRITE_INTERRUPTED error is ever checked or any actions taken(nowhere in the entire code base) .

The right measure to take is to close the connection if thd->net.error is set non-zero as what’s done in my patch. Note that thd->net.error can be set to 1/2/3 for different types of errors, all of which will cause the sending of results to client to stop, but in some cases my_error() is called to report various errors and in other cases my_error() is not called at all, so it’s not reliable or convenient to check for specific errors reported by my_error().

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
城市应急指挥系统是智慧城市建设的重要组成部分,旨在提高城市对突发事件的预防和处置能力。系统背景源于自然灾害和事故灾难频发,如汶川地震和日本大地震等,这些事件造成了巨大的人员伤亡和财产损失。随着城市化进程的加快,应急信息化建设面临信息资源分散、管理标准不统一等问题,需要通过统筹管理和技术创新来解决。 系统的设计思路是通过先进的技术手段,如物联网、射频识别、卫星定位等,构建一个具有强大信息感知和通信能力的网络和平台。这将促进不同部门和层次之间的信息共享、交流和整合,提高城市资源的利用效率,满足城市对各种信息的获取和使用需求。在“十二五”期间,应急信息化工作将依托这些技术,实现动态监控、风险管理、预警以及统一指挥调度。 应急指挥系统的建设目标是实现快速有效的应对各种突发事件,保障人民生命财产安全,减少社危害和经济损失。系统将包括预测预警、模拟演练、辅助决策、态势分析等功能,以及应急值守、预案管理、GIS应用等基本应用。此外,还包括支撑平台的建设,如接警中心、视频议、统一通信等基础设施。 系统的实施将涉及到应急网络建设、应急指挥、视频监控、卫星通信等多个方面。通过高度集成的系统,建立统一的信息接收和处理平台,实现多渠道接入和融合指挥调度。此外,还包括应急指挥中心基础平台建设、固定和移动应急指挥通信系统建设,以及应急队伍建设,确保能够迅速响应并有效处置各类突发事件。 项目的意义在于,它不仅是提升灾害监测预报水平和预警能力的重要科技支撑,也是实现预防和减轻重大灾害和事故损失的关键。通过实施城市应急指挥系统,可以加强社管理和公共服务,构建和谐社,为打造平安城市提供坚实的基础。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值