背景
服务端与客户端之间的网络通信(使用Boost Asio库异步编程模式实现),客户端会向服务端请求数据。
在刚开始的测试中,是没有出现问题的。后来有一次测试时,服务端查询完数据后,向客户端发送时总是崩溃。
通过gdb调试,可以发现是在调用到异步发送函数(boost::asio::async_write)后崩溃的。
打印的栈信息如下:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000056625c in boost::asio::detail::reactive_socket_service_base::start_op (this=0x2e33313a30313a5f, impl=..., op_type=1, op=0x7fc5b4000af0, is_continuation=false,
is_non_blocking=true, noop=false) at /usr/local/include/boost/asio/detail/impl/reactive_socket_service_base.ipp:219
219 reactor_.post_immediate_completion(op, is_continuation);
[Current thread is 1 (Thread 0x7fc5f7fff700 (LWP 128087))]
(gdb) bt
#0 0x000000000056625c in boost::asio::detail::reactive_socket_service_base::start_op (this=0x2e33313a30313a5f, impl=..., op_type=1, op=0x7fc5b4000af0, is_continuation=false,
is_non_blocking=true, noop=false) at /usr/local/include/boost/asio/detail/impl/reactive_socket_service_base.ipp:219
#1 0x000000000056bed5 in boost::asio::detail::reactive_socket_service_base::async_send<boost::asio::mutable_buffers_1, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> > > (this=0x2e33313a30313a5f, impl=..., buffers=..., flags=0, handler=...)
at /usr/local/include/boost/asio/detail/reactive_socket_service_base.hpp:216
#2 0x000000000056b20f in boost::asio::stream_socket_service<boost::asio::ip::tcp>::async_send<boost::asio::mutable_buffers_1, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> > >(boost::asio::detail::reactive_socket_service<boost::asio::ip::tcp>::implementation_type&, boost::asio::mutable_buffers_1 const&, int, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> >&&) (this=0x2e33313a30313a37, impl=..., buffers=..., flags=0, handler=<unknown type in /a.out, CU 0x5ed5a4, DIE 0x6317d1>)
at /usr/local/include/boost/asio/stream_socket_service.hpp:334
#3 0x000000000056a708 in boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >::async_write_some<boost::asio::mutable_buffers_1, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> > >(boost::asio::mutable_buffers_1 const&, boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> >&&) (
this=0x2473c38, buffers=..., handler=<unknown type in /a.out, CU 0x5ed5a4, DIE 0x62fcbd>) at /usr/local/include/boost/asio/basic_stream_socket.hpp:732
#4 0x0000000000569831 in boost::asio::detail::write_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> >::operator() (this=0x7fc5f7ffde20, ec=..., bytes_transferred=0, start=1) at /usr/local/include/boost/asio/impl/write.hpp:258
#5 0x0000000000568263 in boost::asio::async_write<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running> >(boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&, boost::asio::mutable_buffers_1 const&, boost::asio::detail::wrapped_handler<boost::asio::io_service::strand, boost::_bi::bind_t<void, boost::_mfi::mf2<void, TcpSession, boost::system::error_code const&, unsigned long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<TcpSession> >, boost::arg<1>, boost::arg<2> > >, boost::asio::detail::is_continuation_if_running>&&) (s=..., buffers=..., handler=<unknown type in /a.out, CU 0x5ed5a4, DIE 0x62aa66>)
at /usr/local/include/boost/asio/impl/write.hpp:621
#6 0x000000000056306b in TcpSession::StartSend (this=0x246ec20, pData=0x7fc5a000f158 "\314F", nDataSize=18128) at SourceFiles/Common/ClientMgrBase.cpp:259
分析
- 数据缓冲区有效性
在Boost Asio异步编程中,最常见的错误莫过于数据缓冲区的失效。这个在编程初期已经进行了严格的检查。
在服务端的实现中,使用了TcpSession类来管理socket和它相关的read/write缓冲,它继承自boost::enable_shared_from_this<TcpSession>
,并在bind仿函数中使用智能指针,所以这块应该是没有问题的。
- socket类非线程安全
另外一个,是socket的非线程安全性。对于同一个socket,不能在一个线程读的同时,在另一个线程中执行写操作。
其实,在数据量比较小时(比如小于65535),多线程操作socket一般也不会有什么问题。
- 结合测试情况和debug
结合测试具体情况,在先前的测试中,并未出现问题,测试条件的改变在于读取的数据量变大。
而在boost::asio::async_write实现中,使用定长的数组作为发送数据缓冲区。在发送函数到来的时候,先把数据拷贝到这个缓冲区,再调用发送函数。
一个现实的情况是,当数据量太大时,会超过发送数据缓冲区,这样就会导致内存使用异常。
虽然在拷贝的时候没有崩溃,但在真正发送数据时,发现buff违例,导致崩溃。接收数据时也会存在类似情况。
所以最终定准为数据缓冲区的使用问题,最简单的方法是修改为string,这样就由string类的asign方法来承载待发送的数据,不会出现超出范围的问题(内存允许)。
小结
在实际测试中,当我修改客户端的recv函数的数据缓冲区小到不能容纳所接收的数据时,也会出现崩溃的现象,但打印的栈信息与上述不同。
可见,在数据缓冲区的处理方面,我的程序还有不少需要改进的地方。由于不能预知数据的最大大小,所以应该考虑动态分配内存的方式,比如std::vector类型。
总之,当Boost Asio库由于异常崩溃时,不要因为看不懂栈信息而着急上火,还是要结合相关信息和已有知识储备,由浅入深,逐步排查定位。