概念
在TCP中,发送方把若干数据发送,接收方收到数据时候黏在一包,从接受缓冲区来看,后一包的数据黏在前一包的尾部的一种现象。
出现原因
发送端:
TCP默认使用Nagle算法,主要做两件事:
- 上一包分组得到确认,发送下一组;
- 收集多个小组,合包组成一个分组,在确认信息来一起发送。
接收端:
- 收到分组暂放缓存区,程序主动接受缓存里面调出的分组;
- 当接收分组的速度大于应用读取分组时,多个数据包会存在缓存区里面,造成黏包。
下面先简单介绍一下Nagle算法:
为了尽可能的多发送数据,减少线路上的带宽负荷,在数据发送端和接收端建立缓冲机制,等待数据量或者请求必要的时候进行发送,减少网络负载。Nagle算法规则如下:
- 如果包长度达到MSS,允许发送;
- 如果含有FIN,允许发送;
- 设置TCP_NODELAY选项,允许发送;
- 未设置TCP_CORK选项时,若所有发出去的小数据包(包长度小于MSS)均被确认,则允许发送;
- 上述条件都未满足,但发生了超时(一般为200ms),则立即发送。
TCP黏包:
服务端:
from socket import *
import subprocess
ip_port = ('127.0.0.1', 8888)
BUFSIZE = 1024
tcp_socket_server = socket(AF_INET, SOCK_STREAM)
tcp_socket_server.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
tcp_socket_server.bind(ip_port)
tcp_socket_server.listen(5)
while True:
conn, addr = tcp_socket_server.accept()
print('客户端', addr)
while True:
cmd = conn.recv(BUFSIZE)
if len(cmd) == 0:
break
res = subprocess.Popen(cmd.decode('utf-8'), shell=True,
stdout=subprocess.PIPE,
stdin=subprocess.PIPE,
stderr=subprocess.PIPE)
stderr = res.stderr.read()
stdout = res.stdout.read()
conn.send(stderr)
conn.send(stdout)
客户端:
import socket
BUFSIZE = 1024
ip_port = ('127.0.0.1', 8888)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
res = s.connect_ex(ip_port)
while True:
msg = input('>>: ').strip()
if len(msg) == 0:
continue
if msg == 'quit':
break
s.send(msg.encode('utf-8'))
act_res = s.recv(BUFSIZE)
print(act_res.decode('utf-8'), end='')
结果:
(略)
只有TCP协议出现黏包,UDP协议不会出现黏包。
TCP协议中为提高传输效率,发送方往往要收集到足够多的数据后才发送一个TCP段。若连续几次需要send的数据都很少,通常TCP会根据优化算法把这些数据合成一个TCP段后一次发送出去,这样接收方就收到了粘包数据。
UDP协议是无连接的,面向消息的,提供高效率服务。 不会使用块的合并优化算法, 由于UDP支持的是一对多的模式,所以接收端的skbuff(套接字缓冲区)采用了链式结构来记录每一个到达的UDP包,在每个UDP包中就有了消息头(消息来源地址,端口等信息),这样,对于接收端来说,就容易进行区分处理了。 即面向消息的通信是有消息保护边界的。
此外,用UDP协议发送时,用sendto函数最大能发送数据的长度为:65535- IP头(20) – UDP头(8)=65507字节。用sendto函数发送数据时,如果发送数据长度大于该值,则函数会返回错误(丢弃这个包,不进行发送)。
发生黏包的两种情况
1. 发送方缓存机制
发送方需要等待缓存区满才发送出去,造成黏包(也就是发送时间间隔短,数据量很小,合在一起)。
服务端:
from socket import *
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
ss = socket(AF_INET, SOCK_STREAM)
ss.bind(ADDRESS)
ss.listen(5)
con, add = ss.accept()
data_one = con.recv(10)
data_two = con.recv(10)
print("--->", data_one.decode('utf-8'))
print("--->", data_two.decode('utf-8'))
con.close()
客户端:
from socket import *
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
sc = socket(AF_INET, SOCK_STREAM)
sc.connect(ADDRESS)
sc.send("Hello".encode('utf-8'))
sc.send("Python".encode('utf-8'))
sc.close()
结果:
服务端:
---> HelloPytho
---> nProcess finished with exit code 0
客户端:
(无)
可以看出,由于发送方缓存机制,导致出现黏包现象。
2. 接收方缓存机制
接收方不及时接收缓冲区的包,造成多个包接收(客户端发送了一段数据,服务端只收了一小部分,服务端下次再收的时候还是从缓冲区拿上次遗留的数据,产生粘包) 。
服务端:
from socket import *
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
ss = socket(AF_INET, SOCK_STREAM)
ss.bind(ADDRESS)
ss.listen(5)
con, add = ss.accept()
data_one = con.recv(2)
data_two = con.recv(10)
print("--->",data_one.decode('utf-8'))
print("--->",data_two.decode('utf-8'))
con.close()
客户端:
from socket import *
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
sc = socket(AF_INET, SOCK_STREAM)
res = sc.connect_ex(ADDRESS)
sc.send("Hello,Python!".encode('utf-8'))
sc.close()
结果:
服务端:
---> He
---> llo,PythonProcess finished with exit code 0
客户端:
(无)
可以看出,这是典型的接收方缓存原因导致黏包。
总结:
- 黏包机制仅仅发生在TCP协议中。
- 黏包起因主要是发送方或者接收方缓存机制导致。
- 发送和接收双方不知道缓存消息直接的界限导致黏包。
黏包解决方案
黏包是由于接收端不知道发送端将要传送的字节流的长度,所以解决黏包的方法就是围绕如何让发送端在发送数据前,把自己将要发送的字节流总大小让接收端知晓这个问题来处理,然后接收端来一个死循环接收完所有数据。
服务端:
import socket, subprocess
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
ss = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ss.bind(ADDRESS)
ss.listen(5)
while 1:
con, add = ss.accept()
print("Client:", add)
while 1:
msg = con.recv(BUFF_SIZE)
if not msg:
break
res = subprocess.Popen(msg.decode('utf-8'),
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
error = res.stderr.read()
if error:
ret = error
else:
ret = res.stdout.read()
data_length = len(ret)
con.send(str(data_length).encode('utf-8'))
data = con.recv(BUFF_SIZE).decode('utf-8')
if data == 'recv_ready':
con.sendall(ret)
con.close()
客户端:
import socket
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
sc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
res = sc.connect_ex(ADDRESS)
while 1:
msg = input(">>>").strip()
if len(msg) == 0:
continue
if msg == 'quit':
break
sc.send(msg.encode('utf-8'))
length = int(sc.recv(BUFF_SIZE).decode('utf-8'))
sc.send("recv_ready".encode('utf-8'))
send_size = 0
recv_size = 0
data = b''
while recv_size < length:
data += sc.recv(BUFF_SIZE)
recv_size += len(data)
print(data)
结果:(略)
本程序存在问题:程序的运行速度远快于网络传输速度,所以在发送一段字节前,先用send去发送该字节流长度,这种方式会放大网络延迟带来的性能损耗。接下来看看进阶方案:
我们可以借助一个模块,这个模块可以把要发送的数据长度转换成固定长度的字节。这样客户端每次接收消息之前只要先接受这个固定长度字节的内容看一看接下来要接收的信息大小,那么最终接受的数据只要达到这个值就停止,就能刚好不多不少的接收完整的数据了。
struct 模块
了解c语言的人,一定会知道struct结构体在c语言中的作用,它定义了一种结构,里面包含不同类型的数据(int,char,bool等等),方便对某一结构对象进行处理。
而在网络通信当中,大多传递的数据是以二进制流(binary data)存在的。当传递字符串时,不必担心太多的问题,而当传递诸如int、char之类的基本数据的时候,就需要有一种机制将某些特定的结构体类型打包成二进制流的字符串然后再网络传输,而接收端也应该可以通过某种机制进行解包还原出原始的结构体数据。
python中的struct模块就提供了这样的机制,该模块的主要作用就是对python基本类型值与用python字符串格式表示的C struct类型间的转化,stuct模块提供了很简单的几个函数。
pack()和unpack()
对数据进行打包和解包。例如:
import struct
import binascii
import ctypes
values1 = (1, 'Hello'.encode('utf-8'), 2.7)
values2 = ('Python'.encode('utf-8'), 101)
s1 = struct.Struct('I3sf')
s2 = struct.Struct('4sI')
print(s1.size, s2.size)
pre_buffer = ctypes.create_string_buffer(s1.size + s2.size)
print('Before : ', binascii.hexlify(pre_buffer))
# t = binascii.hexlify('asdfaf'.encode('utf-8'))
# print(t)
s1.pack_into(pre_buffer, 0, *values1)
s2.pack_into(pre_buffer, s1.size, *values2)
print('After pack', binascii.hexlify(pre_buffer))
print(s1.unpack_from(pre_buffer, 0))
print(s2.unpack_from(pre_buffer, s1.size))
s3 = struct.Struct('ii')
s3.pack_into(pre_buffer, 0, 123, 123)
print('After pack', binascii.hexlify(pre_buffer))
print(s3.unpack_from(pre_buffer, 0))
结果:
12 8
Before : b'0000000000000000000000000000000000000000'
After pack b'0100000048656c00cdcc2c405079746865000000'
(1, b'Hel', 2.700000047683716)
(b'Pyth', 101)
After pack b'7b0000007b000000cdcc2c405079746865000000'
(123, 123)Process finished with exit code 0
使用struct解决黏包
借助struct模块,我们知道长度数字可以被转换成一个标准大小的4字节数字。因此可以利用这个特点来预先发送数据长度。
发送时 | 接收时 |
---|---|
先发送struct转换好的数据长度4字节 | 先接受4个字节使用struct转换成数字来获取要接收的数据长度 |
再发送数据 | 再按照长度接收数据 |
服务端:
import socket
import struct
import subprocess
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
struct_server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
struct_server.bind(ADDRESS)
struct_server.listen(5)
while 1:
conn, add = struct_server.accept()
while 1:
cmd = conn.recv(BUFF_SIZE)
if not cmd:
break
print('cmd:%s' % cmd)
res = subprocess.Popen(cmd.decode('utf-8'),
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
error = res.stderr.read()
print(error)
if error:
back_msg = error
else:
back_msg = res.stdout.read()
print("执行到这里")
conn.send(struct.pack('i', len(back_msg)))
conn.sendall(back_msg)
conn.close()
客户端:
import socket
import struct
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
sc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
res = sc.connect_ex(ADDRESS)
while 1:
msg = input("请输入:").strip()
if len(msg) == 0:
continue
if msg == 'quit':
break
sc.send(msg.encode('utf-8'))
l = sc.recv(4)
x = struct.unpack('i', l)[0]
r_s = 0
data = b''
while r_s < x:
r_d = sc.recv(BUFF_SIZE)
data += r_d
r_s += len(r_d)
print(data.decode('gbk'))
结果:
(略)
我们还可以把报头做成字典,字典里包含将要发送的真实数据的详细信息,然后json序列化,然后用struct将序列化后的数据长度打包成4个字节。
发送时 | 接收时 |
---|---|
先发报头长度 | 先收报头长度,用struct取出来 |
再编码报头内容然后发送 | 根据取出的长度收取报头内容,然后解码,反序列化 |
最后发真实内容 | 从反序列化的结果中取出待取数据的详细信息,然后去取真实的数据内容 |
服务端:
import socket
import struct
import json
import subprocess
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
ss = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ss.bind(ADDRESS)
ss.listen(5)
while 1:
conn, add = ss.accept()
while 1:
cmd = conn.recv(BUFF_SIZE)
if not cmd:
break
print("cmd:%s", cmd)
res = subprocess.Popen(cmd.decode('utf-8'),
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
error = res.stderr.read()
print(error)
if error:
back_msg = error
else:
back_msg = res.stdout.read()
header = {'data_size': len(back_msg)}
header_json = json.dumps(header)
header_json_bytes = bytes(header_json, encoding='utf-8')
conn.send(struct.pack('i', len(header_json_bytes)))
conn.send(header_json_bytes)
conn.sendall(back_msg)
conn.close()
客户端:
import socket
import struct
import json
HOST = '127.0.0.1'
PORT = 8080
ADDRESS = (HOST, PORT)
BUFF_SIZE = 1024
sc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sc.connect_ex(ADDRESS)
while 1:
cmd = input("请输入:")
if not cmd:
continue
sc.send(bytes(cmd, encoding='utf-8'))
head = sc.recv(4)
head_json_len = struct.unpack('i', head)[0]
head_json = json.loads(sc.recv(head_json_len).decode('utf-8'))
data_len = head_json['data_size']
recv_size = 0
recv_data = b''
while recv_size < data_len:
recv_data += sc.recv(BUFF_SIZE)
recv_size += len(recv_data)
print(recv_data.decode('utf-8'))
结果:
(略)
面试题
1. 试写FTP文件上传下载
参考:
服务端:
import socket
import struct
import json
import os
class MYTCPServer:
address_family = socket.AF_INET
socket_type = socket.SOCK_STREAM
allow_reuse_address = False
max_packet_size = 8192
coding = 'utf-8'
request_queue_size = 5
server_dir = 'file_upload'
def __init__(self, server_address, bind_and_activate=True):
self.server_address = server_address
self.socket = socket.socket(self.address_family,
self.socket_type)
if bind_and_activate:
try:
self.server_bind()
self.server_activate()
except:
self.server_close()
raise
def server_bind(self):
if self.allow_reuse_address:
self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
self.socket.bind(self.server_address)
self.server_address = self.socket.getsockname()
def server_activate(self):
self.socket.listen(self.request_queue_size)
def server_close(self):
self.socket.close()
def get_request(self):
return self.socket.accept()
def close_request(self, request):
request.close()
def run(self):
while True:
self.conn, self.client_addr = self.get_request()
print('from client ', self.client_addr)
while True:
try:
head_struct = self.conn.recv(4)
if not head_struct: break
head_len = struct.unpack('i', head_struct)[0]
head_json = self.conn.recv(head_len).decode(self.coding)
head_dic = json.loads(head_json)
print(head_dic)
cmd = head_dic['cmd']
if hasattr(self, cmd):
func = getattr(self, cmd)
func(head_dic)
except Exception:
break
def put(self, args):
file_path = os.path.normpath(os.path.join(
self.server_dir,
args['filename']
))
filesize = args['filesize']
recv_size = 0
print('----->', file_path)
with open(file_path, 'wb') as f:
while recv_size < filesize:
recv_data = self.conn.recv(self.max_packet_size)
f.write(recv_data)
recv_size += len(recv_data)
print('recvsize:%s filesize:%s' % (recv_size, filesize))
tcpserver1 = MYTCPServer(('127.0.0.1', 8080))
tcpserver1.run()
客户端:
import socket
import struct
import json
import os
class MYTCPClient:
address_family = socket.AF_INET
socket_type = socket.SOCK_STREAM
allow_reuse_address = False
max_packet_size = 8192
coding = 'utf-8'
request_queue_size = 5
def __init__(self, server_address, connect=True):
self.server_address = server_address
self.socket = socket.socket(self.address_family,
self.socket_type)
if connect:
try:
self.client_connect()
except:
self.client_close()
raise
def client_connect(self):
self.socket.connect(self.server_address)
def client_close(self):
self.socket.close()
def run(self):
while True:
inp = input(">>: ").strip()
if not inp: continue
l = inp.split()
cmd = l[0]
if hasattr(self, cmd):
func = getattr(self, cmd)
func(l)
def put(self, args):
cmd = args[0]
filename = args[1]
if not os.path.isfile(filename):
print('file:%s is not exists' % filename)
return
else:
filesize = os.path.getsize(filename)
head_dic = {'cmd': cmd, 'filename': os.path.basename(filename), 'filesize': filesize}
print(head_dic)
head_json = json.dumps(head_dic)
head_json_bytes = bytes(head_json, encoding=self.coding)
head_struct = struct.pack('i', len(head_json_bytes))
self.socket.send(head_struct)
self.socket.send(head_json_bytes)
send_size = 0
with open(filename, 'rb') as f:
for line in f:
self.socket.send(line)
send_size += len(line)
print(send_size)
else:
print('upload successful')
client = MYTCPClient(('127.0.0.1', 8080))
client.run()
参考:
https://blog.csdn.net/ArchyLi/article/details/78116195
https://baike.baidu.com/item/Nagle%E7%AE%97%E6%B3%95/5645172