Python默认的解释器是CPython,而CPython中有一个全局锁GIL,任何线程获得锁之后才能执行,所以多线程只能交替进行,即使是多核CPU也只能用到1核。因此,Python中使用多线程并不一定能提高效率,一般来说CPU密集型的任务不适合用多线程,IO密集型的任务适当使用多线程是可以提高效率的。下面分别使用多线程和单线程进行端口(0-5000端口)扫描:
单线程(无延时)
#encoding: utf-8
#python3.4
import socket, time
if __name__=='__main__':
openPortNum = 0
t=time.time()
for port in range(0,5000):
#time.sleep(0.01)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
result = s.connect_ex(('ip',port))
if(result == 0):
print (port,'is open')
openPortNum+=1
s.close()
print ('total open port is %s, scan used time is: %f ' % (openPortNum, time.time()-t))
多线程(无延时)
#encoding: utf-8
#python3.4
import socket, sys, threading, time
openPortNum = 0
socket.setdefaulttimeout(3)
threads=[]
def socket_port(ip, PORT):
global openPortNum
#time.sleep(0.01)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
result = s.connect_ex((ip, PORT))
if(result == 0):
print (PORT,'is open')
openPortNum += 1
s.close()
if __name__ == '__main__':
t=time.time()
for port in range(0, 5000):
th=threading.Thread(target=socket_port, args=('ip',port))
threads.append(th)
th.start()
print ('total open port is %s, scan used time is: %f ' % (openPortNum, time.time()-t))
没有延迟的情况下,单线程和多线程得到扫描耗时为:
单线程0.1s,多线程1.3s,多线程反而没有优势,这是因为线程需要频繁切换,而且IO等待时间较短,无法体现多线程的优势。那么可以在每一个线程执行的时候加一个延时,模拟较长的IO等待时间。
单线程(有延时,延时设为0.01s)
#encoding: utf-8
#python3.4
import socket, time
if __name__=='__main__':
openPortNum = 0
t=time.time()
for port in range(0,5000):
time.sleep(0.01)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
result = s.connect_ex(('ip',port))
if(result == 0):
print (port,'is open')
openPortNum+=1
s.close()
print ('total open port is %s, single_thread scan used time is: %f ' % (openPortNum, time.time()-t))
多线程(有延时,延时为0.01s)
#encoding: utf-8
#python3.4
import socket, sys, threading, time
openPortNum = 0
socket.setdefaulttimeout(3)
threads=[]
def socket_port(ip, PORT):
global openPortNum
time.sleep(0.01)
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
result = s.connect_ex((ip, PORT))
if(result == 0):
print (PORT,'is open')
openPortNum += 1
s.close()
if __name__ == '__main__':
t=time.time()
for port in range(0, 5000):
th=threading.Thread(target=socket_port, args=('ip',port))
threads.append(th)
th.start()
print ('total open port is %s, multi_thread scan used time is: %f ' % (openPortNum, time.time()-t))
有延时的情况下,单线程与多线程的扫描时间分别为:
单线程51.9s,多线程1.3s,多线程的优势就体现出来了。
总结一下,Python多线程对于IO密集型任务有正面效果,对于CPU密集型任务反而效率更低,主要就是因为GIL的存在。
如何减少GIL的影响呢?可以通过使用多进程代替多线程、CPU计算密集型任务使用C模块或其他语言、换成JPython等其他解释器等方法。