文章目录
Python中的GIL
对于cpython来说
- python中的一个线程对应于c语言中的一个线程
- GIL使得同一个时刻只有一个线程在cpu上执行字节码
- 无法将多个线程映射到多个cpu上
通过dis模块来分析代码块对应的字节码指令序列
import dis
def add(a):
a = a + 1
return a
print(dis.dis(add))
结果
3 0 LOAD_FAST 0 (a)
2 LOAD_CONST 1 (1)
4 BINARY_ADD
6 STORE_FAST 0 (a)
4 8 LOAD_FAST 0 (a)
10 RETURN_VALUE
None
GIL并不是在函数执行完进行释放的
total = 0
def add():
global total
for i in range(1000000):
total += 1
def desc():
global total
for i in range(1000000):
total -= 1
import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(total)
其结果不确定
335419
GIL会根据执行的字节码行数以及时间片释放,还会在遇到IO操作的时候主动释放
多线程编程 threading
1.通过Thread类实例化
def get_detail_html(url):
print("get detail html started")
time.sleep(2)
print("get detail html end")
def get_detail_url(url):
print("get detail url started")
time.sleep(4)
print("get detail url end")
threading模块创建线程
threading1 = threading.Thread(target=get_detail_html, args=("",))
threading2 = threading.Thread(target=get_detail_url, args=("",))
start_time = time.time()
threading1.start()
threading2.start()
print("last time: {}".format(time.time() - start_time))
结果为
get detail html started
get detail url started
last time: 0.0
get detail html end
get detail url end
setDaemon()方法。主线程A中,创建了子线程B,并且在主线程A中调用了B.setDaemon(),这个的意思是,把主线程A设置为守护线程,这时候,要是主线程A执行结束了,就不管子线程B是否完成,一并和主线程A退出.这就是setDaemon方法的含义,这基本和join是相反的。此外,还有个要特别注意的:必须在start() 方法调用之前设置,如果不设置为守护线程,程序会被无限挂起。
就是设置子线程随主线程的结束而结束:
threading1 = threading.Thread(target=get_detail_html, args=("",))
threading2 = threading.Thread(target=get_detail_url, args=("",))
threading1.setDaemon(True)
threading2.setDaemon(True)
start_time = time.time()
threading1.start()
threading2.start()
print("last time: {}".format(time.time() - start_time))
结果为
get detail html started
get detail url started
last time: 0.0009708404541015625
join ()方法:主线程A中,创建了子线程B,并且在主线程A中调用了B.join(),那么,主线程A会在调用的地方等待,直到子线程B完成操作后,才可以接着往下执行,那么在调用这个线程时可以使用被调用线程的join方法。
里面的参数时可选的,代表线程运行的最大时间,即如果超过这个时间,不管这个此线程有没有执行完毕都会被回收,然后主线程或函数都会接着执行的。
threading1 = threading.Thread(target=get_detail_html, args=("",))
threading2 = threading.Thread(target=get_detail_url, args=("",))
start_time = time.time()
threading1.start()
threading2.start()
threading1.join()
threading2.join()
print("last time: {}".format(time.time() - start_time))
结果为
get detail html started
get detail url started
get detail html end
get detail url end
last time: 4.001973867416382
2.通过集成Thread来实现多线程
class GetDetailHtml(threading.Thread):
def __init__(self, name):
super().__init__(name=name)
def run(self):
print("get detail html started")
time.sleep(2)
print("get detail html end")
class GetDetailUrl(threading.Thread):
def __init__(self, name):
super().__init__(name=name)
def run(self):
print("get detail url started")
time.sleep(2)
print("get detail url end")
thread1 = GetDetailHtml("get_detail_html")
thread2 = GetDetailUrl("get_detail_url")
start_time = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print("last time: {}".format(time.time() - start_time))
结果为
get detail html started
get detail url started
get detail html end
get detail url end
last time: 2.001891613006592
线程间通信 共享变量和Queue
1.共享变量
import time
import threading
detail_url_list = []
def get_detail_html(detail_url_list):
# 爬取文章详情页
while True:
if len(detail_url_list):
url = detail_url_list.pop()
print("get detail html started")
time.sleep(2)
print("get detail html end")
def get_detail_url(detail_url_list):
# 爬取文章列表页
while True:
print("get detail url started")
time.sleep(4)
for i in range(20):
detail_url_list.append("http://projectsedu.com/{id}".format(id=i))
print("get detail url end")
if __name__ == "__main__":
thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_list,))
thread_detail_url.start()
for i in range(10):
html_thread = threading.Thread(target=get_detail_html, args=(detail_url_list,))
html_thread.start()
也可用另外一个文件来保存这些变量
from chapter11 import variables
from chapte