1. 之前写过一篇《通过实例认识Python的GIL》的文章,感觉有些意犹未尽
2. 这次对例子作了些扩展,进一步的分析GIL对Python程序的影响
2.1 先来看例子:
[python] view plain copy
- from threading import Thread
- from threading import Event as TEvent
- from multiprocessing import Process
- from multiprocessing import Event as PEvent
- from timeit import Timer
- def countdown(n,event):
- while n > 0:
- n -= 1
- event.set()
- def io_op(n,event):
- f = open('test.txt','w')
- while not event.is_set():
- f.write('hello,world')
- f.close()
- def t1():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT,event))
- thread1.start()
- thread1.join()
- def t2():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT//2,event))
- thread2 = Thread(target=countdown,args=(COUNT//2,event))
- thread1.start(); thread2.start()
- thread1.join(); thread2.join()
- def t3():
- COUNT=100000000
- event = PEvent()
- p1 = Process(target=countdown,args=(COUNT//2,event))
- p2 = Process(target=countdown,args=(COUNT//2,event))
- p1.start(); p2.start()
- p1.join(); p2.join()
- def t4():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT,event))
- thread2 = Thread(target=io_op,args=(COUNT,event))
- thread1.start(); thread2.start()
- thread1.join(); thread2.join()
- def t5():
- COUNT=100000000
- event = PEvent()
- p1 = Process(target=countdown,args=(COUNT,event))
- p2 = Process(target=io_op,args=(COUNT,event))
- p1.start(); p2.start()
- p1.join(); p2.join()
- if __name__ == '__main__':
- t = Timer(t1)
- print('countdown in one thread:%f'%(t.timeit(1),))
- t = Timer(t2)
- print('countdown use two thread:%f'%(t.timeit(1),))
- t = Timer(t3)
- print('countdown use two Process:%f'%(t.timeit(1),))
- t = Timer(t4)
- print('countdown in one thread with io op in another thread:%f'%(t.timeit(1),))
- t = Timer(t5)
- print('countdown in one process with io op in another process:%f'%(t.timeit(1),))
2.2 再来看输出:
2.2.1 先来看多核CPU禁用其它CPU,只运行一个CPU,Windows系统,Python2.7.6上运行的结果:
[plain] view plain copy
- countdown in one thread:', 5.9650638561501195
- countdown use two thread:', 5.8188333656781595
- countdown use two Process', 6.197559396296269
- countdown in one thread with io op in another thread:', 11.369204522553051
- countdown in one process with io op in another process:', 11.79234388645473
2.2.2 再来看下四核CPU,Windows系统,Python2.7.6上运行的结果:
[plain] view plain copy
- countdown in one thread:6.479085
- countdown use two thread:24.266131
- countdown use two Process4.360930
- countdown in one thread with io op in another thread:29.967870
- countdown in one process with io op in another process:6.478644
2.2.3 再来看下四核64位CPU,Widonws系统,Python3.4上运行的结果:
[html] view plain copy
- countdown in one thread:12.333187
- countdown use two thread:19.358091
- countdown use two Process:7.105101
- countdown in one thread with io op in another thread:10.443203
- countdown in one process with io op in another process:18.682883
[html] view plain copy
[html] view plain copy
为了方便对比,还是上张图吧:
1)、单线程
2)、两线程
3)、二进程
4)、CPU计算线程+I/O线程
5)、CPU计算进程+I/O进程
2.3 总结:
2.3.1 在单核CPU上,一切都很美好
单线程和多线程的运行效率差不多。
多进程和多线程的表现一致,多进程稍微慢些,可能是进程切换更耗时间所致。
CPU和IO混合操作时,多进程和多线程的表现也一致
2.3.2 但是到了多核CPU上时,多线程和多进程的区别就暴露无余了:
使用多线程情况下,执行同样的计算量,CPU的计算时间比单线程慢了四倍(6.479085比24.266131)
如果是多线程情况下的CPU计算和IO混合操作,情况变得更糟(29.967870秒),这里的时间还算是好的,如果开的程序多了,其它程序也在执行IO操作,所耗的时间还会更多。
多进程情况下,一切依然美好。
2.3.3 在Python3.4上,由于对GIL作了很大的优化,多线程情况下的运行效率有了很大 改善,I/O操作对CPU计算的影响也比较小了,而没有3.2版本上那么大了,但是整体的运行速度比Python2.7慢了2倍!
3. 从上面这个例子来看,情况是非常不妙的,Python在多核CPU的情况下,Thread似乎变得一无是处,但是要不要这么悲观呢?我们来接着看下一个例子:
4. 我们对原来例子作下优化,将countdown移到c代码中进行处理:
4.1 先看代码:
4.1.1 utility.pyx
[python] view plain copy
- def countdown(int n):
- with nogil:
- while n > 0:
- n -= 1
4.1.2 Setup.py
[python] view plain copy
- from distutils.core import setup
- from distutils.extension import Extension
- from Cython.Build import cythonize
- ext = Extension("utility",
- define_macros = [('MAJOR_VERSION', '1'),
- ('MINOR_VERSION', '0')],
- sources = ["utility.pyx", ])
- setup(
- name = 'callback',
- version = '1.0',
- description = 'This is a callback demo package',
- author = '',
- author_email = 'shi19@163.com',
- url = '',
- long_description = '',
- ext_modules=cythonize([ext,]),
- )
4.1.3 count.py
[python] view plain copy
- from threading import Thread
- from threading import Event as TEvent
- from multiprocessing import Process
- from multiprocessing import Event as PEvent
- import utility
- from timeit import Timer
- def countdown(n,event):
- for i in range(100):
- utility.countdown(n)
- event.set()
- def io_op(n,event):
- f = open('test.txt','w')
- while not event.is_set():
- f.write('hello,world')
- f.close()
- def t1():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT,event))
- thread1.start()
- thread1.join()
- def t2():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT//2,event))
- thread2 = Thread(target=countdown,args=(COUNT//2,event))
- thread1.start(); thread2.start()
- thread1.join(); thread2.join()
- def t3():
- COUNT=100000000
- event = PEvent()
- p1 = Process(target=countdown,args=(COUNT//2,event))
- p2 = Process(target=countdown,args=(COUNT//2,event))
- p1.start(); p2.start()
- p1.join(); p2.join()
- def t4():
- COUNT=100000000
- event = TEvent()
- thread1 = Thread(target=countdown,args=(COUNT,event))
- thread2 = Thread(target=io_op,args=(COUNT,event))
- thread1.start(); thread2.start()
- thread1.join(); thread2.join()
- def t5():
- COUNT=100000000
- event = PEvent()
- p1 = Process(target=countdown,args=(COUNT,event))
- p2 = Process(target=io_op,args=(COUNT,event))
- p1.start(); p2.start()
- p1.join(); p2.join()
- if __name__ == '__main__':
- t = Timer(t1)
- print('countdown in one thread:%f'%(t.timeit(1),))
- t = Timer(t2)
- print('countdown use two thread:%f'%(t.timeit(1),))
- t = Timer(t3)
- print('countdown use two Process:%f'%(t.timeit(1),))
- t = Timer(t4)
- print('countdown in one thread with io op in another thread:%f'%(t.timeit(1),))
- t = Timer(t5)
- print('countdown in one process with io op in another process:%f'%(t.timeit(1),))
4.2 几点说明:
4.2.1 utility.pyx是cython的脚本,用cython可以实现python和c的混合编程,并可以最终生成c文件。其中with nogil的意思是,在执行while循环的时候释放gil,因为接下来的计算不涉及到Python对象的操作,可以放心大胆的把gil的枷锁给去掉。
4.2.2 Setup.py是utility.pyx的编译脚本,执行python Setup.py build_ext --inplace即可在Windows下生成utility.pyd的动态库,在linux下生成的动态库叫utility.so,在python代码中就可以通过import utility来引用扩展。
4.2.3 count.py,修改后的测试程序,请注意countdown,这里将会比原来的代码多调用100次countdown!!
4.3 运行后得到输出:
[python] view plain copy
- countdown in one thread:16.968686
- countdown use two thread:9.333422
- countdown use two Process:9.620321
- countdown in one thread with io op in another thread:17.754015
- countdown in one process with io op in another process:17.867098
4.4 嗯,世界又变得很美好了,请记住,上面的输出是countdown比原来多调用100倍的输出结果,可见将数字计算的操作移到c代码会获得怎么的性能提升!!
5. 好了,最后来作个总结:
5.1 Python的GIL在单核情况下对性能的影响可以忽略不计,几乎没有。
5.2 Python由于其GIL的存在在多核CPU的情况下Thread的表现真的是非常的糟糕,但是Process则不受GIL的影响。
5.3 Python内置的数据类是不适合用于大量的数学计算的,当然这也不仅仅是Python的问题,其它完全面向对象的语言都有这个问题, 要进行大量的数学计算就要用把代码移到C/C++中去实现,这样不仅可以去除gil的影响,更可以让性能获得几十倍上百倍的提升, 或者用numpy之类的扩展在执行科学计算时也可以让性能大幅的提升。
5.4 Python慢其实就是慢在数字计算上,想想就知道,如果每一个数字都是一个对象, 在计算的时候就免不了不断的为对象申请内存,释放内存,速度肯定就慢下来。
5.5 但是,Python对数据结构的操作是非常高效的,像Python内置的强大的dict,str,list等类, 不是说大话,其处理的速度真的可以和C媲美,因为它们的实现本身就是用C实现的。 我们在编程刚入门的时候就被告知:数据结构+算法=程序,这个道理也许只会在用Python这样的语言时才会有更切身的体会。
5.6 在用Python开发程序时,你不得不花点时间在性能优化上来, 过程也很简单:用cProfile类查找出比较耗时的操作,然后将其移到C中去实现, 另外,如果是使用多核CPU的情况,一定要小心使用Thread,尽量用Process来替代Thread,通过本文对GIL的分析,将对性能的优化提供很好的帮助。 其实,Python的性能优化过程也是程序开发中有挑战又非常有成就感的部分。
5.7 但是,记住一点,不要过早的对程序进行优化,过早优化是罪恶之源 ---Donald Knuth。前期开发应该把注意力放在功能实现以及代码的可读性和可维护性上来。
5.8 最后,愿以一句话作为本篇文件的结束语:都说爱一个人就要爱他(她)的全部,包括他(她)的缺点,对人如此,对物呢?
(完)