多进程，多线程

最新推荐文章于 2022-08-15 15:48:12 发布

会不会依然想起我a

最新推荐文章于 2022-08-15 15:48:12 发布

阅读量121

点赞数

分类专栏： python

python 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

程序并不能单独运行，只有将程序装载到内存中，系统为它分配资源才能运行，而这种执行的程序就称之为进程。程序和进程的区别就在于：程序是指令的集合，它是进程运行的静态描述文本；进程是程序的一次执行活动，属于动态概念。

在多道编程中，我们允许多个程序同时加载到内存中，在操作系统的调度下，可以实现并发地执行。这是这样的设计，大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU，因此，进程就是为了在CPU上实现多道编程而提出的。

有了进程为什么还要线程？

进程有很多优点，它提供了多道编程，让我们感觉我们每个人都拥有自己的CPU和其他资源，可以提高计算机的利用率。很多人就不理解了，既然进程这么优秀，为什么还要线程呢？其实，仔细观察就会发现进程还是有很多缺陷的，主要体现在两点上：

进程只能在一个时间干一件事，如果想同时干两件事或多件事，进程就无能为力了。
进程在执行的过程中如果阻塞，例如等待输入，整个进程就会挂起，即使进程中有些工作不依赖于输入的数据，也将无法执行。
线程是操作系统能够进行运算调度的最小单位。它被包含在进程之中，是进程中的实际运作单位。一条线程指的是进程中一个单一顺序的控制流，一个进程中可以并发多个线程，每条线程并行执行不同的任务

进程和线程的区别？
（1）进程是资源的分配和调度的一个独立单元，而线程是CPU调度的基本单元
（2）同一个进程中可以包括多个线程，并且线程共享整个进程的资源（寄存器、堆栈、上下文），一个进行至少包括一个线程。
（3）进程的创建调用fork或者vfork，而线程的创建调用pthread_create，进程结束后它拥有的所有线程都将销毁，而线程的结束不会影响同个进程中的其他线程的结束
（4）线程是轻两级的进程，它的创建和销毁所需要的时间比进程小很多，所有操作系统中的执行功能都是创建线程去完成的
（5）线程中执行时一般都要进行同步和互斥，因为他们共享同一进程的所有资源
（6）线程有自己的私有属性TCB，线程id，寄存器、硬件上下文，而进程也有自己的私有属性进程控制块PCB，这些私有属性是不被共享的，用来标示一个进程或一个线程的标志

线程有2种调用方式，如下：

直接调用

 
           import  
           threading 
          
           import  
           time 
          
           def  
           sayhi(num):  
           #定义每个线程要运行的函数 
          
           print 
           ( 
           "running on number:%s"  
           % 
           num) 
          
           time.sleep( 
           3 
           ) 
          
           if  
           __name__  
           = 
           =  
           '__main__' 
           : 
          
           t1  
           =  
           threading.Thread(target 
           = 
           sayhi,args 
           = 
           ( 
           1 
           ,))  
           #生成一个线程实例 
          
           t2  
           =  
           threading.Thread(target 
           = 
           sayhi,args 
           = 
           ( 
           2 
           ,))  
           #生成另一个线程实例 
          
           t1.start()  
           #启动线程 
          
           t2.start()  
           #启动另一个线程 
          
           print 
           (t1.getName())  
           #获取线程名 
          
           print 
           (t2.getName())

继承式调用

 
           import  
           threading 
          
           import  
           time 
          
           class  
           MyThread(threading.Thread): 
          
           def  
           __init__( 
           self 
           ,num): 
          
           threading.Thread.__init__( 
           self 
           ) 
          
           self 
           .num  
           =  
           num 
          
           def  
           run( 
           self 
           ): 
           #定义每个线程要运行的函数 
          
           print 
           ( 
           "running on number:%s"  
           % 
           self 
           .num) 
          
           time.sleep( 
           3 
           ) 
          
           if  
           __name__  
           = 
           =  
           '__main__' 
           : 
          
           t1  
           =  
           MyThread( 
           1 
           ) 
          
           t2  
           =  
           MyThread( 
           2 
           ) 
          
           t1.start() 
          
           t2.start()

线程锁(互斥锁Mutex)

一个进程下可以启动多个线程，多个线程共享父进程的内存空间，也就意味着每个线程可以访问同一份数据，此时，如果2个线程同时要修改同一份数据，会出现什么状况？

 
          import  
          time 
         
          import  
          threading 
         
          def  
          addNum(): 
         
          global  
          num  
          #在每个线程中都获取这个全局变量 
         
          print 
          ( 
          '--get num:' 
          ,num ) 
         
          time.sleep( 
          1 
          ) 
         
          num   
          - 
          = 
          1  
          #对此公共变量进行-1操作 
         
          num  
          =  
          100   
          #设定一个共享变量 
         
          thread_list  
          =  
          [] 
         
          for  
          i  
          in  
          range 
          ( 
          100 
          ): 
         
          t  
          =  
          threading.Thread(target 
          = 
          addNum) 
         
          t.start() 
         
          thread_list.append(t) 
         
          for  
          t  
          in  
          thread_list:  
          #等待所有线程执行完毕 
         
          t.join() 
         
          print 
          ( 
          'final num:' 
          , num )

正常来讲，这个num结果应该是0，但在python 2.7上多运行几次，会发现，最后打印出来的num结果不总是0，为什么每次运行的结果不一样呢？哈，很简单，假设你有A,B两个线程，此时都要对num 进行减1操作，由于2个线程是并发同时运行的，所以2个线程很有可能同时拿走了num=100这个初始变量交给cpu去运算，当A线程去处完的结果是99，但此时B线程运算完的结果也是99，两个线程同时CPU运算的结果再赋值给num变量后，结果就都是99。那怎么办呢？很简单，每个线程在要修改公共数据时，为了避免自己在还没改完的时候别人也来修改此数据，可以给这个数据加一把锁，这样其它线程想修改此数据时就必须等待你修改完毕并把锁释放掉后才能再访问此数据。

*注：不要在3.x上运行，不知为什么，3.x上的结果总是正确的，可能是自动加了锁

加锁版本

 
          import  
          time 
         
          import  
          threading 
         
          def  
          addNum(): 
         
          global  
          num  
          #在每个线程中都获取这个全局变量 
         
          print 
          ( 
          '--get num:' 
          ,num ) 
         
          time.sleep( 
          1 
          ) 
         
          lock.acquire()  
          #修改数据前加锁 
         
          num   
          - 
          = 
          1  
          #对此公共变量进行-1操作 
         
          lock.release()  
          #修改后释放 
         
          num  
          =  
          100   
          #设定一个共享变量 
         
          thread_list  
          =  
          [] 
         
          lock  
          =  
          threading.Lock()  
          #生成全局锁 
         
          for  
          i  
          in  
          range 
          ( 
          100 
          ): 
         
          t  
          =  
          threading.Thread(target 
          = 
          addNum) 
         
          t.start() 
         
          thread_list.append(t) 
         
          for  
          t  
          in  
          thread_list:  
          #等待所有线程执行完毕 
         
          t.join() 
         
          print 
          ( 
          'final num:' 
          , num )

GIL VS Lock

机智的同学可能会问到这个问题，就是既然你之前说过了，Python已经有一个GIL来保证同一时间只能有一个线程来执行了，为什么这里还需要lock? 注意啦，这里的lock是用户级的lock,跟那个GIL没关系，具体我们通过下图来看一下+配合我现场讲给大家，就明白了。

那你又问了，既然用户程序已经自己有锁了，那为什么C python还需要GIL呢？加入GIL主要的原因是为了降低程序的开发的复杂度，比如现在的你写python不需要关心内存回收的问题，因为Python解释器帮你自动定期进行内存回收，你可以理解为python解释器里有一个独立的线程，每过一段时间它起wake up做一次全局轮询看看哪些内存数据是可以被清空的，此时你自己的程序里的线程和 py解释器自己的线程是并发运行的，假设你的线程删除了一个变量，py解释器的垃圾回收线程在清空这个变量的过程中的clearing时刻，可能一个其它线程正好又重新给这个还没来及得清空的内存空间赋值了，结果就有可能新赋值的数据被删除了，为了解决类似的问题，python解释器简单粗暴的加了锁，即当一个线程运行时，其它人都不能动，这样就解决了上述的问题，这可以说是Python早期版本的遗留问题。

RLock（递归锁）

说白了就是在一个大锁中还要再包含子锁

 
          import  
          threading,time 
         
          def  
          run1(): 
         
          print 
          ( 
          "grab the first part data" 
          ) 
         
          lock.acquire() 
         
          global  
          num 
         
          num  
          + 
          = 
          1 
         
          lock.release() 
         
          return  
          num 
         
          def  
          run2(): 
         
          print 
          ( 
          "grab the second part data" 
          ) 
         
          lock.acquire() 
         
          global   
          num2 
         
          num2 
          + 
          = 
          1 
         
          lock.release() 
         
          return  
          num2 
         
          def  
          run3(): 
         
          lock.acquire() 
         
          res  
          =  
          run1() 
         
          print 
          ( 
          '--------between run1 and run2-----' 
          ) 
         
          res2  
          =  
          run2() 
         
          lock.release() 
         
          print 
          (res,res2) 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          num,num2  
          =  
          0 
          , 
          0 
         
          lock  
          =  
          threading.RLock() 
         
          for  
          i  
          in  
          range 
          ( 
          10 
          ): 
         
          t  
          =  
          threading.Thread(target 
          = 
          run3) 
         
          t.start() 
         
          while  
          threading.active_count() ! 
          =  
          1 
          : 
         
          print 
          (threading.active_count()) 
         
          else 
          : 
         
          print 
          ( 
          '----all threads done---' 
          ) 
         
          print 
          (num,num2)

Semaphore(信号量)

互斥锁同时只允许一个线程更改数据，而Semaphore是同时允许一定数量的线程更改数据，比如厕所有3个坑，那最多只允许3个人上厕所，后面的人只能等里面有人出来了才能再进去。

 
          import  
          threading,time 
         
          def  
          run(n): 
         
          semaphore.acquire() 
         
          time.sleep( 
          1 
          ) 
         
          print 
          ( 
          "run the thread: %s\n"  
          % 
          n) 
         
          semaphore.release() 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          num 
          =  
          0 
         
          semaphore   
          =  
          threading.BoundedSemaphore( 
          5 
          )  
          #最多允许5个线程同时运行 
         
          for  
          i  
          in  
          range 
          ( 
          20 
          ): 
         
          t  
          =  
          threading.Thread(target 
          = 
          run,args 
          = 
          (i,)) 
         
          t.start() 
         
          while  
          threading.active_count() ! 
          =  
          1 
          : 
         
          pass  
          #print threading.active_count() 
         
          else 
          : 
         
          print 
          ( 
          '----all threads done---' 
          ) 
         
          print 
          (num)

多进程multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

 
          from  
          multiprocessing  
          import  
          Process 
         
          import  
          time 
         
          def  
          f(name): 
         
          time.sleep( 
          2 
          ) 
         
          print 
          ( 
          'hello' 
          , name) 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          p  
          =  
          Process(target 
          = 
          f, args 
          = 
          ( 
          'bob' 
          ,)) 
         
          p.start() 
         
          p.join()

To show the individual process IDs involved, here is an expanded example:

 
          from  
          multiprocessing  
          import  
          Process 
         
          import  
          os 
         
          def  
          info(title): 
         
          print 
          (title) 
         
          print 
          ( 
          'module name:' 
          , __name__) 
         
          print 
          ( 
          'parent process:' 
          , os.getppid()) 
         
          print 
          ( 
          'process id:' 
          , os.getpid()) 
         
          print 
          ( 
          "\n\n" 
          ) 
         
          def  
          f(name): 
         
          info( 
          '\033[31;1mfunction f\033[0m' 
          ) 
         
          print 
          ( 
          'hello' 
          , name) 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          info( 
          '\033[32;1mmain process line\033[0m' 
          ) 
         
          p  
          =  
          Process(target 
          = 
          f, args 
          = 
          ( 
          'bob' 
          ,)) 
         
          p.start() 
         
          p.join()

进程间通讯　　

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：

Queues

使用方法跟threading里的queue差不多

 
          from  
          multiprocessing  
          import  
          Process, Queue 
         
          def  
          f(q): 
         
          q.put([ 
          42 
          ,  
          None 
          ,  
          'hello' 
          ]) 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          q  
          =  
          Queue() 
         
          p  
          =  
          Process(target 
          = 
          f, args 
          = 
          (q,)) 
         
          p.start() 
         
          print 
          (q.get())     
          # prints "[42, None, 'hello']" 
         
          p.join()

Pipes

The Pipe() function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example:

 
          from  
          multiprocessing  
          import  
          Process, Pipe 
         
          def  
          f(conn): 
         
          conn.send([ 
          42 
          ,  
          None 
          ,  
          'hello' 
          ]) 
         
          conn.close() 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          parent_conn, child_conn  
          =  
          Pipe() 
         
          p  
          =  
          Process(target 
          = 
          f, args 
          = 
          (child_conn,)) 
         
          p.start() 
         
          print 
          (parent_conn.recv())    
          # prints "[42, None, 'hello']" 
         
          p.join()

The two connection objects returned by Pipe() represent the two ends of the pipe. Each connection object has send() and recv() methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Managers

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array. For example,

 
           from  
           multiprocessing  
           import  
           Process, Manager 
          
           def  
           f(d, l): 
          
           d[ 
           1 
           ]  
           =  
           '1' 
          
           d[ 
           '2' 
           ]  
           =  
           2 
          
           d[ 
           0.25 
           ]  
           =  
           None 
          
           l.append( 
           1 
           ) 
          
           print 
           (l) 
          
           if  
           __name__  
           = 
           =  
           '__main__' 
           : 
          
           with Manager() as manager: 
          
           d  
           =  
           manager. 
           dict 
           () 
          
           l  
           =  
           manager. 
           list 
           ( 
           range 
           ( 
           5 
           )) 
          
           p_list  
           =  
           [] 
          
           for  
           i  
           in  
           range 
           ( 
           10 
           ): 
          
           p  
           =  
           Process(target 
           = 
           f, args 
           = 
           (d, l)) 
          
           p.start() 
          
           p_list.append(p) 
          
           for  
           res  
           in  
           p_list: 
          
           res.join() 
          
           print 
           (d) 
          
           print 
           (l)

进程同步

Without using the lock output from the different processes is liable to get all mixed up.

 
          from  
          multiprocessing  
          import  
          Process, Lock 
         
          def  
          f(l, i): 
         
          l.acquire() 
         
          try 
          : 
         
          print 
          ( 
          'hello world' 
          , i) 
         
          finally 
          : 
         
          l.release() 
         
          if  
          __name__  
          = 
          =  
          '__main__' 
          : 
         
          lock  
          =  
          Lock() 
         
          for  
          num  
          in  
          range 
          ( 
          10 
          ): 
         
          Process(target 
          = 
          f, args 
          = 
          (lock, num)).start()

进程池　　

进程池内部维护一个进程序列，当使用时，则去进程池中获取一个进程，如果进程池序列中没有可供使用的进进程，那么程序就会等待，直到进程池中有可用进程为止。

进程池中有两个方法：

apply
apply_async

 
          from   
          multiprocessing  
          import  
          Process,Pool 
         
          import  
          time 
         
          def  
          Foo(i): 
         
          time.sleep( 
          2 
          ) 
         
          return  
          i 
          + 
          100 
         
          def  
          Bar(arg): 
         
          print 
          ( 
          '-->exec done:' 
          ,arg) 
         
          pool  
          =  
          Pool( 
          5 
          ) 
         
          for  
          i  
          in  
          range 
          ( 
          10 
          ): 
         
          pool.apply_async(func 
          = 
          Foo, args 
          = 
          (i,),callback 
          = 
          Bar) 
         
          #pool.apply(func=Foo, args=(i,)) 
         
          print 
          ( 
          'end' 
          ) 
         
          pool.close() 
         
          pool.join() 
          #进程池中进程执行完毕后再关闭，如果注释，那么程序直接关闭。

原文：http://www.cnblogs.com/alex3714/articles/5230609.html

会不会依然想起我a

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
多进程，多线程

程序并不能单独运行，只有将程序装载到内存中，系统为它分配资源才能运行，而这种执行的程序就称之为进程。程序和进程的区别就在于：程序是指令的集合，它是进程运行的静态描述文本；进程是程序的一次执行活动，属于动态概念。在多道编程中，我们允许多个程序同时加载到内存中，在操作系统的调度下，可以实现并发地执行。这是这样的设计，大大提高了CPU的利用率。进程的出现让每个用户感觉到自己独享CPU，因此，进程
复制链接

扫一扫

专栏目录