python进程线程

最新推荐文章于 2024-09-14 19:55:48 发布

Lxy_Python

最新推荐文章于 2024-09-14 19:55:48 发布

阅读量380

点赞数

分类专栏： python 文章标签： python 线程

python 专栏收录该内容

35 篇文章 0 订阅

订阅专栏

进程与线程

进程是程序执行的最小单元，每个进程都有自己独立的内存空间，而线程是进程的一个实体，是系统调用调用的一个基本单位。

举个栗子吧：

我们启动一个app 这就创建了一个进程，这个app里可能有语音播放、搜索等功能，这些是进程里不同的线程。

注意：线程是轻量级的，他没有独立的空间地址(内存空间)，因为他是由进程创建的，寄存在进程的内存地址中。一个进程会包含多个线程

线程的5种状态：

1、新建状态：

当一个线程被创建时就开始了它的生命周期，在启动线程之前他一直处于新建状态。

2、就绪状态：

当线程被启动时，由于还没有分配到cpu资源，该线程进入等待队列在等待另一个线程执行完(等待cpu服务)，此时线程被称为就绪状态。

3、运行状态：

当处于就绪状态的线程被调用并获得cpu资源时，此时为运行状态。

4、阻塞状态：

一个正在执行的线程在某些情况下不得已让出cpu资源时，会中止自己的执行过程，这是被称为阻塞状态。值得注意的是：阻塞被消除后是回到就绪状态，不是运行状态。

5、死亡状态：

线程被终止、销毁、或执行完毕则进入死亡状态。不可再重新启动

阻塞状态又分为三种情况：等待阻塞、同步阻塞、其他阻塞

说到阻塞不得不提到一个‘锁’的概念

多线程可以运行多个任务，很大程度上提高了我们程序的工作效率，但是面临一个非常致命的问题。如果有多个线程去操作同一个列表(这个列表被称为：共享数据)，比如线程a要列表第一个元素的值加1，这个过程可以细分为3步：1.取出元素；2：元素加1；3：将最终的结果放入列表。那如果在a线程执行到第二步加1的时候线程b突然要读取列表那么他读取到的列表仍然是没修改之前的内容。这并不是我们想要的

所以引进了锁的概念。当某个线程需要独占共享资源时，必须先上锁，这样别的线程就无法再操作。当操作完之后一定要将锁打开，别的线程才可以操作数据。

在I/O密集型操作中，需要保持数据同步的时候需要加锁保证资源同步。但同时因为其他线程面临阻塞，性能不可避免的会下降。

同步阻塞：线程请求锁定的时候进入同步阻塞，一旦获得锁又变成运行状态。

等待阻塞：是指等待其他线程通知的状态，线程获得条件锁定后，调用“等待”将进入这个状态，一旦其他线程发出通知，线程将进入同步状态，再次竞争条件锁定。

其他阻塞：指线程sleep 、join或等待io时的阻塞。

进程

fork（）一个进程，包括代码、数据和分配给进程的资源。fork（）函数通过系统调用创建一个与原来进程几乎完全相同的进程，也就是两个进程可以做完全相同的事，但如果初始参数或者传入的变量不同，两个进程也可以做不同的事。

Unix/Linux操作系统提供了一个fork()系统调用，它非常特殊。普通的函数调用，调用一次，返回一次，但是fork()调用一次，返回两次，因为操作系统自动把当前进程（称为父进程）复制了一份（称为子进程），然后，分别在父进程和子进程内返回。

子进程永远返回0，而父进程返回子进程的ID。这样做的理由是，一个父进程可以fork出很多子进程，所以，父进程要记下每个子进程的ID，而子进程只需要调用getppid()就可以拿到父进程的ID。

Python的os模块封装了常见的系统调用，其中就包括fork，可以在Python程序中轻松创建子进程：

import os

print('Process (%s) start...' % os.getpid())
# Only works on Unix/Linux/Mac:
pid = os.fork()
if pid == 0:
    print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid()))
else:
    print('I (%s) just created a child process (%s).' % (os.getpid(), pid))


>>>Process (8232) start...
I (8232) just created a child process (8283).
I am child process (8283) and my parent is 8232.

有了fork调用，一个进程在接到新任务时就可以复制出一个子进程来处理新任务，常见的Apache服务器就是由父进程监听端口，每当有新的http请求时，就fork出子进程来处理新的http请求。

Multiprocessing

`multiprocessing`模块提供了一个`Process`类来代表一个进程对象，下面的例子演示了启动一个子进程并等待其结束：

from multiprocessing import Process
import os

# 子进程要执行的代码
def run_proc(name):
    print('Run child process %s (%s)...' % (name, os.getpid()))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Process(target=run_proc, args=('test',))
    print('Child process will start.')
    p.start()
    p.join()
    print('Child process end.')

>>>Parent process 8360.
Child process will start.
Run child process test (8453)...
Child process end.

创建子进程时，只需要传入一个执行函数和函数的参数，创建一个Process实例，用start()方法启动，这样创建进程比fork()还要简单。

join()方法可以等待子进程结束后再继续往下运行，通常用于进程间的同步。

Pool

如果要启动大量的子进程，可以用进程池的方式批量创建子进程：

from multiprocessing import Pool
import os, time, random

def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')


>Parent process 8360.
Waiting for all subprocesses done...
Run task 0 (8525)...
Run task 1 (8526)...
Run task 2 (8527)...
Run task 3 (8528)...
Task 2 runs 0.13 seconds.
Run task 4 (8527)...
Task 4 runs 0.54 seconds.
Task 3 runs 1.04 seconds.
Task 1 runs 1.14 seconds.
Task 0 runs 1.34 seconds.
All subprocesses done.

对Pool对象调用join()方法会等待所有子进程执行完毕，调用join()之前必须先调用close()，调用close()之后就不能继续添加新的Process了。

请注意输出的结果，task 0，1，2，3是立刻执行的，而task 4要等待前面某个task完成后才执行，这是因为Pool设置为4，因此，最多同时执行4个进程。这是Pool有意设计的限制，并不是操作系统的限制。如果改成：

p = Pool(5)

就可以同时跑5个进程。

线程

import _thread
import threading
import time

def my_thread(threadName):
   for i in range(10):
       print(' 线程 :' + threadName + '正在执行')

# 启动线程
# 方法名  方法参数，无参时空tuple
# _thread.start_new_thread()
t1 = threading.Thread(target=my_thread, args=('name1',))
t2 = threading.Thread(target=my_thread, args=('name2',))

## t1 = _thread.start_new_thread(my_thread, ('name1',))
# t2 = _thread.start_new_thread(my_thread, ('name2',))

# 通过start方法 启动线程
t1.start()
t2.start()

从控制台打印的结果来看 t1线程和t2线程无规律的交错打印。这正是两个线程之间抢占cpu资源的结果。

模拟多窗口出售电影票的场景来理解阻塞和锁

import threading
# 库存电影票数量,为了使结果更加准确设置成10w
num = 100000
def thread(name):
   global num
   while num > 0:
       num -= 1
       print('%s出售 1 张电影票 === 剩余 %d 张电影票' % (name, num))

# 三种售票途径
businesses = ['美团', '淘票', '糯米']
for i in businesses:
   # 创建线程
   t = threading.Thread(target=thread, args=(i,))
   # 启动线程
   t.start()

通过多次运行代码、发现控制台打印的结果有时候明明两个窗口都售出去一张票了，但余票数量相等。更明显的是明明美团窗口显示余票已经为0了，但是另外两个窗口还是有很多剩余电影票。

通过分析控制台记录我们会发现，美团售票窗口一次性卖了好几百张票，糯米和淘票窗口的数据一直没有更新成最新的库存，导致明明没票了，缺还显示剩余很多。

不仅仅是售票，生活中有很多这样的例子，比如抢购火车票，银行取钱等...都会有这种数据不同步的问题。解决这一问题的办法就是前面提到的‘锁’。

简单的讲美团在卖票的过程中，将库存进行锁定，在这期间糯米和淘票票不可以在操作，只能等待美团操作完将数据更新后，然后释放锁才可以继续操作。

在threading模块中提供了一个获得线程锁的方法:

threading.Lock()

import threading
# 库存电影票数量,为了使结果更加准确设置成10w
num = 100000
lock = threading.Lock()
def thread(name):
   global num
   while num > 0:
       # 加锁  这里一定要放在判断总量之前，
       # 不然会导致另外两个窗口最后会出现负数票的情况
       # 如果没有加锁就释放锁会导致报错，所以在while循环里又加了一层if判断
       lock.acquire()
       if num > 0:
           num -= 1
           print('%s出售 1 张电影票 === 剩余 %d 张电影票' % (name, num))
           # 释放锁
           lock.release()
       else:
           lock.release()

# 三种售票途径
businesses = ['美团', '淘票', '糯米']
for i in businesses:
   # 创建线程
   t = threading.Thread(target=thread, args=(i,))
   # 启动线程
   t.start()

在运行时我们发现，无论执行代码多少次，最终票数为0时，所有窗口都停止售票了。这个例子很完美的阐述了阻塞和锁在多线程中的重要性！！！

线程一：创建线程

Python中有两个线程模块，分别是thread和threading，threading是thread的升级版。threading的功能更强大。

创建线程有3种方法：

1.thread模块的start_new_thread函数

2、继承自threading.Thread模块

3、用theading.Thread直接返回一个thread对象，然后运行它的start方法

方法一、thread模块的start_new_thread函数

其函数原型：
    start_new_thread(function,atgs[,kwargs])
其参数含义如下：
    function: 在线程中执行的函数名
    args:元组形式的参数列表。
    kwargs: 可选参数，以字典的形式指定参数（即对一些参数进行指定初始化）

import _thread
 
def hello(id = 0, interval = 2):
    for i in filter(lambda x: x % interval == 0, range(10)):
        print ("Thread id : %d, time is %d\n" % (id, i))
 
if __name__ == "__main__":
 
    #_thread.start_new_thread(hello, (1,2))   这种调用形式也是可用的
    #_thread.start_new_thread(hello, (2,4))
     
    _thread.start_new_thread(hello, (), {"id": 1})
    _thread.start_new_thread(hello, (), {"id": 2})

方法二：继承自threading.Thread模块

注意：必须重写run函数，而且想要运行应该调用start方法

import threading
 
class MyThread(threading.Thread):
 
    def __init__(self, id, interval):
        threading.Thread.__init__(self)
 
        self.id = id
        self.interval = interval
 
    def run(self):
        for x in filter(lambda x: x % self.interval == 0, range(10)):
            print ("Thread id : %d   time is %d \n" % (self.id, x))
 
 
if __name__ == "__main__":
    t1 = MyThread(1, 2)
    t2 = MyThread(2, 4)
 
    t1.start()
    t2.start()
 
    t1.join()
    t2.join()

方法三：用theading.Thread直接返回一个thread对象，然后运行它的start方法

import threading
 
def hello(id, times):
    for i in range(times):
        print ("hello %s time is %d\n" % (id , i))
 
 
if __name__ == "__main__":
    t = threading.Thread(target=hello, args=("Tom", 5))
    t.start()

python线程二：Timer（定时器） join()

Timer: 隔一定时间调用一个函数,如果想实现每隔一段时间就调用一个函数的话，就要在Timer调用的函数中，再次设置Timer。Timer是Thread的一个派生类

import threading
import time

def hello(name):
    print ("hello %s\n" % name)

    global timer
    timer = threading.Timer(2.0, hello, ["Tom"])
    timer.start()

if __name__ == "__main__":
    timer = threading.Timer(2.0, hello, ["Tom"])
    timer.start()

join()
如果一个线程在执行过程中要调用另外一个线程，并且等到其完成以后才能接着执行

def my_thread(threadName):
  for i in range(10):
      print(' 线程 :' + threadName + '正在执行')

# 启动线程
# 方法名  方法参数，无参时空tuple
# _thread.start_new_thread()
t1 = threading.Thread(target=my_thread, args=('name1',))
t2 = threading.Thread(target=my_thread, args=('name2',))

# t1 = _thread.start_new_thread(my_thread, ('name1',))
# t2 = _thread.start_new_thread(my_thread, ('name2',))

# 通过start方法 启动线程
t1.start()
t2.start()
# 如果不加join 则Ending会在t1和t2没有执行完就会打印
# 加了join之后 Ending会等待线程执行完毕之后才会打印
t1.join()
t2.join()
print("Ending 。。。。。")

python线程三：简单锁实现线程同步

Python中有两种锁，一个锁是原始的锁（原语），不可重入，而另一种锁则是可重入的锁即递归锁。而thread模块中，只提供了不可重入的锁，而在threading中则提供这两种锁。

可重入：当一个线程拥有一个锁的使用权后，再次获取锁的使用权时，不会阻塞，会立马得到使用权，则原始锁的话，则不行，会阻塞。

方法一：thread的不可重入锁

import _thread
import time
 
lock = _thread.allocate_lock()
 
def Count(id):
    global num;
 
    while True:
        lock.acquire()
        if num <= 10:
            print ("Thread id is : %s     The num is %s\n" % (id, str(num)))
            num = num + 1
        else:
            break
        lock.release()
    else:
        _thread.exit_thread()
 
if __name__ == "__main__":
    num = 1
    _thread.start_new_thread(Count, ('A',))
    _thread.start_new_thread(Count, ('B',))
 
    time.sleep(5)

方法二：theading的Lock(不可重入锁)

import threading
import time
 
lock = threading.Lock()
 
def Count(id):
    global num;
 
    while True:
        lock.acquire()
        if num <= 10:
 
            print ("Thread id is : %s     The num is %s\n" % (id, str(num)))
            num = num + 1
        else:
            break
        lock.release()
 
if __name__ == "__main__":
    num = 1
    t1 = threading.Thread(target=Count, args=('A', ))
    t2 = threading.Thread(target=Count, args=('B', ))
 
    t1.start()
    t2.start()
 
    time.sleep(5)

方法三：threading的RLock(可重入)

RLock允许在同一线程中被多次acquire。而Lock却不允许这种情况。注意：如果使用RLock，那么acquire和release必须成对出现，即调用了n次acquire，必须调用n次的release才能真正释放所占用的琐。

import threading
import time
 
lock = threading.RLock()
 
def CountNum(id):
    global num
     
    lock.acquire()
     
    if num <= 10:
        print ("Thread id is : %s     The num is %s\n" % (id, str(num)))
        num = num + 1
        CountNum(id)
 
    lock.release()
 
if __name__ == "__main__":
    num = 1
    t1 = threading.Thread(target=CountNum, args=('A'))
 
    t1.start()
 
    time.sleep(5)

python线程四：Condition 条件变量（）

可以把Condiftion理解为一把高级的琐，它提供了比Lock, RLock更高级的功能，允许我们能够控制复杂的线程同步问题。threadiong.Condition在内部维护一个琐对象（默认是RLock），可以在创建Condigtion对象的时候把琐对象作为参数传入。Condition也提供了acquire, release方法，其含义与琐的acquire, release方法一致，其实它只是简单的调用内部琐对象的对应的方法而已。Condition还提供wait方法、notify方法、notifyAll方法(特别要注意：这些方法只有在占用琐(acquire)之后才能调用，否则将会报RuntimeError异常)

　　　　条件变量是属于线程的高级应用，所以我们一般需要引入threading模块，而在条件变量中，最经典的例子，恐怕就是生产者与消费者的问题了。

　　Condition: 一个比Lock, RLock更高级的锁

　　wait:　　　等待被唤醒

　　notify/notifyAll : 唤醒一个线程，或者唤醒所有线程

　　注意：Condition，在wait之前必须require

　　代码：

import threading
import time

class Buf:
    def __init__(self):

        self.cond = threading.Condition()
        self.data = []

    def isFull(self):
        return len(self.data) == 5

    def isEmpty(self):
        return len(self.data) == 0

    def get(self):

        self.cond.acquire()

        while self.isEmpty():
            self.cond.wait()

        temp = self.data.pop(0)

        self.cond.notify()
        self.cond.release()
        return temp

    def put(self, putInfo):
        self.cond.acquire()

        while self.isFull():
            self.cond.wait()

        self.data.append(putInfo)

        self.cond.notify()
        self.cond.release()


def Product(num):
    for i in range(num):
        info.put(i+1)
        print ("Product %s\n" % (str(i+1)))

def Customer(id, num):
    for i in range(num):
        temp = info.get()
        print ("Customer%s %s\n" % (id, str(temp)))

info = Buf();


if __name__ == "__main__":
    p = threading.Thread(target=Product, args=(10, ))
    c1 = threading.Thread(target=Customer, args=('A', 5))
    c2 = threading.Thread(target=Customer, args=('B', 5))

    p.start()
    time.sleep(1)
    c1.start()
    c2.start()

    p.join()
    c1.join()
    c2.join()

    print ("Game Over")

python线程五：semphore同步

Semaphore：Semaphore 在内部管理着一个计数器。调用 acquire() 会使这个计数器 -1，release() 则是+1(可以多次release()，所以计数器的值理论上可以无限).计数器的值永远不会小于 0，当计数器到 0 时，再调用 acquire() 就会阻塞，直到其他线程来调用release()

Semphore，是一种带计数的线程同步机制，当调用release时，增加计算，当acquire时，减少计数，当计数为0时，自动阻塞，等待release被调用。

而在Python中存在两种Semphore，一种就是纯粹的Semphore，还有一种就是BoundedSemaphore。

区别：

Semphore: 在调用release()函数时，不会检查，增加的计数是否超过上限（没有上限，会一直上升）

BoundedSemaphore：在调用release()函数时，会检查，增加的计数是否超过上限，这样就保证了使用的计数

代码：

import threading
import time

semaphore = threading.Semaphore(3)
#semaphore = threading.BoundedSemaphore(3)

def fun():
    print ("Thread %s is waiting semphore\n" % threading.currentThread().getName())
    semaphore.acquire()
    print ("Thread %s get semphore\n" % threading.currentThread().getName())
    time.sleep(1)
    print ("Thread %s release semphore\n" % threading.currentThread().getName())
    semaphore.release()


if __name__ == "__main__":
    t1 = threading.Thread(target=fun)
    t2 = threading.Thread(target=fun)
    t3 = threading.Thread(target=fun)
    t4 = threading.Thread(target=fun)

    t1.start()
    t2.start()
    t3.start()
    t4.start()

    t1.join()
    t2.join()
    t3.join()
    t4.join()

    semaphore.release()  #这里因为是简单的Semaphore,所以可以再次释放，不会报错，而BoundedSemaphore，则会报错

python线程六:Event

Event: 是线程同步的一种方式，类似于一个标志，当该标志为false时，所有等待该标志的线程阻塞，当为true时，所有等待该标志的线程被唤醒

isSet(): 　　　　当内置标志为True时返回True。
set(): 　　　　将标志设为True，并通知所有处于等待阻塞状态的线程恢复运行状态。
clear(): 　　　　将标志设为False。
wait([timeout]): 如果标志为True将立即返回，否则阻塞线程至等待阻塞状态，等待其他线程调用set()

import threading
import time

event = threading.Event()

def func():
    print ("%s is waiting for event...\n" % threading.currentThread().getName())
    event.wait()

    print ("%s get the Event..\n" % threading.currentThread().getName())


if __name__ == "__main__":
    t1 = threading.Thread(target=func)
    t2 = threading.Thread(target=func)

    t1.start()
    t2.start()

    time.sleep(2)

    print "MainThread set Event\n"

    event.set()

    t1.join()
    t2.join()

python线程七：Local（线程局部存储）

线程局部存储（tls），对于同一个local，线程无法访问其他线程设置的属性；线程设置的属性不会被其他线程设置的同名属性替换。

代码：

import threading

local = threading.local()
local.tname = "main"

def func(info):
    local.tname = info
    print local.tname

t1 = threading.Thread(target=func, args=['funcA'])
t2 = threading.Thread(target=func, args=['funcB'])

t1.start()
t1.join()

t2.start()
t2.join()

print (local.tname)

isAlive

isAlive 等价于 is_alive(self)，用于判断线程是否运行。当线程没有调用start时，或者线程执行完毕处于死亡状态，isAlive()返回false。

# False
print(t1.isAlive())
t1.start()
# True
print(t1.is_alive())

Daemon

Python主程序当且仅当不存在非Daemon线程存活时退出。即:主程序等待所有非Daemon线程结束后才退出，且退出时会自动结束（很粗鲁的结束）所有Daemon线程。

t1 = threading.Thread(target=thread_run, args=('jone', ), daemon= True)
t1.setDaemon(True)