python多线程多进程（一）

Htojk

已于 2023-03-12 17:13:19 修改

阅读量142

点赞数 2

分类专栏： python开发之路文章标签：学习

于 2023-03-08 14:58:06 首次发布

本文链接：https://blog.csdn.net/Htojk/article/details/129390670

版权

python开发之路专栏收录该内容

3 篇文章 0 订阅

订阅专栏

如何选择多线程Thread，多进程Process，和多协程Coroutine

1、什么是CPU密集型计算(CPU-bound)，IO密集型(I/O-bound)计算？

CPU密集型：I/O短时间内完成，CPU需执行大量计算。例如：压缩解压缩、加密解密、正则表达式搜索。

I/O密集型：大部分时间，CPU在等待I/O执行。例如：文件处理程序、网络爬虫程序、读写数据库程序。

2、多线程、多进程、多协程的对比？

关系：一个进程中可以启动多个线程，一个线程中可以启动多个协程。

多进程Process（multiprocessing）

优点：可以利用多核cpu并行运算

缺点：占用资源最多、可启动数目比线程少

适用于：cpu密集型计算

多线程Thread(threading)

优点:相比进程，更轻量，占用资源少。

缺点：相比进程：多线程只能并发执行，不能利用多CPU（GIL）

相比协程：启动数目有限制，占用内存资源，有线程切换开销。

适用于：IO密集型计算、同时运行的任务数目要求不多

多协程Coroutine(asyncio)

优点：内存开销最少，启动协程数量最多

缺点：支持的内存库有限制（aiohttp vs requests）、代码实现复杂

使用于：IO密集型计算、需要超过任务并行、但有现成库支持的场景

3、怎么根据任务选择对应技术？

python速度慢的两大原因？

GIL是什么？

为什么有GIL这个东西？

怎么规避GIL带来的限制？

python创建多线程的方法

import requests

urls = [
    f'https://www.cnblogs.com/#p{i}' for i in range(1,50)
]


def craw(url):
    res = requests.get(url)
    print(url,len(res.text))

import blog_spider
import threading
import time

def test_time(func):
    def inner_func():
        start = time.time()
        func()
        end = time.time()
        print(f'运行时间为：{end-start}')
    return inner_func

@test_time
def single_thread():
    print('单线程开始')
    for url in blog_spider.urls:
        blog_spider.craw(url)
    print('单线程结束')

@test_time
def multi_thread():
    print('多线程开始')
    threads = []
    for url in blog_spider.urls:
        threads.append(threading.Thread(target=blog_spider.craw,args=(url,)))
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    print('多线程结束')



if __name__ =='__main__':
    single_thread()
    multi_thread()

python实现生产者，消费者爬虫

from bs4 import BeautifulSoup
import requests

urls = [
    f'https://www.cnblogs.com/#p{i}' for i in range(1,50)
]


def craw(url):
    res = requests.get(url)
    return res.text

def parse(html):
    # class = "post-item-title"
    soup = BeautifulSoup(html, 'html.parser')
    links = soup.find_all("a",class_="post-item-title")
    return [(link['href'],link.get_text()) for link in links]

if __name__ =="__main__":
    for result in parse(craw(urls[3])):
        print(result)

import queue
import blog_spider
import time
import random
import threading


def do_craw(url_quene: queue.Queue, html_queue: queue.Queue):
    while True:
        url = url_quene.get()
        html = blog_spider.craw(url)
        html_queue.put(html)
        print(threading.current_thread().name, f'craw{url}', 'url_quene.size=', url_quene.qsize())
        time.sleep(random.randint(1,2))


def do_parse(html_queue: queue.Queue, fout):
    while True:
        html = html_queue.get()
        results = blog_spider.parse(html)
        for result in results:
            fout.write(str(result) + '\n')
        print(threading.current_thread().name, f'result.size', len(results),
              'html_quene.size=',html_queue.qsize())
        time.sleep(random.randint(1,2))

if __name__=='__main__':
    url_queue = queue.Queue()
    html_queue = queue.Queue()
    for url in blog_spider.urls:
        url_queue.put(url)
    for ids in range(3):
        t = threading.Thread(target=do_craw,args=(url_queue,html_queue),
                             name=f'craw{ids}')
        t.start()

    fout = open('02.data.txt','w')
    for ids in range(3):
        t = threading.Thread(target=do_parse,args=(html_queue,fout),
                             name=f'parse{ids}')
        t.start()

python线程安全以及解决方案：

import threading
import time

lock = threading.Lock()

class Account():
    def __init__(self,amount):
        self.amount = amount


def draw(account,amount):
    with lock:
        if account.amount>amount:
            time.sleep(0.1)
            print(threading.current_thread().name, '开始取钱')
            account.amount = account.amount - amount
            print(threading.current_thread().name, '取钱成功')
            print(threading.current_thread().name, '剩余金额：',account.amount)
        else:
            print(threading.current_thread().name,'取钱失败，余额不足！')

if __name__ == '__main__':
    account = Account(1000)
    thread1 = threading.Thread(name='thread1', target=draw, args=(account, 800))
    thread2 = threading.Thread(name='thread2', target=draw, args=(account, 800))
    thread1.start()
    thread2.start()

参考来源：bilibili视频，后续会附上地址