python用爬虫实战Future

实现下载功能,顺序执行

sites = [
    'https://www.tutorialspoint.com/python/index.htm',
    'https://www.w3schools.com/python/default.asp',
    'https://realpython.com/tutorials/python/',
    'https://www.programiz.com/python-programming/tutorial',
    'https://pythonbasics.org/learn-python/',
    'https://www.learnpython.org/',
    'https://www.datacamp.com/community/tutorials/python-data-science-handbook',
    'https://www.codecademy.com/learn/learn-python',
    'https://python-course.eu/python3_tutorial.php'
    'https://docs.python.org/3/tutorial/index.html'
]
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    
def download_all(sites):
    for site in sites:
        download_one(site)

def main():
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()
Read 190716 from https://www.tutorialspoint.com/python/index.htm
Read 360091 from https://www.w3schools.com/python/default.asp
Read 135028 from https://realpython.com/tutorials/python/
Read 79647 from https://www.programiz.com/python-programming/tutorial
Read 3082 from https://pythonbasics.org/learn-python/
Read 31649 from https://www.learnpython.org/
Read 15536 from https://www.datacamp.com/community/tutorials/python-data-science-handbook
Read 604652 from https://www.codecademy.com/learn/learn-python
Read 26062 from https://python-course.eu/python3_tutorial.phphttps://docs.python.org/3/tutorial/index.html
Download 9 sites in 32.739756299999996 seconds

改用多线程

import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))


def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites)

def main():
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

Read 3082 from https://pythonbasics.org/learn-python/
Read 190716 from https://www.tutorialspoint.com/python/index.htm
Read 31649 from https://www.learnpython.org/Read 79647 from https://www.programiz.com/python-programming/tutorial

Read 15538 from https://www.datacamp.com/community/tutorials/python-data-science-handbook
Read 607295 from https://www.codecademy.com/learn/learn-python
Read 360091 from https://www.w3schools.com/python/default.asp
Read 26062 from https://python-course.eu/python3_tutorial.phphttps://docs.python.org/3/tutorial/index.html
Read 135028 from https://realpython.com/tutorials/python/
Download 9 sites in 43.716087200000004 seconds

增加future

import concurrent.futures
import requests
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))
    return resp.content

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        to_do = []
        for site in sites:
            future = executor.submit(download_one, site)
            to_do.append(future)
            
        for future in concurrent.futures.as_completed(to_do):
            future.result()
            #print(future.result())
            
def main():
    start_time = time.perf_counter()
    download_all(sites)
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

Read 3082 from https://pythonbasics.org/learn-python/
Read 190716 from https://www.tutorialspoint.com/python/index.htm
Read 15560 from https://www.datacamp.com/community/tutorials/python-data-science-handbook
Read 31649 from https://www.learnpython.org/
Read 79647 from https://www.programiz.com/python-programming/tutorial
Read 604638 from https://www.codecademy.com/learn/learn-python
Read 26062 from https://python-course.eu/python3_tutorial.phphttps://docs.python.org/3/tutorial/index.html
Read 135028 from https://realpython.com/tutorials/python/
Read 360091 from https://www.w3schools.com/python/default.asp
Download 9 sites in 25.880217799999997 seconds

使用协程来跑

import asyncio
import aiohttp
import time
import ssl

async def download_one(url):
    ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)  # 尝试指定不同的 SSL 版本
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(ssl=ssl_context)) as session:
        async with session.get(url) as resp:
            print('Read {} from {}'.format(resp.content_length, url))
            
async def download_all(sites):
    tasks = [asyncio.create_task(download_one(site)) for site in sites]
    await asyncio.gather(*tasks)

def main():
    start_time = time.perf_counter()
    asyncio.run(download_all(sites))
    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))
    
if __name__ == '__main__':
    main()
SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA10D89B0>
transport: <_SelectorSocketTransport fd=1800 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)
SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA10D8748>
transport: <_SelectorSocketTransport fd=1732 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)


Read None from https://www.datacamp.com/community/tutorials/python-data-science-handbook
Read None from https://www.learnpython.org/
Read 32445 from https://www.tutorialspoint.com/python/index.htm


SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA10D8550>
transport: <_SelectorSocketTransport fd=1828 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)


Read None from https://python-course.eu/python3_tutorial.phphttps://docs.python.org/3/tutorial/index.html
Read None from https://pythonbasics.org/learn-python/
Read None from https://www.programiz.com/python-programming/tutorial
Read None from https://www.w3schools.com/python/default.asp


SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA0F76438>
transport: <_SelectorSocketTransport fd=1508 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)
SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA0FC6710>
transport: <_SelectorSocketTransport fd=1764 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)
SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA0FC6AC8>
transport: <_SelectorSocketTransport fd=1824 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)


Read None from https://www.codecademy.com/learn/learn-python


SSL error in data received
protocol: <asyncio.sslproto.SSLProtocol object at 0x000001EEA0FB6BE0>
transport: <_SelectorSocketTransport fd=1788 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "D:\Sdk\Anaconda3\lib\asyncio\sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "D:\Sdk\Anaconda3\lib\ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)


Read None from https://realpython.com/tutorials/python/
Download 9 sites in 22.620816800000057 seconds
#!pip install -U aiohttp asyncio
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

悟空学编程

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值