本文目录:
- 同步方式爬取博客标题
- async/await异步爬取博客标题
本片为深入理解协程系列文章的补充。
你将会在从本文中了解到:async/await
如何运用的实际的爬虫中。
案例
从CSDN上批量爬取指定文章的标题。文章列表如下:
urls = [
'https://blog.csdn.net/Jmilk/article/details/103218919',
'https://blog.csdn.net/stven_king/article/details/103256724',
'https://blog.csdn.net/csdnnews/article/details/103154693',
'https://blog.csdn.net/dg_lee/article/details/103951021',
'https://blog.csdn.net/m0_37907797/article/details/103272967',
'https://blog.csdn.net/zzq900503/article/details/49618605',
'https://blog.csdn.net/weixin_44339238/article/details/103977138',
'https://blog.csdn.net/dengjin20104042056/article/details/103930275',
'https://blog.csdn.net/Mind_programmonkey/article/details/103940511',
'https://blog.csdn.net/xufive/article/details/102993570',
'https://blog.csdn.net/weixin_41010294/article/details/104009722',
'https://blog.csdn.net/yunqiinsight/article/details/103137022',
'https://blog.csdn.net/qq_44210563/article/details/102826406',
]
同步爬虫
import requests
import time
from lxml import etree
urls = [
'https://blog.csdn.net/Jmilk/article/details/103218919',
'https://blog.csdn.net/stven_king/article/details/103256724',