比较普通爬虫、多线程爬虫、scrapy框架爬虫对小说爬取的花费用时

最新推荐文章于 2023-11-11 09:45:00 发布

指弹代码摄影汪

最新推荐文章于 2023-11-11 09:45:00 发布

阅读量619

点赞数 1

分类专栏：爬虫文章标签： python

本文链接：https://blog.csdn.net/zz001357/article/details/104547567

版权

最近真是闲的无所事事，快要发霉了，于是比较一下，不同爬虫技术在同一台电脑上的爬取同一个网站的同一本小说的花费用时。

爬的是一本叫《龙王赘婿》的小说，电脑也是差劲的可以

好了，废话不多说了，首先是什么都没有处理的通用爬虫

一、通用爬虫

import os
import time
import requests
import re

url = 'http://www.shuquge.com/txt/115748/index.html'
response = requests.get(url)
response.encoding = response.apparent_encoding
html = response.text
result = re.findall('<dd><a href="(.*?)">(.*?)</a></dd>', html)


def novel_content(novel_url):
    response_2 = requests.get(novel_url)
    response_2.encoding = response_2.apparent_encoding
    html_2 = response_2.text
    result_2 = re.findall('<div id="content" class="showtxt">(.*?)</div>', html_2,

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

指弹代码摄影汪

关注关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
比较普通爬虫、多线程爬虫、scrapy框架爬虫对小说爬取的花费用时

最近真是闲的无所事事，快要发霉了，于是比较一下，不同爬虫技术在同一台电脑上的爬取同一个网站的同一本小说的花费用时。爬的是一本叫《龙王赘婿》的小说，电脑也是差劲的可以好了，废话不多说了，首先是什么都没有处理的通用爬虫一、通用爬虫import osimport timeimport requestsimport reurl = 'http://www.shuqug...
复制链接

扫一扫