爬虫1：总共有1000页，每一页有10个项，每个项有一个通向详情页的url，现在需要爬详情的内容

会编程的漂亮小姐姐

于 2018-08-15 10:20:01 发布

阅读量1.3k

点赞数

分类专栏： Python 爬虫

本文链接：https://blog.csdn.net/u014229742/article/details/81699454

版权

本文讨论了一个爬虫项目，涉及1000页，每页10个链接，总计10000个详情页。使用Scrapy进行抓取时，通过pipeline处理每个链接的症状数据。当前存在的问题是，每条症状单独插入数据库，导致效率低下。为提高效率，文章可能探讨了如何优化数据库操作，以批量处理疾病的所有症状。

摘要由CSDN通过智能技术生成

总共有1000页，每一页有10个项，每个项有一个通向详情页的url，现在需要爬详情的内容，用scrappy。

class AskdSpider(scrapy.Spider):
    name = 'ym'
    allowed_domains = ['j4b.x4y.com', 'z4k.x4y.com']
    start_urls = []
    # for i in range(0, 10137):
    for i in range(1, 10137):
        start_urls.append('http://j4b.x4y.com/il_s4i/symptom/' + str(i) + '.htm')


    def parse(self, response):
        item = AskdoctorItem()
        #<div class="jb-name fYaHei gre">肺出血－肾炎综合征</div>
        #获取div内容
        item['title'] = response.xpath("//*[@class='jb-name fYaHei gre']/te

最低0.47元/天解锁文章

会编程的漂亮小姐姐

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
爬虫1：总共有1000页，每一页有10个项，每个项有一个通向详情页的url，现在需要爬详情的内容

总共有1000页，每一页有10个项，每个项有一个通向详情页的url，现在需要爬详情的内容，用scrappy。class AskdSpider(scrapy.Spider): name = 'ym' allowed_domains = ['j4b.x4y.com', 'z4k.x4y.com'] start_urls = [] # for i in r...
复制链接

扫一扫

专栏目录