随堂笔记

随堂笔记

scrapy框架使用基本流程

  • 创建项目: scrapy startproject dushu

  • 创建爬虫: cd /dushu; scrapy genspider guoxue ““www.dushu.com””

  • 打开guoxue.py,开始写代码.

    class GuoxueSpider(scrapy.Spider):
        name = 'guoxue'
        allowed_domains = ['www.dushu.com']
        # 起始地址,一般需要修改.
        start_urls = ['https://www.dushu.com/book/1617.html']
    
        def parse(self, response):
            # 找到详情页的超链接
            detail_url_list = response.xpath('//div[@class="book-info"]//h3/a/@href')
            for detail_url in detail_url_list.getall():
                detail = 'https://www.dushu.com' + detail_url
                yield scrapy.Request(url=detail, callback=self.detail_parse)
    		
            # 下一页地址.
            for i in range(2, 11):
                next_page = 'https://www.dushu.com/book/1617_%d.html' % i
                yield scrapy.Request(url=next_page, callback=self.parse)
    	
        # 解析详情页的内容
        def detail_parse(self, response):
            book_title = response.xpath('string(//div[@class="book-title"])').extract_first()
            book_img = response.xpath('//div[@class="book-pic"]//img/@src').extract_first()
            price = response.xpath('//p[@class="price"]/span/text()').extract_first()
            author = response.xpath('string(//div[@class="book-details-left"]//table/tbody/tr[1]/td[2])').extract_first()
            book_brief, author_brief = response.xpath('//div[contains(@class, "txtsummary")]/text()')[:2].extract()
            book_brief, author_brief = book_brief.strip(), author_brief.strip()
            item = DushuItem()
            item['book_title'] = book_title
            item['book_img'] = book_img
            item['price'] = price
            item['author'] = author
            item['book_brief'] = book_brief
            item['author_brief'] = author_brief
            yield item
    
  • scrapy shell,利用这个shell可以进行代码调试.

em


- scrapy shell,利用这个shell可以进行代码调试.

- crawler spider
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值