Scrapy电影天堂最新电影信息爬取

最新推荐文章于 2024-05-14 19:45:37 发布

SunChao3555

最新推荐文章于 2024-05-14 19:45:37 发布

阅读量1.5k

点赞数

分类专栏： Python 文章标签：电影天堂电影链接爬虫

本文链接：https://blog.csdn.net/SunChao3555/article/details/79312196

版权

Python 专栏收录该内容

41 篇文章 1 订阅

订阅专栏

环境：python 2.7

创建scrapy项目过程可见本人博客其他文章，这里不再赘述

直接上代码

主要代码

# -*- coding: utf-8 -*-
import scrapy


class DyttSpider(scrapy.Spider):
    name = 'dytt'
    allowed_domains = ['ygdy8.net']
    start_urls = ['http://www.ygdy8.net/html/gndy/dyzz/index.html']

    def parse(self, response):
        #print '***********>',response
        #extract_first('默认值')取出列表第一个元素，为空返回默认值
        title=response.xpath('//title/text()').extract()[0]
        #print title
        hrefs = response.xpath('//a[@class="ulink"]/@href')
        # for循环取出所有的href值
        #for href in hrefs:
            #print href

        total_page = response.xpath('//select[@name="sldd"]/option[last()]/text()').extract_first('0')
        #print total_page
        for x in range(2, int(total_page) + 1):
            #print '正在爬取第%s页数据，请稍后....' % x
            # 根据x的值，拼接完整页面url地址
            url = 'http://www.ygdy8.net/html/gndy/dyzz/list_23_%s.html' %x
            #和return类似，不会结束函数的执行
            #返回一个请求对象
            yield scrapy.Request(url)

SunChao3555

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Scrapy电影天堂最新电影信息爬取

环境：python 2.7创建scrapy项目过程可见本人博客其他文章，这里不再赘述直接上代码主要代码# -*- coding: utf-8 -*-import scrapyclass DyttSpider(scrapy.Spider): name = 'dytt' allowed_domains = ['ygdy8.net'] st...
复制链接

扫一扫

专栏目录