scrapy爬虫爬取新片场信息

最新推荐文章于 2022-06-01 14:50:59 发布

VIP文章 TamoR.

最新推荐文章于 2022-06-01 14:50:59 发布

阅读量323

点赞数

分类专栏： python爬虫

本文为博主原创文章，未经博主允许不得转载。https://blog.csdn.net/weixin_43576564

本文链接：https://blog.csdn.net/weixin_43576564/article/details/103380418

版权

# -*- coding: utf-8 -*-
import scrapy
import re
from scrapy import Request
import json

def convert(s):
    if s is str and s.isdigit:
        return int(s.resplace(','))
    else:
        return 0

class XpcSpider(scrapy.Spider):
    name = 'xpc'
    allowed_domains = ['xinpianchang.com','openapi-vtom.vmovier.com']
    start_urls = ['https://www.xinpianchang.com/channel/index/sort-like?from=tabArticle']
    #獲取每個視頻的鏈接
    def parse(self,response):
        pid_list=response.xpath('//ul[@class="video-list"]/li[@class="enter-filmplay"]/@data-articleid').extract() #pid列表


        cookies={
   "Authorization":"01D3EF58AA36A73BCAA36A438BAA36A9459AA36AFD0C8371FE04"}


        for pid in pid_list:
            url ='https://www.xinpianchang.com/a%s?from=ArticleList' %pid
            request=response.follow(url,self.parse_post)
            request.meta['pid']=pid
            yield request
            '''
            pages=response.xpath('//div[@class="page-wrap"]/div[@class="page"]/a/@href').extract()
            
            for page in pages:
                yield response.follow(page,self.parse,cookies=cookies)
'''

    #解析單個視頻信息
    def parse_post(self, response):
        pid

最低0.47元/天解锁文章

TamoR.

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
scrapy爬虫爬取新片场信息

# -*- coding: utf-8 -*-import scrapyimport refrom scrapy import Requestimport jsondef convert(s): if s is str and s.isdigit: return int(s.resplace(',')) else: return 0c...
复制链接

扫一扫