今天来爬一个让人很有动力的网站,网址就不便放上来了,看看有没有有缘人能得知了
还是先来items.py
import scrapy
class AvmooItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
name=scrapy.Field()
birthday=scrapy.Field()
age=scrapy.Field()
height=scrapy.Field()
cup=scrapy.Field()
bust=scrapy.Field()
waistline=scrapy.Field()
hipline=scrapy.Field()
birthplace=scrapy.Field()
Avatar=scrapy.Field()
designations=scrapy.Field()
des_imgs=scrapy.Field()
des_urls=scrapy.Field()
各位施主从这些字段应该就可以看出来了吧
接下来就是主爬取程序了
spider.py
# -*- coding:utf-8 -*-
import scrapy
from AVMOO.items import AvmooItem
import os
import requests
class AvmooSpider(scrapy.Spider):
name='AVMOO'
allowed_domains=['xxx.xx','jp.netcdn.space/digital/video/']#,'xxxx.xx']
start_urls=['https://xxx.xx/cn/actresses/']
base_url='https://xxx.xx'
des_imgs=[]
def parse(self,response):
star_urls=response.xpath('//a[@class="avatar-box text-center"]/