抓取目标网址:https://www.cn357.com/notice_300;https://www.cn357.com/notice_191
由于该网站没有设置反爬,所以直接干!
需要抓取的数据:
以上是车辆信息列表
接下来是车辆详细信息:
抓取的信息包括所有车辆的详细信息和车辆的图片。
首先,建立好工程:
接下来我们在items里写好需要的数据项:
import scrapy
class ShangchewangItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
myurl = scrapy.Field()
mytime = scrapy.Field()
# 公告型号
announcement_model = scrapy.Field()
# 公告批次
announcement_batch = scrapy.Field()
# 品牌
brand = scrapy.Field()
# 类型
car_type = scrapy.Field()
# 额定质量
rated_quality = scrapy.Field()
# 总质量
total_quality = scrapy.Field()
# 整备质量
Curing_quality = scrapy.Field()
# 燃料种类
fuel_type = scrapy.Field()
# 排放依据标准
emission_standard = scrapy.Field()
# 轴数
number_of_axes = scrapy.Field()
# 轴距
wheelbase = scrapy.Field()
# 轴荷
axle_load = scrapy.Field()
# 弹簧片数
number_of_spring = scrapy.Field()
# 轮胎数
number_of_tire = scrapy.Field()
# 轮胎规格
standard_tire = scrapy.Field()
# 接近离去角
leave_angle = scrapy.Field()
# 前悬后悬
QianHouXuan = scrapy.Field()
# 前轮距
before_tire_distance = scrapy.Field()
# 后轮距
back_tire_distance = scrapy.Field()
# 识别代号
identification_number = scrapy.Field()
# 整车长
car_lange = scrapy.Field()
# 整车宽
car_width = scrapy.Field()
# 整车高
car_hight = scrapy.Field()
# 货厢长
container_lang = scrapy.Field()
# 货厢宽
container_width = scrapy.Field()
# 货厢高
container_hight = scrapy.Field()
# 最高车速
highest_speed = scrapy.Field()
# 额定载客
rated_passenger = scrapy.Field()
# 驾驶室准乘人数
cab_people_number = scrapy.Field()
# 转向形式
turn_type = scrapy.Field()
# 准拖挂车总质量
hang_car_all_quality = scrapy.Field()
# 载质量利用系数
modulus = scrapy.Field()
# 半挂车鞍座最大承载质量
must_quality = scrapy.Field()
# 企业名称
firm_name = scrapy.Field()
# 企业地址
firm_address = scrapy.Field()
# 电话号码
TLE = scrapy.Field()
# 传真号码
fax = scrapy.Field()
# 邮政编码
postal_code = scrapy.Field()
# 底盘1
chassis_one = scrapy.Field()
# 底盘2
chassis_tow = scrapy.Field()
# 底盘3
chassis_thress = scrapy.Field()
# 底盘4
chassis_four = scrapy.Field()
# 发动机型号
engine_model = scrapy.Field()
# 发动机生产企业
engine_firm = scrapy.Field()
# 发动机商标
engine_brand = scrapy.Field()
# 排量
displacement = scrapy.Field()
# 功率
power = scrapy.Field()
# 备注
remark = scrapy.Field()
# 图片
某些字段确实太长了,不好命名。但是小松鼠还是秉承着良好的命名习惯。
接着我们稍微分析一下抓取的目标网页:
我们点击下一页看看网站是如何实现翻页的:
第二页为:
我们发现网站似乎是直接在网站后面添加页数实现翻页的,接下来验证我们的猜想,我们试试第三页