第1关:爬取网站实训图片的链接
任务描述
本关任务:使用Scrapy爬取给定网站的图片链接,并保存到本地。
编程要求
首先,通过审查元素,观察图片链接的代码规律;然后,点击代码文件旁边的三角符号,选择文件eduSpider.py,如下图所示。在 Begin-End 区间补充代码,使函数 parse 能够爬取图片链接,并保存到本地文件images.txt中。
注:本实训评测系统的开发环境均已配置好。
测试说明
平台会对你编写的代码进行测试(本次测试无输入):
预期输出:
爬取成功
开始你的任务吧,祝你成功!
解析:
找到 代码文件 并点击,展开后长这个样子
点击 step1/web/index.html 后输入以下代码↓
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>花</title>
</head>
<body>
<div class="box">
<div>
<a href="/static/app1/imgs/1.png" target="_blank">
<img src="/static/app1/imgs/1.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/10.png" target="_blank">
<img src="/static/app1/imgs/10.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/11.png" target="_blank">
<img src="/static/app1/imgs/11.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/12.png" target="_blank">
<img src="/static/app1/imgs/12.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/13.png" target="_blank">
<img src="/static/app1/imgs/13.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/14.png" target="_blank">
<img src="/static/app1/imgs/14.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/15.png" target="_blank">
<img src="/static/app1/imgs/15.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/16.png" target="_blank">
<img src="/static/app1/imgs/16.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/17.png" target="_blank">
<img src="/static/app1/imgs/17.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/18.png" target="_blank">
<img src="/static/app1/imgs/18.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/19.png" target="_blank">
<img src="/static/app1/imgs/19.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/2.png" target="_blank">
<img src="/static/app1/imgs/2.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/20.png" target="_blank">
<img src="/static/app1/imgs/20.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/21.png" target="_blank">
<img src="/static/app1/imgs/21.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/22.png" target="_blank">
<img src="/static/app1/imgs/22.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/23.png" target="_blank">
<img src="/static/app1/imgs/23.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/24.png" target="_blank">
<img src="/static/app1/imgs/24.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/25.png" target="_blank">
<img src="/static/app1/imgs/25.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/3.png" target="_blank">
<img src="/static/app1/imgs/3.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/4.png" target="_blank">
<img src="/static/app1/imgs/4.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/5.png" target="_blank">
<img src="/static/app1/imgs/5.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/6.png" target="_blank">
<img src="/static/app1/imgs/6.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/7.png" target="_blank">
<img src="/static/app1/imgs/7.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/8.png" target="_blank">
<img src="/static/app1/imgs/8.png" alt="未显示">
</a>
</div>
<div>
<a href="/static/app1/imgs/9.png" target="_blank">
<img src="/static/app1/imgs/9.png" alt="未显示">
</a>
</div>
</div>
</body>
</html>
接着点击 代码文件 ,找到点击 step1/mySpider/mySpider/spiders/eduSpider.py后 输入以下代码↓
# -*- coding: utf-8 -*-
import scrapy
class EduspiderSpider(scrapy.Spider):
name = 'eduSpider'
allowed_domains = ['127.0.0.1']
start_urls = ['http://127.0.0.1:8080/imgs/']
def parse(self, response):
#********** Begin **********#
with open('images.txt','w') as f:
img=response.xpath("//div[@class='box']/div/a/img/@src")
f.write("{}\n".format(img))
#********** End **********#
点击右下角 自测运行
提交即可