下述是itcast.py文件,在终端运行
import scrapy
class ItcastSpider(scrapy.Spider):
name = "itcast"
allowed_domains = ["itcast.cn"]
# 将start_urls的值修改为需要爬取的第一个url
start_urls = ("http://www.itcast.cn/channel/teacher.shtml",)
def parse(self, response):
filename = "teacher.html"
open(filename, 'w').write(response.body)
【所遇问题】
scrapy crawl itcast
运行结果报错信息
Traceback (most recent call last):
File "d:\pyhton\lib\site-packages\twisted\internet\defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "d:\pyhton\lib\site-packages\scrapy\spiders\__init__.py", line 82, in _parse
return self.parse(response, **kwargs)
File "D:\python_pro\ex\pro_0606_1\pro_0606_1\spiders\itcast.py", line 12, in parse
open(t,"w").write(response.body)
TypeError: write() argument must be str, not bytes
【解决办法】
将文件中w改成wb
import scrapy
class ItcastSpider(scrapy.Spider):
name = "itcast"
allowed_domains = ["itcast.cn"]
# 将start_urls的值修改为需要爬取的第一个url
start_urls = ("http://www.itcast.cn/channel/teacher.shtml",)
def parse(self, response):
filename = "teacher.html"
open(filename, 'wb').write(response.body)