1) 导入import xml.etree.ElementTree as ET
2)需要一个xml文件(sensorexpert_brand.xml),结构如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<error>0</error>
<msg>success</msg>
<data>
<total>10602</total>
<pageSize>100</pageSize>
<rows>
<item>
<id>11683</id>
<full_name>Zywyn</full_name>
<flag></flag>
<country_name></country_name>
<product_total>0</product_total>
<brand_type>制造商</brand_type>
<link>/brand/11683.html</link>
</item>
<item>
<id>6005</id>
<full_name>阿尔法电线</full_name>
<flag></flag>
<country_name></country_name>
<product_total>1</product_total>
<brand_type></brand_type>
<link>/brand/6005.html</link>
</item>
</rows>
</data>
</response>
3) 读取相关代码
def parse(self, response):
filename='ebs_crawler/files/sensorexpert_brand.xml'
with open(filename,'w',encoding='UTF-8-sig') as file_object:
file_object.write(response.text)
self.logger.info(f"覆盖xml文件成功:{response.request.url}")
tree = ET.parse(filename)
root = tree.getroot()
links=[]
for chlid in root[2][2]:
links.append('https://www.sensorexpert.com.cn'+chlid[6].text)
self.save(links)
注意,ET.parse 方法中参数,只支持文件地址,不支持文件内容。