原问题描述
使用xpath抓取数据时遇到抓取值时遇到下面情况
['\n 2021-10-20\n ']
源代码为
html_element = etree.HTML(text1)
data_element = html_element.xpath('//div[@class="main_content_container"]')
for i in data_element:
publishDate = i.xpath('div[@class="main_content_top"]/div[@class="main_content_detail_top"]/ul/li[@class="dateMove"]/span/text()')
publishDate
解决办法
解决办法为 在Xpath中添加normalize-space(),修改后的代码为
html_element = etree.HTML(text1)
data_element = html_element.xpath('//div[@class="main_content_container"]')
for i in data_element:
publishDate = i.xpath('normalize-space(div[@class="main_content_top"]/div[@class="main_content_detail_top"]/ul/li[@class="dateMove"]/span/text())')
publishDate