Python爬虫之遇到Index out of range解决办法

最新推荐文章于 2024-07-03 15:05:38 发布

比菜鸟更菜的菜鸟

最新推荐文章于 2024-07-03 15:05:38 发布

阅读量4.2k

点赞数 1

文章标签： python 爬虫

本文链接：https://blog.csdn.net/weixin_43258896/article/details/118163472

版权

Python爬取网站遇到Index out of range问题

问题展示

报错信息

Traceback (most recent call last):
  File "D:/spiderNew.py", line 127, in <module>
    term = tmp[0].split('成交周期')[1].split('天')
IndexError: list index out of range

相关代码

before_price = tmp[0].split("挂牌")[1].split("万")[0]  
term = tmp[0].split('成交周期')[1].split('天')	#此行为127行
tmpDay = article.xpath('div[5]/span[2]')[0]
tmpDay = tmpDay.xpath('span[2]/text()')[0]
term = tmpDay.split("成交周期")[1].split('天')[0]

这是爬取某壳上面的二手房成交信息，但是在不加判断条件的时候，在此处总是报错，然后我检查网站源码对应的地方。发现此html标签在当前待解析项中不存在。

解决方法

所以加了判断条件如下：

if '挂牌' in tmp[0]:
    before_price = tmp[0].split("挂牌")[1].split("万")[0]  
    if '成交周期' in tmp[0]:
        term = tmp[0].split('成交周期')[1].split('天')
    tmpDay = article.xpath('div[5]/span[2]')[0]
    if len(tmpDay) > 1:
        tmpDay = tmpDay.xpath('span[2]/text()')[0]
        if '成交周期' in tmpDay:
            term = tmpDay.split("成交周期")[1].split('天')[0]   #成交周期

然后就能正确解析出相应内容，并且不会因报错导致数据爬取不全的问题。

比菜鸟更菜的菜鸟

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫之遇到Index out of range解决办法

Python爬取网站遇到Index out of range问题问题展示报错信息Traceback (most recent call last): File "D:/spiderNew.py", line 127, in <module> term = tmp[0].split('成交周期')[1].split('天')IndexError: list index out of range相关代码before_price = tmp[0].split("挂牌")[1]
复制链接

扫一扫