scrapy边爬虫边对爬取到的文本进行分割
爬取的目标网页如下:
拿到a标签的text之后,还需要分别取到公司名称和公司id,这个时候就需要对字符串进行split了。这时直接对爬取道的内容进行split然后往item里append内容会出现keyerror错误,而改成下面的代码就成功了(这里借用了第三方list,然后对第三方list进行操作,再赋值)。
错误代码(keyerror):
LeastRepComList = response.xpath( "//table[@class='da_tbl'][3]/tr/td[1]/div/p/a/text()").extract() for LeastRepCom in LeastRepComList: item['LeastRepComID'].append(LeastRepCom.split()[1].strip()) item['LeastRepComName'].append(LeastRepCom.split()[0].strip())
正确代码:
LeastRepComList = response.xpath( "//table[@class='da_tbl'][3]/tr/td[1]/div/p/a/text()").extract() for LeastRepCom in LeastRepComList: LeastRepComIdList.append(LeastRepCom.split()[1].strip()) LeastRepComNameList.append(LeastRepCom.split()[0].strip()) item['LeastRepComID'] = LeastRepComIDList item['LeastRepComName'] = LeastRepComNameList
虽然问题解决了,但是还是不清楚原理,初步猜想跟scrapy框架的异步多线程有关??,本人入门党一枚,期待大神解答!!!!