scrapy边爬取边对字符进行分割

最新推荐文章于 2024-05-03 01:43:28 发布

weixin_34189116

最新推荐文章于 2024-05-03 01:43:28 发布

阅读量789

点赞数 1

文章标签： python 爬虫

原文链接：https://my.oschina.net/u/3636678/blog/1859657

版权

2019独角兽企业重金招聘Python工程师标准>>>

scrapy边爬虫边对爬取到的文本进行分割

爬取的目标网页如下：

拿到a标签的text之后，还需要分别取到公司名称和公司id，这个时候就需要对字符串进行split了。这时直接对爬取道的内容进行split然后往item里append内容会出现keyerror错误，而改成下面的代码就成功了（这里借用了第三方list，然后对第三方list进行操作，再赋值）。

错误代码（keyerror）：

LeastRepComList = response.xpath(
    "//table[@class='da_tbl'][3]/tr/td[1]/div/p/a/text()").extract()
for LeastRepCom in LeastRepComList:
    item['LeastRepComID'].append(LeastRepCom.split()[1].strip())
    item['LeastRepComName'].append(LeastRepCom.split()[0].strip())

正确代码：

LeastRepComList = response.xpath(
    "//table[@class='da_tbl'][3]/tr/td[1]/div/p/a/text()").extract()
for LeastRepCom in LeastRepComList:
    LeastRepComIdList.append(LeastRepCom.split()[1].strip())
    LeastRepComNameList.append(LeastRepCom.split()[0].strip())
item['LeastRepComID'] = LeastRepComIDList
item['LeastRepComName'] = LeastRepComNameList

虽然问题解决了，但是还是不清楚原理，初步猜想跟scrapy框架的异步多线程有关？？，本人入门党一枚，期待大神解答！！！！

转载于:https://my.oschina.net/u/3636678/blog/1859657