项目场景:
用xpath爬取多个url时,可以通过在[]中使用for循环的方式将所需列表返回
解决方案:
代码如下
def parse_index(html):
etree = lxml.html.etree
e = etree.HTML(html)
all_url = e.xpath('//div[@class="channel-detail movie-item-title"]/a/@href')
return ['https://maoyan.com{}'.format(url) for url in all_url]
关键代码如下
return ['https://maoyan.com{}'.format(url) for url in all_url]
下面的例子也用到了这个方法:
scores_div = response.xpath('//div[@class="channel-detail channel-detail-orange"]')
scores = []
for score in scores_div:
scores.append(score.xpath('string(.)').extract_first())
scores_div = [score.xpath('string(.)').extract_first() for score in response.xpath('//div[@class="channel-detail channel-detail-orange"]')]
(上面的两段代码是等价的,可以看到后者明显简洁了许多)