据我所知,设置正确的选项值涉及到javascript逻辑。在
帮助我解决这个问题的是^{} middleware,它使用Splash浏览器即服务。跳过安装和配置,下面是我执行的spider:# -*- coding: utf-8 -*-
import scrapy
class IndiaBixSpider(scrapy.Spider):
name = "indiabix"
allowed_domain = ["www.indiabix.com"]
start_urls = ["http://www.indiabix.com/verbal-ability/spotting-errors/"]
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(url, meta={
'splash': {
'endpoint': 'render.html',
'args': {'wait': 0.5}
}
})
def parse(self, response):
for question in response.css("div.bix-div-container"):
answer = question.xpath(".//input[starts-with(@id, 'hdnAnswer')]/@value").extract()
print answer
下面是我在控制台上看到的(正确答案):
^{2}$
另请参见: