- 创建一个scrapy项目,在终端输入如下命令后用pycharm打开桌面生成的zhilian项目
cd Desktop
scrapy startproject zhilian
cd zhilian
scrapy genspider Zhilian sou.zhilian.com
- middlewares.py里添加如下代码:
from scrapy.http.response.html import HtmlResponse
class PhantomjsMiddleware(object):
def process_request(self,request,spider):
if spider.name == 'Zhilian':
spider.driver.get(request.url)
spider.driver.implicitly_wait(10)response = HtmlResponse(url=spider.driver.current_url,
request=request,
body=spider.driver.page_source,
encoding='utf-8'