phantomjs过时了,出来了headless, selenium之 chromedriver与chrome版本对应表
chromedriver Mirror
1 安装chromedriver
wget http://npm.taobao.org/mirrors/chromedriver/2.40/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
mv chromedriver /usr/local/bin/chromedriver
chmod u+x,o+x /usr/local/bin/chromedriver
# 检验是否正常使用:
chromedriver --version
2 安装chrome
不能只安装chromedriver,还需要安装chrome,否则会提示Message: unknown error: cannot find Chrome binary
chrome下载路径
执行下面的命令进行安装
wget https://dl.lancdn.com/landian/software/chrome/m/67.0.3396.79_x86_64.rpm
yum install 67.0.3396.79_x86_64.rpm
这里不用rpm -ivh 67.0.3396.79_x86_64.rpm
,使用yum会自动找依赖。
3 headless的试用
有的文章里面参数前面加了--
,比如--headless
,但不要也可以。
def getWebDriverHeadLess(self, options=webdriver.ChromeOptions(), timeout=360, types='http'):
# tell selenium to use the dev channel version of chrome
# NOTE: only do this if you have a good reason to
# options.binary_location = '/usr/bin/google-chrome-unstable' # path to google Chrome bin
options.add_argument('headless')
options.add_argument('no-sandbox')
options.add_argument('window-size=1200x600')
desired_capabilities = options.to_capabilities()
if (types == 'http'):
# 从代理服务获取ip
proxyip = self.ipService.select_rand(types=types)
if proxyip:
proxy_url = str(proxyip['ip']) + ':' + str(proxyip['port'])
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': proxy_url,
})
proxy.add_to_capabilities(desired_capabilities)
elif (types == 'https'):
# 从代理服务获取ip
proxyip = self.ipService.select_rand(types=types)
if proxyip:
# with proxy
proxy_url = str(proxyip['ip']) + ':' + str(proxyip['port'])
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'sslProxy': proxy_url # 需要信任代理服务器CA证书
})
proxy.add_to_capabilities(desired_capabilities)
return webdriver.Chrome(chrome_options=options, desired_capabilities=desired_capabilities)
4 重启scrapyd
当环境发生变化的时候,需要重启scrapyd,scrapyd保存了一些旧的信息
kill -9 `ps -ef |grep scrapyd|awk '{print $2}' `
/etc/init.d/scrapyd start
scrapyd-deploy -p einfo
curl http://10.101.3.170:6800/schedule.json -d project=einfo -d spider=xxSpider
这里提一下phantomjs的配置,虽然将来也没啥用,主要还是环境变量的配置
tar -xjvf phantomjs-2.1.1-linux-x86_64.tar.bz2
ln -s /phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/bin/phantomjs
phantomjs -v