当我们爬取网页时,chrome时常用的浏览器,而驱动程序chromedriver是必不可少的。但是,有时也要把程序发给别人,甚至是不懂程序的人,这时就很难向他们说明怎样自己选择chromedriver的版本了。这篇文章提供了python语言(3.x)自动下载电脑上chrome对应的chromedriver的解决办法。
完整代码
import os
import urllib
import urllib.request
import winreg
import re
import sys
import zipfile
DriverVersions = {
'73':'2.46',
'72':'2.46',
'71':'2.46',
'70':'2.45',
'69':'2.44',
'68':'2.42',
'67':'2.41',
'66':'2.40',
'65':'2.38',
'64':'2.37',
'63':'2.36',
'62':'2.35',
'61':'2.34',
'60':'2.33',
'59':'2.32',
'58':'2.31',
'57':'2.29',
'56':'2.29',
'55':'2.28',
'54':'2.27',
'53':'2.26',
'52':'2.24',
'51':'2.23',
'50':'2.22',
'49':'2.22',
'48':'2.21',
'47':'2.21',
'46':'2.21',
'45':'2.20',
'44':'2.20',
'43':'2.20',
'42':'2.16',
'41':'2.15',
'40':'2.15',
'39':'2.14',
'38':'2.13',
'37':'2.12',
'36':'2.12',
'35':'2.10',
'34':'2.10',
'33':'2.10',
'32':'2.9',
'31':'2.9',
'30':'2.8',
'29':'2.7'
}
def unzip_single(src_file, dest_dir, password=None):
if password:
password = password.encode()
zf = zipfile.ZipFile(src_file)
try:
zf.extractall(path=dest_dir, pwd=password)
except RuntimeError as e:
raise OSError('Occurred an exception while extracting zip file. ')
zf.close()
FullChromeVersion = winreg.QueryValueEx(winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE,'SOFTWARE\\WOW6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Google Chrome'),'DisplayVersion')[0]
ChromeVersion = int(FullChromeVersion.split('.')[0])
print('Chrome version: '+FullChromeVersion)
if ChromeVersion <= 73:
if not str(ChromeVersion) in DriverVersions:
raise KeyError('There isn\'t a chromedriver that supports your Chrome version. ')
try:
urllib.request.urlretrieve('https://npm.taobao.org/mirrors/chromedriver/'+DriverVersions[str(ChromeVersion)]+'/chromedriver_win32.zip','chromedriver_win32.zip')
except:
print('Can\'t connect to the server! ')
raise ConnectionError('Can\'t connect to the server')
else:
print('Extracting file... ')
unzip_single('chromedriver_win32.zip','')
print('Download successfully. ')
else:
AvailableVersions = {}
try:
urlRead = urllib.request.urlopen(urllib.request.Request('https://npm.taobao.org/mirrors/chromedriver/')).read().decode()
except:
print('Can\'t connect to the server! ')
raise ConnectionError('Can\'t connect to the server')
else:
for i in re.findall('<a href="/mirrors/chromedriver/(.*?)</a>',urlRead):
if i[0] in 'qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM' or 'RELEASE' in i or int(i.split('.')[0]) <= 72:
continue
if not i.split('.')[0] in AvailableVersions:
AvailableVersions[i.split('.')[0]] = i.split('/">')[0]
if not str(ChromeVersion) in AvailableVersions:
raise KeyError('There isn\'t has a chromedriver that supports your Chrome version. ')
try:
print('Downloading... \nURL:https://npm.taobao.org/mirrors/chromedriver/'+AvailableVersions[str(ChromeVersion)]+'/chromedriver_win32.zip')
urllib.request.urlretrieve('https://npm.taobao.org/mirrors/chromedriver/'+AvailableVersions[str(ChromeVersion)]+'/chromedriver_win32.zip','chromedriver_win32.zip')
except:
print('Download failed. ')
else:
print('Extracting file... ')
unzip_single('chromedriver_win32.zip','')
print('Download successfully. ')
分析
首先,DriverVersions
是一个存放29~73版本Chrome对应chromedriver的词典。
然后要获取Chrome的版本。
我们知道HKLM\SOFTWARE\WOW6432Node\Microsoft\Windows\CurrentVersion\App Paths
下存的是各种软件的目录,毫不意外,Chrome的也在这里。只需要读取它就能找到Chrome安装到哪里了。打开这个目录,非常幸运地发现,Chrome也在这里。在HKLM\SOFTWARE\WOW6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome
这一项下,我们找到了DisplayVersion
,就是Chrome的版本信息。
版本信息之后,就要获取对应的Chromedriver。前面已经在一个词典里总结了29~73版本(这个版本号即版本号的第一位)的,如果是29~73版本的Chrome,就直接在dict里读取;如果高于73版本,对应的Chromedriver版本就是和Chrome的版本一样的。具体后面第二、三、四位是哪个版本,就要在网页上查找。
查找方式:通过获取镜像网站https://npm.taobao.org/mirrors/chromedriver/,可以看到每一个文件夹都是以版本号为名称,可以直接获取这些版本号。具体获取方式使用正则表达式。然后再用python进行筛选。
这里我摘一段网站的返回信息。
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>ChromeDriver Mirror</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Bootstrap -->
<link href="https://cdn.staticfile.org/twitter-bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet" media="screen">
<style>
#fork{position:fixed;top:0;right:0;_position:absolute;z-index: 10000;}
.bottom{margin: 20px auto; width: 100%; text-align: center;}
.container{width: 1080px; margin: 50px auto;}
</style>
<head>
<body>
<a href="https://github.com/cnpm/cnpmjs.org" id="fork" target="_blank">
<img alt="Fork me on GitHub" src="//s3.amazonaws.com/github/ribbons/forkme_right_darkblue_121621.png">
</a>
<div class="container">
<h1>Mirror index of <a target="_blank" href="http://chromedriver.storage.googleapis.com/">http://chromedriver.storage.googleapis.com/</a></h1>
<hr>
<pre><a href="../">../</a>
<a href="/mirrors/chromedriver/2.0/">2.0/</a> 2013-09-25T22:57:39.349Z -
<a href="/mirrors/chromedriver/2.1/">2.1/</a> 2013-09-25T22:57:49.481Z -
<a href="/mirrors/chromedriver/2.10/">2.10/</a> 2014-05-01T20:46:22.843Z -
<a href="/mirrors/chromedriver/2.11/">2.11/</a> 2014-10-08T01:17:17.918Z -
<a href="/mirrors/chromedriver/2.12/">2.12/</a> 2014-10-27T09:27:24.626Z -
<a href="/mirrors/chromedriver/2.13/">2.13/</a> 2014-12-10T13:17:59.776Z -
<a href="/mirrors/chromedriver/2.14/">2.14/</a> 2015-01-28T09:29:27.341Z -
<a href="/mirrors/chromedriver/2.15/">2.15/</a> 2015-03-26T22:08:19.898Z -
<a href="/mirrors/chromedriver/2.16/">2.16/</a> 2015-06-08T12:30:55.879Z -
<a href="/mirrors/chromedriver/2.17/">2.17/</a> 2015-07-30T22:11:44.809Z -
<a href="/mirrors/chromedriver/2.18/">2.18/</a> 2015-08-19T03:48:06.740Z -
<a href="/mirrors/chromedriver/2.19/">2.19/</a> 2015-08-28T06:09:42.121Z -
<a href="/mirrors/chromedriver/2.2/">2.2/</a> 2013-09-25T22:57:58.374Z -
<a href="/mirrors/chromedriver/2.20/">2.20/</a> 2015-10-08T23:22:48.789Z -
<a href="/mirrors/chromedriver/2.21/">2.21/</a> 2016-01-26T06:47:39.216Z -
<a href="/mirrors/chromedriver/2.22/">2.22/</a> 2016-06-04T19:54:50.312Z -
<a href="/mirrors/chromedriver/2.23/">2.23/</a> 2016-08-04T19:02:02.309Z -
<a href="/mirrors/chromedriver/2.24/">2.24/</a> 2016-09-09T00:57:14.652Z -
<a href="/mirrors/chromedriver/2.25/">2.25/</a> 2016-10-22T02:16:44.584Z -
<a href="/mirrors/chromedriver/2.26/">2.26/</a> 2016-12-05T23:24:16.587Z -
<a href="/mirrors/chromedriver/2.27/">2.27/</a> 2016-12-21T23:07:03.291Z -
<a href="/mirrors/chromedriver/2.28/">2.28/</a> 2017-03-08T22:53:11.244Z -
<a href="/mirrors/chromedriver/2.29/">2.29/</a> 2017-04-04T01:21:21.907Z -
<a href="/mirrors/chromedriver/2.3/">2.3/</a> 2013-09-25T22:58:07.947Z -
<a href="/mirrors/chromedriver/2.30/">2.30/</a> 2017-06-07T22:53:24.655Z -
<a href="/mirrors/chromedriver/2.31/">2.31/</a> 2017-07-22T01:08:24.087Z -
<a href="/mirrors/chromedriver/2.32/">2.32/</a> 2017-08-30T20:07:04.354Z -
<a href="/mirrors/chromedriver/2.33/">2.33/</a> 2017-10-03T21:09:52.970Z -
<a href="/mirrors/chromedriver/2.34/">2.34/</a> 2017-12-10T03:28:46.062Z -
<a href="/mirrors/chromedriver/2.35/">2.35/</a> 2018-01-10T02:35:57.501Z -
<a href="/mirrors/chromedriver/2.36/">2.36/</a> 2018-03-02T09:17:32.016Z -
<a href="/mirrors/chromedriver/2.37/">2.37/</a> 2018-03-16T06:19:07.262Z -
<a href="/mirrors/chromedriver/2.38/">2.38/</a> 2018-04-17T20:19:14.328Z -
<a href="/mirrors/chromedriver/2.39/">2.39/</a> 2018-05-30T06:19:55.386Z -
<a href="/mirrors/chromedriver/2.4/">2.4/</a> 2013-10-01T05:42:36.371Z -
<a href="/mirrors/chromedriver/2.40/">2.40/</a> 2018-06-07T23:44:20.210Z -
<a href="/mirrors/chromedriver/2.41/">2.41/</a> 2018-07-27T19:25:01.951Z -
<a href="/mirrors/chromedriver/2.42/">2.42/</a> 2018-09-13T18:14:11.882Z -
<a href="/mirrors/chromedriver/2.43/">2.43/</a> 2018-10-17T02:46:13.125Z -
<a href="/mirrors/chromedriver/2.44/">2.44/</a> 2018-11-20T00:32:52.802Z -
<a href="/mirrors/chromedriver/2.45/">2.45/</a> 2018-12-10T23:20:22.017Z -
<a href="/mirrors/chromedriver/2.46/">2.46/</a> 2019-02-01T19:22:24.040Z -
<a href="/mirrors/chromedriver/2.5/">2.5/</a> 2013-11-01T18:01:58.116Z -
<a href="/mirrors/chromedriver/2.6/">2.6/</a> 2013-11-05T07:13:23.018Z -
<a href="/mirrors/chromedriver/2.7/">2.7/</a> 2013-11-22T23:02:01.944Z -
<a href="/mirrors/chromedriver/2.8/">2.8/</a> 2013-12-16T23:41:09.841Z -
<a href="/mirrors/chromedriver/2.9/">2.9/</a> 2014-02-03T09:11:50.536Z -
<a href="/mirrors/chromedriver/70.0.3538.16/">70.0.3538.16/</a> 2018-09-17T20:50:43.843Z -
<a href="/mirrors/chromedriver/70.0.3538.67/">70.0.3538.67/</a> 2018-10-17T16:02:03.103Z -
<a href="/mirrors/chromedriver/70.0.3538.97/">70.0.3538.97/</a> 2018-11-06T07:19:03.877Z -
<a href="/mirrors/chromedriver/71.0.3578.137/">71.0.3578.137/</a> 2019-01-21T19:35:39.578Z -
<a href="/mirrors/chromedriver/71.0.3578.30/">71.0.3578.30/</a> 2018-11-01T21:02:45.154Z -
<a href="/mirrors/chromedriver/71.0.3578.33/">71.0.3578.33/</a> 2018-11-02T15:53:57.452Z -
<a href="/mirrors/chromedriver/71.0.3578.80/">71.0.3578.80/</a> 2018-12-11T19:10:42.607Z -
<a href="/mirrors/chromedriver/72.0.3626.69/">72.0.3626.69/</a> 2019-01-22T07:21:41.137Z -
<a href="/mirrors/chromedriver/72.0.3626.7/">72.0.3626.7/</a> 2018-12-11T19:09:45.570Z -
<a href="/mirrors/chromedriver/73.0.3683.20/">73.0.3683.20/</a> 2019-02-06T19:24:05.478Z -
<a href="/mirrors/chromedriver/73.0.3683.68/">73.0.3683.68/</a> 2019-03-07T22:34:54.837Z -
<a href="/mirrors/chromedriver/74.0.3729.6/">74.0.3729.6/</a> 2019-03-12T19:25:26.063Z -
<a href="/mirrors/chromedriver/75.0.3770.140/">75.0.3770.140/</a> 2019-07-12T18:06:25.447Z -
<a href="/mirrors/chromedriver/75.0.3770.8/">75.0.3770.8/</a> 2019-04-30T00:02:57.641Z -
<a href="/mirrors/chromedriver/75.0.3770.90/">75.0.3770.90/</a> 2019-06-13T21:21:15.477Z -
<a href="/mirrors/chromedriver/76.0.3809.12/">76.0.3809.12/</a> 2019-06-07T16:19:42.400Z -
<a href="/mirrors/chromedriver/76.0.3809.126/">76.0.3809.126/</a> 2019-08-20T18:01:27.496Z -
<a href="/mirrors/chromedriver/76.0.3809.25/">76.0.3809.25/</a> 2019-06-13T21:24:59.874Z -
<a href="/mirrors/chromedriver/76.0.3809.68/">76.0.3809.68/</a> 2019-07-16T17:09:55.657Z -
<a href="/mirrors/chromedriver/77.0.3865.10/">77.0.3865.10/</a> 2019-08-06T18:45:26.553Z -
<a href="/mirrors/chromedriver/77.0.3865.40/">77.0.3865.40/</a> 2019-08-20T18:02:46.906Z -
<a href="/mirrors/chromedriver/78.0.3904.105/">78.0.3904.105/</a> 2019-11-18T18:20:40.686Z -
<a href="/mirrors/chromedriver/78.0.3904.11/">78.0.3904.11/</a> 2019-09-12T16:45:50.292Z -
<a href="/mirrors/chromedriver/78.0.3904.70/">78.0.3904.70/</a> 2019-10-21T20:40:07.509Z -
<a href="/mirrors/chromedriver/79.0.3945.16/">79.0.3945.16/</a> 2019-10-30T16:10:56.644Z -
<a href="/mirrors/chromedriver/79.0.3945.36/">79.0.3945.36/</a> 2019-11-18T18:20:03.409Z -
<a href="/mirrors/chromedriver/80.0.3987.106/">80.0.3987.106/</a> 2020-02-13T19:21:31.091Z -
<a href="/mirrors/chromedriver/80.0.3987.16/">80.0.3987.16/</a> 2019-12-19T17:39:26.425Z -
<a href="/mirrors/chromedriver/81.0.4044.20/">81.0.4044.20/</a> 2020-02-13T19:11:47.807Z -
<a href="/mirrors/chromedriver/81.0.4044.69/">81.0.4044.69/</a> 2020-03-17T16:16:51.579Z -
<a href="/mirrors/chromedriver/83.0.4103.14/">83.0.4103.14/</a> 2020-04-16T19:48:28.068Z -
<a href="/mirrors/chromedriver/icons/">icons/</a> 2013-09-25T17:42:04.972Z -
<a href="/mirrors/chromedriver/70.0.3538.LATEST_RELEASE">70.0.3538.LATEST_RELEASE</a> 2018-09-19T22:24:28.963Z 12(12B)
<a href="/mirrors/chromedriver/index.html">index.html</a> 2013-09-25T16:59:18.911Z 10574(10.33kB)
<a href="/mirrors/chromedriver/LATEST_RELEASE">LATEST_RELEASE</a> 2020-04-08T15:52:54.589Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_70">LATEST_RELEASE_70</a> 2019-02-21T05:37:43.183Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_70.0.3538">LATEST_RELEASE_70.0.3538</a> 2018-11-06T07:19:08.413Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_71">LATEST_RELEASE_71</a> 2019-02-21T05:37:29.970Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_71.0.3578">LATEST_RELEASE_71.0.3578</a> 2019-01-21T19:35:43.788Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_72">LATEST_RELEASE_72</a> 2019-02-21T05:37:17.996Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_72.0.3626">LATEST_RELEASE_72.0.3626</a> 2019-01-22T07:21:45.396Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_73">LATEST_RELEASE_73</a> 2019-03-12T16:05:59.036Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_73.0.3683">LATEST_RELEASE_73.0.3683</a> 2019-03-07T22:34:59.301Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_74">LATEST_RELEASE_74</a> 2019-03-12T19:25:31.583Z 11(11B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_74.0.3729">LATEST_RELEASE_74.0.3729</a> 2019-03-12T19:25:30.367Z 11(11B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_75">LATEST_RELEASE_75</a> 2019-07-12T18:06:31.115Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_75.0.3770">LATEST_RELEASE_75.0.3770</a> 2019-07-12T18:06:29.734Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_76">LATEST_RELEASE_76</a> 2019-08-20T18:01:32.838Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_76.0.3809">LATEST_RELEASE_76.0.3809</a> 2019-08-20T18:01:31.671Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_77">LATEST_RELEASE_77</a> 2019-08-20T18:02:52.200Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_77.0.3865">LATEST_RELEASE_77.0.3865</a> 2019-08-20T18:02:50.947Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_78">LATEST_RELEASE_78</a> 2019-11-18T18:20:46.724Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_78.0.3904">LATEST_RELEASE_78.0.3904</a> 2019-11-18T18:20:45.336Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_79">LATEST_RELEASE_79</a> 2019-11-18T18:20:09.561Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_79.0.3945">LATEST_RELEASE_79.0.3945</a> 2019-11-18T18:20:08.321Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_80">LATEST_RELEASE_80</a> 2020-02-13T19:34:11.419Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_80.0.3987">LATEST_RELEASE_80.0.3987</a> 2020-02-13T19:33:45.571Z 13(13B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_81">LATEST_RELEASE_81</a> 2020-03-17T16:16:57.283Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_81.0.4044">LATEST_RELEASE_81.0.4044</a> 2020-03-17T16:16:55.944Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_83">LATEST_RELEASE_83</a> 2020-04-16T19:48:33.848Z 12(12B)
<a href="/mirrors/chromedriver/LATEST_RELEASE_83.0.4103">LATEST_RELEASE_83.0.4103</a> 2020-04-16T19:48:32.557Z 12(12B)
</pre>
<hr>
</div>
<hr/>
<div class="bottom">
Copyright © <a href="https://github.com/cnpm" target="_blank">cnpm</a>
<a href="/">Home</a>
</div>
</body>
</html>
不难发现只要用'<a href="/mirrors/chromedriver/(.*?)</a>'
这一正则表达式就能提取出来全部链接。再用python处理一下,只保留≥74版本的链接,即可提取到所有的chromedriver。如果还没有对应版本,那就只能报错了。
获取到了对应的版本就可以下载了。这里使用urllib库进行下载。下载目录可以更改,但是代码中有四个地方都要修改(即全部'chromedriver_win32.zip'
)。
下载完得到了一个.zip,需要我们进行解压。解压使用zipfile库。最后解压完,就完成下载了!
如果有格式问题,我把源代码上传到博文资源,欢迎下载、提出意见(对于我代码风格的就算了)。