爬虫常用库安装
一、pip install urllib 及 pip install re 及 pip install requests
在python下检验是否安装成功
>>> import requests
>>> requests.get('http://www.baidu.com')
<Response [200]>
>>> import urllib
>>> import urllib.request
>>> urllib.request.urlopen('http://www.baidu.com')
<http.client.HTTPResponse object at 0x0000010E6D8DCC50>
>>> import re
>>>
二、pip-3 install selenium
>>> import selenium
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\common\service.py", line 76, in start
stdin=PIPE)
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\common\service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
提示要安装对应的 chromedriver
在谷歌帮助中查看版本信息 ,下载对应的chromedriver
chromedriver版本要符合支持的Chrome版本
下载地址:http://npm.taobao.org/mirrors/chromedriver/
再将下好的chromedriver解压放到python路径下的scripts中
再试一下,会弹出一个浏览器界面
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
DevTools listening on ws://127.0.0.1:58437/devtools/browser/9a2a9740-024b-4133-939b-a69ec05d7927
>>> driver.get('http://www.baidu.com')
>>> driver.page_source //查看网页源代码
三、Phantomjs 无界面浏览器(在后台寂寞的运行)
下载地址:http://phantomjs.org/download.html
解压后把Phantomjs.exe添加到环境变量里面去
四、