环境搭建
安装
pip install selenium
pip install beautifulsoup4
还不行,可试一下下载安装到当前运行的编辑器下的包文件夹里
pip install --target=E:\AZ\python\ANACONDA\envs\py38\Lib\site-packages beautifulsoup4
pip install --target=E:\AZ\python\ANACONDA\envs\py38\Lib\site-packages selenium
下载
Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
推荐下载chromedriver地址
下载chromedriver地址:https://chromedriver.chromium.org
chromedriver.exe文件放至python.exe所在目录
以下错误说明下载版本不对,要下载102.0.5005
才行
Message: session not created: This version of ChromeDriver only supports Chrome version 103
Current browser version is 102.0.5005.115 with binary path C:\Users\CFFHL\AppData\Local\Google\Chrome\Application\chrome.exe
Stacktrace:
参考代码
from bs4 import BeautifulSoup
from selenium import webdriver
target = '网页网址'
option = webdriver.ChromeOptions()
option.add_argument('headless') # 设置option,后台运行
driver = webdriver.Chrome(chrome_options=option)
driver.get(target)
result= driver.find_element_by_class_name('需要点击的类名')
result.click()
result_list= driver.find_elements_by_class_name('需要点击的类名')
for i in range(4, 8):
result_list[i].click()
selenium_page = driver.page_source
driver.quit()
soup = BeautifulSoup(selenium_page, 'html.parser')
# one = soup.find('div', {'class': '布拉布拉类名'}) 单个
many= cities.find_all('div', {'class': '咕噜咕噜类名'}) #多个
for i in many:
content = i.find_all('p') #找到对应元素
nation = content[0].get_text() # 读取内容
以分析百度首页为例
from bs4 import BeautifulSoup
from selenium import webdriver
target = 'https://www.baidu.com/'
option = webdriver.ChromeOptions()
option.add_argument('headless') # 设置option,后台运行
driver = webdriver.Chrome(chrome_options=option)
driver.get(target)
result= driver.find_element_by_class_name('bg s_btn btnhover')
result.click()
result_list= driver.find_elements_by_class_name('bg s_btn btnhover')
for i in range(4, 8):
result_list[i].click()
selenium_page = driver.page_source
driver.quit()
soup = BeautifulSoup(selenium_page, 'html.parser')
print(soup)
# one = soup.find('div', {'class': '布拉布拉类名'}) 单个
# many= soup.find_all('div', {'class': '咕噜咕噜类名'}) #多个
# for i in many:
# content = i.find_all('p') #找到对应元素
# nation = content[0].get_text() # 读取内容
出现以下,或者正确结果,说明 环境搭建成功
AttributeError: 'WebDriver' object has no attribute 'find_element_by_class_name'
加入
from selenium.webdriver.common.by import By
把元素选取改为
result_list= driver.find_element(By.XPATH,r'网页元素复制过来的').click()
控制台调试
$x('网页元素复制过来的xpath')
以下错误可忽略:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
requests 2.22.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.9 which is incompatible.
pip list
pip uninstal requests
pip install reques
参考
参考1
参考2
参考3
参考4