python_爬虫_webdriver模拟器

最新推荐文章于 2024-07-01 11:36:30 发布

zk仔的博客

最新推荐文章于 2024-07-01 11:36:30 发布

阅读量966

点赞数

分类专栏： python_爬虫

本文链接：https://blog.csdn.net/weixin_39532362/article/details/87901678

版权

python_爬虫专栏收录该内容

14 篇文章 0 订阅

订阅专栏

初始化及配置

firefox

from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile

def get_driver():
  # 构造配置对象
  fp=webdriver.FirefoxProfile()

  # 用系统浏览器的配置
  #fp=webdriver.FirefoxProfile(r'C:\Users\Administrator\AppData\Roaming\Mozilla\Firefox\Profiles\pu72dfl6.default')
  
  # 设置新窗口打开方式：1当前窗口；2新窗口；3标签页)
  fp.setPreference("browser.link.open_newwindow", 3);

  # 禁用CSS
  fp.set_preference('permissions.default.stylesheet', 2)
  
  # 禁用加载图像
  fp.set_preference('permissions.default.image', 2)
  
  # 禁用加载Flash
  fp.set_preference('dom.ipc.plugins.enabled.libflashplayer.so','false')

  # 设置文件下载路径：0下载到桌面；1下载到默认路径；2自定义下载路径；
  fp.set_preference("browser.download.folderList",2)
  fp.set_preference("browser.download.dir",os.getcwd())
  
  # 在开始下载时是否显示下载管理器
  fp.set_preference("browser.download.manager.showWhenStarting",False)
  
  # 对所给出文件类型不再弹出框进行询问
  fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")

  # 指定配置文件及启动文件路径
  return webdriver.Firefox(firefox_profile=fp,executable_path=r'.\Mozilla Firefox\geckodriver')

bw=get_driver()
bw.get(url)

设置打开窗口方式

如果代码设置失败，需要修改selenium的firefox配置文件webdriver_prefs.json
路径：Python\Lib\site-packages\selenium\webdriver\firefox\

手动设置地址

about:config：firefox浏览器设置地址

chrome

# 生成设置
options = webdriver.ChromeOptions()

# 更换头部
options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36')

# 加载文件路径
options.add_argument('--user-data-dir=./chrome/data') 

# 不加载图片【2=n;3=y】
prefs={"profile.managed_default_content_settings.images":2}
options.add_experimental_option("prefs",prefs)

# 设置无界面浏览器
options.add_argument('--headless')
options.add_argument('--disable--gpu')

# 设置代理
options.add_argument('--proxy-server=http://ip:port')

# 设置报错级别
options.add_argument('--log-level=3')

# 启动时安装crx扩展
options.add_extension('d:\crx\AdBlock_v2.17.crx')

# 启动
browser = webdriver.Chrome(chrome_options=options,executable_path='./Chrome/chromedriver')

# 全屏
browser.maximize_window()

常用函数及属性

浏览器属性方法：

bw.get(url)：提交请求
bw.colse()：关闭浏览器
bw.refresh()：刷新
bw.current_url：获取当前url
bw.page_source：获取源码
bw.find_element_by_xpath('//*').get_attribute('outerHTML')：获取源码
bw.back()：后退
bw.forward()：前进
bw.switch_to_window(bw.window_handles[0])：切换选项卡
bw.switch_to.frame(str/int)#name,id,webelement：切换到指定iframe
switch_to.parent_frame()：切换父iframe
get_cookies()：获取cookies
delete_all_cookies()：删除所有cookies
add_cookie({'name':'zzz','age':18})：增加cookies

执行js

bw.execute_script('window.scrollTo(0, document.body.scrollHeight)')：滚动到底部
bw.execute_script('window.stop()')：停止加载

对话框：

bw.switch_to_alert().accept()：同意对话框
bw.switch_to_alert().dismiss()：取消对话框

元素交互：

ele.clear()：清除可输入节点的内容
ele.send_keys(str)：在可输入节点键入内容
ele.click()：点击节点

获取节点内容：

ele.id：获取id
ele.tag_name：获取标签名
ele.location：获取位置
ele.size：获取字节数
ele.text：获取文本
ele.get_attribute('innerText')：获取文本
ele.get_attribute('class')：获取属性内容

截屏

save_screenshot()：当前目录
get_screenshot_as_file(filename)：制定路径

定位元素

搜索单个返回第一个元素

find_element_by_id(str)：id
find_element_by_class_name(str)：classname
find_element_by_tag_name(str)：标签名
find_element_by_name：name
find_element_by_link_text(str)：文字内容
find_element_by_partial_link_text(str)：部分文字内容
find_element_by_css_selector(str)：css选择器
find_element_by_xpath(str)：xpath

搜索全部返回列表

find_elements_by_id(str)：id
find_elements_by_class_name(str)：classname
find_elements_by_tag_name(str)：标签名
find_elements_by_name：name
find_elements_by_link_text(str)：文字内容
find_elements_by_partial_link_text(str)：部分文字内容
find_elements_by_css_selector(str)：css选择器
find_elements_by_xpath(str)：xpath

利用By对象定位

from selenium.webdriver.common.by import By

bw.find_element(By.ID, str)
bw.find_element(By.CLASS_NAME, str)
bw.find_element(By.TAG_NAME, str)
bw.find_element(By.NAME, str)
bw.find_element(By.LINK_TEXT, str)
bw.find_element(By.PARTIAL_LINK_TEXT, str)
bw.find_element(By.CSS_SELECTOR, str)
bw.find_element(By.XPATH, str)

交互动作

from selenium.webdriver import ActionChains

bw.switch_to.frame('iframeResult')#name,id,webelement
source=bw.find_element_by_css_selector('#draggable')
target=bw.find_element_by_css_selector('#droppable')

actions=ActionChains(bw)
actions.drag_and_drop(source,target)
actions.perform()

窗口操作

import time

# 打开5个新窗口
for i in range(5):
    bw.execute_script("window.open('https://www.baidu.com/')") 

# 获取当前窗口句柄
windows = bw.current_window_handle

# 获取所有窗口句柄
all_handles = bw.window_handles
 
# 遍历窗口句柄
for handle in all_handles:
    if handle!=windows:
        bw.switch_to_window(handle)
        print(handle)
        time.sleep(2)

快捷键

from selenium.webdriver.common.keys import Keys

ele.send_keys(Keys.BACK_SPACE) #删除键（BackSpace）
ele.send_keys(Keys.SPACE) #空格键(Space)
ele.send_keys(Keys.TAB) #制表键(Tab)
ele.send_keys(Keys.ESCAPE) #回退键（Esc）
ele.send_keys(Keys.ENTER) #回车键（Enter）
ele.send_keys(Keys.F1) #键盘 F1

ele.send_keys(Keys.CONTROL,'a') #全选（Ctrl+A）
ele.send_keys(Keys.CONTROL,'c') #复制（Ctrl+C）
ele.send_keys(Keys.CONTROL,'x') #剪切（Ctrl+X）
ele.send_keys(Keys.CONTROL,'v') #粘贴（Ctrl+V）

# 打开新窗口
bw.execute_script("window.open('')")

中断

from selenium.common import exceptions

bw.set_page_load_timeout(5)
bw.set_script_timeout(5)
  
try:
	bw.get(url)
except exceptions.TimeoutException:
	bw.execute_script('window.stop()')

等待

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(browser, 10)

key = wait.until(EC.presence_of_element_located((By.ID, 'key')))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn')))
kw = wait.until(lambda driver:driver.find_element_by_id("kw"))

print(key, button, kw)

常用等待条件判断

title_is 标题是某内容
title_contains 标题包含某内容
presence_of_element_located 元素加载出，传入定位元组，如(By.ID, 'p')
visibility_of_element_located 元素可见，传入定位元组
visibility_of 可见，传入元素对象
presence_of_all_elements_located 所有元素加载出
text_to_be_present_in_element 某个元素文本包含某文字
text_to_be_present_in_element_value 某个元素值包含某文字
frame_to_be_available_and_switch_to_it frame加载并切换
invisibility_of_element_located 元素不可见
element_to_be_clickable 元素可点击
staleness_of 判断一个元素是否仍在DOM，可判断页面是否已经刷新
element_to_be_selected 元素可选择，传元素对象
element_located_to_be_selected 元素可选择，传入定位元组
element_selection_state_to_be 传入元素对象以及状态，相等返回True，否则返回False
element_located_selection_state_to_be 传入定位元组以及状态，相等返回True，否则返

其他使用参考链接

交互动作 http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains
等待判断 http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions
异常 http://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions

zk仔的博客

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python_爬虫_webdriver模拟器

爬虫之selenium导入模块初始化及配置设置打开窗口方式常用函数定位元素导入模块# 用于初始化from selenium import webdriver# 用于浏览器引擎的配置from selenium.webdriver.firefox.firefox_profile import FirefoxProfile# 包含获取元素element的方法from selenium....
复制链接

扫一扫