Python selenium速查

最新推荐文章于 2024-03-06 11:45:00 发布

pendant59

最新推荐文章于 2024-03-06 11:45:00 发布

阅读量328

点赞数

分类专栏： Python 文章标签： selenium python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/benpaodelulu_guajian/article/details/106257351

版权

Python 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

目录

13. 修改更改chromedriver的浏览器驱动标识，反爬

12. div 滚动条滚动

11. 获取指定元素集合的个数

10. 选择指定位置的li

9. 鼠标移动到指定位置

8. 获取隐藏的元素

7. 常用设置

6. 等待元素出现设置了最长等待15秒，超时不出现则抛出异常

5. 切换当前浏览器的标签页

3. 设置窗口大小

2. selenium 添加 cookie

1. invalid cookie domain

有帮助记得点赞呐

13. 修改更改chromedriver的浏览器驱动标识，反爬

url = 'https://xxx.xxx.com'
# 请求页面
driver.get(url)
# 请求页面后执行js，更改chromedriver的浏览器驱动标识，只在当前页面有效
js = 'Object.defineProperty(navigator, "webdriver", {get:() => false});'
brower.execute_script(js)

12. div 滚动条滚动

# 截取部分代码  driver 就是驱动
# 获取子div的高度
top_length = (div[k].size)['height']
while True:
    print('开始滚动')
    top_length = (int(top_length) + 500)
    # js 获取待滚动条的父级div元素并设置滚动条位置
    js = 'document.getElementsByClassName("game-store__block")[0].scrollTop={}'.format(top_length)
    driver.execute_script(js)
    print(top_length)
    result = self.is_not_visible(driver, 'body > div > div.game-store__main > div > div.loading-wrapper')
    if result:
        break
    else:
        print('继续滚动')
print('滚动结束')


# 对象方法
def is_not_visible(self, driver, css_selector, timeout=1):
    try:
        WebDriverWait(driver, int(timeout)).until_not(EC.visibility_of_element_located(("css selector", css_selector)))
        return True
    except TimeoutException:
        return False

11. 获取指定元素集合的个数

# 获取ul
ul = driver.find_element_by_css_selector('body > .frm_control_group > div > ul')
# 获取ul下面的所有li  注意是 elements
lis = ul.find_elements_by_tag_name('li')
print(len(lis))

10. 选择指定位置的li

# 点击触发下拉框
driver.find_element_by_css_selector('#dropdown_menu > a > i').click()
# 选中ul 这里也可以选外层的元素 最好是选li的直属父级
ul = driver.find_element_by_css_selector('body > .frm_control_group > div > ul')
# 选择指定索引位置的li （lin_index从0开始）
ul.find_elements_by_tag_name('li')[li_index].click()


# 获取ul下li的个数
li_num = len(ul.find_elements_by_tag_name('li'))

9. 鼠标移动到指定位置

from selenium.webdriver import ActionChains

ActionChains(driver).move_to_element(driver.find_element_by_id('#tt-link-ul')).perform()

8. 获取隐藏的元素

先判断元素是否被隐藏

element = driver.find_element_by_css_selector('#table-info > tbody > tr > td')

display = element.is_displayed()

如果 display 是 false 则表示元素被隐藏了

# 获取内部 html 字符串
element.get_attribute("innerHTML")) 
# 获取文本内容
element.get_attribute('textContent')

7. 常用设置

from selenium.webdriver.chrome.options import Options
from selenium import webdriver


bin_path = 'chromedriver驱动 所在的路径'

chrome_options = Options()
chrome_options.add_argument('--headless')  # 无头浏览器，即看不到浏览器界面
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-zygote')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument(
    '--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36')

driver = webdriver.Chrome(bin_path, chrome_options=chrome_options)

6. 等待元素出现设置了最长等待15秒，超时不出现则抛出异常


from selenium.webdriver.support.ui import WebDriverWait
import selenium.webdriver.support.expected_conditions as EC



WebDriverWait(driver, 15).until(EC.visibility_of_element_located(("css selector", '#menuBar')))

5. 切换当前浏览器的标签页

driver.switch_to.window(driver.window_handles[1])  # 切换到浏览器的第二个标签页  0 是第一个

4. 选择器

driver.find_element_by_id('login_form') # 根据id，这个不需要加#号
driver.find_element_by_css_selector('#menuBar > li.weui-desktop >ul > li:nth-child(3) > a > span') # 根据css选择器

3. 设置窗口大小

driver.maximize_window() # 最大化
driver.set_window_size(1920, 1080) # 指定大小

2. selenium 添加 cookie

# driver = webdriver.Chrome(bin_path, chrome_options=chrome_options)

# 循环 cookie_list(自己构造) 取出cookie键值对组成的 dict
for cookie_dict in cookie_lists:
    # cookie_dict 例如 {'name': 'PHPSESSID', 'value': 'asd21dvyt2cdyt2cdt12ytc21yc'}
    # 字典类型，name 对应 cookie的键， value 对应值
    driver.add_cookie(cookie_dict )

1. invalid cookie domain

爬取多类内容，第一次需要登陆，接着就保存cookie直接请求，给selenium直接添加cookie的时候报错 invalid cookie domain

原因：驱动打开浏览器，没有请求页面，直接添加对应cookie。

解决办法：驱动打开浏览器后先访问cookie 所属网站，一般都是登录页，然后再次添加cookie，就可以了。

添加对应网站的cookie之前，要先访问一次网站页面。

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。