爬虫day3-小结

最新推荐文章于 2024-09-11 10:08:47 发布

奈德丽Zz

最新推荐文章于 2024-09-11 10:08:47 发布

阅读量109

点赞数

文章标签：爬虫 chrome python

本文链接：https://blog.csdn.net/m0_55837237/article/details/121150827

版权

一、selenium基本用法

基本用法

导入语法：from selenium.webdriver import Chrome

from selenium.webdriver import Chrome

# 1. 创建浏览器对象
b = Chrome('files/chromedriver')

# 2.打开页面
b.get('https://www.qidian.com/')

# 3.获取网页数据
print(b.page_source)

# 4.关闭网页
# b.close()

常见配置

导入语法：from selenium.webdriver import Chrome, ChromeOptions

# 1.设置谷歌浏览器的设置对象
options = ChromeOptions()
# 1)取消测试环境
options.add_experimental_option('excludeSwitches', ['enable-automation'])
# 2)取消图片加载  - 加速
options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})

# 2.创建浏览器打开网页
b = Chrome('files/chromedriver', options=options)
b.get('https://www.jd.com')
print(b.page_source)

获取和操作网页标签

导入语法：from selenium.webdriver import Chrome | from selenium.webdriver.common.keys import Keys

b = Chrome('files/chromedriver')
b.get('https://www.jd.com')

# 1. 获取标签
# 浏览器对象.find_element_by...    - 返回标签
# 浏览器对象.find_elements_by...   - 返回列表，列表中的元素是标签
search = b.find_element_by_id('key')
# b.find_element_by_css_selector('#key')

# 2.操作标签
# 1）输入框操作(input标签)：输入内容
search.send_keys('电脑')
# 按回车
search.send_keys(Keys.ENTER)

# 2）点击标签(点击按钮或者超连接)
# 获取到需要点击的标签
search_btn = b.find_element_by_xpath('//div[@role="serachbox"]/button')
# 点击
search_btn.click()

页面滚动

from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

# 1.打开京东搜索'电脑'按回车
b = Chrome('files/chromedriver')
b.get('https://www.jd.com')
search_input = b.find_element_by_id('key')
search_input.send_keys('电脑')
search_input.send_keys(Keys.ENTER)
# print(b.page_source)
time.sleep(1)

# 2.慢慢滚动到指定位置
height = 0
while True:
    height += 500
    if height > 9000:
        break
    # 执行js滚动代码：window.scrollTo(x, y)
    b.execute_script(f'window.scrollTo(0, {height})') #控制滚动的距离
    time.sleep(1)

等待

from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

b = Chrome('files/chromedriver')
b.get('https://www.jd.com')
# 1. 隐式等待
# 获取网页标签的时候，正常情况下获取的时候如果网页中找不到标签程序直接报错；
# 隐式等待是在获取不到标签的时候设置一个等待时间，只要在等待时间内获取到标签就不会报错
b.implicitly_wait(10)     # 设置等待时间为10秒，全局有效


# 2. 显式等待
# 1)先创建一个等待对象：WebDriverWait(浏览器对象, 超时时间)
wait = WebDriverWait(b, 5)
wait2 = WebDriverWait(b, 10)

# 2)添加条件
# 等待对象.until(条件)     -   等到条件成立的时候，等待结束
# 等待对象.until_not(条件)  -  等到条件不成立的时候，等待结束
"""
常用条件：
EC.presence_of_element_located((By.X, 值))      -       判断某个元素是否被加到dom树里（判断某个标签是否加载到网页中，不一定可见），条件成立的时候返回对应的标签
EC.visibility_of_element_located((By.X, 值))    -   判断某个标签是否可见(没有隐藏，并且元素的宽度和高度都不等于0)，条件成立的时候返回对应的标签
EC.text_to_be_present_in_element((By.X, 值), 数据)   -  判断某个标签中的标签内容是否 包含 了预期的字符串，条件成立的时候返回布尔True
EC.text_to_be_present_in_element_value((By.X, 值), 数据)  - 判断某个标签中的value属性是否包含了预期的字符串，条件成立的时候返回布尔True
EC.element_to_be_clickable((By.X, 值))      -   判断某个标签是否可以点击，条件成立的时候返回对应的标签
"""
# EC.presence_of_element_located((通过什么方式确定标签, 值))
wait.until(EC.presence_of_element_located((By.ID, 'key')))
search_input = b.find_element_by_id('key')

# input标签(输入框)的内容就是value属性的值
wait2.until(EC.text_to_be_present_in_element_value((By.ID, 'key'), '电脑'))
search_input.send_keys(Keys.ENTER)