day3-selenium基本用法

最新推荐文章于 2024-07-09 11:31:50 发布

m0_62840623

最新推荐文章于 2024-07-09 11:31:50 发布

阅读量170

点赞数

文章标签： selenium chrome python

本文链接：https://blog.csdn.net/m0_62840623/article/details/121150357

版权

selenium基本用法

运行环境：from selenium.webdriver import Chrome

1. 创建浏览器对象

b = Chrome('files/chromedriver')

2. 打开页面

b.get('https://www.qidian.com/rank/yuepiao/month10/')

3.获取网页数据

print(b.page_source)

4.关闭网页

b.close()

selenium常见配置

运行环境：from selenium.webdriver import Chrome, ChromeOptions
import time

1. 设置谷歌浏览器的设置对象

options = ChromeOptions()

1) 取消测试环境

options.add_experimental_option('excludeSwitches', ['enable-automation'])

2) 取消图片加载 - 加速

options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})

2. 创建浏览器打开网页

b = Chrome('files/chromedriver', options=options)
b.get('https://www.jd.com')
print(b.page_source)

获取和操作网页标签

运行环境：from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys

# goods = input('请输入你要获取的商品类型:')
b = Chrome('files/chromedriver')
b.get('https://www.jd.com')

1. 获取标签

浏览器对象.find_element_by… - 返回标签
浏览器对象.find_elements_by… - 返回列表，列表中的元素是标签

search = b.find_element_by_id('key')
# b.find_element_by_css_selector('#key')

2. 操作标签

1）输入框操作(input标签)：输入内容

search.send_keys('电脑')
# 按回车
# search.send_keys(Keys.ENTER)

2）点击标签(点击按钮或者超连接)
获取到需要点击的标签

search_btn = b.find_element_by_xpath('//div[@role="serachbox"]/button')

点击

search_btn.click()

练习：51job爬取5页’数据分析’岗位数据，解析拿到：岗位名称、薪资待遇、公司名称、公司类型

from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
import time
from lxml import etree

b = Chrome('files/chromedriver')


def get_html_by_chrome():
    url = 'https://www.51job.com'
    b.get(url)
    search_input = b.find_element_by_id('kwdselectid')
    search_input.send_keys('数据分析')
    search_input.send_keys(Keys.ENTER)

    # 点5次下一页
    for _ in range(5):
        print('=======================================================================================\n')
        # print(b.page_source)
        analysis_data(b.page_source)
        time.sleep(1)
        next = b.find_element_by_class_name('next')
        next.click()


def analysis_data(html: str):
    html_node = etree.HTML(html)
    all_job_div = html_node.xpath('//div[@class="j_joblist"]/div[@class="e"]')
    for job_div in all_job_div:
        # 工作名称
        job_name = job_div.xpath('./a/p[@class="t"]/span[1]/text()')[0]
        # 薪资
        try:
            salary = job_div.xpath('./a/p[@class="info"]/span[1]/text()')[0]
        except IndexError:
            salary = '面议'
        # 公司名称
        company_name = job_div.xpath('./div[@class="er"]/a/text()')[0]

        # 公司类型
        try:
            company_type = job_div.xpath('./div[@class="er"]/p[@class="int at"]/text()')[0]
        except IndexError:
            company_type = '无'
        print(job_name, salary, company_name, company_type)


if __name__ == '__main__':
    get_html_by_chrome()

页面滚动

运行环境：from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

1. 打开京东搜索’电脑’按回车

b = Chrome('files/chromedriver')
b.get('https://www.jd.com')
search_input = b.find_element_by_id('key')
search_input.send_keys('电脑')
search_input.send_keys(Keys.ENTER)
# print(b.page_source)
time.sleep(1)

2. 慢慢滚动到指定位置

height = 0
while True:
    height += 500
    if height > 9000:
        break
    # 执行js滚动代码：window.scrollTo(x, y)
    b.execute_script(f'window.scrollTo(0, {height})')
    time.sleep(1)


# soup = BeautifulSoup(b.page_source, 'lxml')
# all_goods_li = soup.select('#J_goodsList li')
# print(len(all_goods_li))

等待

运行环境：from selenium.webdriver import Chrome
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

b = Chrome('files/chromedriver')
b.get('https://www.jd.com')

1. 隐式等待

获取网页标签的时候，正常情况下获取的时候如果网页中找不到标签程序直接报错；
隐式等待是在获取不到标签的时候设置一个等待时间，只要在等待时间内获取到标签就不会报错

b.implicitly_wait(10)     # 设置等待时间为10秒，全局有效

2. 显式等待

1) 先创建一个等待对象：WebDriverWait(浏览器对象, 超时时间)

wait = WebDriverWait(b, 5)
wait2 = WebDriverWait(b, 10)

2) 添加条件
等待对象.until(条件) - 等到条件成立的时候，等待结束
等待对象.until_not(条件) - 等到条件不成立的时候，等待结束

常用条件：
EC.presence_of_element_located((By.X, 值)) - 判断某个元素是否被加到dom树里（判断某个标签是否加载到网页中，不一定可见），条件成立的时候返回对应的标签
EC.visibility_of_element_located((By.X, 值)) - 判断某个标签是否可见(没有隐藏，并且元素的宽度和高度都不等于0)，条件成立的时候返回对应的标签
EC.text_to_be_present_in_element((By.X, 值), 数据) - 判断某个标签中的标签内容是否包含了预期的字符串，条件成立的时候返回布尔True
EC.text_to_be_present_in_element_value((By.X, 值), 数据) - 判断某个标签中的value属性是否包含了预期的字符串，条件成立的时候返回布尔True
EC.element_to_be_clickable((By.X, 值)) - 判断某个标签是否可以点击，条件成立的时候返回对应的标签

# EC.presence_of_element_located((通过什么方式确定标签, 值))
wait.until(EC.presence_of_element_located((By.ID, 'key')))
search_input = b.find_element_by_id('key')

# input标签(输入框)的内容就是value属性的值
wait2.until(EC.text_to_be_present_in_element_value((By.ID, 'key'), '电脑'))
search_input.send_keys(Keys.ENTER)
print('===============end==============')

m0_62840623

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
day3-selenium基本用法

selenium基本用法运行环境：from selenium.webdriver import Chrome1. 创建浏览器对象b = Chrome('files/chromedriver')2. 打开页面b.get('https://www.qidian.com/rank/yuepiao/month10/')3.获取网页数据print(b.page_source)4.关闭网页b.close()selenium常见配置运行环境：from selenium.webdriver i
复制链接

扫一扫