python selenium爬虫自动登录实例

think12

已于 2023-07-24 21:44:55 修改

阅读量2.8k

点赞数 2

文章标签： python selenium 爬虫

于 2023-07-24 21:43:35 首次发布

本文链接：https://blog.csdn.net/think12/article/details/131905638

版权

拷贝地址：python selenium爬虫自动登录实例_python selenium登录_Ustiniano的博客-CSDN博客

一、概述

我们要先安装selenium这个库，使用pip install selenium 命令安装，selenium这个库相当于机器模仿人的行为去点击浏览器上的元素，这时我们要用到一个浏览器的驱动（这里我用的是谷歌浏览器）。
二、安装驱动
确认浏览器版本

首先我们先要查看自己浏览器的版本，谷歌浏览器的话点右上角三个点--帮助--关于 Chrome

我们会看到自己的浏览器版本，可以看到我的浏览器版本为100.0.4896.127（正式版本）

下载驱动

打开网页：CNPM Binaries Mirror

找到100.0.4896.127，后面的小版本号虽然和我的浏览器有些差异，可以忽略。只要保证大版本是一样即可。

点击进去，找到windows版。注意：windows版只有32位，没有64位。

下载完后，解压后里面有个chromedriver.exe文件

获取自己python安装的目录

打开cmd，输入where python可以查看python安装的路径，一般是下面这个（如果找不到目录记得打开计算机文件隐藏项目）

将解压后的chromedriver.exe文件复制到python安装目录下

三、分析网页

打开某宝官网，点击登录，按f12查看网页源码，定位到账号输入框、密码输入框和登录按钮复制它们的xpath 。

返回官网首页，同样的方法复制搜索框和搜索按钮的xpath，这里比如我输入电脑

接下来分析网页获取商品信息，这里我就放在代码里面了。
四、代码

代码这里我使用了一个滑块验证的方法，滑块验证不一定会成功也可以自己手动滑一下。

 import time
    import csv
    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver import ChromeOptions, ActionChains
     
     
    # 定义爬取单页的函数
    def get_page(web):
        divs = web.find_elements_by_xpath('//*[@id="mainsrp-itemlist"]/div/div/div[1]/div')
        # print(divs)
        for div in divs:
            info = div.find_element_by_xpath('./div[2]/div[2]/a').text  # 商品名称
            price = div.find_element_by_xpath('./div[2]/div[1]/div[1]/strong').text + '元'  # 商品价格
            deal = div.find_element_by_xpath('./div[2]/div[1]/div[2]').text  # 商品付款人数
            name = div.find_element_by_xpath('./div[2]/div[3]/div[1]/a/span[2]').text  # 商家店名
            print(info, price, deal, name, sep="|")
            try:
                csvwriter.writerow([info, price, deal, name])
            except :
                pass
     
     
    option = ChromeOptions()
    # 设置为开发者模式，防止被各大网站识别出来使用了Selenium
    option.add_experimental_option('excludeSwitches', ['enable-automation'])
    option.add_argument("--disable-blink-features")
    option.add_argument("--disable-blink-features=AutomationControlled")
    # 初始化一个web对象
    web = webdriver.Chrome(options=option)
    # 进入淘宝官网
    web.get('https://www.taobao.com/')
    # 点击登录
    web.find_element_by_xpath('//*[@id="J_SiteNavLogin"]/div[1]/div[1]/a[1]').click()
    # 输入账号密码
    web.find_element_by_xpath('//*[@id="fm-login-id"]').send_keys('你的手机号')
    web.find_element_by_xpath('//*[@id="fm-login-password"]').send_keys('你的密码')
    # 点击登录
    web.find_element_by_xpath('//*[@id="login-form"]/div[4]/button').click()
    time.sleep(2)
    # 搜索商品并回车
    web.find_element_by_xpath('//*[@id="q"]').send_keys('电脑', Keys.ENTER)
    time.sleep(3)
    #  验证淘宝滑块，在前三秒也可以手动滑块，因为不确保自动滑块能成功
    try:
        yz = web.find_element_by_xpath('//*[@id="baxia-punish"]/div[2]/div/div[1]/div[2]/div/p').text
        if yz == '通过验证以确保正常访问':
            while 1:
                # 获取滑块的大小
                span_background = web.find_element_by_xpath('//*[@id="nc_1__scale_text"]/span')
                span_background_size = span_background.size
                # print(span_background_size)
                # 获取滑块的位置
                button = web.find_element_by_xpath('//*[@id="nc_1_n1z"]')
                button_location = button.location
                # print(button_location)
                # 拖动操作：drag_and_drop_by_offset
                # 将滑块的位置由初始位置，右移一个滑动条长度（即为x坐标在滑块位置基础上，加上滑动条的长度，y坐标保持滑块的坐标位置）
                x_location = span_background_size["width"]
                y_location = button_location["y"]
                # print(x_location, y_location)
                action = ActionChains(web)
                source = web.find_element_by_xpath('//*[@id="nc_1_n1z"]')
                action.click_and_hold(source).perform()
                action.move_by_offset(x_location, 0)
                action.release().perform()
                time.sleep(1)
                try:
                    web.find_element_by_xpath('//*[@id="`nc_1_refresh1`"]').click()
                    time.sleep(3)
                except:
                    pass
    except:
        with open('taobao.csv', mode='a', newline='', encoding='gbk') as fp:
            csvwriter = csv.writer(fp, delimiter=',')
            csvwriter.writerow(['info', 'price', 'deal', 'name'])
        Allpage = 3
        count = 0
        while count < Allpage:
            count += 1
            print('-------------------正在爬取第%d页---------------------' % count)
            get_page(web)
            web.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/ul/li[8]/a/span[1]').click()
            print('------------------------')
            time.sleep(5)
     
        web.close()
        web.quit()

最好不要用自己的账号过多的爬取，可能会封号。