selenium模拟登陆某宝商家店铺千牛后台--小林月

最新推荐文章于 2025-03-17 10:11:07 发布

小林月

最新推荐文章于 2025-03-17 10:11:07 发布

阅读量1.7k

点赞数 2

文章标签： selenium 爬虫测试工具

本文链接：https://blog.csdn.net/qq_53953480/article/details/130606111

版权

一、登陆账号

二、发现出现模块（暴力解决）

三、发现最大的问题：（iframe网页页面）

四、寻找规律提取ID，标题，创建时间和商品状态

五、利用字典和datafram存贮数据导出到exel

六、结果展示

七、完整代码

selenium：用来模拟人自动登陆网页

任务：爬取店铺链接创建时间以及ID，标题

网页：https://myseller.taobao.com/home.htm/SellManage/all?current=1&pageSize=20

一、登陆账号

1.1、使用账号密码登陆账号

查看元素class标签为iconfont或者icon-password

web.find_element(By.CLASS_NAME,"iconfont").click()

1.2、输入账号密码

元素标签定位ID分别为fm-login-id和fm-login-password

web.find_element(By.ID,"fm-login-id").send_keys(account[C])
web.find_element(By.ID,"fm-login-password").send_keys(password[C])

1.3、点击登录

web.find_element(by=By.XPATH,value='//*[@id="login-form"]/div[5]/button').click()

二、发现出现模块（暴力解决）

时间等待，手动滑动

time.sleep(10)

三、发现最大的问题：（iframe网页页面）

出现问题：一直报错找不到元素

解决方案

取出该页面，再进行提取元素

# 进入嵌套iframe页面
iframe1 = web.find_element(By.TAG_NAME,"iframe")
web.switch_to.frame(iframe1)

四、寻找规律提取ID，标题，创建时间和商品状态

##标题：
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[1]/div/div/div/span[1]
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[2]/div/div/div/span[1]
##	ID：

//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[1]/div/div/div/span[2]
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[2]/div/div/div/span[2]
##	创建时间：
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[7]/div/div/div[1]
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[2]/td[7]/div/div/div[1]
##	销售状态：
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[1]/td[7]/div/div/div[2]
//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[2]/td[7]/div/div/div[2]

找出Xpath规律为在tr处，一页一共20条数据，for循环获取20次可以获取完

一共有多少页就循环点击下一次多少次

web.find_element(by=By.XPATH,
                         value='//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[2]/div[2]/button[2]').click()

五、利用字典和datafram存贮数据导出到exel

几个字段就设置几个list，每次取出数据后就appead加入列表，获取完数据后，统一转换为dict字典。最后转成df，然后导出数据到excel

dict = {}
dict['ID'] = list1
dict['标题'] = list2
dict['创建时间'] = list3
dict['状态'] = list4
# print(dict)
df = pd.DataFrame(dict)
print(df)
df.insert(0, "店铺", 店铺)
df.to_excel('./天猫创建时间/'+店铺+'创建时间.xlsx', index=False)

六、结果展示

七、完整代码

注意账号密码涉及到个人隐私，未展出，可以自行修改。

from selenium import webdriver
import time
import random
import pandas as pd
from selenium.webdriver.common.by import By
from browsermobproxy import Server
# 创建代理服务器
# server = Server(r"F:\browsermob-proxy-2.1.4\bin\browsermob-proxy.bat")
# server.start()
# proxy = server.create_proxy()
#打开浏览器
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-automation'])
options.add_experimental_option("detach",True)#闪退问题
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('--ignore-certificate-errors')
# options.add_argument('--headless') #无头模式
# options.add_argument('--proxy-server={0}'.format(proxy.proxy))
web =webdriver.Chrome(options=options)
web.implicitly_wait(5)
# 1.登录流程
web.get("https://myseller.taobao.com/home.htm/SellManage/all?current=1&pageSize=20")
time.sleep(6)
web.maximize_window()
name = [""]
account =[""]
password = ['"]
C = 5
# 进入嵌套iframe页面
iframe1 = web.find_element(By.TAG_NAME,"iframe")
web.switch_to.frame(iframe1)
web.find_element(By.CLASS_NAME,"iconfont").click()
##点击
time.sleep(3)
web.find_element(By.ID,"fm-login-id").send_keys(account[C])
time.sleep(3)
web.find_element(By.ID,"fm-login-password").send_keys(password[C])
time.sleep(10)
web.find_element(by=By.XPATH,value='//*[@id="login-form"]/div[5]/button').click()
time.sleep(5)
web.execute_script('window.scrollTo(0,document.body.scrollHeight)')
time.sleep(random.randint(4,9))
list1 = []
list2= []
list3 = []
list4 = []
店铺 = name[C]
for di in range(0,1):
    for i  in range (1,21):
        try:
            ID = web.find_elements(by=By.XPATH,
                value = '//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[' + str(
                    i) + ']/td[2]/div/div/div/span[2]'
            )[0].text
            ID = ID[3:]
            list1.append(ID)
        except:
            print("类目不足")
            dict = {}
            dict['ID'] = list1
            dict['标题'] = list2
            dict['创建时间'] = list3
            dict['状态'] = list4
            # import pandas as pd
            df = pd.DataFrame(dict)
            df.insert(0, "店铺",店铺)
            print(df)
            df.to_excel('./天猫创建时间/'+店铺+'创建时间.xlsx', index=False)
            # print(ID)
            quit()
        try:
            TITLE = web.find_elements(by=By.XPATH,
                                      value='//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[' + str(
                                          i) + ']/td[2]/div/div/div/span[1]/a'
                                      )[0].text
            list2.append(TITLE)
        except:
            print("TITLE")
            quit()
        try:
            time_creat = web.find_elements(by=By.XPATH,
                        value='//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[' + str(
                        i) + ']/td[7]/div/div/div[1]')[0].text
            # print(time_creat[:-5]
            list3.append(time_creat)
        except:
            print("time_creat")
            quit()
        try:
            zhuangtai= web.find_elements(by=By.XPATH,
                        value='//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[1]/table/tbody/tr[' + str(
                        i) + ']/td[7]/div/div/div[2]')[0].text
            # print(time_creat[:-5]
            list4.append(zhuangtai)
        except:
            print("zhuangtai")
            quit()
    try:
        web.find_element(by=By.XPATH,
                         value='//*[@id="sell-manage-wrap"]/div[4]/div/div[5]/div[2]/div[2]/button[2]').click()
        time.sleep(6)
    except:
            dict = {}
            dict['ID'] = list1
            dict['标题'] = list2
            dict['创建时间'] = list3
            dict['状态'] = list4
            print(dict)
            df = pd.DataFrame(dict)
            print(df)
            df.insert(0, "店铺", 店铺)
            df.to_excel('./天猫创建时间/'+店铺+'创建时间.xlsx', index=False)
dict = {}
dict['ID'] = list1
dict['标题'] = list2
dict['创建时间'] = list3
dict['状态'] = list4
# print(dict)
df = pd.DataFrame(dict)
print(df)
df.insert(0, "店铺", 店铺)
df.to_excel('./天猫创建时间/'+店铺+'创建时间.xlsx', index=False)