python爬虫通过selenuim和Chromedriver模拟登录B站并点击返回“排行榜”页的url和网页源码

最新推荐文章于 2024-05-07 15:35:11 发布

Healer512

最新推荐文章于 2024-05-07 15:35:11 发布

阅读量599

点赞数 1

分类专栏： python 爬虫文章标签： python selenium chrome

本文链接：https://blog.csdn.net/Healer512/article/details/106386049

版权

python 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

爬虫

2 篇文章 0 订阅

订阅专栏

知识点

selenuim

这个代码的使用需要先安装selenuim库，这个库我的pycharm里面没有，所以我利用pip来安装的python -m pip install selenuim
，也是失败好几次最后才装成功。
在代码中用到了几个函数
find_element_by_id()：在当前页通过ID来定位标签元素
find_element_by_xpath()：在当前页中利用Xpath语句来定位到元素，可以利用
XPath Helper在浏览器里面测试一下.
chrome.current_url,chrome.page_source：获得当前页的url和源码.

chromedriver

接着就是chromdriver的安装，需要将它的Chromedriver.exe放在和chrom.exe同一个目录下，并将这个路径添加到系统环境变量里的Path中并重启，在cmd中能访问即成功。chrome与Chromedriver有一个对应表

最大bug

在点击“排行榜”跳转之后得到的url是www.bilibili.com，完蛋。去百度试了一下能获得跳转页的url，那就不是我的问题了，就这样一直找原因，都是时间啊o(╥﹏╥)o，好饿
感谢大佬的帮助，在这个帖子里找到了答案。

在运行中经常会出错，报找不到xxxx,那是因为还没有刷新完，需要等待。所以我用了time.sleep()。那个验证是通过手滑的。

from selenium import webdriver
import time
chrome=webdriver.Chrome()
chrome.get("https://passport.bilibili.com/login")
#通过Xpath定位到“排行榜”的url://a[@target="_blank"][contains(text(),"排行榜")]/@href
#登录输入框的ID：login-username,login-passwd
chrome.find_element_by_id("login-username").send_keys("130xxxxx011")
chrome.find_element_by_id("login-passwd").send_keys("cpxxxxxx34")
time.sleep(2)
chrome.find_element_by_xpath('//a[text()="登录"]').click()
time.sleep(5)
chrome.find_element_by_xpath('//a[@target="_blank"][contains(text(),"排行榜")]').click()
chrome.switch_to.window(chrome.window_handles[1])
print(chrome.current_url)
time.sleep(5)
print(chrome.current_url)
with open("first.html","w",encoding="utf-8") as file:
    file.write(chrome.page_source)
chrome.quit()#关闭当前页面