Day04 爬虫学习第四天：Selenium捕获动态加载数据，12306模拟登录

最新推荐文章于 2023-02-27 23:47:50 发布

free youreself

最新推荐文章于 2023-02-27 23:47:50 发布

阅读量378

点赞数 1

分类专栏：爬虫文章标签： python 爬虫 chrome

本文链接：https://blog.csdn.net/NotFound_error/article/details/105476425

版权

文章目录

- Selenium捕获动态加载数据
- selenium模拟12306登录

学习了Selenium捕获动态加载数据和Selenium模拟12306登录

Selenium捕获动态加载数据

我是在chorme浏览器中进行爬虫的，在进行Selenium动态爬取之前，需要先下载choremdriver驱动，可以通过下面的链接下载对应的版本。
chormedriver下载 http://chromedriver.storage.googleapis.com/index.html
下面还是一个爬取药监局页面的公司名称的代码，药监局的url:http://125.35.6.84:81/xk/

from selenium import webdriver
import time
from lxml import etree
bro = webdriver.Chrome(executable_path='./chromedriver.exe') 
url = 'http://125.35.6.84:81/xk/'
bro.get(url)
#爬取药监总局的前三页的数据
time.sleep(2)
#获取当前浏览器显示的页面源码数据
page_text = bro.page_source  #该属性可以返回当前页面的所有被加载出来的页面源码数据

#列表：存放前三页的页面源码数据
all_page_text = [page_text]
for i in range(3):
    #进行下一页按钮的定位且对其进行点击
    a_tag = bro.find_element_by_xpath('//*[@id="pageIto_next"]')
    a_tag.click()
    time.sleep(1)
    all_page_text.append(bro.page_source)

for page_text in all_page_text:    
    #解析企业名称(动态加载的数据)
    tree = etree.HTML(page_text)
    li_list = tree.xpath('//*[@id="gzlist"]/li')
    for li in li_list:
        name = li.xpath('./dl/@title')[0</

最低0.47元/天解锁文章

free youreself

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Day04 爬虫学习第四天：Selenium捕获动态加载数据，12306模拟登录

文章目录Selenium捕获动态加载数据selenium模拟12306登录学习了Selenium捕获动态加载数据和Selenium模拟12306登录Selenium捕获动态加载数据我是在chorme浏览器中进行爬虫的，在进行Selenium动态爬取之前，需要先下载choremdriver驱动，可以通过下面的链接下载对应的版本。chormedriver下载 http://chromedri...
复制链接

扫一扫

专栏目录