selenium 爬取某基金网站数据

最新推荐文章于 2024-05-13 15:28:32 发布

heyh_py

最新推荐文章于 2024-05-13 15:28:32 发布

阅读量825

点赞数

文章标签：爬虫基金金融数据获取

本文链接：https://blog.csdn.net/heyh_py/article/details/80530468

版权

# coding: utf-8
from selenium import webdriver
import time

driver = webdriver.Chrome()

driver.get(
    'http://fund.eastmoney.com/data/fundranking.html#tall;c0;r;szzf;pn100;ddesc;qsd20170531;qed20180531;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb')
i = 1
while i < 41:
    i = i + 1
    driver.find_element_by_xpath('//label[@value={}]'.format(i)).click()
    time.sleep(5)
    trs = driver.find_elements_by_xpath('//table[@id="dbtable"]//tr')
    for tr in trs:
        ths = tr.find_elements_by_xpath('./td')
        with open('foundation.csv', 'a', encoding='utf-8') as f:
            for th in ths:
                f.write(th.text.replace('\n', '\r\r') + ',' if th.text else '')
            f.write('\n')
        print('保存成功')

这个网站的基金数据是jQuery请求返回的，然后通过渲染到页面上，请求下一页也是ajax请求进行局部刷新，

但是，这个ajax返回的数据是类似于js脚本的格式，不好直接提取，因此采用的selenium模拟点击的方式，

点击下一页按钮，直到没有最后一页，通过xpath提取页面的数据，存入csv文件中，方便在excle或者pandas分析。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

heyh_py

关注关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
selenium 爬取某基金网站数据

# coding: utf-8from selenium import webdriverimport timedriver = webdriver.Chrome()driver.get( 'http://fund.eastmoney.com/data/fundranking.html#tall;c0;r;szzf;pn100;ddesc;qsd20170531;qed2018...
复制链接

扫一扫