python爬取 东方财富/天天基金网 基金排行数据

在搜集并整理了获取基金数据方法 https://blog.csdn.net/fuyouhan/article/details/120595188
发现还有另外一种通过访问网页并提取数据的方法。这里分享给大家。

思路: 通过chrome driver获取指定urls的数据并解析为dataframes,然后保存为excel的多个sheet
具体步骤:

  1. 安装chrome浏览器,可以从腾讯安装 https://pc.qq.com/search.html#!keyword=chrome
  2. 运行代码:
    • 该代码自动下载chrome版本对应的最新chrome driver
    • 通过该chrome driver自动打开网址,加载数据
    • 解析数据,并保存为excel
"""
通过chrome driver获取指定urls的数据并解析为dataframes,然后保存为excel的多个sheet
6. 安装chrome浏览器,可以从腾讯安装 https://pc.qq.com/search.html#!keyword=chrome
7. 该代码自动下载chrome LATEST driver
8. 通过该chrome自动打开网址,加载数据
9. 解析数据,并保存为excel
"""
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

import pandas as pd
from bs4 import BeautifulSoup
import time
import datetime

def getData(key, url):
    """通过chrome driver获取指定URL的数据并生成dataframe"""

    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.get(url)
    
    time.sleep(10)
    soup = BeautifulSoup(driver.page_source, features="lxml")

    dbtable = soup.findAll(name="table", attrs={"id": "dbtable"})
    all_dict = {}
    for i in range(len(column_name_list)):
        all_dict[column_name_list[i]] = []
    trs_item = dbtable[0].tbody.findAll(name="tr")

    for tr_item in trs_item:
        ids = tr_item.findAll(name="td")
        for i in range(len(column_name_list)):
            all_dict[column_name_list[i]].append(ids[i + 2].text)

    #pd.DataFrame(all_dict).to_csv("{}.csv".format(key), encoding='utf_8_sig', index=False)
    data = pd.DataFrame(all_dict)
    print("%s,nums=%d,%s" % (key, len(data), url))
    print(data)
    data.to_csv("{}.csv".format(key), encoding='utf_8_sig', index=False)

    driver.close()
    return  data


if __name__ == "__main__":
    #driver = webdriver.Chrome(ChromeDriverManager().install())
    url_dict = {
        "全部": "http://fund.eastmoney.com/data/fundranking.html#tall;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb",
        "股票型": "http://fund.eastmoney.com/data/fundranking.html#tgp;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb",
        "混合型": "http://fund.eastmoney.com/data/fundranking.html#thh;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb",
        "债券型": "http://fund.eastmoney.com/data/fundranking.html#tzq;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb",
        "指数型": "http://fund.eastmoney.com/data/fundranking.html#tzs;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb"
    }

    column_name_list = ["基金代码", "基金简称", "日期", "单位净值", "累计净值", "日增长率", "近1周", "近1月", "近3月", "近6月", "近1年", "近2年", "近3年",
                        "今年来", "成立来", "自定义", "手续费"]

    today = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
    filename = "基金排行%s.xlsx" % today

    data = pd.DataFrame()

    with pd.ExcelWriter(filename) as writer:
        for key, url in url_dict.items():
            print(">>>>> Get Data for %s" % key)
            data = getData(key, url)
            print(">>>>> Save Data for %s" % key)
            data.to_excel(writer, key, encoding='utf_8_sig', index=False)

运行结果:

C:\py\venv\Scripts\python.exe C:/py/test.py
[WDM] - 

[WDM] - ====== WebDriver manager ======
>>>>> Get Data for 全部
[WDM] - Current google-chrome version is 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
[WDM] - Trying to download new driver from https://chromedriver.storage.googleapis.com/95.0.4638.54/chromedriver_win32.zip
[WDM] - Driver has been saved in cache [C:\Users\anna\.wdm\drivers\chromedriver\win32\95.0.4638.54]
全部,nums=9238,http://fund.eastmoney.com/data/fundranking.html#tall;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb
        基金代码    基金简称     日期    单位净值  ...     今年来     成立来     自定义    手续费
0     001970  泰信鑫选灵活  11-05  1.3610  ...  14.27%  36.10%  10.71%  0.15%
1     002580  泰信鑫选灵活  11-05  1.3550  ...  14.73%  34.96%  11.15%  0.00%
2     012728  国泰中证动漫  11-05  1.0495  ...     ---   4.95%  -1.73%  0.10%
3     012729  国泰中证动漫  11-05  1.0484  ...     ---   4.84%  -1.83%  0.00%
4     012769  华夏中证动漫  11-05  1.0723  ...     ---   7.23%   0.60%  0.00%
...      ...     ...    ...     ...  ...     ...     ...     ...    ...
9233  013850  同泰优选配置  11-04  1.0008  ...     ---   0.08%   0.00%  0.00%
9234  013849  同泰优选配置  11-04  1.0009  ...     ---   0.09%   0.00%  0.06%
9235  014046  交银医药创新  11-05  3.3731  ...     ---   0.00%     ---  0.00%
9236  014051  平安安盈灵活  11-05  2.8516  ...     ---  -0.73%     ---  0.00%
9237  013950  交银先锋混合  11-05  2.8005  ...     ---   0.00%     ---  0.00%

[9238 rows x 17 columns]
>>>>> Save Data for 全部
[WDM] - 

[WDM] - ====== WebDriver manager ======
>>>>> Get Data for 股票型
[WDM] - Current google-chrome version is 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
[WDM] - Driver [C:\Users\anna\.wdm\drivers\chromedriver\win32\95.0.4638.54\chromedriver.exe] found in cache
股票型,nums=1850,http://fund.eastmoney.com/data/fundranking.html#tgp;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb
        基金代码    基金简称     日期    单位净值  ...     今年来      成立来     自定义    手续费
0     012728  国泰中证动漫  11-05  1.0495  ...     ---    4.95%  -1.73%  0.10%
1     012729  国泰中证动漫  11-05  1.0484  ...     ---    4.84%  -1.83%  0.00%
2     012769  华夏中证动漫  11-05  1.0723  ...     ---    7.23%   0.60%  0.00%
3     012768  华夏中证动漫  11-05  1.0728  ...     ---    7.28%   0.65%  0.12%
4     001167  金鹰科技创新  11-05  1.3910  ...  24.75%   39.10%  31.46%  0.15%
...      ...     ...    ...     ...  ...     ...      ...     ...    ...
1845  011602  前海开源公共  11-05  0.8139  ...     ---  -18.61%  -9.01%  0.00%
1846  011601  前海开源公共  11-05  0.8159  ...     ---  -18.41%  -8.79%  0.15%
1847  013475  华宝中证智能  11-05  0.9948  ...     ---   -0.52%     ---  0.10%
1848  013476  华宝中证智能  11-05  0.9948  ...     ---   -0.52%     ---  0.00%
1849  014046  交银医药创新  11-05  3.3731  ...     ---    0.00%     ---  0.00%

[1850 rows x 17 columns]
>>>>> Save Data for 股票型
>>>>> Get Data for 混合型
[WDM] - 

[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
[WDM] - Driver [C:\Users\anna\.wdm\drivers\chromedriver\win32\95.0.4638.54\chromedriver.exe] found in cache
混合型,nums=5053,http://fund.eastmoney.com/data/fundranking.html#thh;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb
        基金代码    基金简称     日期    单位净值  ...     今年来      成立来     自定义    手续费
0     001970  泰信鑫选灵活  11-05  1.3610  ...  14.27%   36.10%  10.71%  0.15%
1     002580  泰信鑫选灵活  11-05  1.3550  ...  14.73%   34.96%  11.15%  0.00%
2     004666  长城久嘉创新  11-05  2.0266  ...  41.76%  102.66%  37.62%  0.15%
3     010052  长城久嘉创新  11-05  2.0237  ...     ---   10.19%   3.40%  0.00%
4     290011  泰信中小盘精  11-05  4.4110  ...  18.13%  416.04%  24.30%  0.15%
...      ...     ...    ...     ...  ...     ...      ...     ...    ...
5048  012639  富国智优精选  11-04  1.0132  ...     ---    1.32%   1.17%  0.00%
5049  013850  同泰优选配置  11-04  1.0008  ...     ---    0.08%   0.00%  0.00%
5050  013849  同泰优选配置  11-04  1.0009  ...     ---    0.09%   0.00%  0.06%
5051  014051  平安安盈灵活  11-05  2.8516  ...     ---   -0.73%     ---  0.00%
5052  013950  交银先锋混合  11-05  2.8005  ...     ---    0.00%     ---  0.00%

[5053 rows x 17 columns]
>>>>> Save Data for 混合型
[WDM] - 

[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
>>>>> Get Data for 债券型
[WDM] - Driver [C:\Users\anna\.wdm\drivers\chromedriver\win32\95.0.4638.54\chromedriver.exe] found in cache
债券型,nums=2133,http://fund.eastmoney.com/data/fundranking.html#tzq;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb
        基金代码    基金简称     日期    单位净值  ...      今年来      成立来     自定义    手续费
0     005717  兴业机遇债券  11-05  1.3307  ...   13.64%   41.85%   9.88%  0.08%
1     008222  兴业机遇债券  11-05  1.3614  ...   13.25%   35.46%   9.43%  0.00%
2     009512  天弘添利债券  11-05  1.3173  ...   30.03%   31.73%  18.35%  0.08%
3     164206  天弘添利债券  11-05  1.6455  ...   29.64%  164.92%  17.95%  0.00%
4     001257  兴业收益增强  11-05  1.5340  ...   11.00%   53.40%   8.21%  0.08%
...      ...     ...    ...     ...  ...      ...      ...     ...    ...
2128  005891  先锋博盈纯债  11-05  0.9343  ...  -13.65%   -6.57%  -8.21%  0.00%
2129  006147  宝盈融源可转  11-05  1.1748  ...   -1.30%   17.48%   4.62%  0.08%
2130  006148  宝盈融源可转  11-05  1.1671  ...   -1.55%   16.71%   4.31%  0.00%
2131  006831  鹏扬利沣短债    ---     ---  ...      ---    0.00%     ---  0.00%
2132  011955  招商招祥纯债    ---     ---  ...      ---    0.00%     ---     --

[2133 rows x 17 columns]
>>>>> Save Data for 债券型
>>>>> Get Data for 指数型
[WDM] - 

[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 95.0.4638
[WDM] - Get LATEST driver version for 95.0.4638
[WDM] - Driver [C:\Users\anna\.wdm\drivers\chromedriver\win32\95.0.4638.54\chromedriver.exe] found in cache
指数型,nums=1354,http://fund.eastmoney.com/data/fundranking.html#tzs;c0;r;szzf;pn10000;ddesc;qsd20201031;qed20211031;qdii;zq;gg;gzbd;gzfs;bbzt;sfbb
        基金代码    基金简称     日期    单位净值  ...      今年来      成立来      自定义    手续费
0     012728  国泰中证动漫  11-05  1.0495  ...      ---    4.95%   -1.73%  0.10%
1     012729  国泰中证动漫  11-05  1.0484  ...      ---    4.84%   -1.83%  0.00%
2     012769  华夏中证动漫  11-05  1.0723  ...      ---    7.23%    0.60%  0.00%
3     012768  华夏中证动漫  11-05  1.0728  ...      ---    7.28%    0.65%  0.12%
4     004752  广发中证传媒  11-05  0.7421  ...  -13.10%  -25.79%  -22.10%  0.12%
...      ...     ...    ...     ...  ...      ...      ...      ...    ...
1349  502023  鹏华国证钢铁  11-05  1.6850  ...   28.72%   20.20%   46.35%  0.12%
1350  008189  国泰中证钢铁  11-05  1.5119  ...   27.68%   51.19%   45.22%  0.10%
1351  008190  国泰中证钢铁  11-05  1.5040  ...   27.37%   50.40%   44.80%  0.00%
1352  013475  华宝中证智能  11-05  0.9948  ...      ---   -0.52%      ---  0.10%
1353  013476  华宝中证智能  11-05  0.9948  ...      ---   -0.52%      ---  0.00%

[1354 rows x 17 columns]
>>>>> Save Data for 指数型

进程已结束,退出代码为 0
  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值