天天基金网批量爬虫实例

Quantoday

已于 2023-08-23 18:44:13 修改

阅读量971

点赞数 1

分类专栏：量化数据文章标签：爬虫 python beautifulsoup pandas

于 2023-08-23 18:42:59 首次发布

本文链接：https://blog.csdn.net/swang979/article/details/132457031

版权

量化数据专栏收录该内容

1 篇文章 0 订阅

订阅专栏

本文介绍批量爬天天基金网的基金信息，并保存在本地。

概要

天天基金网最近发生了一些变化，针对宽基指数挂钩的产品还提供估算，其他许多基金都不给提供净值估算服务了。本文针对前者进行爬虫。

适配对象

目前对宽基挂钩的指数还保留净值估算服务。界面如下：

代码

import requests
from bs4 import BeautifulSoup
import pandas as pd
from tabulate import tabulate

还是用用爬虫经典包，没有的自行pip一下。

下面自己定义一个函数，为了批量爬做准备：

def get_fund_info(code):
    # 构造请求URL
    url = f"http://fund.eastmoney.com/{code}.html"
    # 发送HTTP GET请求
    response = requests.get(url)
    # 解析HTML
    soup = BeautifulSoup(response.content.decode('utf-8', 'ignore'), 'html.parser')
    # 获取累计净值
    accumulative_value = soup.find(class_='dataItem03').find_all('span')[2].text
    #获取近六个月
    sixmonth_value = soup.find(class_='dataItem03').find_all('span')[4].text
    #获取成立以来
    fromstart_value = soup.find(class_='dataItem03').find_all('span')[6].text
    # 获取单位净值
    unit_value = soup.find(class_='dataItem02').find_all('span')[2].text
     # 获取今日波动净值
    todayvol  = soup.find(class_='dataItem02').find_all('span')[3].text   
   # 获取近3个月
    threemonth_value  = soup.find(class_='dataItem02').find_all('span')[5].text
    # 获取近1个月
    onemonth_value  = soup.find(class_='dataItem01').find_all('span')[9].text
    # 获取近1年
    oneyear_value  = soup.find(class_='dataItem01').find_all('span')[11].text
    #基金信息
    jjjl =  soup.find(class_='infoOfFund').find_all('a')[2]
    jjjl = jjjl.get_text(strip=True) if jjjl else "未知"
    jjglr =  soup.find(class_='infoOfFund').find_all('a')[3]
    jjglr = jjglr.get_text(strip=True) 
    return accumulative_value, unit_value, sixmonth_value ,todayvol ,threemonth_value, fromstart_value ,onemonth_value , oneyear_value , jjjl, jjglr

试一试

# 输入基金代码
fund_code = "013233"

# 获取基金信息
accumulative_value, unit_value ,sixmonth_value, todayvol,threemonth_value, fromstart_value, onemonth_value , oneyear_value ,jjjl ,jjglr = get_fund_info(fund_code)

print(fund_code,jjjl, jjglr)
print(f"单位净值：{unit_value}")
print(f"今日波动：{todayvol}")
print(f"近1月：{onemonth_value}")
print(f"近3月：{threemonth_value}")
print(f"近6月：{sixmonth_value}")
print(f"近1年：{oneyear_value}") 
print(f"成立以来：{fromstart_value}")
print(f"累计净值：{accumulative_value}")

结果和第一张图对比了一下，完全准确。下一步批量爬数据，并乖乖形成dataframe。

df = pd.DataFrame(columns=['基金代码', '单位净值','今日波动', '近一月','近三月', '近六月', '近一年',  '成立以来',  '累计净值' ,'基金经理', '基金公司'])

fund_codes = [  '013233', '010992','003579'] #自己去设定list，注意是宽基类基金

for code in fund_codes:
    accumulative_value, unit_value, sixmonth_value, todayvol, threemonth_value, fromstart_value, onemonth_value, oneyear_value, jjjl, jjglr = get_fund_info(code)
    df = pd.concat([df, pd.DataFrame({
        '基金代码': [code],
        '单位净值': [unit_value],
        '今日波动': [todayvol],
        '近一月': [onemonth_value],
        '近三月': [threemonth_value],
        '近六月': [sixmonth_value],
        '近一年': [oneyear_value],
        '成立以来': [fromstart_value],
        '累计净值': [accumulative_value],
        '基金经理': [jjjl],
        '基金公司': [jjglr]
    })], ignore_index=True)

保存到本地

# 将DataFrame保存为逗号分隔的CSV文件
output_file = 'output20230823.csv'
df.to_csv(output_file, sep=',', index=True)
print(f"数据已保存为CSV文件：{output_file}")

忘记保存在哪里了？没关系，查一下路径

import os
os.getcwd()

或者直接搜索文件名，打开路径。

以下是我打开Excel的表格。

	基金代码	单位净值	今日波动	近一月	近三月	近六月	近一年	成立以来	累计净值	基金经理	基金公司
0	013233	0.9549	0.41%	-2.08%	-3.05%	-3.42%	-0.28%	-4.51%	0.9549	孙蒙	华夏基金
1	010992	1.004	0.52%	-2.21%	-3.70%	-6.78%	-7.31%	0.40%	1.004	姚楠燕	东财基金
2	003579	1.5722	0.69%	-0.59%	-2.25%	-4.13%	-6.24%	47.74%	1.5722	耿帅军	中金基金