4.4 基金排行数据（Python）

ibun.song

已于 2022-11-30 09:01:47 修改

阅读量529

点赞数

分类专栏： Python 文章标签： python 开发语言爬虫

于 2022-11-30 00:39:02 首次发布

本文链接：https://blog.csdn.net/qq_40805441/article/details/128107120

版权

Python 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

获取基金排行数据（Python）

目录
- 一、分析网站

Request URL: 	http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=1nzf&st=desc&sd=2021-11-29&ed=2022-11-29&qdii=&tabSubtype=,,,,,&pi=2&pn=50&dx=1&v=0.6643416321523266
Request Method: GET





Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive
Cookie: ASP.NET_SessionId=i03nvep4hyyah3hicksk5u2p; st_si=89173127450265; st_asi=delete; _adsame_fullscreen_14694=1; qgqp_b_id=f8e698e18501714515c96820b888266d; st_pvi=58560174179267; st_sp=2022-11-29%2023%3A50%3A25; st_inirUrl=https%3A%2F%2Fwww.1234567.com.cn%2F; st_sn=8; st_psi=20221130003920296-112200312936-2858370981
Host: fund.eastmoney.com
Referer: http://fund.eastmoney.com/data/fundranking.html
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36

因为数据是分页的，还需要知道其他页数据的获取方式

点击< 3 >

在这里插入图片描述

# 文本在线比较
https://www.osgeo.cn/app/sb134

在这里插入图片描述

可以看到url中有两个参数的值不一样 pi 和 v
v参数可以忽略
pi的值 第2页是2， 第3页是3

确定请求多页数据的api

# {page} 是个参数，代表第page页数据
http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=1nzf&st=desc&sd=2021-11-29&ed=2022-11-29&qdii=&tabSubtype=,,,,,&pi={page}&pn=50&dx=1

3. 示例代码

"""
[课  题]: Python爬取天天基金股票信息

[知识点]:
    requests发送请求
    开发者工具的使用
    json类型数据解析
    正则表达式的使用

[模块安装]: 按住键盘 win + r, 输入cmd回车 打开命令行窗口, 在里面输入 pip install 模块名
          或 打开终端 输入命令: pip install 模块名
          eg: pip install requests

[开发环境]：
    版  本：python  3.9
    编辑器：pycharm 2022.1.4

"""
import requests     # 发送请求 第三模块
import re
import csv

# 伪装: 请求头
# re 替换
# 1. 选中我们要替换内容
# 2. 按住键盘 ctrl + r
# 3. 在第一个框中 填写 (.*?): (.*)
# 4. 在第二个框中 填写 "$1": "$2",
# 5. 点亮星星 全部替换
headers = {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Connection": "keep-alive",
    "Cookie": "ASP.NET_SessionId=i03nvep4hyyah3hicksk5u2p; st_si=89173127450265; st_asi=delete; _adsame_fullscreen_14694=1; qgqp_b_id=f8e698e18501714515c96820b888266d; st_pvi=58560174179267; st_sp=2022-11-29%2023%3A50%3A25; st_inirUrl=https%3A%2F%2Fwww.1234567.com.cn%2F; st_sn=8; st_psi=20221130003920296-112200312936-2858370981",
    "Host": "fund.eastmoney.com",
    "Referer": "http://fund.eastmoney.com/data/fundranking.html",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36}"
}

# 爬取1~5页的数据
for page in range(1, 6):
    print(f'-------------------------正在爬取第{page}页内容-----------------------')
    # 后台实现接口逻辑有问题可能会导致我们爬取数据有很多重复值
    url = f'http://fund.eastmoney.com/data/rankhandler.aspx?op=ph&dt=kf&ft=all&rs=&gs=0&sc=1nzf&st=desc&sd=2021-11-29&ed=2022-11-29&qdii=&tabSubtype=,,,,,&pi={page}&pn=50&dx=1'
    # 1. 发送请求
    # <Response [200]>: 发送请求成功结果
    response = requests.get(url=url, headers=headers)
    # 2. 获取数据
    data = response.text
    # 3. 解析数据 筛选数据 re
    # 第一个是我们正则表达式语法  第二个就是我们需要在哪里匹配
    data_str = re.findall('\[(.*?)\]', data)[0]
    # 4. 保存数据
    # 表格当中
    # 数据转型
    # 列表 元组
    # eval 可以帮助我们把字符串转变为 列表/字典/元组/整数类型/boolean/浮点类型...
    tuple_data = eval(data_str)
    for td in tuple_data:
        # 把td 变成列表
        td_list = td.split(',')
        # 4. 保存数据到 data.csv 中, 会在当前文件夹下自动新建data.csv,mode='a'属性决定如果文件存在则继续写入数据
        with open('data.csv', mode='a', encoding='utf-8', newline='') as f:
            csv_write = csv.writer(f)
            csv_write.writerow(td_list)
        print(td)