Python爬虫

EA开发-青衫码客

已于 2022-07-20 16:02:11 修改

阅读量709

点赞数

文章标签： python 爬虫开发语言

于 2022-03-13 00:35:23 首次发布

本文链接：https://blog.csdn.net/godofnight/article/details/123453300

版权

这段代码展示了如何使用Python的requests库来抓取东方财经网站上的基金排名数据。通过设置headers以模拟浏览器行为，并利用随机数生成参数进行防爬处理，程序获取了前两页的基金数据并存储到content_list中。每个基金数据包括一系列的属性如排名、收益率等。

摘要由CSDN通过智能技术生成

import requests
import random

url = 'http://fund.eastmoney.com/data/rankhandler.aspx'

headers={
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Cookie': 'qgqp_b_id=24f527c7b8ec9f0e4624fa644a994120; ' \
          'EMFUND1=null; EMFUND2=null; EMFUND3=null; EMFUND4=null; EMFUND5=null; ' \
          'AUTH_FUND.EASTMONEY.COM_GSJZ=AUTH*TTJJ*TOKEN; EMFUND0=null; ' \
          'EMFUND6=03-07%2014%3A13%3A56@%23%24%u91D1%u4FE1%u6C11%u5174%u503A%u5238A@%23%24004400; ' \
          'EMFUND7=03-07%2021%3A45%3A24@%23%24%u540C%u6CF0%u5927%u5065%u5EB7%u4E3B%u9898%u6DF7%u5408A@%23%24011002; ' \
          'EMFUND8=03-07%2021%3A35%3A45@%23%24%u524D%u6D77%u5F00%u6E90%u6CAA%u6E2F%u6DF1%u519C%u4E1A%u6DF7%u5408A@%23%24164403; ' \
          'EMFUND9=03-10 16:52:24@#$%u6613%u65B9%u8FBE%u7B56%u7565%u6210%u957F%u6DF7%u5408@%23%24110002; em_hq_fls=js; ' \
          'HALis=a-sz-300059-%u4E1C%u65B9%u8D22%u5BCC%2Ca-sz-002667-%u978D%u91CD%u80A1%u4EFD; _adsame_fullscreen_16928=1; ' \
          'st_si=62786453084976; st_asi=delete; ASP.NET_SessionId=c0djrvxhnnjzfgrsummunfcu; st_pvi=34135657244968; ' \
          'st_sp=2022-03-06%2023%3A31%3A38; st_inirUrl=https%3A%2F%2Fwww.baidu.com%2Flink; st_sn=8; st_psi=20220312205308995-112200304021-2152809199',
'Host': 'fund.eastmoney.com',
'Referer': 'http://fund.eastmoney.com/data/fundranking.html',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'}

content_list=[]
for i in range(1,3):   #获取前两页
    v_value = random.random()
    params={'op':'ph','dt':'kf','ft':'all','rs':'','gs':'0',
        'sc':'rzdf','st':'desc','sd':'2021-03-12',
        'ed':'2022-03-12','qdii':'','tabSubtype':',,,,,',
        'pi':i,'pn':'50','dx':'1','v':v_value}

    response=requests.get(url=url,headers=headers,params=params)
    response.encode = 'utf8'
    print(response.status_code)
    content=response.text

    content=content[:content.find('"]')].replace("var rankData = {datas:[", "")
    for i in content.split('"'):
        if len(i.split(',')) < 10:
            continue
        order = i.split(',')
        content_list.append(order)

for i in content_list:
    print(i)