python计算库存_许多数据帧上的高效Python Pandas Stock Beta计算

我还想再等等

于 2020-12-24 18:17:48 发布

阅读量427

点赞数

文章标签： python计算库存

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_29663547/article/details/111972091

版权

博主正在尝试使用Python的Pandas库计算大量CSV库存数据的滚动12个月贝塔系数，但发现代码执行速度较慢，耗时超过2.5小时。相比之下，相同计算在SQL中仅需3分钟。博主寻求提高代码性能的方法，尤其是希望避免逐行遍历，以匹配SQL的效率。目前的代码包括导入CSV数据、计算每日回报率以及使用自定义的`rolling_apply`函数计算贝塔系数。

摘要由CSDN通过智能技术生成

我有很多(4000)CSV的库存数据(日期,开放,高,低,关闭),我将其导入单个Pandas数据帧以执行分析.我是python的新手,想要为每个股票计算一个滚动的12个月测试版,我找到了一个计算滚动测试版的帖子(

Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion),但是当我在下面的代码中使用时需要超过2.5小时！考虑到我可以在3分钟内在SQL表中运行完全相同的计算,这太慢了.

如何提高下面的代码的性能以匹配SQL的性能？我理解Pandas / python有这种能力.我当前的方法遍历每行,我知道这会降低性能,但我不知道在数据帧上执行滚动窗口beta计算的任何聚合方式.

注意：将CSV加载到单个数据帧并计算每日返回的前两个步骤只需要大约20秒.我的所有CSV数据帧都存储在名为“FilesLoaded”的字典中,其名称为“XAO”.

非常感谢您的帮助！

谢谢：)

import pandas as pd, numpy as np

import datetime

import ntpath

pd.set_option('precision',10) #Set the Decimal Point precision to DISPLAY

start_time=datetime.datetime.now()

MarketIndex = 'XAO'

period = 250

MinBetaPeriod = period

# ***********************************************************************************************

# CALC RETURNS

# ***********************************************************************************************

for File in FilesLoaded:

FilesLoaded[File]['Return'] = FilesLoaded[File]['Close'].pct_change()

# ***********************************************************************************************

# CALC BETA

# ***********************************************************************************************

def calc_beta(df):

np_array = df.values

m = np_array[:,0] # market returns are column zero from numpy array

s = np_array[:,1] # stock returns are column one from numpy array

covariance = np.cov(s,m) # Calculate covariance between stock and market

beta = covariance[0,1]/covariance[1,1]

return beta

#Build Custom "Rolling_Apply" function

def rolling_apply(df, period, func, min_periods=None):

if min_periods is None:

min_periods = period

result = pd.Series(np.nan, index=df.index)

for i in range(1, len(df)+1):

sub_df = df.iloc[max(i-period, 0):i,:]

if len(sub_df) >= min_periods:

idx = sub_df.index[-1]

result[idx] = func(sub_df)

return result

#Create empty BETA dataframe with same index as RETURNS dataframe

df_join = pd.DataFrame(index=FilesLoaded[MarketIndex].index)

df_join['market'] = FilesLoaded[MarketIndex]['Return']

df_join['stock'] = np.nan

for File in FilesLoaded:

df_join['stock'].update(FilesLoaded[File]['Return'])

df_join = df_join.replace(np.inf, np.nan) #get rid of infinite values "inf" (SQL won't take "Inf")

df_join = df_join.replace(-np.inf, np.nan)#get rid of infinite values "inf" (SQL won't take "Inf")

df_join = df_join.fillna(0) #get rid of the NaNs in the return data

FilesLoaded[File]['Beta'] = rolling_apply(df_join[['market','stock']], period, calc_beta, min_periods = MinBetaPeriod)

# ***********************************************************************************************

# CLEAN-UP

# ***********************************************************************************************

print('Run-time: {0}'.format(datetime.datetime.now() - start_time))

我还想再等等

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。