概述
本项目旨在利用深度强化学习 (DRL) 创建一个自动化股票交易系统。通过应用先进的机器学习技术,该系统将接受训练,以便在股票市场做出有利可图的交易决策。项目内容包括下载历史股票数据,使用技术指标进行增强,开发模拟交易环境,以及训练多个 DRL 代理,例如 PPO、A2C、DDPG、SAC、TD3 和集成代理。我们将使用各种财务指标来评估和比较这些代理的性能。
深度强化学习(DRL)
强化学习背景下的深度学习涉及代理与环境交互,通过反复试验来学习最佳行为。该领域的主要术语和概念对于理解这些系统的运作方式至关重要:
- 环境:代理与之交互的外部系统。它提供了一个可以采取行动并接收响应(以观察和奖励的形式)的场景。示例包括游戏世界、机器人系统或任何进行学习的模拟场景。
- 状态:表示当前情况或环境配置。状态捕获决策所需的所有必要信息。在游戏中,状态可能包含所有角色和物体的位置。
- 观察值:智能体从环境中感知到的数据。观察值可以是状态的部分或全部表示。在许多情况下,智能体无法访问完整的状态,必须依靠观察值来推断。
- 动作:代理根据当前状态或观察结果做出的决策或移动。动作会改变环境状态。代理可以采取的所有可能动作的集合称为动作空间。
- 步骤:代理与环境交互周期中的一次迭代。在每个步骤中,代理根据其当前策略采取行动,接收来自环境的观察和奖励,并转换到新状态。
- 策略:代理在给定状态或观察值的情况下决定采取何种行动的策略。策略可以是确定性的(对于给定状态始终采取相同的行动),也可以是随机性的(根据概率分布选择行动)。策略可以是确定性的(对于给定状态始终采取相同的动作)或随机性的(根据概率分布选择动作)。
- 奖励:代理执行操作后收到的标量反馈信号。奖励量化了该操作的直接收益,并用于强化期望行为。代理的目标是最大化随时间推移的累积奖励。
- 情节:从初始状态到终止状态的一系列步骤。当满足预定条件(例如达到目标或时间耗尽)时,情节结束。
在强化学习中,代理的目标是通过反复与环境交互、采取行动并根据收到的奖励调整策略,来学习最大化累积奖励的策略。这些要素的结合使代理能够发展复杂的行为,并通过经验提升其性能。
以下是常见的强化学习算法 PPO、A2C、DDPG、SAC 和 TD3(又称“DRL 代理”)的简要概述:
DRL 代理比较
执行
要开启深度强化学习在股票自动交易中的应用,我们首先需要收集必要的数据。第一步是加载历史股票数据,这些数据是我们交易模型的基础。
在这个项目中,我们使用 Python 库(例如numpy
、pandas
和)yfinance
来获取和处理股票市场数据。具体来说,我们专注于道琼斯 30 指数,这是一份包含 30 只知名股票的列表。我们利用雅虎财经下载这些股票的历史数据,时间跨度从 2009 年 1 月 1 日到 2020 年 5 月 8 日。这些数据对于训练和测试我们的强化学习模型至关重要。通过创建字典来存储这些数据,我们确保在整个项目中都能高效、有序地访问。这种设置使我们能够在将股票数据输入到交易算法之前对其进行分析和预处理。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#aa0d91">导入</span>numpy<span style="color:#aa0d91">作为</span>np
<span style="color:#aa0d91">导入</span>pandas<span style="color:#aa0d91">作为</span>pd
<span style="color:#aa0d91">导入</span>yfinance<span style="color:#aa0d91">作为</span>yf
<span style="color:#aa0d91">导入</span>gymnasium<span style="color:#aa0d91">作为</span>gym
<span style="color:#aa0d91">从</span>gymnasium<span style="color:#aa0d91">导入</span>空间
<span style="color:#aa0d91">导入</span>matplotlib.pyplot<span style="color:#aa0d91">作为</span>plt
<span style="color:#aa0d91">从</span>stable_baselines3<span style="color:#aa0d91">导入</span>PPO、A2C、DDPG、SAC、TD3
<span style="color:#aa0d91">从</span>stable_baselines3.common.vec_env<span style="color:#aa0d91">导入</span>DummyVecEnv
<span style="color:#aa0d91">从</span>stable_baselines3.common.callbacks<span style="color:#aa0d91">导入</span>BaseCallback
<span style="color:#007400"># 道琼斯 30 种股票列表</span>
tickers = [
<span style="color:#c41a16">'MMM'</span> , <span style="color:#c41a16">'AXP'</span> , <span style="color:#c41a16">'AAPL'</span> , <span style="color:#c41a16">'BA'</span> , <span style="color:#c41a16">'CAT'</span> , <span style="color:#c41a16">'CVX'</span> , <span style="color:#c41a16">'CSCO'</span> , <span style="color:#c41a16">'KO'</span> , <span style="color:#c41a16">'DIS'</span> , <span style="color:#c41a16">'DOW'</span> ,
<span style="color:#c41a16">'GS'</span> , <span style="color:#c41a16">'HD'</span> , <span style="color:#c41a16">'IBM'</span> , <span style="color:#c41a16">'INTC'</span> , <span style="color:#c41a16">'JNJ'</span> , <span style="color:#c41a16">'JPM'</span> , <span style="color:#c41a16">'MCD'</span> , <span style="color:#c41a16">'MRK'</span> , <span style="color:#c41a16">'MSFT'</span> , <span style="color:#c41a16">'NKE'</span> ,
<span style="color:#c41a16">'PFE'</span> , <span style="color:#c41a16">'PG'</span> , <span style="color:#c41a16">'TRV'</span> , <span style="color:#c41a16">'UNH'</span> , <span style="color:#c41a16">'UTX'</span> , <span style="color:#c41a16">'VZ'</span> , <span style="color:#c41a16">'V'</span> , <span style="color:#c41a16">'WBA'</span> , <span style="color:#c41a16">'WMT'</span> , <span style="color:#c41a16">'XOM'</span>
]
tickers.remove( <span style="color:#c41a16">'DOW'</span> )
tickers.remove( <span style="color:#c41a16">'UTX'</span> )
<span style="color:#007400"># 从雅虎财经获取历史数据并保存到字典中</span>
<span style="color:#aa0d91">def </span> fetch_stock_data ( <span style="color:#5c2699">tickers, start_date, end_date</span> ):
stock_data = {}
<span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> tickers:
stock_data[ticker] = yf.download(ticker, start=start_date, end=end_date)
<span style="color:#aa0d91">return</span> stock_data
<span style="color:#007400"># 调用函数获取数据</span>
stock_data = fetch_stock_data(tickers, <span style="color:#c41a16">'2009-01-01'</span> , <span style="color:#c41a16">'2020-05-08'</span> )</span></span></span></span>
为了确保模型的稳健性和泛化能力,我们将历史股票数据拆分为三个不同的数据集:训练数据集、验证数据集和测试数据集。这样,我们可以先用一组数据训练模型,再用第二组数据验证其性能,最后用第三组数据测试其有效性,从而确保模型在未知数据上也能表现良好。
在本项目中,我们将 2009 年 1 月 1 日至 2015 年 12 月 31 日的数据指定为训练数据集。这是最大的数据集,用于训练我们的强化学习模型。验证数据集涵盖 2016 年 1 月 1 日至 2016 年 12 月 31 日的数据,用于微调模型并防止过拟合。最后,测试数据集涵盖 2017 年 1 月 1 日至 2020 年 5 月 8 日的数据,用于评估模型在实际场景中的表现。
我们接下来对道琼斯30指数中每只股票的数据进行相应的拆分。通过绘制苹果公司(AAPL)在这三个时期的开盘价,我们可视化了数据分布,并确保拆分正确实施。这种谨慎的划分对于开发可靠的自动交易系统至关重要。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#007400"># 将数据分成训练集、验证集和测试集</span>
training_data_time_range = ( <span style="color:#c41a16">'2009-01-01'</span> , <span style="color:#c41a16">'2015-12-31'</span> )
validation_data_time_range = ( <span style="color:#c41a16">'2016-01-01'</span> , <span style="color:#c41a16">'2016-12-31'</span> )
test_data_time_range = ( <span style="color:#c41a16">'2017-01-01'</span> , <span style="color:#c41a16">'2020-05-08'</span> )
<span style="color:#007400"># 将数据分成训练集、验证集和测试集</span>
training_data = {}
validation_data = {}
test_data = {}
<span style="color:#aa0d91">for</span> ticker, df <span style="color:#aa0d91">in</span> stock_data.items():
training_data[ticker] = df.loc[training_data_time_range[ <span style="color:#1c00cf">0</span> ]:training_data_time_range[ <span style="color:#1c00cf">1</span> ]]
validation_data[ticker] = df.loc[validation_data_time_range[ <span style="color:#1c00cf">0</span> ]:validation_data_time_range[ <span style="color:#1c00cf">1</span> ]]
test_data[ticker] = df.loc[test_data_time_range[ <span style="color:#1c00cf">0</span> ]:test_data_time_range[ <span style="color:#1c00cf">1</span> ]]
<span style="color:#007400"># 打印训练、验证和测试数据的形状</span>
ticker = <span style="color:#c41a16">'AAPL' </span>
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'- Training data shape for <span style="color:#000000">{ticker}</span> : <span style="color:#000000">{training_data[ticker].shape}</span> '</span> )
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'- Validation data shape for <span style="color:#000000">{ticker}</span> : <span style="color:#000000">{validation_data[ticker].shape}</span> '</span> )
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'- Test data shape for <span style="color:#000000">{ticker}</span> : <span style="color:#000000">{test_data[ticker].shape}</span> \n'</span> )
<span style="color:#007400"># 显示数据的前 5 行</span>
display(stock_data[ <span style="color:#c41a16">'AAPL'</span> ].head())
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">'\n'</span> )
<span style="color:#007400"># 绘图:</span>
plt.figure(figsize=( <span style="color:#1c00cf">12</span> , <span style="color:#1c00cf">4</span> ))
plt.plot(training_data[ticker].index, training_data[ticker][ <span style="color:#c41a16">'开盘价'</span> ], label= <span style="color:#c41a16">'训练'</span> , color= <span style="color:#c41a16">'蓝色'</span> )
plt.plot(validation_data[ticker].index, validation_data[ticker][ <span style="color:#c41a16">'开盘价'</span> ], label= <span style="color:#c41a16">'验证'</span> , color= <span style="color:#c41a16">'红色'</span> )
plt.plot(test_data[ticker].index, test_data[ticker][ <span style="color:#c41a16">'开盘价'</span> ], label= <span style="color:#c41a16">'测试'</span> , color= <span style="color:#c41a16">'绿色'</span> )
plt.xlabel( <span style="color:#c41a16">'日期'</span> )
plt.ylabel( <span style="color:#c41a16">'值'</span> )
plt.title( <span style="color:#c41a16">f' <span style="color:#000000">{ticker}</span>股票, 开盘价'</span> )
plt.legend()
plt.show()</span></span></span></span>
相应结果如下:
数据准备
接下来,我们用各种对交易策略制定至关重要的技术指标来丰富我们的数据集。我们计算了几个关键指标:
- MACD(移动平均线收敛散度):该指标计算12天和26天的指数移动平均线(EMA)来确定MACD线,然后将9天EMA应用于MACD线以生成信号线。MACD和信号线有助于根据它们的交叉点识别潜在的买入或卖出信号。
- RSI(相对强弱指数):我们以14天为一个时间段来计算RSI,以衡量价格变动的动量。该指标通过衡量价格变动的速度和变化,帮助识别超买或超卖状况。
- CCI(商品通道指数):该指标评估价格与其平均价格之间的偏差,有助于识别新趋势或极端情况。我们使用 20 天的窗口期来计算 CCI。
- ADX(平均方向指数):为了衡量趋势的强度,我们使用14天的窗口期来计算ADX。这涉及计算方向运动(DM)指标和平均真实波动幅度(ATR),以确定ADX值。
通过添加这些指标,我们将原始股价数据转换为一个功能丰富的数据集,从而更好地捕捉市场趋势和价格动态。然后,这个增强的数据集将用于训练和评估我们的强化学习模型。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424">def <span style="color:#5c2699">add_technical_indicators</span> (df):
df = df.copy <span style="color:#5c2699">(</span> )
# 计算MACD 的EMA <span style="color:#1c00cf">12</span>和<span style="color:#1c00cf">26</span>
df.loc[:, <span style="color:#c41a16">'EMA12'</span> ] = df[ <span style="color:#c41a16">'Close'</span> ]. <span style="color:#5c2699">ewm</span> (span= <span style="color:#1c00cf">12</span> , Adjust=False). <span style="color:#5c2699">mean</span> ()
df.loc[:, <span style="color:#c41a16">'EMA26'</span> ] = df[ <span style="color:#c41a16">'Close'</span> ]. <span style="color:#5c2699">ewm</span> (span= <span style="color:#1c00cf">26</span> , Adjust=False). <span style="color:#5c2699">mean</span> ()
df.loc[:, <span style="color:#c41a16">'MACD'</span> ] = df[ <span style="color:#c41a16">'EMA12'</span> ] - df[ <span style="color:#c41a16">'EMA26'</span> ]
df.loc[:, <span style="color:#c41a16">'Signal'</span> ] = df[ <span style="color:#c41a16">'MACD'</span> ]. <span style="color:#5c2699">ewm</span> (span= <span style="color:#1c00cf">9</span> , Adjust=False).<span style="color:#5c2699">平均值</span>()
#计算 RSI <span style="color:#1c00cf">14</span>
rsi_14_mode = True
delta = df[ <span style="color:#c41a16">'Close'</span> ]. <span style="color:#5c2699">diff</span>()
如果 rsi_14_mode:
增益 = (delta。<span style="color:#5c2699">其中</span>(delta > <span style="color:#1c00cf">0</span> , <span style="color:#1c00cf">0</span> )). <span style="color:#5c2699">rolling</span> (window= <span style="color:#1c00cf">14</span> ) <span style="color:#5c2699">.mean</span>()
损失 = (-delta。<span style="color:#5c2699">其中</span>(delta < <span style="color:#1c00cf">0</span> , <span style="color:#1c00cf">0</span> )). <span style="color:#5c2699">rolling</span> (window= <span style="color:#1c00cf">14</span> ) <span style="color:#5c2699">.mean</span>()
rs = 增益/损失
其他:
up = delta。<span style="color:#5c2699">其中</span>(delta > <span style="color:#1c00cf">0</span> , <span style="color:#1c00cf">0</span> )
down = -delta。<span style="color:#5c2699">其中</span>(delta < <span style="color:#1c00cf">0</span> , <span style="color:#1c00cf">0</span> )
rs = up. <span style="color:#5c2699">rolling</span> (window= <span style="color:#1c00cf">14</span> ) <span style="color:#5c2699">.mean</span>()/ down. <span style="color:#5c2699">rolling</span> (window= <span style="color:#1c00cf">14</span> )。<span style="color:#5c2699">mean</span> ()
df.loc[:, <span style="color:#c41a16">'RSI'</span> ] = <span style="color:#1c00cf">100</span> - (<span style="color:#1c00cf">100</span> / (<span style="color:#1c00cf">1</span> + rs ))
# 计算 CCI <span style="color:#1c00cf">20</span>
tp = (df[ <span style="color:#c41a16">'High'</span> ] + df[ <span style="color:#c41a16">'Low'</span> ] + df[ <span style="color:#c41a16">'Close'</span> ]) / <span style="color:#1c00cf">3</span>
sma_tp = tp.rolling <span style="color:#5c2699">(</span> window= <span style="color:#1c00cf">20</span> ) <span style="color:#5c2699">.mean</span> ()
mean_dev = tp.rolling <span style="color:#5c2699">(</span> window= <span style="color:#1c00cf">20</span> ) <span style="color:#5c2699">.apply</span> (lambda x: <span style="color:#5c2699">np.mean </span><span style="color:#5c2699">(</span> np.abs (x - x.<span style="color:#5c2699">意思是</span>())))
df.loc[:, <span style="color:#c41a16">'CCI'</span> ] = (tp - sma_tp) / ( <span style="color:#1c00cf">0.015</span> * mean_dev)
# 计算ADX <span style="color:#1c00cf">14</span>
high_diff = df[ <span style="color:#c41a16">'High'</span> ]. <span style="color:#5c2699">diff</span> ()
low_diff = df[ <span style="color:#c41a16">'Low'</span> ]. <span style="color:#5c2699">diff</span> ()
df.loc[:, <span style="color:#c41a16">'+DM'</span> ] = np. <span style="color:#5c2699">where</span> ((high_diff > low_diff) & (high_diff > <span style="color:#1c00cf">0</span> ), high_diff, <span style="color:#1c00cf">0</span> )
df.loc[:, <span style="color:#c41a16">'-DM'</span> ] = np. <span style="color:#5c2699">where</span> ((low_diff > high_diff) & (low_diff > <span style="color:#1c00cf">0</span> ), low_diff, <span style="color:#1c00cf">0</span> )
tr = pd. <span style="color:#5c2699">concat</span>([df[ <span style="color:#c41a16">'High'</span> ] - df[ <span style="color:#c41a16">'Low'</span> ],np.abs <span style="color:#5c2699">(</span> df[ <span style="color:#c41a16">'High'</span> ] - df[ <span style="color:#c41a16">'Close'</span> ]. <span style="color:#5c2699">shift</span>(<span style="color:#1c00cf">1</span>)),np.abs <span style="color:#5c2699">(</span> df[ <span style="color:#c41a16">'Low'</span> ] - df[ <span style="color:#c41a16">'Close'</span> ]. <span style="color:#5c2699">shift</span>(<span style="color:#1c00cf">1</span>))],axis= <span style="color:#1c00cf">1 </span><span style="color:#5c2699">)</span>。max (axis= <span style="color:#1c00cf">1</span>)<span style="color:#5c2699">atr</span>
= tr.ewm (span= <span style="color:#1c00cf">14</span>,adjust=False)。<span style="color:#5c2699">Mean</span> () df.loc[:, <span style="color:#c41a16">'+DI'</span> ] = <span style="color:#1c00cf">100</span> * (df[ <span style="color:#c41a16">'+DM'</span> ] <span style="color:#5c2699">.ewm</span> (span= <span style="color:#1c00cf">14</span> , adjustment=False) <span style="color:#5c2699">.mean</span> () / atr) df.loc[:, <span style="color:#c41a16">'-DI'</span> ] = <span style="color:#1c00cf">100</span> * (df[ <span style="color:#c41a16">'-DM'</span> ] <span style="color:#5c2699">.ewm</span> (span= <span style="color:#1c00cf">14</span> , adjustment=False) <span style="color:#5c2699">.mean</span> () / atr) dx = <span style="color:#1c00cf">100</span> * np。<span style="color:#5c2699">绝对</span>(df[ <span style="color:#c41a16">'+DI'</span> ] - df[ <span style="color:#c41a16">'-DI'</span> ]) / (df[ <span style="color:#c41a16">'+DI'</span> ] + df[ <span style="color:#c41a16">'-DI'</span> ]) df.loc[:, <span style="color:#c41a16">'ADX'</span> ] = dx. <span style="color:#5c2699">ewm</span>(跨度= <span style="color:#1c00cf">14</span>,调整=假)。<span style="color:#5c2699">mean</span> () # 删除 NaN 值 df. <span style="color:#5c2699">dropna</span> (inplace=True) # 仅保留必需的列 df = df[[ <span style="color:#c41a16">'Open'</span> , <span style="color:#c41a16">'High'</span> , <span style="color:#c41a16">'Low'</span> , <span style="color:#c41a16">'Close'</span> , <span style="color:#c41a16">'Volume'</span> , <span style="color:#c41a16">'MACD'</span> , <span style="color:#c41a16">'Signal'</span> , <span style="color:#c41a16">'RSI'</span> ,<span style="color:#c41a16">'CCI'</span> , <span style="color:#c41a16">'ADX'</span> ]] 返回df
#----------------------------------------------------------------------------------------
# 为每只股票的训练数据添加技术指标
for ticker, df in training_data.items <span style="color:#5c2699">(</span> ):
training_data[ticker] = <span style="color:#5c2699">add_technical_indicators</span> (df)
# 为每只股票的验证数据添加技术指标
for ticker, df in validation_data.items (): <span style="color:#5c2699">validation_data</span>
[ticker] = <span style="color:#5c2699">add_technical_indicators</span> (df)
# 为每只股票的测试数据添加技术指标
for ticker, df in test_data. <span style="color:#5c2699">items</span> ():
test_data[ticker] = <span style="color:#5c2699">add_technical_indicators</span> (df) # 打印数据的
前<span style="color:#1c00cf">5行</span>
<span style="color:#5c2699">print</span> (f <span style="color:#c41a16">'- {ticker} 的训练数据形状:{training_data[ticker].shape}'</span> )
<span style="color:#5c2699">print</span> (f <span style="color:#c41a16">'- {ticker} 的验证数据形状:{validation_data[ticker].shape}'</span> )
<span style="color:#5c2699">print</span> (f <span style="color:#c41a16">'- {ticker} 的测试数据形状:{test_data[ticker].shape}\n'</span> )
<span style="color:#5c2699">display</span> (test_data[ticker]. <span style="color:#5c2699">head</span> ())</span></span></span></span>
相应结果如下:
技术指标提取
在下一节中,我们将使用 OpenAI Gym 框架为我们的强化学习模型定义一个自定义交易环境。该环境模拟股票交易,并允许智能体通过买入、卖出或持有股票等操作与市场进行互动。
环境的主要特征:
- 初始化:使用历史股票数据初始化环境,并设置各种参数,包括行动和观察空间、交易成本以及账户变量,如余额、净值和持有的股票。
- 观察空间:在每一步中,环境都会提供全面的状态信息,包括当前股价、账户余额、持股数量、净值和其他相关指标。这个观察空间对于智能体做出明智的决策至关重要。
- 行动空间:行动空间定义为一个连续空间,代理可以决定投资组合中每只股票的买入或卖出比例。正值表示买入行为,负值表示卖出行为。
- 步进函数:该
step
函数执行代理的操作,更新账户余额和持有的股份,计算新的净值并确定奖励。它还管理交易成本,并检查是否应根据最大步数或净值是否低于零来结束情节。 - 渲染:该
render
函数提供当前状态的人类可读的输出,包括步数、余额、持有股份、净值和利润。 - 重置:该
reset
函数重新初始化新情节的环境,确保代理从初始条件和数据开始。
该定制环境旨在紧密模拟现实世界的交易场景,为强化学习代理提供学习和优化交易策略所需的工具。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#aa0d91">class </span> StockTradingEnv (gym.Env):
metadata = { <span style="color:#c41a16">'render_modes'</span> : [ <span style="color:#c41a16">'human'</span> ]}
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, stock_data, transaction_cost_percent= <span style="color:#1c00cf">0.005</span></span> ):
<span style="color:#5c2699">super</span> (StockTradingEnv, self).__init__()
<span style="color:#c41a16">"""
此函数使用股票数据初始化环境并设置必要的变量:
- 动作和观察空间:定义动作空间(买入/卖出/持有)和
观察空间(股票价格、余额、持股、净值等)。
- 账户变量:初始化余额、净值、持股和交易成本。
""" </span>
<span style="color:#007400"># 删除任何空的 DataFrames</span>
self.stock_data = {ticker: df <span style="color:#aa0d91">for</span> ticker, df <span style="color:#aa0d91">in</span> stock_data.items() <span style="color:#aa0d91">if </span> <span style="color:#aa0d91">not</span> df.empty}
self.tickers = <span style="color:#5c2699">list</span> (self.stock_data.keys())
<span style="color:#aa0d91">if </span> <span style="color:#aa0d91">not</span> self.tickers:
<span style="color:#aa0d91">raise</span> ValueError( <span style="color:#c41a16">"All provided stock data is empty"</span> )
<span style="color:#007400"># 计算一只股票数据的大小</span>
sample_df = <span style="color:#5c2699">next</span> ( <span style="color:#5c2699">iter</span> (self.stock_data.values()))
self.n_features = <span style="color:#5c2699">len</span> (sample_df.columns)
<span style="color:#007400"># 定义动作和观察空间</span>
self.action_space = Spaces.Box(low=- <span style="color:#1c00cf">1</span> , high= <span style="color:#1c00cf">1</span> , shape=( <span style="color:#5c2699">len</span> (self.tickers),), dtype=np.float32)
<span style="color:#007400"># 观察空间:每只股票的价格数据 + 余额 + 持有股份 + 净值 + 最大净值 + 当前步骤</span>
self.obs_shape = self.n_features * <span style="color:#5c2699">len</span> (self.tickers) + <span style="color:#1c00cf">2</span> + <span style="color:#5c2699">len</span> (self.tickers) + <span style="color:#1c00cf">2</span>
self.observation_space = Spaces.Box(low=-np.inf, high=np.inf, shape=(self.obs_shape,), dtype=np.float32)
<span style="color:#007400"># 初始化账户余额</span>
self.initial_balance = <span style="color:#1c00cf">1000</span>
self.balance = self.initial_balance
self.net_worth = self.initial_balance
self.max_net_worth = self.initial_balance
self.shares_held = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers}
self.total_shares_sold = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers}
self.total_sales_value = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span><span style="color:#5c2699"> len</span>ticker <span style="color:#aa0d91">in</span> self.tickers}
<span style="color:#007400"># 设置当前步骤</span>
self.current_step = <span style="color:#1c00cf">0 </span>
<span style="color:#007400"># 计算所有股票的最小数据长度</span>
self.max_steps = <span style="color:#5c2699">max</span> ( <span style="color:#1c00cf">0</span> , <span style="color:#5c2699">min</span> ( <span style="color:#5c2699">len</span> (df) <span style="color:#aa0d91">for</span> df <span style="color:#aa0d91">in</span> self.stock_data.values()) - <span style="color:#1c00cf">1</span> )
<span style="color:#007400"># 交易成本</span>
self.transaction_cost_percent = transaction_cost_percent
<span style="color:#007400"># 卖空策略</span>
self.short = <span style="color:#aa0d91">False </span>
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> reset ( <span style="color:#5c2699">self, seed= <span style="color:#aa0d91">None</span> , options= <span style="color:#aa0d91">None</span></span> ):
<span style="color:#5c2699">super</span> ().reset(seed=seed)
<span style="color:#c41a16">""" 将环境重置为新一集的初始状态。 """ </span>
<span style="color:#007400"># 重置账户余额</span>
self.balance = self.initial_balance
self.net_worth = self.initial_balance
self.max_net_worth = self.initial_balance
self.shares_held = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers}
self.total_shares_sold = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers}
self.total_sales_value = {ticker: <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers}
self.current_step = <span style="color:#1c00cf">0 </span>
<span style="color:#aa0d91">return</span> self._next_observation(), {}
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> _next_observation ( <span style="color:#5c2699">self</span> ):
<span style="color:#c41a16">""" 返回环境的当前状态,包括股票价格、余额、持股、净值等。 """ </span>
<span style="color:#007400"># 初始化框架</span>
frame = np.zeros(self.obs_shape)
<span style="color:#007400"># 为每个股票代码添加股票数据</span>
idx = <span style="color:#1c00cf">0 </span>
<span style="color:#007400"># 循环遍历每个股票代码</span>
<span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers:
<span style="color:#007400"># 获取当前股票代码的 DataFrame</span>
df = self.stock_data[ticker]
<span style="color:#007400"># 如果当前步长小于长度DataFrame 的,添加当前步骤的价格数据</span>
<span style="color:#aa0d91">if</span> self.current_step < <span style="color:#5c2699">len</span> (df):
frame[idx:idx+self.n_features] = df.iloc[self.current_step].values
<span style="color:#007400"># 否则,添加最后可用的价格数据</span>
<span style="color:#aa0d91">elif</span> (df) > <span style="color:#1c00cf">0</span> :
frame[idx:idx+self.n_features] = df.iloc[- <span style="color:#1c00cf">1</span> ].values
<span style="color:#007400"># 将索引移动到下一个股票代码</span>
idx += self.n_features
<span style="color:#007400"># 添加余额、持股、净值、最大净值和当前步骤</span>
frame[- <span style="color:#1c00cf">4</span> - <span style="color:#5c2699">len</span> (self.tickers)] = self.balance <span style="color:#007400"># 余额</span>
frame[- <span style="color:#1c00cf">3</span> - <span style="color:#5c2699">len</span> (self.tickers):- <span style="color:#1c00cf">3</span> ] = [self.shares_held[ticker] <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers] <span style="color:#007400"># 持股数量</span>
frame[- <span style="color:#1c00cf">3</span> ] = self.net_worth <span style="color:#007400"># 净值</span>
frame[- <span style="color:#1c00cf">2</span> ] = self.max_net_worth <span style="color:#007400"># 最大净值</span>
frame[- <span style="color:#1c00cf">1</span> ] = self.current_step <span style="color:#007400"># 当前步骤</span>
<span style="color:#aa0d91">return</span> frame
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> step ( <span style="color:#5c2699">self, actions</span> ):
<span style="color:#c41a16">""" 在环境,更新状态,计算奖励,并检查情节是否完成。 """ </span>
<span style="color:#007400"># 更新当前步骤</span>
self.current_step += <span style="color:#1c00cf">1 </span>
<span style="color:#007400"># 检查我们是否已达到最大步数</span>
<span style="color:#aa0d91">if</span> self.current_step > self.max_steps:
<span style="color:#aa0d91">return</span> self._next_observation(), <span style="color:#1c00cf">0</span> , <span style="color:#aa0d91">True</span> , <span style="color:#aa0d91">False</span> , {}
close_prices = {}
<span style="color:#007400"># 循环遍历每个股票代码并执行操作</span>
<span style="color:#aa0d91">for</span> i, ticker <span style="color:#aa0d91">in </span> <span style="color:#5c2699">enumerate</span> (self.tickers):
<span style="color:#007400"># 获取股票的当前开盘价和收盘价</span>
current_day = self.stock_data[ticker].iloc[self.current_step]
open_price = current_day[ <span style="color:#c41a16">'Open'</span> ]
close_price = current_day[ <span style="color:#c41a16">'Close'</span> ]
<span style="color:#007400"># 记录收盘价</span>
close_prices[ticker] = close_price
<span style="color:#007400"># 获取当前股票代码的操作</span>
action = actions[i]
action_price = open_price <span style="color:#aa0d91">if</span> self.short <span style="color:#aa0d91">else</span> close_price
<span style="color:#aa0d91">if</span> action > <span style="color:#1c00cf">0</span> : <span style="color:#007400"># 买入</span>
<span style="color:#007400"># 计算要购买的股票</span>
shares_to_buy = <span style="color:#5c2699">int</span> (self.balance * action / action_price)
<span style="color:#007400"># 计算股票成本</span><span style="color:#007400"># 交易成本</span> transaction_cost = cost * self.transaction_cost_percent
cost = share_to_buy * action_price
<span style="color:#007400"># 更新余额和持有股份</span>
self.balance -= (cost + transaction_cost)
<span style="color:#007400"># 更新售出股份总数</span>
self.shares_held[ticker] += share_to_buy
<span style="color:#aa0d91">elif</span> action < <span style="color:#1c00cf">0</span> : <span style="color:#007400"># 卖出</span>
<span style="color:#007400"># 计算要卖出的股份数量</span>
share_to_sell = <span style="color:#5c2699">int</span> (self.shares_held[ticker] * <span style="color:#5c2699">abs</span> (action))
<span style="color:#007400"># 计算销售价值</span>
sale = share_to_sell * action_price
<span style="color:#007400"># 交易成本</span>
transaction_cost = sale * self.transaction_cost_percent
<span style="color:#007400"># 更新余额和持有股份</span>
self.balance += (sale - transaction_cost)
<span style="color:#007400"># 更新售出股份总数</span>
self.shares_held[ticker] -= share_to_sell
<span style="color:#007400"># 更新售出股份</span>
self.total_shares_sold[ticker] += share_to_sell
<span style="color:#007400"># 更新销售总价值</span>
self.total_sales_value[ticker] += sale
<span style="color:#007400"># 计算净值</span>
self.net_worth = self.balance + <span style="color:#5c2699">sum</span> (self.shares_held[ticker] * close_prices[ticker] <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers)
<span style="color:#007400"># 更新最大净值</span>
self.max_net_worth = <span style="color:#5c2699">max</span> (self.net_worth, self.max_net_worth)
<span style="color:#007400"># 计算奖励</span>
reward = self.net_worth - self.initial_balance
<span style="color:#007400"># 检查情节是否完成</span>
done = self.net_worth <= <span style="color:#1c00cf">0</span> <span style="color:#aa0d91">或</span>self.current_step >= self.max_steps
obs = self._next_observation()
<span style="color:#aa0d91">return</span> obs, reward, done, <span style="color:#aa0d91">False</span> , {}
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> render ( <span style="color:#5c2699">self, mode= <span style="color:#c41a16">'human'</span></span> ):
<span style="color:#c41a16">""" 以人类可读的格式显示环境的当前状态。 """ </span>
<span style="color:#007400"># 打印当前步骤、余额、持有股份、净值和利润</span>
profit = self.net_worth - self.initial_balance
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'Step: <span style="color:#000000">{self.current_step}</span> '</span> )
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'Balance: <span style="color:#000000">{self.balance: <span style="color:#1c00cf">.2</span> f}</span> '</span> )
<span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> self.tickers:
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'<span style="color:#000000"> {ticker}</span> Shares held:<span style="color:#000000">{self.shares_held[ticker]}</span> '</span> )
<span style="color:#5c2699"> print</span> (<span style="color:#c41a16"> f'Net worth: <span style="color:#000000">{self.net_worth: <span style="color:#1c00cf">.2</span> f}</span> '</span> )
<span style="color:#5c2699"> print</span> (<span style="color:#c41a16"> f'Profit: <span style="color:#000000">{profit: <span style="color:#1c00cf">.2</span> f}</span> '</span> )
<span style="color:#aa0d91"> def </span> close (<span style="color:#5c2699"> self</span> ):
<span style="color:#c41a16"> """ 用于任何清理操作的占位符 """ </span>
<span style="color:#aa0d91">pass</span></span></span></span></span>
接下来,我们设置各种强化学习代理,以便与之前定义的交易环境进行交互。每个代理基于不同的强化学习算法,而集成代理则整合了这些独立模型的优势。
PolicyGradientLossCallback 类
- 目的:此自定义回调记录训练期间的策略梯度损失,以进行性能监控。
- 功能:
_on_step
:从模型记录器捕获并附加策略梯度损失。_on_training_end
:绘制训练后的策略梯度损失,以直观地显示损失随时间的变化情况。
个人交易代理
- PPOAgent(近端策略优化)
- 初始化:设置具有指定数量的时间步长和行动决策阈值的 PPO 模型。
- 方法:
predict
:返回 PPO 模型针对给定观察决定的动作。action_to_recommendation
:根据阈值将模型的动作转换为交易建议(买入/卖出/持有)。validate
:通过在环境中运行代理并计算总奖励来评估代理的性能。
2. 其他代理:
- A2CAgent(优势演员-评论家)
- DDPGAgent(深度确定性策略梯度)
- SACAgent(软演员-评论家)
- TD3Agent(双延迟深度确定性策略梯度)
- 初始化:继承自 PPOAgent,但分别使用 A2C、DDPG、SAC 和 TD3 算法。它们还包含 PolicyGradientLossCallback 来跟踪训练期间的损失。
3. 集成代理
- 目的:结合多个独立模型(PPO、A2C、DDPG、SAC 和 TD3)的预测结果,做出最终决策。这种集成方法旨在充分发挥每种算法的优势。
- 方法:
predict
:对每个单独模型预测的动作进行平均以确定集合动作。action_to_recommendation
:根据阈值将集合动作转化为买入/卖出/持有建议。validate
:测试集成代理在环境中的表现并计算总奖励。
这些代理旨在处理环境中的交易决策,并经过验证以确保其在最大化回报和做出明智的交易选择方面的有效性。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#aa0d91">class </span> PolicyGradientLossCallback (BaseCallback):
<span style="color:#c41a16">"" </span><span style="color:#c41a16">"
一个自定义回调类,用于在训练期间记录 policy_gradient_loss。
此类扩展了 BaseCallback,并用于捕获和存储我们想要的指标。
" </span><span style="color:#c41a16">"" </span>
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self , verbose= <span style="color:#1c00cf">0</span></span> ):
super (PolicyGradientLossCallback, self ).__init__(verbose)
self .losses = []
<span style="color:#aa0d91">def </span> _on_step ( <span style="color:#5c2699">self</span> ) -> <span style="color:#1c00cf">bool: </span>
<span style="color:#aa0d91">if</span> hasattr( self .model, <span style="color:#c41a16">'logger'</span> ):
logs = self .model.logger.name_to_value
<span style="color:#aa0d91">if </span> <span style="color:#c41a16">'train/policy_gradient_loss' </span> <span style="color:#aa0d91">in </span> <span style="color:#1c00cf">logs:</span>
loss = logs[ <span style="color:#c41a16">'train/policy_gradient_loss'</span> ]
self .losses.append(loss)
<span style="color:#aa0d91">return</span> True
<span style="color:#aa0d91">def </span> _on_training_end ( <span style="color:#5c2699">self</span> ):
<span style="color:#c41a16">"" </span><span style="color:#c41a16">" 训练结束后绘制损失图 " </span><span style="color:#c41a16">""</span>
name = self .model.__class__.__name__
plt.figure(figsize=( <span style="color:#1c00cf">12</span> , <span style="color:#1c00cf">4</span> ))
plt.plot( self .losses, label= <span style="color:#c41a16">'策略梯度损失'</span> )
plt.title(f <span style="color:#c41a16">'{name} - 训练期间的策略梯度损失'</span> )
plt.xlabel( <span style="color:#c41a16">'训练步骤'</span> )
plt.ylabel( <span style="color:#c41a16">'损失'</span> )
plt.legend()
plt.show()</span></span></span></span>
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#007400"># 定义 PPO 代理</span>
<span style="color:#aa0d91">class </span> PPOAgent :
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, env, total_timesteps, threshold</span> ):
self.model = PPO( <span style="color:#c41a16">"MlpPolicy"</span> , env, verbose= <span style="color:#1c00cf">1</span> )
self.callback = PolicyGradientLossCallback()
self.model.learn(total_timesteps=total_timesteps, callback=self.callback)
self.threshold = threshold
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> predict ( <span style="color:#5c2699">self, obs</span> ):
action, _ = self.model.predict(obs, deterministic= <span style="color:#aa0d91">True</span> )
<span style="color:#aa0d91">return</span> action
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> action_to_recommendation ( <span style="color:#5c2699">self, action</span> ):
recommendations = []
<span style="color:#aa0d91">for</span> a <span style="color:#aa0d91">in</span> action:
<span style="color:#aa0d91">if</span> a > self.threshold:
recommendations.append( <span style="color:#c41a16">'buy'</span> )
<span style="color:#aa0d91">elif</span> a < -self.threshold:
recommendations.append( <span style="color:#c41a16">'sell'</span> )
<span style="color:#aa0d91">else</span> :
recommendations.append( <span style="color:#c41a16">'hold'</span> )
<span style="color:#aa0d91">return</span> recommendations
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> verify ( <span style="color:#5c2699">self, env</span> ):
obs = env.reset()
total_rewards = <span style="color:#1c00cf">0 </span>
<span style="color:#aa0d91">for</span> _ <span style="color:#aa0d91">in </span> <span style="color:#5c2699">range</span> ( <span style="color:#1c00cf">1000</span> ): <span style="color:#007400"># 根据需要调整</span>
action, _ = self.model.predict(obs)
obs, reward, done, _ = env.step(action)
total_rewards += reward
<span style="color:#aa0d91">if</span> done:
obs = env.reset()
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'Agent Validation Reward: <span style="color:#000000">{total_rewards}</span> '</span> )
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 定义 A2C Agent</span>
<span style="color:#aa0d91">类</span> A2CAgent ( PPOAgent ):
<span style="color:#aa0d91">def </span> __init__ (<span style="color:#5c2699">自我,环境,total_timesteps,阈值</span>):
<span style="color:#5c2699">super</span>()。__init__(env,total_timesteps,阈值)
self.model = A2C(<span style="color:#c41a16">“MlpPolicy”</span>,env,verbose = <span style="color:#1c00cf">1</span>)
self.callback = PolicyGradientLossCallback()
self.model.learn(total_timesteps=total_timesteps,callback=self.callback)
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 定义 DDPG 代理</span>
<span style="color:#aa0d91">class </span> DDPGAgent ( PPOAgent ):
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, env, total_timesteps, threshold</span> ):
<span style="color:#5c2699">super</span> ().__init__(env, total_timesteps, threshold)
self.model = DDPG( <span style="color:#c41a16">"MlpPolicy"</span> , env, verbose= <span style="color:#1c00cf">1</span> )
self.callback = PolicyGradientLossCallback()
self.model.learn(total_timesteps=total_timesteps,callback=self.callback)
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 定义 SAC 代理</span>
<span style="color:#aa0d91">class </span> SACAgent ( PPOAgent ):
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, env, total_timesteps, threshold</span> ):
<span style="color:#5c2699">super</span> ().__init__(env, total_timesteps, threshold)
self.model = SAC( <span style="color:#c41a16">"MlpPolicy"</span> , env, verbose= <span style="color:#1c00cf">1</span> )
self.callback = PolicyGradientLossCallback()
self.model.learn(total_timesteps=total_timesteps,callback=self.callback)
<span style="color:#007400"># ----------------------------------------------------------------------------------------- </span>
<span style="color:#007400"># 定义 TD3 Agent </span>
<span style="color:#aa0d91">class </span> TD3Agent ( PPOAgent ):
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, env, total_timesteps, threshold</span> ):
<span style="color:#5c2699">super</span> ().__init__(env, total_timesteps, threshold)
self.model = TD3( <span style="color:#c41a16">"MlpPolicy"</span> , env, verbose= <span style="color:#1c00cf">1</span> )
self.callback = PolicyGradientLossCallback()
self.model.learn(total_timesteps=total_timesteps,callback=self.callback)
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 定义 Ensemble Agent </span>
<span style="color:#aa0d91">class </span> EnsembleAgent :
<span style="color:#aa0d91">def </span> __init__ ( <span style="color:#5c2699">self, ppo_model, a2c_model, ddpg_model, sac_model, td3_model,阈值</span>):
self.ppo_model = ppo_model
self.a2c_model = a2c_model
self.ddpg_model = ddpg_model
self.sac_model = sac_model
self.td3_model = td3_model
self.threshold =阈值
<span style="color:#007400">#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> predict(<span style="color:#5c2699">self,obs</span>):
ppo_action, _ = self.ppo_model.predict(obs, deterministic= <span style="color:#aa0d91">True</span> )
a2c_action, _ = self.a2c_model.predict(obs, deterministic=<span style="color:#aa0d91"> True</span> )
ddpg_action, _ = self.ddpg_model.predict(obs, deterministic= <span style="color:#aa0d91">True</span> )
sac_action, _ = self.sac_model.predict(obs, deterministic= <span style="color:#aa0d91">True</span> )
td3_action, _ = self.td3_model.predict(obs, deterministic= <span style="color:#aa0d91">True</span> )
<span style="color:#007400"># 对动作求平均值</span>
ensemble_action = np.mean([ppo_action, a2c_action, ddpg_action, sac_action, td3_action], axis= <span style="color:#1c00cf">0</span> )
<span style="color:#aa0d91">return</span> ensemble_action
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> action_to_recommendation ( <span style="color:#5c2699">self, action</span> ):
recommendations = []
<span style="color:#aa0d91">for</span> a <span style="color:#aa0d91">in</span> action:
<span style="color:#aa0d91">if</span> a > self.threshold:
recommendations.append( <span style="color:#c41a16">'buy'</span> )
<span style="color:#aa0d91">elif</span> a < -self.threshold:
recommendations.append( <span style="color:#c41a16">'sell'</span> )
<span style="color:#aa0d91">else</span> :
recommendations.append( <span style="color:#c41a16">'hold'</span> )
<span style="color:#aa0d91">return</span> recommendations
<span style="color:#007400"># - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>
<span style="color:#aa0d91">def </span> verify ( <span style="color:#5c2699">self, env</span> ):
obs = env.reset()
total_rewards = <span style="color:#1c00cf">0 </span>
<span style="color:#aa0d91">for</span> _ <span style="color:#aa0d91">in </span> <span style="color:#5c2699">range</span> ( <span style="color:#1c00cf">1000</span> ): <span style="color:#007400"># 根据需要调整</span>
action = self.predict(obs)
obs, reward, done, _ = env.step(action)
total_rewards += reward
<span style="color:#aa0d91">if</span> done:
obs = env.reset()
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'Agent Validation Reward: <span style="color:#000000">{total_rewards}</span> '</span> )</span></span></span></span>
以下是用于训练和评估交易代理的辅助功能的详细概述:
1. 创建环境和训练代理的功能
create_env_and_train_agents
- 目的:初始化交易环境并培训各种交易代理。
- 功能:
- 环境:使用该类创建训练(
train_env
)和验证( )环境。val_env
StockTradingEnv
- 代理:使用各自的类别和验证数据训练和验证每个代理(
PPO
,,,,)A2C
。DDPG
SAC
TD3
- 集成:训练并验证结合所有单独模型的预测的集成代理。
- 返回:提供初始化的环境和训练有素的代理以供进一步分析。
2.可视化功能
visualize_portfolio
- 目的:绘制随时间变化的余额、净值和持有的股份。
- 参数:
steps
:时间步长列表。balances
、、:随时间跟踪的指标net_worths
。shares_held
tickers
:股票代码列表。show_balance
,,show_net_worth
:show_shares_held
控制显示哪些图的标志。- 功能:创建余额、净值和持有股票的多面板图,以便直观地检查投资组合随时间的表现。
visualize_portfolio_net_worth
- 目的:绘制随时间变化的净值。
- 参数:
steps
:时间步长列表。net_worths
:随时间推移跟踪的净资产。- 功能:创建净值的单一图表,清晰地显示投资组合的价值进展。
visualize_multiple_portfolio_net_worth
- 目的:在同一张图表上比较多个投资组合的净值。
- 参数:
steps
:时间步长列表。net_worths_list
:不同代理商的净值系列列表。labels
:每个经纪人净值系列的标签。- 功能:在一张图表上绘制多个代理的净值,以便于直接比较。
3.测试功能
test_agent
- 目的:测试单个代理在环境中的性能并跟踪关键指标。
- 参数:
env
:测试代理的环境。agent
:需要测试的代理。stock_data
:与股票相关的数据,用于指标跟踪。n_tests
:测试迭代次数。visualize
:用于控制测试期间环境渲染的标志。- 功能:在环境中运行代理,收集指标(余额、净值、持有股份),并可选择可视化环境。
test_and_visualize_agents
- 目的:测试多个代理并可视化其性能。
- 参数:
env
:测试代理的环境。agents
:待测试代理的词典。data
:用于指标跟踪的股票数据。n_tests
:测试迭代次数。- 功能:测试每个代理,收集性能指标,并生成随时间推移的净值的比较可视化。
4. 性能比较功能
compare_and_plot_agents
- 目的:根据代理商的回报、标准差和夏普比率进行比较。
- 参数:
agents_metrics
:从测试代理收集的指标。labels
:每个代理的标签。risk_free_rate
:计算夏普比率的无风险利率。- 功能:
- 比较:计算每个代理的回报、标准差和夏普比率。
- 可视化:显示排序的数据框和条形图,比较代理的夏普比率,突出显示哪个代理相对于风险调整后的回报表现最佳。
这些功能提供了用于训练、测试和评估交易代理的综合工具包,可以对不同的模型进行深入分析和比较。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#007400"># 创建环境并训练代理的函数</span>
<span style="color:#aa0d91">def </span> create_env_and_train_agents ( <span style="color:#5c2699">train_data, val_data, total_timesteps, threshold</span> ):
<span style="color:#007400"># 创建用于训练和验证的环境</span>
train_env = DummyVecEnv([ <span style="color:#aa0d91">lambda</span> : StockTradingEnv(train_data)])
val_env = DummyVecEnv([ <span style="color:#aa0d91">lambda</span> : StockTradingEnv(val_data)])
<span style="color:#007400"># 训练和验证 PPO 代理</span>
ppo_agent = PPOAgent(train_env, total_timesteps, threshold)
ppo_agent.validate(val_env)
<span style="color:#007400"># 训练和验证 A2C 代理</span>
a2c_agent = A2CAgent(train_env, total_timesteps, threshold)
a2c_agent.validate(val_env)
<span style="color:#007400"># 训练和验证 DDPG 代理</span>
ddpg_agent = DDPGAgent(train_env, total_timesteps, threshold)
ddpg_agent.validate(val_env)
<span style="color:#007400"># 训练并验证 SAC 代理</span>
sac_agent = SACAgent(train_env, total_timesteps, threshold)
sac_agent.validate(val_env)
<span style="color:#007400"># 训练并验证 TD3 代理</span>
td3_agent = TD3Agent(train_env, total_timesteps, threshold)
td3_agent.validate(val_env)
<span style="color:#007400"># 训练并验证集成代理 ensemble_agent</span>
= EnsembleAgent(ppo_agent.model, a2c_agent.model, ddpg_agent.model,
sac_agent.model, td3_agent.model, threshold)
ensemble_agent.validate(val_env)
<span style="color:#aa0d91">return</span> train_env, val_env, ppo_agent, a2c_agent, ddpg_agent, sac_agent, td3_agent,ensemble_agent
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 可视化投资组合变化的函数</span>def
<span style="color:#aa0d91">visualize_portfolio </span> ( steps <span style="color:#5c2699">,balances,net_worths,shares_held,tickers,
show_balance = <span style="color:#aa0d91">True</span>,show_net_worth= <span style="color:#aa0d91">True</span>,show_shares_held= <span style="color:#aa0d91">True</span></span> ):
fig,axs = plt.subplots( <span style="color:#1c00cf">3</span>,figsize=( <span style="color:#1c00cf">12,18 </span><span style="color:#1c00cf">)</span> ) <span style="color:#007400"># 绘制余额</span><span style="color:#aa0d91">if</span> show_balance: axs[ <span style="color:#1c00cf">0</span> ].plot(steps,balances,label= <span style="color:#c41a16">'Balance'</span> ) axs[ <span style="color:#1c00cf">0</span> ].set_title( <span style="color:#c41a16">'Balance Over Time'</span> ) axs[ <span style="color:#1c00cf">0</span> ].set_xlabel( <span style="color:#c41a16">'Steps'</span> ) axs[ <span style="color:#1c00cf">0</span> ].set_ylabel( <span style="color:#c41a16">'Balance'</span> ) axs[ <span style="color:#1c00cf">0</span> ].legend() <span style="color:#007400"># 绘制净值</span><span style="color:#aa0d91">if</span> show_net_worth: axs[ <span style="color:#1c00cf">1</span>
].plot(steps, net_worths, label= <span style="color:#c41a16">'Net Worth'</span> , color= <span style="color:#c41a16">'orange'</span> )
axs[ <span style="color:#1c00cf">1</span> ].set_title( <span style="color:#c41a16">'净资产随时间变化'</span> )
axs[ <span style="color:#1c00cf">1</span> ].set_xlabel( <span style="color:#c41a16">'Steps'</span> )
axs[ <span style="color:#1c00cf">1</span> ].set_ylabel( <span style="color:#c41a16">'净资产'</span> )
axs[ <span style="color:#1c00cf">1</span> ].legend()
<span style="color:#007400"># 绘制持有股份数</span>
<span style="color:#aa0d91">if</span> show_shares_held:
<span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> tickers:
axs[ <span style="color:#1c00cf">2</span> ].plot(steps, share_held[ticker], label= <span style="color:#c41a16">f'Shares Held: <span style="color:#000000">{ticker}</span> '</span> )
axs[ <span style="color:#1c00cf">2</span> ].set_title( <span style="color:#c41a16">'持有股份随时间变化'</span> )
axs[ <span style="color:#1c00cf">2</span> ].set_xlabel( <span style="color:#c41a16">'Steps'</span> )
axs[ <span style="color:#1c00cf">2</span> ].set_ylabel( <span style="color:#c41a16">'持有股份'</span> )
axs[ <span style="color:#1c00cf">2</span> ].legend()
plt.tight_layout()
plt.show()
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 用于可视化投资组合净值的函数</span>
<span style="color:#aa0d91">def </span> visualize_portfolio_net_worth ( <span style="color:#5c2699">steps, net_worths</span> ):
plt.figure(figsize=( <span style="color:#1c00cf">12</span> , <span style="color:#1c00cf">6</span> ))
plt.plot(steps, net_worths, label= <span style="color:#c41a16">'Net Worth'</span> , color= <span style="color:#c41a16">'orange'</span> )
plt.title( <span style="color:#c41a16">'Net Worth Over Time'</span> )
plt.xlabel( <span style="color:#c41a16">'Steps'</span> )
plt.ylabel( <span style="color:#c41a16">'Net Worth'</span> )
plt.legend()
plt.show()
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400"># 用于可视化多个投资组合净值的函数(同一张图表)</span>
<span style="color:#aa0d91">def </span> visualize_multiple_portfolio_net_worth ( <span style="color:#5c2699">steps, net_worths_list, labels</span> ):
plt.figure(figsize=( <span style="color:#1c00cf">12</span> , <span style="color:#1c00cf">6</span> ))
<span style="color:#aa0d91">for</span> i, net_worths <span style="color:#aa0d91">in </span> <span style="color:#5c2699">enumerate</span> (net_worths_list):
plt.plot(steps, net_worths,label = label [i])
plt.title(<span style="color:#c41a16">'随时间变化的净值'</span>)
plt.xlabel(<span style="color:#c41a16">'步数'</span>)
plt.ylabel(<span style="color:#c41a16">'净值'</span>)
plt.legend()
plt。显示()
<span style="color:#007400">#-------------------------------------------------------------------------------------------- </span>
<span style="color:#aa0d91">def</span> test_agent (<span style="color:#5c2699">env, agent, stock_data, n_tests= <span style="color:#1c00cf">1000</span> , visualize= <span style="color:#aa0d91">False</span></span> ):
<span style="color:#c41a16">""" 测试单个代理并跟踪性能指标,并可选择可视化结果 """ </span>
<span style="color:#007400"># 初始化指标跟踪</span>
metrics = {
<span style="color:#c41a16">'steps'</span> : [],
<span style="color:#c41a16">'balances'</span> : [],
<span style="color:#c41a16">'net_worths'</span> : [],
<span style="color:#c41a16">'shares_held'</span> : {ticker: [] <span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> stock_data.keys()}
}
<span style="color:#007400"># 在开始测试之前重置环境</span>
obs = env.reset()
<span style="color:#aa0d91">for</span> i <span style="color:#aa0d91">in </span> <span style="color:#5c2699">range</span> (n_tests):
metrics[ <span style="color:#c41a16">'steps'</span> ].append(i)
action = agent.predict(obs)
obs, rewards, dones, infos = env.step(action)
<span style="color:#aa0d91">if</span> visualize:
env.render()
<span style="color:#007400"># 跟踪指标</span>
metrics[ <span style="color:#c41a16">'balances'</span> ].append(env.get_attr( <span style="color:#c41a16">'balance'</span> )[ <span style="color:#1c00cf">0</span> ])
metrics[ <span style="color:#c41a16">'net_worths'</span> ].append(env.get_attr( <span style="color:#c41a16">'net_worth'</span> )[ <span style="color:#1c00cf">0</span> ])
env_shares_held = env.get_attr( <span style="color:#c41a16">'shares_held'</span> )[ <span style="color:#1c00cf">0</span> ]
<span style="color:#007400"># 更新每个股票代码的持有股份</span>
<span style="color:#aa0d91">for</span> ticker <span style="color:#aa0d91">in</span> stock_data.keys():
<span style="color:#aa0d91">if</span> ticker <span style="color:#aa0d91">in</span> env_shares_held:
metrics[ <span style="color:#c41a16">'shares_held'</span> ][ticker].append(env_shares_held[ticker])
<span style="color:#aa0d91">else</span> :
metrics[ <span style="color:#c41a16">'shares_held'</span> ][ticker].append( <span style="color:#1c00cf">0</span> ) <span style="color:#007400"># 如果未找到股票代码,则附加 0 </span>
<span style="color:#aa0d91">if</span> dones:
obs = env.reset()
<span style="color:#aa0d91">return</span> metrics
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#aa0d91">def </span> test_and_visualize_agents ( <span style="color:#5c2699">env, agent, data, n_tests= <span style="color:#1c00cf">1000</span></span> ):
metrics = {}
<span style="color:#aa0d91">for</span> agent_name, agent <span style="color:#aa0d91">in</span> agent.items():
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f"Testing <span style="color:#000000">{agent_name}</span> ..."</span> )
指标[agent_name] = test_agent(env, agent, data, n_tests=n_tests,visualize= <span style="color:#aa0d91">True</span> )
<span style="color:#007400"># 提取净值进行可视化</span>
net_worths = [metrics[agent_name][ <span style="color:#c41a16">'net_worths'</span> ] <span style="color:#aa0d91">for</span> agent_name <span style="color:#aa0d91">in</span> agent.keys()]
steps = <span style="color:#5c2699">next</span> ( <span style="color:#5c2699">iter</span>(metrics.values()))[ <span style="color:#c41a16">'steps'</span> ] <span style="color:#007400"># 为简单起见,假设所有代理的步数相同</span>
<span style="color:#007400"># 可视化多个代理的绩效指标</span>
visualize_multiple_portfolio_net_worth(steps, net_worths, <span style="color:#5c2699">list</span> (agents.keys()))
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#aa0d91">def </span> compare_and_plot_agents ( <span style="color:#5c2699">agent_metrics, label, Risk_free_rate= <span style="color:#1c00cf">0.0</span></span> ):
<span style="color:#007400"># 用于比较代理的收益、标准差和夏普比率的函数</span>
<span style="color:#aa0d91">def </span> compare_agents ( <span style="color:#5c2699">agent_metrics, label</span> ):
returns = []
stds = []
sharpe_ratios = []
<span style="color:#aa0d91">for</span> metrics <span style="color:#aa0d91">in</span> agent_metrics:
net_worths = metrics[ <span style="color:#c41a16">'net_worths'</span> ]
<span style="color:#007400"># 计算每日收益</span>
daily_returns = np.diff(net_worths) / net_worths[:- <span style="color:#1c00cf">1</span> ]
avg_return = np.mean(daily_returns)
std_return = np.std(daily_returns)
sharpe_ratio = ((avg_return - Risk_free_rate) / std_return) <span style="color:#aa0d91">if</span> std_return != <span style="color:#1c00cf">0 </span> <span style="color:#aa0d91">else </span> <span style="color:#c41a16">'Inf'</span>
returns.append(avg_return)
stds.append(std_return)
sharpe_ratios.append(sharpe_ratio)
df = pd.DataFrame({
<span style="color:#c41a16">'Agent'</span> : label,
<span style="color:#c41a16">'Return'</span> : returns,
<span style="color:#c41a16">'Standard Deviation'</span> : stds,
<span style="color:#c41a16">'Sharpe Ratio'</span> : sharpe_ratios
})
<span style="color:#aa0d91">return</span> df
<span style="color:#007400"># 比较代理</span>
df = compare_agents(agents_metrics, label)
<span style="color:#007400"># 按夏普比率对数据框进行排序</span>
df_sorted = df.sort_values(by= <span style="color:#c41a16">'Sharpe Ratio'</span> , ascending= <span style="color:#aa0d91">False</span> )
<span style="color:#007400"># 显示dataframe</span>
display(df_sorted)
<span style="color:#007400"># 绘制夏普比率的条形图</span>
plt.figure(figsize=( <span style="color:#1c00cf">12</span> , <span style="color:#1c00cf">6</span> ))
plt.bar(df_sorted[ <span style="color:#c41a16">'Agent'</span> ], df_sorted[ <span style="color:#c41a16">'Sharpe Ratio'</span> ])
plt.title( <span style="color:#c41a16">'Sharpe Ratio 比较'</span> )
plt.xlabel( <span style="color:#c41a16">'Agent'</span> )
plt.ylabel( <span style="color:#c41a16">'Sharpe Ratio'</span> )
plt.show()</span></span></span></span>
最后,我们可以培训交易代理:
训练参数设置:
- 阈值:阈值决定了触发买入或卖出决策的最小幅度。在本例中,阈值设置为 0.1。
- 总时间步数:此参数指定代理训练的总时间步数。此处设置为 10,000 个时间步。
环境创建和代理训练:
- 环境创建
StockTradingEnv
:此步骤使用根据提供的股票数据定制的类来初始化训练和验证环境。 - 代理训练:该
create_env_and_train_agents
函数使用训练环境训练各种强化学习代理(PPO、A2C、DDPG、SAC、TD3)。每个代理都经过指定数量的时间步长的训练。 - 集成代理:集成代理会结合所有单个模型的预测,并进行训练。这种方法旨在充分利用每个模型的优势,并潜在地提高整体性能。
返回的对象包括经过训练的环境和代理,可供进一步评估和性能分析。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#007400"># 创建环境并训练代理</span>
threshold = <span style="color:#1c00cf">0.1</span>
total_timesteps = <span style="color:#1c00cf">10000</span>
train_env, val_env, ppo_agent, a2c_agent, ddpg_agent, sac_agent, td3_agent, ensemble_agent = \
create_env_and_train_agents(training_data, validation_data, total_timesteps, threshold)</span></span></span></span>
我们还可以测试和可视化代理:
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424">n_tests = <span style="color:#1c00cf">1000 个</span>
代理 = {
<span style="color:#c41a16">'PPO 代理'</span>:ppo_agent,
<span style="color:#c41a16">'A2C 代理'</span>:a2c_agent,
<span style="color:#c41a16">'DDPG 代理'</span>:ddpg_agent,
<span style="color:#c41a16">'SAC 代理'</span>:sac_agent,
<span style="color:#c41a16">'TD3 代理'</span>:td3_agent,
<span style="color:#c41a16">'Ensemble 代理'</span>:ensemble_agent
}
test_and_visualize_agents(train_env,代理,training_data,n_tests=n_tests)
test_env = DummyVecEnv([ <span style="color:#aa0d91">lambda</span>:StockTradingEnv(test_data)])
test_and_visualize_agents(test_env,代理,test_data,n_tests=n_tests)</span></span></span></span>
相应结果如下:
训练和测试集性能
我们还比较了代理在测试数据上的表现(回报、标准差和夏普比率)。
摘自论文:
代理的夏普比率越高,其收益相对于其承担的投资风险就越高。因此,我们选择能够最大化收益以适应不断增加的风险的交易代理。
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424">test_agents_metrics = [test_agent(test_env, agent, test_data, n_tests=n_tests, visualize= <span style="color:#aa0d91">False</span> ) <span style="color:#aa0d91">for</span> agent <span style="color:#aa0d91">in</span> agent.values()]
compare_and_plot_agents(test_agents_metrics, <span style="color:#5c2699">list</span> (agents.keys()))</span></span></span></span>
相应结果如下:
代理商比较
最后,我们还可以使用该模型来提出第二天的建议:
<span style="color:rgba(0, 0, 0, 0.8)"><span style="background-color:#ffffff"><span style="background-color:#f9f9f9"><span style="color:#242424"><span style="color:#aa0d91">def </span> prepare_next_day_data ( <span style="color:#5c2699">stock_data</span> ):
<span style="color:#c41a16">""" 准备下一个交易日的观察值 """ </span>
<span style="color:#007400"># 使用当前股票数据初始化环境</span>
env = StockTradingEnv(stock_data)
env.reset()
<span style="color:#007400"># 准备第二天的观察值</span>
next_day_observations = env._next_observation()
<span style="color:#aa0d91">return</span> next_day_observations
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#aa0d91">def </span> generate_next_day_recommendations ( <span style="color:#5c2699">agent, next_day_observation</span> ):
<span style="color:#c41a16">""" 使用训练有素的代理生成下一个交易日的建议 """</span>
recommendations = {agent_name: [] <span style="color:#aa0d91">for</span> agent_name <span style="color:#aa0d91">in</span> agent.keys()}
<span style="color:#aa0d91">for</span> agent_name, agent <span style="color:#aa0d91">in</span> agent.items():
action = agent.predict(next_day_observation)
recs = agent.action_to_recommendation(action)
recommendations[agent_name] = <span style="color:#5c2699">zip</span> (recs, action)
<span style="color:#aa0d91">return</span> recommendations
<span style="color:#007400"># ----------------------------------------------------------------------------- </span>
<span style="color:#007400">#准备第二天的观察</span>
next_day_observation = prepare_next_day_data(test_data)
<span style="color:#007400"># 生成下一个交易日的建议</span>
recommendations = generate_next_day_recommendations(agents, next_day_observation)
<span style="color:#007400"># 打印或显示建议</span>
<span style="color:#aa0d91">for</span> agent_name, recs <span style="color:#aa0d91">in</span> recommendations.items():
<span style="color:#aa0d91">if</span> agent_name == <span style="color:#c41a16">'Ensemble Agent'</span> :
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f'\nRecommendations for <span style="color:#000000">{agent_name}</span> :'</span> )
<span style="color:#aa0d91">for</span> ticker, recommendations <span style="color:#aa0d91">in </span> <span style="color:#5c2699">zip</span> (tickers, recs):
<span style="color:#5c2699">print</span> ( <span style="color:#c41a16">f" <span style="color:#000000">{ticker}</span> : <span style="color:#000000">{recommendation}</span> "</span> )</span></span></span></span>
相应结果如下:
次日推荐
结论
我们已完成使用自定义交易环境设置和训练股票交易强化学习代理的复杂流程。我们首先设计了一个全面的环境,以捕捉股票交易的细微差别,包括交易成本、状态观测和奖励计算。在此环境下,我们训练了各种强化学习代理——PPO、A2C、DDPG、SAC 和 TD3——每个代理都为交易策略贡献了其独特的优势。此外,我们还实现了一个集成代理,整合了所有单个模型的预测,旨在最大限度地提高性能和稳健性。
我们的探索展示了这些先进算法如何应用于现实世界的交易场景,凸显了它们根据市场数据进行调整和做出明智决策的潜力。从中获得的洞见不仅展现了强化学习在金融领域的威力,也强调了严格的评估和可视化在评估智能体绩效方面的重要性。通过不断完善模型并分析其结果,我们可以努力制定更有效的交易策略,并更深入地理解市场动态。
相关资料及代码↓