期权定价_强化学习的期权定价

最新推荐文章于 2024-07-05 11:26:16 发布

weixin_26704853

最新推荐文章于 2024-07-05 11:26:16 发布

阅读量1.6k

点赞数

文章标签： python 强化学习

原文链接：https://medium.com/swlh/option-pricing-using-reinforcement-learning-ad2ddca7735b

版权

本文探讨了使用强化学习进行期权定价的方法，通过翻译自https://medium.com/swlh/option-pricing-using-reinforcement-learning-ad2ddca7735b的内容，深入理解这一金融领域中的创新技术。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

期权定价

This post demonstrates how to use reinforcement learning to price an American Option. An option is a derivative contract that gives its owner the right but not the obligation to buy or sell an underlying asset. Unlike its European-style counterpart, American-style option may exercise at any time before expiry.

这篇文章演示了如何使用强化学习来定价美国期权。期权是一种衍生合同，它赋予其所有者购买或出售基础资产的权利，但没有义务。与欧式期权不同，美式期权可以在到期前的任何时间行使。

American Option is known to be an optimal control MDP (Markov Decision Process) problem where the underlying process is Geometric Brownian motion ([1]). The Markovian state is a price-time tuple and the control is a binary action that decides on each day whether to exercise the option or not.

众所周知，美式期权是一个最优控制MDP(马尔可夫决策过程)问题，其中基础过程是几何布朗运动([1])。马尔可夫状态是一个价格时间元组，而控件是一个二进制操作，它每天决定是否执行该期权。

The optimal stopping policy looks like the figure below, where the x-axis is time and the y-axis is the stock price. The curve in red is commonly called the optimal exercise boundary. On each day, if the stock price falls in the exercise region that is located above the boundary for a call or below the boundary for a put, it is optimal to exercise the option and get paid by the amount between stock price and strike price.

最佳停止策略如下图所示，其中x轴是时间，y轴是股票价格。红色的曲线通常称为最佳运动边界。每天，如果股票价格位于看涨期权的边界上方或看跌期权的边界下方的行使区域中，则最好行使期权并按股票价格与行使价之间的金额获得报酬。

One can imagine it as a discretized Q-table as illustrated in dotted grids. Every day the agent or the trader looks up the table and take action according to today’s’ price. The Q-table is monotonous in that all the grids above the boundary yield a go-decision and all the grids below yield a no-go decision. Therefore Q-learning suits well to find the optimal strategy that is defined by this boundary.

可以想象它是离散的Q表，如虚线网格所示。代理商或交易员每天都会查表，并根据今天的价格采取行动。 Q表是单调的，因为边界上方的所有网格均产生决策，而边界下方的所有网格均产生决策。因此，Q学习非常适合找到由该边界定义的最佳策略。

The remainder contains three sections. In the first section, a baseline price is computed using classical models. In the second section, an OpenAI gym environment is constructed, similar to building an Atari game. and then in the third section, an agent is trained with DQN (Deep Q-Network) to play American options, similar to training computers to play Atari games. The full Python notebook is located here on Github.

其余部分分为三个部分。在第一部分中，使用经典模型计算基准价格。在第二部分中，构建了OpenAI体育馆环境，类似于构建Atari游戏。然后在第三部分中，对代理商进行了DQN(深度Q网络)培训，以玩美式期权，类似于训练计算机玩Atari游戏。完整的Python笔记本位于Github上。