Alpha 2 ：使用深度强化学习发现逻辑公式化 Alpha

Alpha2: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning
Alpha 2 ：使用深度强化学习发现逻辑公式化 Alpha

Feng Xu1,2∗, Yan Yin∗, Xinyu Zhang1,2, Tianyuan Liu1,2,
徐 1,2∗ 峰，尹彦 ∗ ，张新宇 1,2 ，刘天元 1,2 ，
Shengyi Jiang3, and Zongzhang Zhang1,2†
江 3 胜义和张 1,2† 宗章

Abstract 抽象

†

Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on genetic programming (GP), which is known to suffer from the problems of being sensitive to the initial population, converting to local optima, and slow computation speed. Recent efforts employing deep reinforcement learning (DRL) for alpha discovery have not fully addressed key practical considerations such as alpha correlations and validity, which are crucial for their effectiveness. In this work, we propose a novel framework for alpha discovery using DRL by formulating the alpha discovery process as program construction. Our agent, Alpha2, assembles an alpha program optimized for an evaluation metric. A search algorithm guided by DRL navigates through the search space based on value estimates for potential alpha outcomes. The evaluation metric encourages both the performance and the diversity of alphas for a better final trading strategy. Our formulation of searching alphas also brings the advantage of pre-calculation dimensional analysis, ensuring the logical soundness of alphas, and pruning the vast search space to a large extent. Empirical experiments on real-world stock markets demonstrates Alpha2’s capability to identify a diverse set of logical and effective alphas, which significantly improves the performance of the final trading strategy. The code of our method is available at GitHub - x35f/alpha2: pseudocode and algorithms for the paper "Alpha$^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning".
阿尔法在为量化交易提供信号方面至关重要。与富有表现力但容易过度拟合的黑盒 alpha 相比，业界高度重视公式化 alpha 的发现，因为它们具有可解释性和易于分析性。在这项工作中，我们专注于发现公式化阿尔法。先前关于自动生成公式化 alpha 集合的研究主要基于遗传编程（GP），众所周知，遗传编程存在对初始种群敏感、转换为局部最优值和计算速度慢等问题。最近采用深度强化学习（DRL）进行 alpha 发现的努力尚未完全解决关键的实际考虑因素，例如 alpha 相关性和有效性，这对其有效性至关重要。在这项工作中，我们通过将 alpha 发现过程表述为程序构建，提出了一种使用 DRL 的 alpha 发现的新框架。我们的代理 Alpha 2 组装了一个针对评估指标优化的 alpha 程序。由 DRL 指导的搜索算法根据潜在 alpha 结果的值估计在搜索空间中导航。评估指标鼓励阿尔法的表现和多样性，以获得更好的最终交易策略。我们搜索 alpha 的公式还带来了预计算维度分析的优势，保证了 alpha 的逻辑健全性，并在很大程度上修剪了广阔的搜索空间。在真实世界的股票市场中进行的实证实验表明，Alpha 2 能够识别出一组多样化的逻辑和有效的 alpha，这显着提高了最终交易策略的性能。我们方法的代码可在 https://github.com/x35f/alpha2 上找到。

1Introduction 1引言

In quantitative investment, alphas play a pivotal role in providing trading signals. Serving as the foundation for strategic decision-making, alphas transform raw market data, such as opening and closing prices, into actionable signals such as return predictions. These signals inform traders’ decisions and shape their strategies. Uncovering high-performance alphas that can withstand market fluctuations has long been a focal point in financial research.
在量化投资中，阿尔法在提供交易信号方面发挥着关键作用。作为战略决策的基础，阿尔法将原始市场数据（如开盘价和收盘价）转化为可操作的信号，如回报预测。这些信号为交易者的决策提供信息并塑造他们的策略。长期以来，发现能够承受市场波动的高性能阿尔法一直是金融研究的焦点。

Alphas are broadly categorized into two groups: formulaic alphas and black-box alphas. Formulaic alphas, in the form of operators and operands, are widely adopted because of their straightforward mathematical expressions and ease of analysis. They encapsulate market dynamics into succinct formulas. A textbook example is 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛ℎ⁢𝑖⁢𝑔⁢ℎ−𝑙⁢𝑜⁢𝑤, a mean-reversion alpha. Conversely, black-box alphas leverage advanced machine learning algorithms, such as deep learning and tree-based models (Ke et al., 2017; Chen & Guestrin, 2016), to directly transform a group of inputs to a numerical signal. Despite their high expressivity and handy end-to-end nature, they come with their own set of challenges. The lifetimes can be notably shorter, and training these models demands careful tuning of hyper-parameters. A widely held opinion is that formulaic alphas, given their simplicity and transparency, exhibit resilience to market fluctuations and are often more enduring than the machine-learning counterparts. Our work focuses on the discovery of formulaic alphas.
Alpha 大致分为两组：公式化 Alpha 和黑盒 alpha。公式化 alpha 以运算符和操作数的形式被广泛采用，因为它们具有简单的数学表达式和易于分析。它们将市场动态封装成简洁的公式。一个教科书式的例子是 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛ℎ⁢𝑖⁢𝑔⁢ℎ−𝑙⁢𝑜⁢𝑤 ，均值回归 alpha。相反，黑盒 alpha 利用先进的机器学习算法，例如深度学习和基于树的模型（Ke 等人，2017 年;Chen&Guestrin，2016），直接将一组输入转换为数值信号。尽管它们具有很高的表现力和方便的端到端性质，但它们也面临着一系列挑战。生存期可能明显更短，训练这些模型需要仔细调整超参数。一个普遍持有的观点是，公式化阿尔法，鉴于其简单性和透明度，表现出对市场波动的弹性，并且通常比机器学习的对应物更持久。我们的工作重点是发现公式化阿尔法。

Traditionally, the discovery of formulaic alphas is often attributed to the intuition and insights of a trader, usually grounded in economic fundamentals. However, as modern computational capabilities advance, algorithmic techniques are also employed to find formulaic alphas (Yu et al., 2023; Cui et al., 2021; Zhang et al., 2020). These methods can identify alphas that satisfy specific criteria without the need for constant human oversight. Genetic Programming (GP) (Koza, 1994) based solutions, such as those detailed in (Zhang et al., 2020), have gained traction as popular tools for alpha discovery. These solutions maintain a population of expressions that undergo stochastic modifications, such as crossover and mutations. AlphaGen (Yu et al., 2023) pioneers the usage of Reinforcement Learning (RL) to discover alphas, which leverages an RL agent to sequentially produce alphas. While their results showcased the potential of RL in this domain, their adoption of the techniques can be improved. These existing works exhibit two weaknesses that hinder their practical use. First, they are not able to find formulaic alphas built from more primitive operators or deeper structures. This problem is even worse for GP-based methods because of their sensitivity to initial population distributions and high computational demands. Second, existing methods tend to use the performance of alpha as the only evaluation metric, producing alphas with high information correlation (IC) and low interpretability.
传统上，公式化阿尔法的发现通常归因于交易者的直觉和洞察力，通常基于经济基本面。然而，随着现代计算能力的进步，算法技术也被用于寻找公式化 alpha（Yu 等人，2023 年;Cui 等人，2021 年;Zhang 等人，2020 年）。这些方法可以识别满足特定标准的 alpha，而无需持续的人工监督。基于遗传编程（GP）（Koza， 1994）的解决方案，例如（Zhang et al.， 2020）中详述的解决方案，作为 alpha 发现的流行工具越来越受欢迎。这些解决方案维护了一组经历随机修改（例如交叉和突变）的表达式。AlphaGen （Yu et al.， 2023）率先使用强化学习（RL）来发现 alpha，它利用 RL 代理按顺序生成 alpha。虽然他们的研究结果展示了RL在该领域的潜力，但他们对这些技术的采用还有待改进。这些现有作品表现出两个阻碍其实际使用的弱点。首先，他们无法找到由更原始的运算符或更深层次的结构构建的公式化 alpha。对于基于GP的方法来说，这个问题甚至更糟，因为它们对初始种群分布和高计算需求很敏感。其次，现有方法倾向于将 alpha 的性能作为唯一的评估指标，从而产生具有高信息相关性（IC）和低可解释性的 alpha。

From the view of practical strategies for real-market data, alphas should satisfy two properties. First, as outlined in (Tulchinsky, 2019), diversity among alphas plays a important role in constructing robust trading strategies. This diversity helps mitigate the risk of overfitting, ensuring that strategies remain resilient when facing market volatility. Second, alphas should be logically sound according to certain rules, such as dimension consistency. For example, performing an addition between the open price and volume should be avoided, since they are not of the same dimension. GP-based methods directly modify the structure of expressions, while AlphaGen constructs an expression in the form of Reverse Polish Notation, token by token. Both methods can only perform dimensional analysis after an alpha is fully constructed. Being unable to prune the search space in advance, a lot of computational efforts are wasted.
从真实市场数据的实用策略来看，阿尔法应该满足两个属性。首先，如（Tulchinsky，2019）所述，阿尔法之间的多样性在构建稳健的交易策略中起着重要作用。这种多样性有助于降低过度拟合的风险，确保策略在面对市场波动时保持弹性。其次，根据某些规则，alpha 应该在逻辑上是合理的，例如维度一致性。例如，应避免在开盘价和交易量之间执行加法，因为它们的维度不同。基于 GP 的方法直接修改表达式的结构，而 AlphaGen 则以反向波兰符号的形式逐个标记构造表达式。这两种方法都只能在完全构建 alpha 后执行量纲分析。由于无法提前修剪搜索空间，因此浪费了大量计算工作。

One key challenge of alpha discovery lies in its large search space. To illustrate, consider a task involving 40 binary operators and 20 operands. For an alpha constituted of up to 15 operators, the search space swells to an overwhelming size of approximately 1063. Performing brute-force search on this space is impractical. In the AlphaGo class of algorithms (Mankowitz et al., 2023; Silver et al., 2017; 2016), RL-guided Monte Carlo Tree Search (MCTS) has demonstrated strong ability in finding solutions in large search spaces, such as Go, Chess, Shogi, Starcraft, and assembly programs.
Alpha Discovery 的一个关键挑战在于其庞大的搜索空间。为了说明这一点，考虑一个涉及 40 个二进制运算符和 20 个操作数的任务。对于最多由 15 个运算符组成的 alpha，搜索空间膨胀到大约 1063 .在此空间上执行暴力搜索是不切实际的。在 AlphaGo 类算法中（Mankowitz 等人，2023 年;Silver 等人，2017 年;2016），RL引导的蒙特卡洛树搜索（MCTS）在大型搜索空间（如围棋、国际象棋、将棋、星际争霸和汇编程序）中寻找解决方案方面表现出了强大的能力。

To address the challenges observed in previously discussed frameworks and to discover alphas for practical use, we present a novel alpha discovery approach that combines RL with MCTS to generate alphas that are logical and less correlated. Drawing inspiration from AlphaDev (Mankowitz et al., 2023), we conceptualize an alpha as a program, akin to an assembly program, assembled incrementally. Such programs can be seamlessly translated into expression trees to be calculated. Meanwhile, such construction of an alpha can easily prune the search space in advance according to predefined rules. We then encapsulate this generation process within an environment. Subsequently, by leveraging a refined value estimation and policy guidance from DRL, we efficiently focus the search for diverse, robust, and high-performance alphas. Empirical studies validate the efficacy of our framework, confirming that alphas searched via our method surpass those discovered through traditional methods, in terms of its performance, correlation, and validity.
为了解决在前面讨论的框架中观察到的挑战并发现实际使用的 alpha，我们提出了一种新的 alpha 发现方法，将 RL 与 MCTS 相结合，以生成合乎逻辑且相关性较低的 alpha。从 AlphaDev （Mankowitz et al.， 2023）中汲取灵感，我们将 alpha 概念化为一个程序，类似于一个汇编程序，以增量方式组装。这样的程序可以无缝地转换为要计算的表达式树。同时，这种 alpha 的构造可以很容易地根据预定义的规则提前修剪搜索空间。然后，我们将此生成过程封装在一个环境中。随后，通过利用 DRL 的精细价值估算和政策指导，我们有效地专注于寻找多样化、稳健和高性能的 alpha。实证研究验证了我们框架的有效性，证实通过我们的方法搜索的 alpha 在性能、相关性和有效性方面超过了通过传统方法发现的 alpha。

The primary contributions of our work are:
我们工作的主要贡献是：

•

We reconceptualize the task of generating formulaic alphas as a program generation process. The assembly of the alpha program enjoys the benefits of pruning the search space to a large extent.

• 我们将生成公式化 alpha 的任务重新概念化为程序生成过程。alpha 程序的组装在很大程度上享受了修剪搜索空间的好处。
•

We present a novel search algorithm for formulaic alpha generation, utilizing the strength of DRL.

• 我们提出了一种利用DRL优势的公式化alpha生成的新搜索算法。
•

Our experimental results validate the efficacy of our approach. We achieve a substantial reduction in search space and demonstrate the capability to discover logical, diverse, and effective alphas.

• 我们的实验结果验证了我们方法的有效性。我们大幅减少了搜索空间，并展示了发现合乎逻辑、多样化和有效的 alpha 的能力。

2Related Works 2相关著作

Symbolic Regression: Symbolic Regression (SR) is a machine learning technique aiming to discover mathematical expressions to fit a dataset. The search for alphas can be seen as a way for SR to predict the market’s return, which is highly correlated with our problem setting. However, the data in the financial market typically has a low signal-to-noise ratio, making accurate predictions from expressions of operators and operands impossible. Techniques like genetic programming, Monte Carlo Tree Search, and neural networks have been applied to symbolic regression. Mundhenk et al. (2021) introduce a hybrid neural-guided GP approach, utilizing the power of RL to seed the GP population. Sahoo et al. (2018) use a shallow neural network structured by symbolic operators to identify underlying equations and extrapolate to unseen domains. Kamienny et al. (2023) propose a MCTS-based method, using a context-aware neural mutation model to find expressions.
符号回归：符号回归（SR）是一种机器学习技术，旨在发现适合数据集的数学表达式。对阿尔法的搜索可以看作是SR预测市场回报的一种方式，这与我们的问题设置高度相关。然而，金融市场中的数据通常具有较低的信噪比，因此无法从运算符和操作数的表达式进行准确预测。遗传编程、蒙特卡洛树搜索和神经网络等技术已应用于符号回归。Mundhenk 等人（2021 年）引入了一种混合神经引导的 GP 方法，利用 RL 的力量来播种 GP 群体。Sahoo et al. （ 2018）使用由符号运算符构建的浅层神经网络来识别基础方程并推断到看不见的领域。Kamienny et al. （ 2023）提出了一种基于 MCTS 的方法，使用上下文感知神经突变模型来查找表达。

Auto Generation of Formulaic Alphas: Formulaic alphas provide interpretable trading signals based on mathematical expressions involving market features. Automated discovery of alphas has gained traction in quantitative trading. Genetic Programming has been widely applied to find trading strategies by evolving mathematical expressions. AutoAlpha (Zhang et al., 2020) uses Principal Component Analysis to navigate the search path from existing alphas. AlphaEvolve (Cui et al., 2021) utilizes AutoML techniques to a new class of alphas that are different from previous classes of formulas and machine learning models. Yu et al. (2023) first propose to use RL to generate formulaic alphas in the form of Reverse Polish Notation, and introduce a framework to automatically maintain the best group of alphas. Although the result of AlphaGen shows great improvement over previous methods, their framework of the RL task can be further improved. Due to the sparse nature of alpha search space, the Markov Decision Process defined in AlphaGen leads to highly volatile value estimations, and the policy generates similar alpha expressions.
自动生成公式化阿尔法：公式化阿尔法基于涉及市场特征的数学表达式提供可解释的交易信号。阿尔法的自动发现在量化交易中越来越受欢迎。遗传编程已被广泛应用于通过不断发展的数学表达式来寻找交易策略。AutoAlpha （Zhang et al.， 2020）使用主成分分析从现有 alpha 导航搜索路径。AlphaEvolve （Cui et al.， 2021）将 AutoML 技术用于一类新的 alpha，这与以前的公式和机器学习模型不同。Yu et al. （ 2023）首先提出使用 RL 以反向波兰符号的形式生成公式化 alpha，并引入一个框架来自动维护最佳 alpha 组。尽管 AlphaGen 的结果比以前的方法有了很大的改进，但他们的 RL 任务框架可以进一步改进。由于 alpha 搜索空间的稀疏性质，AlphaGen 中定义的马尔可夫决策过程会导致高度不稳定的值估计，并且策略生成类似的 alpha 表达式。

3Problem Formulation 3问题表述

3.1Definition of Alpha 3.1Alpha 的定义

In this study, we focus on finding a day-frequency trading strategy in a stock market consisting of 𝑛 distinct stocks spanning 𝑇 trading days. For a stock dataset consists of 𝐷 trading days, on every trading day 𝑑∈{1,2,…,𝐷}, each stock 𝑖 is represented by a feature vector 𝑥𝑑,𝑖∈ℝ𝑚⁢𝜏. This vector encapsulates 𝑚 raw features, including open, close, high prices, etc, in the past 𝜏 days. 𝜏 is decided according to the expression of the alpha and availability of data. An alpha is a function 𝜁 that transforms the features of a stock into a value 𝑧𝑑,𝑖=𝜁⁢(𝑥𝑑,𝑖)∈ℝ. These values of alphas are subsequently utilized in a combination model to form the trading signals. In the rest part of the paper, we omit the date index 𝑖 for brevity and operate on the 𝐷-day stock dataset.
在这项研究中，我们专注于在由跨 𝑇 交易日 𝑛 的不同股票组成的股票市场中寻找日间频率交易策略。对于由 𝐷 交易日组成的股票数据集，在每个交易日 𝑑∈{1,2,…,𝐷} ，每只股票 𝑖 都由一个特征向量表示 𝑥𝑑,𝑖∈ℝ𝑚⁢𝜏 。该向量封装了过去 𝜏 几天 𝑚 的原始特征，包括开盘价、收盘价、最高价等。 𝜏 根据 alpha 的表达式和数据的可用性来决定。alpha 是将股票特征转换为值 𝑧𝑑,𝑖=𝜁⁢(𝑥𝑑,𝑖)∈ℝ 的函数 𝜁 。这些 alpha 值随后在组合模型中使用以形成交易信号。在本文的其余部分，为了简洁起见，我们省略了日期索引 𝑖 ，并在 𝐷 -day 股票数据集上进行操作。

3.2Evaluation Metrics for an Alpha
3.2Alpha 的评估指标

To assess an alpha’s efficacy, the primary metric used is the Information Correlation (IC), which is computed as the average Pearson Correlation Coefficient between the alpha value 𝑧 and market return 𝜇 in 𝐷 days. It is mathematically expressed as:
为了评估阿尔法的功效，使用的主要指标是信息相关性（IC），它计算为 alpha 值 𝑧 与市场回报 𝜇 之间的平均皮尔逊相关系数（以天为 𝐷 单位）。它在数学上表示为：

IC⁢(𝑧,𝜇)=∑𝑑=1𝐷Cov⁢(𝑧𝑑,𝜇𝑑)𝜎𝑧𝑑⁢𝜎𝜇𝑑𝐷,

(1)

where Cov computes the covariance and 𝜎 computes the standard deviation.
其中 Cov 计算协方差并 𝜎 计算标准差。

Additional metrics used to further evaluate alphas include Rank IC, Max Draw Down (MDD), turnover (TVR), and Sharpe Ratio. While these metrics are not the primary targets of optimization within our framework, they hold substantial importance in practical trading scenarios and provide a comprehensive understanding of alpha performance, and our framework can be easily customized to these evaluation metrics.
用于进一步评估 alpha 的其他指标包括 Rank IC、Max Drawdown （MDD）、周转率（TVR）和夏普比率。虽然这些指标不是我们框架中优化的主要目标，但它们在实际交易场景中具有重要意义，并提供了对阿尔法表现的全面理解，我们的框架可以很容易地根据这些评估指标进行定制。

4Methodology 4方法论

4.1Alpha Discovery as Program Generation
4.1Alpha 发现作为程序生成

Formulaic alphas are structured compositions of operators and operands. Drawing inspiration from AlphaDev’s approach to algorithm generation, we reconceptualize the task of alpha discovery as constructing an “alpha program”.
公式化 alpha 是运算符和操作数的结构化组合。从 AlphaDev 的算法生成方法中汲取灵感，我们将 alpha 发现的任务重新概念化为构建“alpha 程序”。

4.1.1Operators, Operands and Instructions
4.1.1运算符、操作数和指令

Category 类别	Examples 例子
Unary 元	Abs, Ln, Sign, … 腹肌、ln、标志......
Binary 二元的	Add, Sub, Mul, TS-Mean, … 加法、子、穆尔、TS-均值、...
Ternary 三重的	Correlation, Covariance, … 相关性、协方差......
Indicator 指示器	Start, End 开始，结束

(a)Operators （一）运营商

Category 类别	Examples 例子
Scalar 标量	0, 0.1, 0.5, 1, 3, 5, 15, …
Matrix 矩阵	open, close, high, low, vwap, … 开盘、收盘、高、低、VWAP、...
Register 注册	Reg0, Reg1, … Reg0、Reg1、...
Placeholder 占位符	Null 零

(b)Operands （二）操作

Table 1:Operators and operands
表 1：运算符和操作数

An alpha program is built from a series of instructions, where each instruction is characterized as a 4-element tuple (𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑡⁢𝑜⁢𝑟,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢1,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢2,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢3). Operators are grouped into unary, binary, ternary, and indicator types based on the type and number of operands they engage. The indicator operators mark the start and end of an alpha program. Operand types include scalar values, matrix data, register storage, and a placeholder. Scalar operands are parameters for operators. Matrix operands are input features of the market, such as 𝑜⁢𝑝⁢𝑒⁢𝑛 and 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒. Registers are used for storing intermediate results. The placeholder operand, 𝑁⁢𝑢⁢𝑙⁢𝑙, is used to align the instructions to a 4-element tuple. Examples of operators and operands are provided in Tab. 1.
alpha 程序是由一系列指令构建的，其中每条指令都表征为一个 4 元素元组 (𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑡⁢𝑜⁢𝑟,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢1,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢2,𝑂⁢𝑝⁢𝑒⁢𝑟⁢𝑎⁢𝑛⁢𝑑⁢3) 。运算符根据它们使用的操作数的类型和数量分为一元、二进制、三元和指示符类型。指标运算符标记 alpha 程序的开始和结束。操作数类型包括标量值、矩阵数据、寄存器存储和占位符。标量操作数是运算符的参数。矩阵操作数是市场的输入特征，例如 𝑜⁢𝑝⁢𝑒⁢𝑛 和 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒 。寄存器用于存储中间结果。占位符操作数 𝑁⁢𝑢⁢𝑙⁢𝑙 用于将指令与 4 元素元组对齐。表 1 中提供了运算符和操作数的示例。

4.1.2Translating an alpha program into a computation tree
4.1.2将 alpha 程序转换为计算树

Operator 算子	Operand1 操作数1	Operand2 操作数2	Operand3 操作数3	Register 注册
Start 开始	Null 零	Null 零	Null 零
Sub 子	close 关闭	open 打开	Null 零	Reg0 注册0
Sub 子	high 高	low 低	Null 零	Reg1 注册1
Div	Reg0 注册0	Reg1 注册1	Null 零	Reg0 注册0
End 结束	Null 零	Null 零	Null 零

Table 2:An example alpha program of 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛ℎ⁢𝑖⁢𝑔⁢ℎ−𝑙⁢𝑜⁢𝑤
表 2：一个 alpha 程序示例 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛ℎ⁢𝑖⁢𝑔⁢ℎ−𝑙⁢𝑜⁢𝑤

[Uncaptioned image]

\captionof \caption的

figureExpression tree 图表达式树

To actually compute an alpha program, we need to convert the program into formats that the computer can understand. A computational tree is built from alpha instructions in a bottom-up way. Tab. 2 and Fig. 2 provide an example of this transformation. Programs begin with the instruction tuple (𝑆⁢𝑡⁢𝑎⁢𝑟⁢𝑡,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙). Then, instructions are translated one by one to build an expression tree. The color coding in the table corresponds to the colored nodes of the expression tree. Each colored node in the tree is the result of the execution of its matching colored instruction in the alpha program. The instructions marked by blue and green fill two registers. Note that the register assignment is implicit, if an instruction doesn’t utilize registers, its output is stored in the first available register. Then, the instruction marked by orange performs a division between the values in the two registers. For an instruction employing a single register, the output replaces the current value. When an instruction involves two registers, the computed result replaces the Reg0 value while Reg1 is emptied. The (𝐸⁢𝑛⁢𝑑,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙) instruction marks the termination of an alpha program. Evaluating an alpha program involves reading values from the Reg0 register either during program construction or after its completion. This implicit approach to register management has been proven effective in our experiments.
为了实际计算 alpha 程序，我们需要将程序转换为计算机可以理解的格式。计算树是以自下而上的方式从 alpha 指令构建的。表 2 和图 2 提供了这种转变的示例。程序以指令元组开头 (𝑆⁢𝑡⁢𝑎⁢𝑟⁢𝑡,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙) 。然后，逐个翻译指令以构建表达式树。表中的颜色编码对应于表达式树的彩色节点。树中的每个彩色节点都是在 alpha 程序中执行其匹配的彩色指令的结果。标有蓝色和绿色的指令填充两个寄存器。请注意，寄存器分配是隐式的，如果指令不使用寄存器，则其输出存储在第一个可用寄存器中。然后，标有橙色的指令在两个寄存器中的值之间执行除法。对于使用单个寄存器的指令，输出将替换当前值。当指令涉及两个寄存器时，计算结果将替换 Reg0 值，而 Reg1 将被清空。该 (𝐸⁢𝑛⁢𝑑,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙,𝑁⁢𝑢⁢𝑙⁢𝑙) 指令标志着 alpha 程序的终止。评估 alpha 程序涉及在程序构建期间或完成后从 Reg0 寄存器中读取值。在我们的实验中，这种隐式寄存器管理方法已被证明是有效的。

4.1.3The MDP for Reinforcement Learning Task
4.1.3强化学习任务的MDP

Given the established operators, operands, and instructions, we can construct a task suitable for RL. This task is defined as a Markov decision process (MDP), denoted as (𝒮,𝒜,𝑝,𝑟,𝛾,𝜌0), where 𝒮 is the state space, 𝒜 is the action space, 𝑝(⋅|𝑠,𝑎) is the transition probability, 𝑟⁢(𝑠,𝑎)∈[0,𝑅max] is the reward function, 𝛾∈(0,1) is the discount factor, and 𝜌0⁢(𝑠) is the initial state distribution.
给定已建立的运算符、操作数和指令，我们可以构造一个适合 RL 的任务。该任务定义为马尔可夫决策过程（MDP），表示为 (𝒮,𝒜,𝑝,𝑟,𝛾,𝜌0) ，其中 𝒮 是状态空间， 𝒜 是动作空间， 𝑝(⋅|𝑠,𝑎) 是转移概率， 𝑟⁢(𝑠,𝑎)∈[0,𝑅max] 是奖励函数， 𝛾∈(0,1) 是贴现因子， 𝜌0⁢(𝑠) 是初始状态分布。

In our alpha program environment, the state space, 𝒮, contains all potential alpha programs. Each state 𝑠 corresponds to a unique alpha function 𝜁. A state 𝑠 is a vectorized representation of the alpha function 𝜁. The action space, 𝒜, is the set of all possible instructions. The transition probability, 𝑝(⋅|𝑠,𝑎), is deterministic, which takes the values 1 for the next alpha program build after applying an action. The reward, denoted as 𝑟⁢(𝑠𝑡,𝑎𝑡,𝑠𝑡+1), is determined by the increase in the evaluation metric when arriving at 𝑠𝑡+1 after applying action 𝑎𝑡. The evaluation metric function is denoted as Perf⁢(𝜁), which takes the alpha expression of the state as input. Since our transition is deterministic, the reward is computed as 𝑟⁢(𝑠𝑡,𝑎𝑡,𝑠𝑡+1)=Perf⁢(𝜁𝑡+1)−Perf⁢(𝜁𝑡). The definition of the evaluation metric is primarily IC, but we have refined it, which we will detail later. The discount factor, 𝛾, is a hyper-parameter to control the length of the alpha program. The initial state distribution, 𝜌0⁢(𝑠), invariably starts from an empty program, i.e., its value is 1 for an empty program, and 0 otherwise.
在我们的 alpha 程序环境中，状态空间 𝒮 ，包含所有潜在的 alpha 程序。每个状态 𝑠 对应一个唯一的 alpha 函数 𝜁 。状态 𝑠 是 alpha 函数 𝜁 的向量化表示。操作空间 𝒜 是所有可能指令的集合。转换概率 𝑝(⋅|𝑠,𝑎) 是确定性的，它采用应用操作后下一个 alpha 程序构建的值 1。奖励，表示为 𝑟⁢(𝑠𝑡,𝑎𝑡,𝑠𝑡+1) ，由应用操作 𝑠𝑡+1 𝑎𝑡 后到达时评估指标的增加决定。评估度量函数表示为 Perf⁢(𝜁) ，它以状态的 alpha 表达式作为输入。由于我们的转换是确定性的，因此奖励的计算公式 𝑟⁢(𝑠𝑡,𝑎𝑡,𝑠𝑡+1)=Perf⁢(𝜁𝑡+1)−Perf⁢(𝜁𝑡) 为。评估指标的定义主要是 IC，但我们已经对其进行了改进，我们将在后面详细介绍。折扣因子 𝛾 是用于控制 alpha 程序长度的超参数。初始状态分布 𝜌0⁢(𝑠) ，总是从一个空程序开始，即，对于空程序，它的值是 1，否则是 0。

4.2Discovering Alphas using RL
4.2使用 RL 发现 Alpha

Alpha2 uses a DRL agent to explore the alpha program generation task. The RL algorithm of Alpha2 is similar to that of AlphaDev (Mankowitz et al., 2023), which is a modification of the AlphaZero agent (Silver et al., 2016). DRL guides a MCTS procedure using a deep neural network. The deep neural network takes the current state 𝑠𝑡, which is a vectorized representation of the alpha program 𝜁𝑡, as input, and outputs action distributions and value predictions. The action distribution predicts the prior probability that an agent should take for each action, and the value predictions predict the cumulative reward that the agent should expect from the current state 𝑠𝑡. Since the Alpha series of works has detailed the MCTS process, we do not elaborate on it in this paper. The next paragraphs focus on key improvements that make the search algorithm better for discovering formulaic alphas.
Alpha 2 使用 DRL 代理来探索 alpha 程序生成任务。Alpha 2 的 RL 算法类似于 AlphaDev 的算法（Mankowitz 等人，2023 年），后者是对 AlphaZero 代理（Silver 等人，2016 年）的修改。DRL 使用深度神经网络指导 MCTS 过程。深度神经网络将当前状态 𝑠𝑡 （即 alpha 程序 𝜁𝑡 的矢量化表示）作为输入，并输出动作分布和值预测。操作分布预测代理应为每个操作采取的先验概率，值预测预测代理应从当前状态 𝑠𝑡 中获得的累积奖励。由于 Alpha 系列著作已经详细介绍了 MCTS 过程，因此本文不再对此进行详细说明。接下来的段落将重点介绍使搜索算法更好地发现公式化 alpha 的关键改进。

4.3Discovering robust, diverse and logical alphas
4.3发现鲁棒、多样和合乎逻辑的 alpha

4.3.1Discovering robust alphas
4.3.1发现鲁棒的 alpha

Our approach to estimating the value of child nodes introduces a nuanced deviation from conventional methodologies. In traditional MCTS, the mean operator is often used to calculate the values of child nodes. Our empirical findings indicate that this operator falls short during the initial phases of the algorithm, a phenomenon we attribute to the inherent sparsity of formulaic alphas. In the early tree search stage, most alphas yield non-informative signals, leading to arbitrary policy directions. This scenario calls for the adoption of a max operator, which is more adept at navigating the sparse landscape of formulaic alphas. However, simply using the max operator can lead to the discovery of parameter-sensitive alphas, which is not desired. Supporting our observation, Dam et al. (2019) state that using the mean operator leads to an underestimation of the optimal value, slowing down the learning, while the maximum operator leads to overestimation. They propose a power mean operator that computes the value between the average value and the maximum one. In our work, we take a simpler form, and leave the balance between the maximum operator and the mean operator controlled by a hyperparameter. The value estimation for a child node is formulated as
我们估计子节点价值的方法引入了与传统方法的细微差别。在传统的 MCTS 中，通常使用平均算子来计算子节点的值。我们的实证结果表明，该算子在算法的初始阶段存在不足，我们将这种现象归因于公式化 alpha 固有的稀疏性。在早期的树搜索阶段，大多数 alpha 会产生非信息性信号，从而导致任意的政策方向。此方案要求采用 max 运算符，该运算符更擅长在公式化 alpha 的稀疏环境中导航。但是，简单地使用 max 运算符可能会导致发现参数敏感的 alpha，这是不希望的。Dam et al. （ 2019）支持我们的观察结果，指出使用平均算子会导致低估最优值，从而减慢学习速度，而最大算子会导致高估。他们提出了一个幂均值算子，用于计算平均值和最大值之间的值。在我们的工作中，我们采用一种更简单的形式，并在最大算子和平均算子之间保持平衡，由超参数控制。子节点的值估计公式为

𝑄⁢(𝑠,𝑎)=𝑟⁢(𝑠,𝑎)+𝛽⋅mean⁢(𝑉𝑠)+(1−𝛽)⁢max⁡(𝑉𝑠),

where 𝛽∈[0,1] is a hyperparameter controlling the balance between mean and max, 𝑉𝑠 is the value backup of a node on state 𝑠. Also, to further increase the validity of value estimation, especially for the mean operator, the value backup is calculated from the top-k values added to the node. That is, 𝑉𝑠={𝑣1,…,𝑣𝑘}, where the 𝑘 values are stored in a min heap, so that the computation complexity is 𝑂⁢(log⁡𝑘) for each new value. The RL agent, in the simulation phase, operates based on maximizing this 𝑄-value. This refined definition of 𝑄-value is expected to help discover alphas that are both effective and robust to parameters.
其中 𝛽∈[0,1] 是控制 mean 和 max 之间平衡的超参数， 𝑉𝑠 是节点 on 状态 𝑠 的值 backup。此外，为了进一步提高值估计的有效性，特别是对于平均算子，根据添加到节点的前 k 个值计算值备份。也就是说， 𝑉𝑠={𝑣1,…,𝑣𝑘} 其中 𝑘 值存储在最小堆中，因此计算复杂度适用于 𝑂⁢(log⁡𝑘) 每个新值。在模拟阶段，RL 代理基于最大化此 𝑄 值进行操作。这种对 -value 的 𝑄 精细定义有望帮助发现对参数既有效又鲁棒的 alpha。

4.3.2Discovering diverse alphas
4.3.2发现不同的 alpha

As outlined in (Tulchinsky, 2019), diversity among alphas helps buliding robust trading strategies. In our framework, we incorporate such a target within the evaluation function. The diversity can be quantified by computing the correlation between alphas. For an alpha function 𝜁𝑡 to be evaluated, we first compute its alpha value 𝑧𝑡 on all stocks and trading days. Then, for an already discovered set of alpha values 𝐺={𝑧1,𝑧2,…,𝑧𝑛}, where 𝑛 is the number of alphas. We compute the max correlation with the current alpha value, i.e., MaxCorr⁢(𝑧𝑡,𝐺)=max𝑖⁡IC⁢(𝑧𝑡,𝑧𝑖). The evaluation metric is discounted according to the max Pearson Correlation Coefficient between the alpha value and the mined alpha set:
如（Tulchinsky，2019）所述，阿尔法之间的多样性有助于建立稳健的交易策略。在我们的框架中，我们将这样的目标纳入评估职能。多样性可以通过计算 alpha 之间的相关性来量化。对于要评估的 alpha 函数 𝜁𝑡 ，我们首先计算其在所有股票和交易日的 alpha 值 𝑧𝑡 。然后，对于一组已经发现的 alpha 值 𝐺={𝑧1,𝑧2,…,𝑧𝑛} ，其中 𝑛 是 alpha 的数量。我们计算与当前 alpha 值的最大相关性，即 MaxCorr⁢(𝑧𝑡,𝐺)=max𝑖⁡IC⁢(𝑧𝑡,𝑧𝑖) .评估指标根据 alpha 值和挖掘的 alpha 集之间的最大 Pearson 相关系数进行折扣：

Perf⁢(𝜁𝑡)=(1−MaxCorr⁢(𝑧𝑡,𝐺))⋅IC⁢(𝑧𝑡,𝜇),

This evaluation metric function encourages the discovery of low-correlation alphas by assigning higher value to alphas with low correlation with the mined alpha set, and discourages the discovery of highly correlated alphas by reducing the value of alphas with high correlation. In this way, Alpha2 can continuously discover diverse alphas.
此评估度量函数通过为与挖掘的 alpha 集相关性较低的 alpha 分配更高的值来鼓励发现低相关性 alpha，并通过降低具有高相关性的 alpha 值来阻止发现高度相关的 alpha。通过这种方式，Alpha 2 可以不断发现不同的 alpha。

4.3.3Ensuring the Dimensional Consistency of Alphas
4.3.3确保 Alpha 的量纲一致性

Refer to caption

Figure 1:Example of pruning the search space according to dimensional consistency. The left part shows an ongoing search process of MCTS. The right part shows the expression trees of the corresponding alpha expressions. For an alpha expression 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛, which is of dimension 𝑐⁢𝑢⁢𝑟⁢𝑟⁢𝑒⁢𝑛⁢𝑐⁢𝑦, consider adding ℎ⁢𝑖⁢𝑔⁢ℎ or 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒 to the expression. Adding ℎ⁢𝑖⁢𝑔⁢ℎ is allowed since it is of the same dimension 𝑐⁢𝑢⁢𝑟⁢𝑟⁢𝑒⁢𝑛⁢𝑐⁢𝑦, while adding 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒 is forbidden because it is of another dimension. The nodes marked in green illustrates the addition of ℎ⁢𝑖⁢𝑔⁢ℎ, and the nodes marked in red illustrates the addition of 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒.
图 1：根据维度一致性修剪搜索空间的示例。左侧部分显示了 MCTS 的持续搜索过程。右侧部分显示了相应 alpha 表达式的表达式树。对于维度的 alpha 表达 𝑐⁢𝑙⁢𝑜⁢𝑠⁢𝑒−𝑜⁢𝑝⁢𝑒⁢𝑛 式 𝑐⁢𝑢⁢𝑟⁢𝑟⁢𝑒⁢𝑛⁢𝑐⁢𝑦 ，请考虑在表达式中添加 ℎ⁢𝑖⁢𝑔⁢ℎ 或 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒 。允许添加 ℎ⁢𝑖⁢𝑔⁢ℎ ，因为它具有相同的维度 𝑐⁢𝑢⁢𝑟⁢𝑟⁢𝑒⁢𝑛⁢𝑐⁢𝑦 ，而禁止添加 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒 ，因为它是另一个维度。绿色标记的节点表示 ℎ⁢𝑖⁢𝑔⁢ℎ 的加法，红色标记的节点表示的 𝑣⁢𝑜⁢𝑙⁢𝑢⁢𝑚⁢𝑒 加法。

Feature 特征	Dimension 尺寸
open 打开	currency 货币
close 关闭	currency 货币
high 高	currency 货币
low 低	currency 货币
vwap VWAP的	currency 货币
volume 卷	unit 单位

Table 3:Dimension of features
表 3：要素维度

In real-world applications, especially in financial contexts, meaningful interactions between features are vital. Combining disparate features can lead to spurious relationships, making trading strategies derived from them unreliable. In the SR field, the dimension of individual input features is generally overlooked. For SR approaches in deep learning, normalization of features typically occurs during the pre-processing phase. Yet, for alpha discovery, where data is extended over both time and the count of assets, integrating normalization within the search process is preferable. This is from the fact that normalization inherently alters the data’s semantics. While AlphaGen and GP-based methods have produced alphas with impressive statistical metrics, these often lack an underlying logical rationale. A defining quality of a coherent alpha expression is the dimensional consistency among its operators and operands. For instance, summing up variables like price and trade volume is fundamentally wrong due to their divergent distributions and different dimensions. Tab. 3 lists the dimensions of the basic input features. Note that the dimension changes as the expression gets complicated. Our approach innovates by imposing rules that constrict the search space right from the node expansion stage, before actually evaluating the alpha, which is a feature not achievable in preceding methods. We maintain a record of an expression’s dimension within each register, allowing for a preemptive filtration of nodes based on the dimensional requisites specified by the operators when expanding an MCTS tree. Our method to construct an alpha ensures that once a segment of the alpha’s expression tree passes the dimensional check, it does not need to be reassessed. An example of the dimension system is illustrated in Fig. 1.
在实际应用中，尤其是在金融环境中，功能之间有意义的交互至关重要。将不同的特征组合在一起会导致虚假的关系，使从中得出的交易策略变得不可靠。在 SR 字段中，通常会忽略单个输入要素的维度。对于深度学习中的 SR 方法，特征的归一化通常发生在预处理阶段。然而，对于 alpha 发现，其中数据会随着时间和资产数量而扩展，在搜索过程中集成规范化是可取的。这是因为规范化本质上会改变数据的语义。虽然 AlphaGen 和基于 GP 的方法已经产生了具有令人印象深刻的统计指标的 alpha，但这些通常缺乏潜在的逻辑原理。相干 alpha 表达式的一个定义性质是其运算符和操作数之间的维度一致性。例如，由于价格和交易量等变量的分布不同和维度不同，因此对价格和交易量等变量进行总结从根本上是错误的。表 3 列出了基本输入要素的尺寸。请注意，维度会随着表达式的复杂而变化。我们的方法通过强加规则来创新，这些规则从节点扩展阶段开始就限制搜索空间，然后再实际评估 alpha，这是以前的方法无法实现的功能。我们在每个寄存器中维护表达式维度的记录，允许在扩展 MCTS 树时根据运算符指定的维度要求对节点进行抢占式过滤。我们构建 alpha 的方法确保一旦 alpha 表达式树的片段通过了维度检查，就不需要重新评估它。尺寸系统的示例如图1所示。

Traditional methods, including AlphaGen and GP-based paradigms, due to their structural limitations, are unable to prune the search space in advance. AlphaGen incrementally constructs alphas, token by token. Coupled with its use of Reverse Polish Notation for expressions, pinpointing a token’s exact location within the final expression tree is ambiguous, which does not support the search space pruning. GP-based methods, with their mutation operations, remain unaware of the overall structure of expression trees, and can even perform cyclic modifications. Thus, performing dimension check is not achievable before the alpha is generated. Employing these dimension restrictions results in a large reduction in nodes at every level compared to the unrestrained counterpart. By reducing the complexity and potential combinations in the search space, we can focus on discovering alphas that are logical. This can also reduce the chances of overfitting, which is a significant concern in quantitative finance.
传统方法，包括基于AlphaGen和GP的范式，由于其结构局限性，无法提前修剪搜索空间。AlphaGen 逐个代币增量构建 alpha。再加上它对表达式使用反向波兰表示法，在最终表达式树中精确定位标记的确切位置是模棱两可的，这不支持搜索空间修剪。基于GP的方法及其突变操作仍然不知道表达树的整体结构，甚至可以进行循环修饰。因此，在生成 alpha 之前无法执行维度检查。与不受约束的对应物相比，采用这些维度限制会导致每个级别的节点大幅减少。通过降低搜索空间的复杂性和潜在组合，我们可以专注于发现合乎逻辑的 alpha。这也可以减少过度拟合的机会，这是量化金融中的一个重要问题。

4.4Pipeline to generate a trading strategy
4.4生成交易策略的管道

Refer to caption

Figure 2:The pipeline for the generation of strategy
图2：战略生成的管道

Our method focuses on generating alphas and does not provide an end-to-end solution. Alpha2 first produces alphas using RL-guided MCTS with the refined value estimation, performance evaluation, and dimension check. Then, a combination model takes the alphas as input and generates a trading strategy. The combination model can be customized to meet user demand, e.g., linear regression, deep neural networks, and gradient boosting trees. Fig. 2 shows the pipeline for a practical adoption of our method.
我们的方法侧重于生成 alpha，不提供端到端解决方案。Alpha 2 首先使用 RL 引导的 MCTS 生成 alpha，并进行精细的值估计、性能评估和维度检查。然后，组合模型将 alpha 作为输入并生成交易策略。组合模型可以定制以满足用户需求，例如线性回归、深度神经网络和梯度提升树。图 2 显示了实际采用我们方法的管道。

5Experiments 5实验

In the experiment section, we aim to demonstrate the efficacy of our method compared to existing approaches. Our primary focus is to answer the three questions:
在实验部分，我们旨在证明我们的方法与现有方法相比的有效性。我们的主要重点是回答三个问题：

•

Can Alpha2 generate diverse and good alphas?

• Alpha 2 能否生成多样化且良好的 alpha？
•

Can alphas mined by Alpha2 perform better than previous methods?

• Alpha 2 开采的 alpha 能否比以前的方法表现更好？
•

How do alphas mined by Alpha2 perform in the real-world market?

• Alpha 2 开采的 alpha 在现实世界中的表现如何？

5.1Experiment Setup 5.1实验设置

Data:. The data is acquired from the Chinese A-shares market through baostock1. Six raw features are selected to generate the alphas: {open, close, high, low, volume, vwap}. The target of our method is to find alphas that have high IC with the 20-day return of the stocks. The dataset is split into a training set (2009/01/01 to 2018/12/31), a validation set (2019/01/01 to 2020/12/31), and a test set (2021/01/01 to 2023/12/31). We use the constituents of the CSI300 and CSI500 indices of China A-shares as the stock set.
数据：。数据来源于中国A股市场通过宝存 1 获取。选择六个原始特征来生成 alpha：{打开、关闭、高、低、音量、vwap}。我们方法的目标是找到具有高 IC 的 alpha 股票 20 天回报率。数据集分为训练集（2009/01/01 至 2018/12/31）、验证集（2019/01/01 至 2020/12/31）和测试集（2021/01/01 至 2023/12/31）。我们以中国A股沪深300指数和沪深500指数的成分股为股票。

Baselines: Our method is compared to several machine learning models. MLP uses a fully connected neural network to process the input data to strategy signals. XGBoost and LightGBM are gradient boosting frameworks. AlphaGen and gplearn 2 are representative methods for generating a collection of alphas. We follow the open source implementations of AlphaGen 3 and Qlib 4 (Yang et al., 2020) to produce the results.
基线：将我们的方法与几种机器学习模型进行了比较。MLP 使用全连接的神经网络将输入数据处理为策略信号。XGBoost 和 LightGBM 是梯度提升框架。AlphaGen 和 gplearn 2 是生成 alpha 集合的代表性方法。我们遵循 AlphaGen 3 和 Qlib 4 的开源实现（Yang 等人，2020 年）来产生结果。

Alpha Combination: Our method, Alpha2, only considers the problem of generating alphas. We use the XGBoost as the combination model. The XGBoost model is trained to fit the alpha signals to the return signals on the training dataset, using the top 20 generated alphas ranked by IC. Then, the trained model is fixed and used to predict the test dataset.
Alpha 组合：我们的方法 Alpha 2 只考虑生成 alpha 的问题。我们使用 XGBoost 作为组合模型。XGBoost 模型经过训练，使用 IC 排名的前 20 个生成的 alpha 将 alpha 信号拟合到训练数据集上的返回信号。然后，固定训练后的模型，并用于预测测试数据集。

Evaluation Metric: Two metrics, IC and Rank IC, are used to measure the performance of the models. The definition of IC is given in Eq. 1. The rank information coefficient (Rank IC) measures the correlation between the ranks of alpha values and the ranks of future returns. It is defined as the Spearman’s Correlation Coefficient between the ranks of the alpha values and the future return, 𝜌⁢(𝑧𝑑,𝑟𝑑)=IC⁢(rk⁢(𝑧𝑑),rk⁢(𝑟𝑑)), where rk⁢(⋅) is the ranking operator.
评估指标：IC 和 Rank IC 两个指标用于衡量模型的性能。IC的定义如方程1所示。排名信息系数（Rank IC）衡量阿尔法值排名与未来回报排名之间的相关性。它被定义为 alpha 值的秩与未来回报之间的斯皮尔曼相关系数， 𝜌⁢(𝑧𝑑,𝑟𝑑)=IC⁢(rk⁢(𝑧𝑑),rk⁢(𝑟𝑑)) 其中 rk⁢(⋅) 是排名运算符。

Code: For an efficient and accelerated experimental process, our implementation is based on the pseudo-code that AlphaDev provides 5, with computational aspects handled by Jax. Experiments are run on a single machine with an Intel Core 13900K CPU and 2 Nvidia A5000 GPUs.
代码：为了实现高效和加速的实验过程，我们的实现基于 AlphaDev 提供的 5 伪代码，计算方面由 Jax 处理。实验在一台具有 Intel Core 13900K CPU 和 2 个 Nvidia A5000 GPU 的机器上运行。

5.2IC and Correlation of Generated Alphas
5.2IC和生成的α的相关性

Method 方法	IC	Correlation 相关
gplearn GP学习	0.0164±0.0167	0.7029±0.1824
AlphaGen 阿尔法Gen	0.0257±0.0153	0.3762±0.6755
Ours 我们	0.0407±0.0219	0.1376±0.3660

Table 4:Statistics of IC and correlations of mined alphas on CSI300.
表4：CSI300上IC的统计和开采的α的相关性。

For a robust strategy, the alphas are expected to be diverse, having low Pearson Correlation Coefficients with each other. To answer the first question, we compute the correlations between alphas generated by gplearn, AlphaGen, and Alpha2. The result is shown in Tab. 4.
对于一个稳健的策略，阿尔法应该是多样化的，彼此之间的皮尔逊相关系数较低。为了回答第一个问题，我们计算了 gplearn、AlphaGen 和 Alpha 2 生成的 alpha 之间的相关性。结果如表4所示。

From the table, we can see that Alpha2 generates the best set of alphas in terms of IC. Meanwhile, it generates the most diverse set of alphas, as measured by the mean correlation. It is worth noting that gplearn generates a set of alphas with high correlations. The high correlation results from the minor mutation of constants in the alpha expressions after it gets trapped in a local optima. With the more diverse and better alpha set, Alpha2 has greater potential to generate a more robust trading strategy.
从表中，我们可以看到 Alpha 2 在 IC 方面生成了最好的 alpha 集。同时，它生成了最多样化的 alpha 集，通过平均相关性来衡量。值得注意的是，gplearn 会生成一组具有高度相关性的 alpha。高相关性是由于 alpha 表达式中常数被困在局部最优值中的微小突变造成的。有了更多样化和更好的阿尔法集，阿尔法 2 有更大的潜力来产生更强大的交易策略。

5.3Performance of Generated Alphas
5.3生成的 Alpha 的性能

Method 方法	CSI300		CSI500
Method 方法	IC	Rank IC 秩IC	IC	Rank IC 秩IC
MLP	0.0123±0.0006	0.0178±0.0017	0.0158±0.0014	0.0211±0.0007
XGBoost	0.0192±0.0021	0.0241±0.0027	0.0173±0.0017	0.0217±0.0022
LightGBM 光GBM	0.0158±0.0012	0.0235±0.0030	0.0112±0.0012	0.0212±0.0020
gplearn GP学习	0.0445±0.0044	0.0673± 0.0058	0.0557±0.0117	0.0665±0.0154
AlphaGen 阿尔法Gen	0.0500±0.0021	0.0540±0.0035	0.0544±0.0011	0.0722±0.0017
Ours 我们	0.0576±0.0022	0.0681±0.0041	0.0612±0.0051	0.0731±0.0093

Table 5:Performance on CSI300 and CSI500 in the test dataset.
表 5：测试数据集中 CSI300 和 CSI500 的性能。

To answer the second question, we run the baselines and our method on the CSI300 and CSI500 stock datasets and evaluate them on the two metrics. The results are shown in Tab. 5.
为了回答第二个问题，我们在 CSI300 和 CSI500 股票数据集上运行基线和方法，并在两个指标上评估它们。结果如表5所示。

The first three methods, MLP, XGBoost, and LightGBM, combine an existing set of alphas from Qlib. They have a worse performance due to the usage of the open-source set of alphas. On the other hand, gplearn and AlphaGen are based on formulaic alphas generated by themselves. Alphas generated by gplearn and AlphaGen perform better on the test dataset. Although AlphaGen has designed a framework to filter alphas, it neither ensures the validity of alphas upon generation nor emphasizes diversity, which leads to possible performance degradation. We attribute the superior performance of Alpha2 to the logical soundness and diversity of alphas.
前三种方法（MLP、XGBoost 和 LightGBM）结合了 Qlib 中现有的一组 alpha。由于使用了开源的 alpha 集，它们的性能更差。另一方面，gplearn 和 AlphaGen 基于自己生成的公式化 alpha。gplearn 和 AlphaGen 生成的 Alpha 在测试数据集上表现更好。尽管 AlphaGen 设计了一个过滤 alpha 的框架，但它既不保证 alpha 在生成时的有效性，也不强调多样性，这可能会导致性能下降。我们将 Alpha 2 的卓越性能归因于 alpha 的逻辑健全性和多样性。

5.4Stock Market Backtest 5.4股市回测

Refer to caption

Figure 3:Backtest result on CSI300. The value of the y axis represents the cumulative reward.
图 3：CSI300 的回测结果。y 轴的值表示累积奖励。

To further validate the effectiveness of our method, we conduct an experiment on a simulated environment. The data is from the Chinese A-shares market in the test period. The trading strategy is a top-k/drop-n strategy. On each trading day, stocks are first sorted according to the alpha values, then the top-𝑘 stocks are selected to trade. With at most 𝑛 stocks traded every day, we try to invest evenly across the 𝑘 stocks. In out experiment, 𝑘=50 and 𝑛=5. The result of the backtest is shown in Fig. 3. Our method demonstrates superior performance on the CSI300 stock market.
为了进一步验证我们方法的有效性，我们在模拟环境中进行了实验。数据来自测试期的中国A股市场。交易策略是 top-k/drop-n 策略。在每个交易日，首先根据阿尔法值对股票进行排序，然后选择排名靠前的 𝑘 股票进行交易。由于大多数 𝑛 股票每天都在交易，我们试图在 𝑘 股票之间均匀投资。在实验中， 𝑘=50 和 𝑛=5 .回测结果如图 3 所示。我们的方法在沪深300股票市场上表现出色。

6Conclusion 6结论

In this work, we introduce Alpha2, a novel framework for the discovery of formulaic alphas. Using RL and MCTS, we harness the power of modern machine learning techniques to address the challenges of discovering powerful formulaic alphas in the vast search space. Alpha2 formulates the alpha generation process as a program construction task, using RL-guided MCTS as the search algorithm. Our refined value estimation, performance evaluation and dimension check ensures the discovery of high-quality alphas and a good and robust trading strategy, which are validated in the experiments. From the perspective of search algorithms, vast theoretical research has been done on MCTS and RL. Our method benefits from the existing research. The search algorithm can minimize the regret to the ground truth within theoretical bounds, compared to the mostly empirical results of previous methods. On the engineering side, we propose a novel framework to generate formulaic alphas. This framework allows a general design of search space pruning for formulaic alphas, including but not limited to dimensional consistency rules.
在这项工作中，我们介绍了 Alpha 2 ，一种用于发现公式化 alpha 的新框架。使用 RL 和 MCTS，我们利用现代机器学习技术的力量来应对在广阔的搜索空间中发现强大的公式化 alpha 的挑战。Alpha 2 将 alpha 生成过程表述为程序构建任务，使用 RL 引导的 MCTS 作为搜索算法。我们精细化的价值估计、性能评估和维度检查确保发现高质量的阿尔法和良好而稳健的交易策略，这些都在实验中得到验证。从搜索算法的角度来看，对MCTS和RL进行了大量的理论研究。我们的方法受益于现有研究。与以前方法的大多数经验结果相比，搜索算法可以在理论范围内最大限度地减少对地面实况的遗憾。在工程方面，我们提出了一个新颖的框架来生成公式化 alpha。该框架允许对公式化 alpha 的搜索空间修剪进行一般设计，包括但不限于维度一致性规则。

References 引用

Chen & Guestrin (2016) Chen&Guestrin（2016年）↑Tianqi Chen and Carlos Guestrin.XGBoost: A scalable tree boosting system.In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
陈天琪和卡洛斯·盖斯特林。XGBoost：一个可扩展的树提升系统。第 22 届 ACM SIGKDD 知识发现和数据挖掘国际会议论文集，第 785–794 页，2016 年。
Cui et al. (2021) 崔等人（2021 年）↑Can Cui, Wei Wang, Meihui Zhang, Gang Chen, Zhaojing Luo, and Beng Chin Ooi.AlphaEvolve: A learning framework to discover novel alphas in quantitative investment.In Proceedings of the 2021 International Conference on Management of Data, pp. 2208–2216, 2021.
崔灿、王伟、张美慧、陈刚、罗兆晶、吴明钦。AlphaEvolve：一个学习框架，用于发现量化投资中的新阿尔法。2021 年数据管理国际会议论文集，第 2208–2216 页，2021 年。
Dam et al. (2019) Dam等人（2019）↑Tuan Dam, Pascal Klink, Carlo D’Eramo, Jan Peters, and Joni Pajarinen.Generalized mean estimation in Monte-Carlo tree search.arXiv preprint arXiv:1911.00384, 2019.
Tuan Dam、Pascal Klink、Carlo D'Eramo、Jan Peters 和 Joni Pajarinen。蒙特卡罗树搜索中的广义均值估计。arXiv 预印本 arXiv：1911.00384， 2019.
Kamienny et al. (2023) Kamienny 等人（2023 年）↑Pierre-Alexandre Kamienny, Guillaume Lample, Sylvain Lamprier, and Marco Virgolin.Deep generative symbolic regression with Monte-Carlo-tree-search.In International Conference on Machine Learning, pp. 15655–15668, 2023.
皮埃尔-亚历山大·卡米尼、纪尧姆·兰普尔、西尔万·兰普里尔和马可·维尔戈林。使用蒙特卡洛树搜索进行深度生成符号回归。在机器学习国际会议上，第 15655–15668 页，2023 年。
Ke et al. (2017) Ke等人（2017）↑Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu.LightGBM: A highly efficient gradient boosting decision tree.In Advances in Neural Information Processing Systems, pp. 3146–3154, 2017.
柯国林，孟琪，托马斯·芬利，王泰峰，陈伟，马卫东，叶启伟，刘铁岩.LightGBM：一种高效的梯度提升决策树。《神经信息处理系统进展》，第 3146–3154 页，2017 年。
Koza (1994) 科扎（1994）↑John R Koza.Genetic Programming as a means for programming computers by natural selection.Statistics and Computing, 4:87–112, 1994.
约翰·科扎（John R Koza）。遗传编程作为通过自然选择对计算机进行编程的一种手段。统计与计算， 4：87–112， 1994.
Mankowitz et al. (2023) Mankowitz 等人（2023 年）↑Daniel J. Mankowitz, Andrea Michi, Anton Zhernov, Marco Gelmi, Marco Selvi, Cosmin Paduraru, Edouard Leurent, Shariq Iqbal, Jean-Baptiste Lespiau, Alex Ahern, Thomas Köppe, Kevin Millikin, Stephen Gaffney, Sophie Elster, Jackson Broshear, Chris Gamble, Kieran Milan, Robert Tung, Minjae Hwang, A. Taylan Cemgil, Mohammadamin Barekatain, Yujia Li, Amol Mandhane, Thomas Hubert, Julian Schrittwieser, Demis Hassabis, Pushmeet Kohli, Martin A. Riedmiller, Oriol Vinyals, and David Silver.Faster sorting algorithms discovered using deep reinforcement learning.Nature, 618(7964):257–263, 2023.
丹尼尔·曼科维茨、安德里亚·米奇、安东·泽尔诺夫、马可·盖尔米、马可·塞尔维、科斯明·帕杜拉鲁、爱德华·洛伦特、沙里克·伊克巴尔、让-巴蒂斯特·莱斯皮奥、亚历克斯·埃亨、托马斯·科普、凯文·米利金、斯蒂芬·加夫尼、索菲·埃尔斯特、杰克逊·布罗希尔、克里斯·甘布尔、基兰·米兰、罗伯特·董、黄敏宰、A. 泰兰·塞姆吉尔、穆罕默德·巴雷卡坦、李玉佳、阿莫尔·曼丹、托马斯·休伯特、朱利安·施里特维瑟、德米斯·哈萨比斯、普什米特·科利、 Martin A. Riedmiller、Oriol Vinyals 和 David Silver。使用深度强化学习发现的更快排序算法。自然， 618（7964）：257–263， 2023.
Mundhenk et al. (2021) Mundhenk 等人（2021 年）↑T Nathan Mundhenk, Mikel Landajuela, Ruben Glatt, Claudio P Santiago, Daniel M Faissol, and Brenden K Petersen.Symbolic regression via neural-guided genetic programming population seeding.arXiv preprint arXiv:2111.00053, 2021.
T Nathan Mundhenk、Mikel Landajuela、Ruben Glatt、Claudio P Santiago、Daniel M Faissol 和 Brenden K Petersen。通过神经引导的遗传编程种群播种进行符号回归。arXiv 预印本 arXiv：2111.00053， 2021.
Sahoo et al. (2018) Sahoo等人（2018）↑Subham Sahoo, Christoph Lampert, and Georg Martius.Learning equations for extrapolation and control.In International Conference on Machine Learning, pp. 4442–4450, 2018.
Subham Sahoo、Christoph Lampert 和 Georg Martius。学习外推和控制方程。在机器学习国际会议上，第 4442–4450 页，2018 年。
Silver et al. (2016) Silver等人（2016）↑David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis.Mastering the game of Go with deep neural networks and tree search.Nature, 529(7587):484–489, 2016.
大卫·西尔弗、Aja Huang、Chris J. Maddison、Arthur Guez、Laurent Sifre、George van den Driessche、Julian Schrittwieser、Ioannis Antonoglou、Vedavyas Panneershelvam、Marc Lanctot、Sander Dieleman、Dominik Grewe、John Nham、Nal Kalchbrenner、Ilya Sutskever、Timothy P. Lillicrap、Madeleine Leach、Koray Kavukcuoglu、Thore Graepel 和 Demis Hassabis。掌握具有深度神经网络和树搜索的围棋游戏。自然， 529（7587）：484–489， 2016.
Silver et al. (2017) Silver等人（2017）↑David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy P. Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis.Mastering the game of go without human knowledge.Nature, 550(7676):354–359, 2017.
大卫·西尔弗、朱利安·施里特维瑟、凯伦·西蒙尼扬、扬尼斯·安东诺格鲁、阿贾·黄、亚瑟·盖兹、托马斯·休伯特、卢卡斯·贝克、马修·赖、阿德里安·博尔顿、陈玉田、蒂莫西·利利克拉普、范辉、洛朗·西弗尔、乔治·范登德里舍、索尔·格雷佩尔和德米斯·哈萨比斯。在人类不知情的情况下掌握围棋游戏。自然， 550（7676）：354–359， 2017.
Tulchinsky (2019) 图尔钦斯基（2019）↑Igor Tulchinsky.Finding alphas: A quantitative approach to building trading strategies.John Wiley & Sons, 2019.
伊戈尔·图尔钦斯基。寻找阿尔法：构建交易策略的定量方法。John Wiley&Sons，2019年。
Yang et al. (2020) Yang 等人（2020 年）↑Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, and Tie-Yan Liu.Qlib: An AI-oriented quantitative investment platform.arXiv preprint arXiv:2009.11189, 2020.
Xiao Yang， Weiqing Liu， Dong 周，江 Bian，和 Tie-Yan Liu.Qlib：面向人工智能的量化投资平台。arXiv 预印本 arXiv：2009.11189， 2020.
Yu et al. (2023) Yu 等人（2023 年）↑Shuo Yu, Hongyan Xue, Xiang Ao, Feiyang Pan, Jia He, Dandan Tu, and Qing He.Generating synergistic formulaic alpha collections via reinforcement learning.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 5476–5486, 2023.
朔宇、薛红燕、向傲、潘飞扬、何佳、涂丹丹、何青。通过强化学习生成协同公式化 alpha 集合。第 29 届 ACM SIGKDD 知识发现和数据挖掘会议论文集，第 5476–5486 页，2023 年。
Zhang et al. (2020) Zhang等人（2020）↑Tianping Zhang, Yuanqi Li, Yifei Jin, and Jian Li.AutoAlpha: An efficient hierarchical evolutionary algorithm for mining alpha factors in quantitative investment.arXiv preprint arXiv:2002.08245, 2020.
Tianping Zhang， Yuanqi Li， Yifei Jin， and Jian Li. AutoAlpha：一种用于挖掘量化投资中阿尔法因素的高效分层进化算法。arXiv 预印本 arXiv：2002.08245， 2020.