以下分享Shihao Gu等的文献的一个章节,因为读到这里很感动,所以特地记下来。
A number of aspects of empirical asset pricing make it a particularly attractive field for analysis with
machine learning methods.
- Two main research agendas have monopolized modern empirical asset pricing research. The
first seeks to describe and understand differences in expected returns across assets. The second
focuses on dynamics of the aggregate market equity risk premium. Measurement of an asset’s risk
premium is fundamentally a problem of prediction—the risk premium is the conditional expectation
of a future realized excess return. Machine learning, whose methods are largely specialized for
prediction tasks, is thus ideally suited to the problem of risk premium measurement. - The collection of candidate conditioning variables for the risk premium is large. The profession
has accumulated a staggering list of predictors that various researchers have argued possess forecast-
ing power for returns. The number of stock-level predictive characteristics reported in the literature
numbers in the hundreds and macroeconomic predictors of the aggregate market number in the
dozens.2Additionally, predictors are often close cousins and highly correlated. Traditional predic-
tion methods break down when the predictor count approaches the observation count or predictors
are highly correlated. With an emphasis on variable selection and dimension reduction techniques,
machine learning is well suited for such challenging prediction problems by reducing degrees of free-
dom and condensing redundant variation among predictors. - Further complicating the problem is ambiguity regarding functional forms through which the
high-dimensional predictor set enter into risk premia. Should they enter linearly? If nonlinearities
are needed, which form should they take? Must we consider interactions among predictors? Such
questions rapidly proliferate the set of potential model specifications. The theoretical literature offers
little guidance for winnowing the list of conditioning variables and functional forms. Three aspects
of machine learning make it well suited for problems of ambiguous functional form. The first is its
diversity. As a suite of dissimilar methods it casts a wide net in its specification search. Second, with
methods ranging from generalized linear models to regression trees and neural networks, machine
learning is explicitly designed to approximate complex nonlinear associations. Third, parameter
penalization and conservative model selection criteria complement the breadth of functional forms
spanned by these methods in order to avoid overfit biases and false discovery.
在线翻译:
经验资产定价的许多方面使其成为使用机器学习方法进行分析的特别有吸引力的领域。
1)两个主要的研究议程已经垄断了现代经验资产定价研究。第一种方法旨在描述和理解资产预期收益的差异。第二个重点是总市场股票风险溢价的动态。从根本上衡量资产的风险溢价是一个预测问题,即风险溢价是对未来实现的超额收益的有条件预期。因此,机器学习的方法主要用于预测任务,因此非常适合于风险溢价测量的问题。
2)风险溢价的候选条件变量的集合很大。该行业积累了惊人的预测指标列表,各种研究人员认为这些指标具有回报的预测能力。数百篇文献中报道的股票水平预测特征的数量以及数十种总体市场数量的宏观经济预测因子。2此外,预测因子通常是近亲,并且具有高度相关性。当预测变量数接近观察计数或预测变量高度相关时,传统的预测方法会崩溃。通过强调变量选择和降维技术,机器学习非常适合此类挑战性的预测问题,它可以减少自由度并压缩预测变量之间的冗余变化。
3)使功能问题进一步复杂化的是关于功能形式的歧义,高维预测变量通过这些功能形式进入风险溢价。他们应该线性输入吗?如果是非线性是被需要的,他们应该采取哪种形式?我们必须考虑预测变量之间的相互作用吗?这些问题迅速扩大了潜在的模型规格集。理论文献几乎没有指导您了解条件变量和功能形式的列表。机器学习的三个方面使其非常适合解决模棱两可的功能形式的问题。首先是它的多样性。作为一组不同的方法,它在规范搜索中投放了广泛的网络。其次,使用从广义线性模型到回归树和神经网络的方法,显式设计了机器学习以近似复杂的非线性关联。第三,参数惩罚和保守模型选择标准补充了这些方法所涵盖的功能形式的广度,以避免过度拟合偏差和错误发现。