Day 17.The role of news sentiment in oil futures returns and volatility forecasting

Title:
The role of news sentiment in oil futures returns and volatility forecasting: data-decomposition based deep learning approach
新闻情绪在石油期货收益率和波动率预测中的作用:基于数据分解的深度学习方法

Key words:
news sentiment, returns and volatility forecasting, variational mode decomposition, deep learning
新闻情绪,收益率和波动预测,变分模态分解,深度学习

Abstract:
In this paper, we extract the qualitative information from crude oil news headlines, and develop a novel VMD-BiLSTM model with investor sentiment indicator for crude oil forecasting. First, we construct a sentiment score considering cumulative effect from contextual data of oil news texts. Then, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility. A non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes. After that, a bidirectional long short-term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs. Our empirical results indicate that the shock of news sentiment significantly causes the fluctuation of oil futures prices, and news sentiment has an asymmetric impact on the volatility of oil futures. The incorporation of sentiment score is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.
本文从原油新闻标题中提取定性信息,建立了一个新的基于投资者情绪指标的VMD-BiLSTM原油预测模型。首先,我们从石油新闻文本的语境数据出发,构建了一个考虑累积效应的情感评分。然后,我们采用基于事件的方法和GARCH模型研究了新闻情绪对收益率和波动率的影响。采用非递归信号分解方法,即变分模态分解(VMD),将历史原油收益率和波动率数据分解为多种内在模式。在此基础上,引入双向长短时记忆神经网络(BiLSTM)作为深度学习预测模型,将定性和定量模型输入相结合。实证结果表明,新闻情绪的冲击显著地引起了石油期货价格的波动,而新闻情绪对石油期货价格的波动具有非对称的影响。在所有的基准情景中,情绪得分的加入总是有助于提高预测性能。具体来说,我们提出的基于数据分解的深度学习模型比几种计量经济学和机器学习模型更有效。

1.Introduction
The major contributions of this paper may lie in that, to the best of our knowledge, this is the first paper to incorporate the sentiment index of oil market based on NLP technique for oil future returns and volatilities prediction, which serves as an initial attempt to improve the forecasting results utilizing the hidden and effective information of irrational behaviors in the crude oil market. Furthermore, we empirically confirm the effectiveness of our proposed hybrid deep learning models for oil return and volatility forecasting. Our proposed model outperforms several benchmark econometrics, machine learning models, deep learning models and hybrid learning models. The methodology and empirical results presented by our study shed new light on risk controls of oil-related assets based on large-scale online datasets and data-driven approaches.
本文的主要贡献可能在于,据我们所知,这是第一篇将基于NLP技术的石油市场情绪指数纳入石油未来收益率和波动率预测的论文,这是利用原油市场非理性行为中隐藏的有效信息提高预测效果的初步尝试。此外,我们还通过实证验证了本文提出的混合深度学习模型对石油收益率和波动率预测的有效性。我们提出的模型优于一些基准计量经济学、机器学习模型、深度学习模型和混合学习模型。本文的研究方法和实证结果为基于大规模在线数据集和数据驱动方法的石油相关资产风险控制提供了新的思路。

2.2 Data collection and preprocessing
For this study, we collected two different datasets separately: crude oil futures price data and news headlines. In terms of the crude oil price dataset, the Brent (LCO) crude oil daily futures contract closing prices are retrieved from Investing.com, for the time period from January 4, 2010 to September 17, 2019. In terms of the news headlines dataset, all the available news data related to “Crude oil”from oilprice.com, which is one of the largest hubs for energy news in the world with over 100,000 daily visitors, for the same time period as the crude oil news headline data. Instead of using full news articles in the analysis, we use news headlines due to several advantages: first, news headlines can provide a sufficient summary of the key news information; second, news headlines contain much less repetition and fewer irrelevant words than the news article itself (Nassirtoussiet al., 2015).
在这项研究中,我们分别收集了两个不同的数据集:原油期货价格数据和新闻标题。根据原油价格数据集,从Investing.com检索2010年1月4日至2019年9月17日期间的布伦特(LCO)原油每日期货合约收盘价。根据新闻标题数据集,从oilprice.com检索与“原油”相关的所有可用新闻数据,它是世界上最大的能源新闻中心之一,每日访问量超过10万人次,与原油新闻标题数据同期。在分析中,我们没有使用完整的新闻文章,而是使用新闻标题,因为它有以下几个优点:第一,新闻标题可以提供对关键新闻信息的充分概括;第二,新闻标题比新闻文章本身包含的重复和无关词要少得多(Nassirtussiet al.,2015)。

We first preprocess the raw news headlines dataset using tokenization to convert all headlines into lower cases, and to remove stop words and punctuations. Stop words are the most common words in a language, such as “the”, “a”, “on”, “all”and “is”. Since stop words, along with punctuations, do not carry important information related to the text, they are removed during preprocessing.
我们首先对原始新闻标题数据集进行标记化预处理,将所有标题转换成小写,并去除停止词和标点符号。停止词是语言中最常见的词,如“the”、“a”、“on”、“all”和“is”。由于停止词以及标点符号不携带与文本相关的重要信息,因此在预处理过程中会删除它们。

After removing stop words and punctuations, the “bag-of-words”approach is then employed to transform new texts into vectors. In this approach, each document (news headline) is represented by a vector, and each word within the document represents an element in the vector. The length of each vector is determined by the number of distinct words in the corresponding news headline in the dataset. In this study, we also use a commonly used weighting technique, namely Term Frequency-Inverse Document Frequency (TF-IDF),in the vectorization process to evaluate the importance of a word to a specific document in a collection of documents.The importance of the word increases proportionally with the number of times it appears in the document, but decreases with the number of documents that contain the word in the collection.
在去除停止词和标点符号后,采用“词袋”方法将新文本转化为向量。在这种方法中,每个文档(新闻标题)由一个向量表示,文档中的每个单词表示向量中的一个元素。每个向量的长度由数据集中相应新闻标题中不同单词的数量决定。在本研究中,我们还使用了一种常用的加权技术,即词频逆文档频率(TF-IDF),在矢量化过程中,评估一个词对文档集合中某一特定文档的重要性。该词的重要性随着它在文档中出现的次数成比例增加,但随着集合中包含单词的文档数的增加而减少。

In terms of the crude oil price data, we select the daily returns of the Brent crude oil futures Journal Pre-proofJournal Pre-proof
contracts as well as the 7-day volatility as the prediction targets. The daily logarithmic returns(𝑟𝑡)and the 7-day volatility(𝑣𝑡)are derived from the raw daily closing prices as follows in Equation 2and 3, respectively:
在原油价格数据方面,我们选取布伦特原油期货日收益率日前检验日前检验合约以及7日波动率作为预测目标。日对数收益率(𝑟𝑡)以及7天的波动性(𝑣𝑡)分别根据公式2和3中的原始每日收盘价得出:

2.3 Sentiment analysis
In this study, we employ the Sentiment r package in R to calculate the sentiment of each processed news headline. The Sentiment r package returns the polarity score in the range of [-1.0, 1.0]for each document. The news is considered as positive news if its polarity score is above zero, otherwise, it is considered as negative news. In general, the more negative the polarity score, the more negative the news; the more positive the polarity score, the more positive the news.
在本研究中,我们使用R中的情绪包来计算每个处理过的新闻标题的情绪。对于每个文档,情绪包返回[-1.0,1.0]范围内的极性分数。极性得分在零以上的新闻为正面新闻,极性得分在零以上的新闻为负面新闻。总的来说,极性得分越负面,新闻越负面;极性得分越积极,消息就越积极。

2.4Data Decomposition
In this study, we employ variational mode decomposition (VMD) in the data decomposition process for the daily returns and 7-day volatility time series of Brent crude oil. In general, VMD is a non-recursive optimization technique that decomposes the original input signal 𝑓(𝑡) into a series of discrete and stationary intrinsic modes 𝑢𝑘 through Wiener filtering and Hilbert transform (Liu et al., 2016). The optimization procedure is as follows (Zhang et al., 2017):
本文采用变分模态分解(VMD)方法对布伦特原油的日收益率和7天波动率时间序列进行数据分解。一般来说,VMD是一种非递归优化技术,它对原始输入信号进行分解𝑓(𝑡) 转换成一系列离散的和稳定的本征模𝑢𝑘 通过维纳滤波和希尔伯特变换(Liu et al., 2016)。优化程序如下 (Zhang et al., 2017):

5.Conclusion
In this study, we propose a novel VMD-BiLSTM model incorporating investor sentiment indicator for crude oil forecasting. Specifically, we collection dataset of crude oil news headlines and conduct sentiment analysis on the contextual data, which provides effective and unexplored information for deep learning forecasting. We empirically investigate the impact of news sentiment on returns and volatility based on event study and GARCH model. As for return and volatility forecasting, variational mode decomposition (VMD) is applied to decompose the historical crude oil return and volatility data into various intrinsic modes. After that, a bidirectional long short-term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs.
在这项研究中,我们提出了一个新的VMD-BiLSTM模型,结合投资者情绪指标进行原油预测。具体来说,我们收集了原油新闻标题的数据集,并对背景数据进行了情绪分析,为深度学习预测提供了有效和未开发的信息。基于事件研究和GARCH模型,实证研究了新闻情绪对收益率和波动率的影响。在收益率和波动率预测方面,采用变分模式分解法(VMD)将历史原油收益率和波动率数据分解为各种内在模式。在此基础上,引入双向长短时记忆神经网络(BiLSTM)作为深度学习预测模型,将定性和定量模型输入相结合。

Our empirical results indicate that news sentiment significantly causes the fluctuation of oil futures prices, and serves as an effective predictor for oil returns and volatility. Specifically, oil futures prices significantly react around news shocks, regardless of positive or negative shocks. Moreover, we find that news sentiment has an asymmetric impact on the volatility of oil futures. According to forecasting comparisons, the data-decomposition based deep learning model integrating sentiment score always performs better than several econometric and machine learning models.
实证结果表明,新闻情绪显著地影响了石油期货价格的波动,并对石油收益率和波动性起到了有效的预测作用。具体而言,无论正面或负面冲击如何,石油期货价格都会围绕新闻冲击做出显著反应。此外,我们发现,新闻情绪对石油期货的波动性有不对称的影响。通过预测比较,基于数据分解的融合情绪得分的深度学习模型的预测效果总是优于几种计量经济模型和机器学习模型。

Our study presents an early attempt to integrating online text data into oil return and volatility forecasting. In future research, more sources of online data could be utilized for forecasting, such as user-generated contexts (UGC) from social media, or oil-related firms’ disclosure. Furthermore, other NLP techniques could be adopted for text analysis of oil market news, such as topic identification and event extraction. Our study indicates the viability of application of deep learning model in return and volatility forecasting. More deep learning models are also promising to further improving the forecasting performances of oil return and volatility.
我们的研究提供了一个早期的尝试,将在线文本数据整合到石油收益率和波动率预测中。在未来的研究中,可以利用更多的在线数据来源进行预测,例如来自社交媒体的用户生成上下文(UGC)或石油相关公司的披露。此外,自然语言处理技术还可以用于石油市场新闻的文本分析,如主题识别和事件抽取。我们的研究表明了深度学习模型在收益率和波动率预测中应用的可行性。更深入的学习模型也有望进一步提高石油收益率和波动率的预测性能。

疑问:为什么事件发生前十天对原油的收益率有影响?

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值