【文献阅读】The role of news sentiment in oil futures returns and volatility forecasting

0、摘要

In this paper, we extract the qualitative information from crude oil news headlines, and develop a novel VMD- BiLSTM model with investor sentiment indicator for crude oil forecasting.

本文中,我们提取了原油新闻标题的定性信息,develop a novel VMD-BiLSTM模型进行原油价格预测。

First, we construct a sentiment score considering cumulative effect from contextual data of oil news texts.

第一,我们构建了考虑cumulative effect的sentiment score。

Then, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility. A non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes.

第二,我们考虑了事件分析法以及Garch模型,以investigate情绪指标对收益及波动率的影响。non-recursive signal decomposition method(VMD)应用于分解原油历史收益和波动率。

After that, a bidirectional long short-term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs.

随后,双向long short-term memory neura networks深度学习预测模型被应用,结合了定性及定量的输入。

Our empirical results indicate that the shock of news sentiment significantly causes the fluctuation of oil futures prices, and news sentiment has an asymmetric impact on the volatility of oil futures. The incorporation of sentiment score is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.

我们的实证结果表明,the shock of news sentiment显著导致原油价格fluctuation,并且news sentiment对oil futures的波动率具有非对称影响;纳入sentiment score有助于改善预测表现。

1、Introduction

It is universally acknowledged that the crude oil futures market is a typical risk aggregation market that attracts worldwide attentions: The oil future price movements are identified to be more likely exposed to global political events and receive simultaneous shocks from other fi- nancial asset markets (Leduc and Sill, 2004; Considine and Larson, 2001).

众所周知,原油期货市场是一个typical risk aggregation market, which attracts worldwide attentions:原油期货价格波动exposed to全球政治事件,其他金融市场的同期震荡。

On the other hand, the prices of global financial assets also receive positive feedbacks from oil future price movements and disturbances (Hamilton and Wu, 2014; Teterin et al., 2016).

反过来,全球金融资产价格也会受到positive feedbacks of oil future price.

For crude oil safety management and assets allocation strategy concerns, the precise prediction for oil future market returns and risks is able to provide useful guideline for policy makers and investors.

因此,the precise prediction for oil future market returns and risks is useful.

However, the nonlinearity property of oil future market price is formulated by different types of factors, such as supply and demand relationship (Kilian, 2009), international events (Zhao et al., 2016) and investor sentiment (Qadan and Nama, 2018), which makes it a tough task in oil returns and volatilities prediction for the complex structures of oil price.

但是,其复杂的机制导致预测oil  returns and volatilities很困难。

An abundant amount of studies is devoted to predict the oil returns and volatilities utilizing the historical time-series data of oil market and economic related influencing factors. For example, Fan et al. (2008) use historical observations of WTI and Brent crude oil time-series data to predict the future oil prices based on genetic algorithm. (Shin et al., 2013) introduce semi-supervised learning approach to investigate the impact factors that affect the oil price movements, including OPEC and SAUDI oil production, USD exchange rates and Producer price index etc.

An abundant amount of studies is devoted to predict the oil returns and volatilities. 

However, some underlying factors, such as investor sentiment, may act as potential causes of oil price changes and fluctuations (Du et al., 2016; Qadan and Nama, 2018), which is hard to assess and calculate in empirical works due to non-quantization characteristic of market sentiment and reaction.

However, 一些潜在因子,譬如投资者情绪,难以量化。

Scholars have attempted to find out appropriate proxies for the investor concerns and sentiments of financial market. For example, Baker and Wurgler (2007) utilize a combination of financial indices to quantify the investor sentiment in stock market, including stock trading volume, mutual fund flows and IPO volume etc. Smales (2017) introduces CBOE Volatility Index (VIX) as a measure of investor fears and in- vestigates the relationship between VIX and stock returns. Kostopoulos et al. (2020) apply Google search volumes as a proxy for trading intensities of individual investors in German.

Scholars尝试构建investor concerns and sentiments的指标。

However, the previous measurements of investor sentiments show less effectiveness in providing untapped information for assets returns and volatilities prediction due to the following weakness (Li et al., 2019).

However, the previous measurements of investor sentiments show less effectiveness in provide untapped information.

First, official indices and statistics, such as transaction volume, are identified to provide less unexplored information about investor attentions, which is mainly due to its less consistency with the individual traders (Deng et al., 2012).

第一,official indices and statistics,比如交易量,难以提供unexplored information (untapped).

Second, the intensity and volume data of search engine contain too much investors-irrelevant noise (Limnios and You, 2018). As a result, the sentiment indicators calculated by search indices may show less ef- fectiveness and confidence level in financial assets prediction.

第二,the data of search engine包括太多无关噪音。

Natural Language Processing (NLP) techniques and big available dataset provide a novel framework for investor sentiment indicator constructions. By crawling the news headlines from hubs and websites for energy news, news dataset of crude oil can be tokenized. Utilizing the headline documents, daily investor sentiments are scored generated based on vector space models (Salton et al., 1975). Finally, returns and volatilities of crude oil are predicted by incorporating the daily polarity score of market sentiment.

NLP techniques and big dataset provide a novel framework for investor sentiment indicator constructions.

Sentiment index based on news headlines has the following advantages: First, news headlines reflect key information of investor attention, which can be measured and obtained efficiently through NLP techniques.

优点:news headlines reflect key informantion of investor attention

Second, sentiment index calculated by news headline contains less noise and irrelevant information, which is helpful to improve the reliability of indicator construction (Nassirtoussi et al., 2015).

无关噪音少

In this paper, we formally investigate the impact of news sentiment on oil futures returns and volatility by an event- based method and GARCH model estimations. Overall, the daily investor sentiment of crude oil is computed by NLP technique in this paper and act as a novel predictor for crude oil future returns and volatilities.

Several types of forecasting methods have been applied to oil future returns and volatility prediction by previous works, such as econometric models (Klein and Walther, 2016) and machine learning approaches (Yu et al., 2008; Tang et al., 2015; Yu et al., 2017).

此前多种方法应用于oil future returns and volatility prediction.

However, the econometric or machine learning typed predictors achieve inferior forecasting performance in comparison with the newly introduced deep learning approach (Mallqui and Fernandes, 2018).

但是,其效果皆inferior to deep learning approach

Utilizing the artificial neural networks consisting of multiple hidden layers, deep learning model shows superior time-series data predictability over its counterparts (LeCun et al., 2015). In recent years, deep learning has been applied broadly in crude oil time-series  data  prediction.  For example,  Zhao et al. (2017) apply a novel stacked denoising autoencoders (SDAE) for crude oil forecasting based on a large dataset of exogenous influencing parameters. Luo et al. (2019) employ a novel convolutional neural net- works (CNN) model to improve the short-term prediction performance for crude oil market.

列举一些deep learning literature

Since crude oil market returns and volatilities are non-stationary time-series data and consistent with complex influencing factors, the prediction accuracies of the proposed models may suffer due to the

high volatilities. In recent studies, a novel ensemble forecasting method, namely “Decomposition and Ensemble”, has been developed to handle

the task of irregular and non-stationary time-series data prediction (Bergmeir et al., 2016; Risse, 2019). This method decomposes the original time-series data into several stationary cycles, which can be estimated by forecasting models individually and finally integrated to generate the forecasting output. Among all the decomposition approaches, empirical mode decomposition (EMD) typed method is the predominant approach utilized in current empirical works (Wen et al., 2017; Santhosh et al., 2019).

EMD: 解决因收益时间序列、波动时间序列high volatilities造成的poor prediction accuracies.

However, the prediction error term may accumulate during the combination process of individual decomposed data forecasting, which is considered to reduce the prediction accuracies (Tang et al., 2015). In addition, EMD typed models may also give rise to the mode-mixing problem, which may probably produce the oscillations with similar scales in single decomposed factors (Colominas et al., 2014).

EMD typed models可能导致mode-mixing problem.

Based on the above studies, this paper develops a novel VMD- BiLSTM model with investor sentiment indicator for crude oil forecasting.

First, we extract the qualitative information from crude oil news headlines and conduct sentiment analysis on the contextual data, which provides effective and unexplored information for deep learning forecasting.

第一, qualitative information is extracted from news headline,其可以explore untapped information. 

Moreover, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility.

并且,event-based method and Garch model are adopted to investigate the impact ot news sentiment 对于returns volaitilities. 

Second, a non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes. Compared to the predominant decomposition approach EMD, VMD is tested to avoid the mode-mixing problem effectively (Dragomiretskiy and Zosso, 2014).

第二,VDM is applied to decompose 历史原油收益和波动into various intrinstic modes, which can avoid the mode-mixing problem.

Third, a bidirectional long short- term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs. The proposed BiLSTM model can extract a two- way sequential relationship in the time series data.

第三,BiLSTM can integrate both the qualitative and quantitative model inputs. The proposed BiLSTM model can extract a two-way sequential relationship.

According to our empirical results, we find the shock of news sentiment significantly causes the fluctuation of oil futures prices. Specifically, oil futures prices react positively around positive news shocks, and present relatively weak decline surrounding negative news shocks. According to the estimations of GARCH models, we find that news sentiment has an asymmetric impact on the volatility of oil futures. As for oil return and volatility forecasting, the incorporation of news index is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.

The major contributions of this paper may lie in that, to the best of our knowledge, this is the first paper to incorporate the sentiment index of oil market based on NLP technique for oil future returns and volatilities prediction, which serves as an initial attempt to improve the forecasting results utilizing the hidden and effective information of irrational behaviors in the crude oil market.

Furthermore, we empirically confirm the effectiveness of our proposed hybrid deep learning models for oil return and volatility forecasting. Our proposed model outperforms several benchmark econometrics, machine learning models, deep learning models and hybrid learning models. The methodology and empirical results presented by our study shed new light on risk controls of oil-related assets based on large-scale online datasets and data- driven approaches.

The rest of this paper is arranged as follows: Section 2 presents the research framework, news text analysis methods and forecasting models; Section 3 tests the impact of news sentiment on oil returns and volatility based on an event-based method and the estimation of GARCH models; Section 4 presents the empirical results of oil returns and volatility forecasting, including several robustness tests. Finally, the concluding remarks and future directions are concluded in Section 5.

2、Methodology

2.1 Research Framework

The forecasting approach proposed in this study aims to utilize qualitative information extracted from financial news headlines and quantitative information extracted from market time series data to improve the return and volatility forecasting accuracy in the crude oil futures market.

The framework of our proposed approach is shown in Fig. 1. Specifically, there are five major steps, namely data collection, data pre- processing, sentiment analysis, data decomposition, as well as returns and volatility forecasting. These steps are explained in detail in Sections 2.2–2.5.

2.2 Data collection and preprocessing

For this study, we collected two different datasets separately: crude oil futures price data and news headlines. In terms of the crude oil price dataset, the Brent (LCO) crude oil daily futures contract closing prices are retrieved from Investing.com, for the time period from January 4, 2010 to September 17, 2019. In terms of the news headlines dataset,    all the available news data related to “Crude oil” from oilprice.com, which is one of the largest hubs for energy news in the world with over 100,000 daily visitors, for the same time period as the crude oil news headline data. Instead of using full news articles in the analysis, we use news headlines due to several advantages: first, news headlines can provide a sufficient summary of the key news information; second, news headlines contain much less repetition and fewer irrelevant words than the news article itself (Nassirtoussi et al., 2015).

We first preprocess the raw news headlines dataset using tokenization to convert all headlines into lower cases, and to remove stop words and punctuations.

转小写,去除停用词,标点

Stop words are the most common words in a language, such as “the”, “a”, “on”, “all” and “is”. Since stop words, along with punctuations, do not carry important information re- lated to the text, they are removed during preprocessing.

After removing  stop  words and  punctuations,  the “bag-of-words” approach is then employed to transform new texts into vectors. In this approach, each document (news headline) is represented by a vector, and each word within the document represents an element in the vector.

each news headline is equal to a document. 向量

每一个标题的每一个词语代表向量中的一个元素

The length of each vector is determined by the number of distinct  words  in the corresponding news headline in the dataset.

向量长度 =  the number of distinct words in corresponding news headline

In this study, we also use a commonly used weighting technique, namely Term Frequency- Inverse Document Frequency (TF-IDF), in the vectorization process to evaluate the importance of a word to a specific document in a collection of documents. The importance of the word increases proportionally with the number of times it appears in the document, but decreases with the

number of documents that contain the word in the collection. Specifically, the TF-IDF score of word x in a document is calculated as follows in Eq. (1):

 计算TF*IDF:评估某个词语的相对重要性。

 In terms of the crude oil price data, we select the daily returns of the

Brent crude oil futures contracts as well as the 7-day volatility as the prediction targets.

日度对数收益率、七天平均波动率:prediction targets.

 正交化

2.3 Sentiment analysis

In this study, we employ the Sentimentr package in R to calculate the sentiment of each processed news headline. The Sentimentr package returns the polarity score in the range of [−1.0, 1.0] for each document.

The news is considered as positive news if its polarity score is above zero, otherwise, it is considered as negative news. In general, the more negative the polarity score, the more negative the news; the more positive the polarity score, the more positive the news.

As pointed by previous studies, news often has a rather continuous effect on the investor's sentiment in the actual futures market (Akhtar et al., 2013). That is to say, the public sentiment on a specific day is shaped by the combination of news on the day and that in previous few days. However, the more recent news is more influential than the old news. Considering this situation, we formulate a cumulative senti- ment score (CSS) following Kiritchenko et al. (2014) and Chowdhury et al. (2014). In this study, we assume any piece of news will have a significant impact on the investor sentiment for seven days, and that its impact exponentially declines each day after its release, which is consistent with the actual situation of news impact (Huang et al., 2014).

2.4 Data decomposition

According to previous literature, decomposing the original time series data into sub-series modes with different economic implications can help the neural networks capture its tendency and cyclicity (Wang et al., 2014). In this study, we employ variational mode decomposition (VMD) in the data decomposition process for the daily returns and 7-day volatility time series of Brent crude oil. In general, VMD is a non-recursive optimization technique that decomposes the original input signal f(t) into a series of discrete and stationary intrinsic modes uk through Wiener filtering and Hilbert transform (Liu et al., 2016). The optimization procedure is as follows (Zhang et al., 2017):

Step 1: Calculate the Hilbert transform of each mode uk and transform into respective uni-sided frequency spectrum.

Step 2: Alter the frequency spectrum of each mode uk to narrow frequency baseband

Step 3: Conduct the H1 Gaussian smoothness on the demodulated signal to obtain the bandwidth of each mode uk.

The optimal solution is obtained using the alternative direction method of multipliers (ADMM) (Hestenes, 1969) and the original input signal f(t) is decomposed into K intrinsic modes.

2.5 Deep learning forecasting model: BiLSTM

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值