nlp 预测_这就是使用NLP预测苹果股价的秘诀,这是关于埃克森美孚股价的事

nlp 预测

In late June 2020, I started a project to predict the stock market’s movement using Natural Language Processing (NLP). Stock market prediction refers to the act of attempting to determine the future value of a company’s stock that is traded on an exchange. Predicting the stock market, especially actual stock prices, turns out to be quite difficult, and this is so for several reasons. A major reason is that past performance is not necessarily a good indicator of future success. Simply put, what this means is that if a stock’s price increased by 2% two days ago, 4% yesterday, and 6% today, it does not imply that the price will increase by 8% tomorrow. In fact, it is very plausible that that stock’s value can decrease by 20%.

在2020年6月下旬,我启动了一个使用自然语言处理(NLP)预测股市走势的项目。 股市预测是指试图确定在交易所交易的公司股票的未来价值的行为。 预测股市,尤其是实际股价,是非常困难的,原因有很多。 主要原因是过去的表现并不一定是未来成功的良好指标。 简而言之,这意味着如果股票的价格在两天前上涨了2%,昨天上涨了4%,今天上涨了6%,则并不意味着明天股价将上涨8%。 实际上,股票的价值可以降低20%是很合理的。

Another major point to consider is that there are many factors which can affect a stock’s price and a model which accounts for all of these factors will most likely produce poor predictions. Even if one were to develop such a model successfully, the mere fact that we can now accurately predict a stock’s price will influence the stock’s price. Thus, again resulting in less than ideal results. However, it is also accepted that the stock market is forward-looking, and it reflects an investor’s outlook on the economy. Because of this, I was interested in using NLP to investigate the relationship between investor sentiment and predicted stock prices.

要考虑的另一个主要要点是,有许多因素会影响股票的价格,而考虑所有这些因素的模型很可能会产生错误的预测。 即使成功开发了这样一种模型,我们现在可以准确预测股票价格的事实也将影响股票价格。 因此,再次导致不理想的结果。 但是,人们也认为股票市场是前瞻性的,它反映了投资者的经济前景。 因此,我对使用NLP来研究投资者情绪与预期股价之间的关系感兴趣。

Image for post
Image by Trist’n Joseph
图片由Trist'n Joseph

When I first wrote about this project in July 2020, I attempted to predict Apple’s stock price and found some very interesting results. To provide a summary, I collected Apple-related articles from MarketWatch.com and determined the polarity of each article to investigate the relationship between investor sentiment and Apple’s stock price. Through looking at the relationship between standardized prices and polarity, I noticed that polarity had a lagged cumulative effect on Apple’s stock price. Because of this, I hypothesized that the stock price can be modelled by the recursive function stock price tomorrow = (price today) + constant*(price today)*(sentiment today) and I predicted the daily stock price from February 20th, 2020 to June 12th, 2020.

当我在2020年7月首次撰写有关该项目的文章时,我试图预测苹果的股价并发现了一些非常有趣的结果。 为了提供摘要,我从MarketWatch.com收集了与Apple有关的文章,并确定了每篇文章的极性以调查投资者情绪与Apple股票价格之间的关系。 通过查看标准化价格与极性之间的关系,我注意到极性对苹果的股价有滞后的累积影响。 因此,我假设可以用stock price tomorrow = (price today) + constant*(price today)*(sentiment today)的递归函数stock price tomorrow = (price today) + constant*(price today)*(sentiment today)来模拟stock price tomorrow = (price today) + constant*(price today)*(sentiment today)并预测了2020年2月20日至2020年6月12日。

The predictions from this model were relatively similar to Apple’s actual stock price from February up until April. Through some more analysis, I discovered that the constant in my function should apply a higher weighting to positive sentiments than negative sentiments and that I should also include the traded volume. Therefore, I developed a new recurrence model which stated that stock price tomorrow = (price today) + price_constant*(price today)*(sentiment today) — volume_constant*(volume today)*(sentiment today). The results from this prediction model can be seen in the graph above, and I would recommend reading my previous article to get a deeper understanding of the analysis (https://tinyurl.com/tristnapplnlp).

该模型的预测与2月至4月苹果的实际股价相对相似。 通过更多的分析,我发现我的函数中的constant应该对积极情绪施加比对消极情绪更大的权重,并且我还应该包括交易量。 因此,我开发了一个新的递归模型,该模型表示stock price tomorrow = (price today) + price_constant*(price today)*(sentiment today) — volume_constant*(volume today)*(sentiment today) 。 这个预测模型的结果可以在上图中看到,我建议阅读我以前的文章,以对分析进行更深入的了解( https://tinyurl.com/tristnapplnlp )。

Image for post
Image by Trist’n Joseph
图片由Trist'n Joseph

With what I uncovered using Apple’s stock, I was interested in testing the model’s framework on Exxon Mobil’s stock over a similar period to determine whether I would see similar results. I also asked the data science and finance communities for feedback, and I am thankful for the amazing suggestions that I received. For this part of the project, I collected 162 Exxon Mobil-related articles that were hosted on MarketWatch.com between March 4th, 2020 and June 6th, 2020. I first used the recurrence function which states stock price tomorrow = (price today) + constant*(price today)*(sentiment today) and it can be seen that the predicted prices followed the general trend of Exxon’s actual prices for most days. The points of most deviation were seen during late May, where the prices had an average deviation of $11.20 and a maximum deviation of $24.20.

借助我发现的使用苹果公司股票的信息,我对在相似时期内测试埃克森美孚股票模型的框架感兴趣,以确定我是否会看到相似的结果。 我还向数据科学和金融界寻求反馈,对于我收到的惊人建议深表感谢。 对于项目的这一部分,我收集了2020年3月4日至2020年6月6日在MarketWatch.com上托管的162篇与埃克森美孚相关的文章。我首先使用了递归函数,该函数表示stock price tomorrow = (price today) + constant*(price today)*(sentiment today) ,可以看到预测的价格在大多数日子里都遵循埃克森美孚实际价格的总体趋势。 在5月下旬出现了最大的偏差点,这些价格的平均偏差为11.20美元,最大偏差为24.20美元。

I then used the recurrence function which states stock price tomorrow = (price today) + price_constant*(price today)*(sentiment today) — volume_constant*(volume today)*(sentiment today). The predictions and graph produced using this function are identical to the one seen above, but this model’s performance was found to be marginally better than the previous one. These models were optimized using the intermediate value theorem to find values for the constants which minimized the mean absolute prediction error. When these evaluation metrics were compared against each other, the function with volume was found to perform ~0.6% better than the function without volume. The deviation during late May indicates that there was very positive news published about Exxon Mobil, but there were also other factors during that time which influenced the stock’s price and had a greater effect than the article’s sentiment.

然后,我使用了递归函数,该函数陈述stock price tomorrow = (price today) + price_constant*(price today)*(sentiment today) — volume_constant*(volume today)*(sentiment today) 。 使用此函数生成的预测和图形与上面看到的相同,但是发现该模型的性能略优于前一个。 这些模型使用中间值定理进行了优化,以找到常数值,从而将平均绝对预测误差降至最低。 将这些评估指标相互比较时,发现带音量的功能比不带音量的功能好〜0.6%。 5月下旬的偏离表明,有关埃克森美孚(Exxon Mobil)的新闻非常积极,但在此期间还有其他因素影响了该股的价格,并比文章的情绪产生更大的影响。

Image for post
Image by Trist’n Joseph
图片由Trist'n Joseph

Next, day-to-day price difference became the prediction of interest. This could seem similar to the predictions that I have previously discussed since I have predicted Exxon’s stock price and I can observe the graph to estimate the price difference. But predicting the daily stock price and predicting the day-to-day price difference is not the same. Predicting the stock price means that I am saying Exxon will be trading at $40 tomorrow, for example. Predicting the day-to-day price difference, however, means that I am saying Exxon will increase by $40 between today and tomorrow. This does not suggest that Exxon will be trading at any particular value. Rather, it can be expected that Exxon will increase by x amount, given a set of inputs, and this increase is price independent.

接下来,日常价格差异成为兴趣的预测。 这似乎与我之前讨论的预测相似,因为我已经预测了埃克森美孚的股价,并且可以观察图表来估算价格差异。 但是,预测每日股票价格和预测每日价格差异并不相同。 预测股价意味着我说埃克森美孚明天的交易价格为40美元。 但是,预测每天的价格差意味着我说埃克森美孚在今天和明天之间将增加40美元。 这并不意味着埃克森美孚将以任何特定价值进行交易。 相反,在给定一组输入的情况下,可以预期埃克森美孚将增加x倍,并且这种增加与价格无关。

For similar reasons, it would have been inappropriate for me to predict Exxon’s stock price using the models stated before and then use these values to determine the price difference. Therefore, I developed a new model which states price difference tomorrow = volume_constant*(volume today) — sentiment_constant*(sentiment today). As can be seen, this model is price independent and the predictions are made solely based on sentiment and traded volume. The graphs above show the daily price difference and the daily absolute price difference. It can be seen that the predicted absolute price difference and the actual absolute price difference are more similar to each other than the predicted daily total price difference and the actual total price difference are to each other.

出于类似的原因,对我而言,使用之前所述的模型预测埃克森美孚的股价,然后使用这些值确定价格差异,本来是不合适的。 因此,我开发了一个新模型,该模型指出price difference tomorrow = volume_constant*(volume today) — sentiment_constant*(sentiment today) 。 可以看出,该模型与价格无关,并且仅基于情绪和交易量进行预测。 上图显示了每日价格差异和每日绝对价格差异。 可以看出,预测的绝对价格差和实际的绝对价格差比预测的每日总价格差和实际的总价格差彼此更相似。

Image for post
Image by Trist’n Joseph
图片由Trist'n Joseph

With this relationship established, I then attempted to develop a model which can predict the actual stock price, using the price difference model. Unfortunately, the predictions from this model were completely wrong. I am still working on this project and I plan to test the new findings on financial data from Goldman Sachs. Another area of this project which needs improvement is the web scraping function to collect articles from MarketWatch.com.

建立这种关系后,我尝试使用价格差异模型开发一个可以预测实际股票价格的模型。 不幸的是,该模型的预测是完全错误的。 我仍在从事这个项目,并计划测试有关高盛财务数据的新发现。 该项目需要改进的另一个方面是Web抓取功能,可从MarketWatch.com收集文章。

Therefore, the focus going forward would be to improve on both the new model and the web scraping function. I would love any feedback on this work thus far, and I am open to having others work alongside me for the remainder of this project if interested. Please feel free to point out any mistakes that I might have made, or to suggest anything that might have been overlooked.

因此,未来的重点将是同时改进新模型和Web抓取功能。 到目前为止,我很乐意对此工作提供任何反馈,如果有兴趣,我愿意与其他人一起在本项目的其余部分工作。 请随时指出我可能犯的任何错误,或提出任何可能被忽略的建议。

Code: github.com/trisxcj1/Stock-Market-Movement-Using-NLP

代码: github.com/trisxcj1/Stock-Market-Movement-Using-NLP

Previous Article: towardsdatascience.com/heres-how-i-predicted-apple-s-stock-price-using-natural-language-processing-13a578c41b8e

上一篇文章: wardsdatascience.com/heres-i-predicted-apple-s-stock-price-using-自然语言处理-13a578c41b8e

machinelearningmastery.com/natural-language-processing/

machinelearningmastery.com/natural-language-processing/

monkeylearn.com/sentiment-analysis/#:~:text=Sentiment%20analysis%20is%20the%20interpretation,or%20services%20in%20online%20feedback.

monkeylearn.com/sentiment-analysis/#:~:text=情感%20analysis%20is%20the%20解释,或%20services%20in%20online%20feedback。

netapp.com/us/info/what-is-unstructured-data.aspx

netapp.com/us/info/what-is-unstructured-data.aspx

thebalance.com/how-market-prices-move-through-buying-and-selling-1031049

thebalance.com/how-market-prices-move-through-buying-and-sale-1031049

arxiv.org/pdf/1806.09533.pdf

arxiv.org/pdf/1806.09533.pdf

Other Useful Material:

其他有用的材料:

algorithmia.com/blog/introduction-natural-language-processing-nlp

algorithmia.com/blog/introduction-natural-language-processing-nlp

quora.com/Why-is-the-stock-market-so-difficult-to-predict#:~:text=It%20is%20really%20impossible%20to,would%20happen%20in%20the%20future.

quora.com/为什么股市很难预测#:〜:text =%20is%20really%20不可能%20to,%20happen%20in%20the%20future。

researchgate.net/publication/228892903_Using_news_articles_to_predict_stock_price_movements

researchgate.net/publication/228892903_Using_news_articles_to_predict_stock_price_movements

nlp.stanford.edu/courses/cs224n/2007/fp/timmonsr-kylee84.pdf

nlp.stanford.edu/courses/cs224n/2007/fp/timmonsr-kylee84.pdf

翻译自: https://medium.com/swlh/heres-what-predicting-apple-s-stock-price-using-nlp-taught-me-about-exxon-mobil-s-stock-c41968cc4dca

nlp 预测

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值