lstm预测股票_股票相关性与lstm预测误差

最新推荐文章于 2024-08-04 22:35:44 发布

weixin_26750511

最新推荐文章于 2024-08-04 22:35:44 发布

阅读量2.7k

点赞数

文章标签：机器学习 python

原文链接：https://medium.com/@gjanesch/stock-correlation-versus-lstm-prediction-error-5ca96a110336

版权

本文探讨了LSTM（长短期记忆网络）如何应用于股票预测，并对比了股票相关性与LSTM预测误差的关系。通过翻译Medium上的一篇文章，深入理解在股票市场中使用机器学习模型进行预测时可能遇到的挑战。

摘要由CSDN通过智能技术生成

lstm预测股票

When trying to look at examples of LSTMs in Keras, I’ve found a lot that focus on using them to predict stock prices in the future. Most are pretty bare-bones though, consisting of little more than a basic LSTM network and a quick plot of the prediction. Though I think the utility of these models is a little questionable, it brought a question into my head: how accurate are the predictions made by a model trained on one stock if it’s predicting on another stock?

当尝试查看Keras中的LSTM的示例时，我发现有很多重点放在使用它们预测未来的股价上。不过，大多数工具只是一个简单的系统，仅由基本的LSTM网络和快速的预测图组成。尽管我认为这些模型的实用性存在一些疑问，但它使我想到一个问题：如果模型预测一只股票，则该模型对另一只股票进行预测的准确性如何？

The full code can be found here.

完整的代码可以在这里找到。

问题描述 (Problem Description)

Stocks are correlated with each other to varying degrees, so the behaviors of any given pair of stocks may or may not track each other. The correlation between stocks is usually measured as the correlation of their returns (or at least, that’s what I’ve seen), and it’s easy to compute those yourself.

股票彼此之间有不同程度的关联，因此任何给定的股票对的行为可能相互追踪，也可能不相互追踪。股票之间的相关性通常以回报率的相关性来衡量(至少，这就是我所看到的)，而且自己计算也很容易。

In addition, there are an immense number of posts and such about predicting stock prices with neural networks. These examples usually don’t go too deep, though, and they invariably train and check the model using data from the same stock. That’s reasonable enough, but it raises the question of how generalizable these models are. It doesn’t seem likely that the models would create good predictions if there was weak correlation between the stock they were trained on and the one it’s predicting on, but maybe it would work well enough for stocks that are more strongly correlated.

此外，还有大量的职位，例如关于使用神经网络预测股票价格的职位。这些示例通常不会太深入，它们总是使用相同库存中的数据来训练和检查模型。这足够合理，但是提出了这些模型的通用性问题。如果他们所训练的股票与所预测的股票之间的相关性较弱，那么这些模型似乎不太可能产生良好的预测，但是对于相关性更高的股票而言，它可能会很好地起作用。

So the goal here is:- Get data on a large number of stocks (preferably hundreds).- Compute the correlations between the stocks.- Train an LSTM on a single, reference stock.- Make predictions for the other stocks using that LSTM model.- See how some error metric varies with correlation.

因此，这里的目标是：-获取大量股票(最好是数百个)的数据。-计算股票之间的相关性。-在单个参考股票上训练LSTM。-使用该LSTM模型对其他股票进行预测.-了解一些误差度量如何随相关性变化。

获取数据 (Getting the Data)

Since I’m aiming to get data on a few hundred stocks, the first list that jumps to mind is the S&P 500. There are actually 505 tickers on there, but that’s because five of the companies have multiple share classes. I just discarded one class for each stock with multiple share classes — the list I ended up using is in the GitHub repo for this post.

由于我的目标是获取几百只股票的数据，因此，我想到的第一个列表是标准普尔500指数。实际上，有505种股票在此收盘，但这是因为其中有五家公司拥有多种股票类别。我只是为具有多个股票类别的每只股票放弃了一个类别–我最终使用的列表在此帖子的GitHub存储库中。

I downloaded the data from Tiingo via the pandas_datareader library. Tiingo limits free accounts to 500 unique symbols per month, so it’s feasible to grab this all at once, although you won’t to be able get data for any other ticker with that account for the remainder of the month.

我是通过pandas_datareader库从Tiingo下载数据的。 Tiingo每月将免费帐户限制为500个唯一符号，因此尽管在该月的剩余时间内您将无法使用该帐户获取任何其他报价器的数据，但一次捕获全部是可行的。

This will take several minutes to execute. If you’re running this code yourself, I recommend saving the data immediately afterward — the file that my run produced was almost 300 megabytes and contained about 2.3 million rows, so it’s not something you want to repeatedly download.

这将需要几分钟的时间来执行。如果您自己运行此代码，我建议之后立即保存数据-我的运行产生的文件将近300兆字节，包含约230万行，因此您不想重复下载该文件。

选择和缩放数据 (Selecting & Scaling Data)

Since we’re dealing with an LSTM, we’d like to have data scaled down to a range that’s better handled by the LSTM inputs. And since the scales of the stocks differ, we need individual scal