Encog3Java-User.pdf翻译:第八章 使用时序数据

Chapter 8
第八章


Using Temporal Data
使用时序数据


? How a Predictive Neural Network Works
? Using the Encog Temporal Dataset
? Attempting to Predict Sunspots
? Using the Encog Market Dataset
? Attempting to Predict the Stock Market
Prediction is another common use for neural networks. A predictive neural network will attempt to predict future values based on present and past values. Such neural networks are called temporal neural networks because they operate over time. This chapter will introduce temporal neural networks and the support classes that Encog provides for them.
预测是神经网络的另一种常用方法。预测神经网络将尝试根据当前值和过去值预测未来值。这种神经网络被称为时序神经网络,因为它们随着时间的推移而运作。本章将介绍时序神经网络和encog为它们提供的支持类。


In this chapter, you will see two applications of Encog temporal neural networks. First, we will look at how to use Encog to predict sunspots. Sunspots are reasonably predictable and the neural network should be able to learn future patterns by analyzing past data. Next, we will examine a simple case of applying a neural network to making stock market predictions.
在这一章中,你将看到encog时序神经网络应用。首先,我们将看看如何使用encog预测太阳黑子。太阳黑子的可预测性很强,神经网络应该能够通过分析过去的数据来学习未来的模式。接下来,我们将研究一个简单的应用神经网络进行股票市场预测的例子。


Before we look at either example we must see how a temporal neural network actually works. A temporal neural network is usually either a feedforward or simple recurrent network. Structured properly, the feedforward neural networks shown so far could be structured as a temporal neural network by assigning certain input and output neurons.
在我们看这两个例子之前,我们必须了解一个时序神经网络是如何工作的。时序神经网络通常是前馈网络或简单递归网络。恰当地构造,迄今为止所示的前馈神经网络可以通过分配一定的输入和输出神经元来构造为时序神经网络。


8.1 How a Predictive Neural Network Works
8.1 预测神经网络如何工作


A predictive neural network uses its inputs to accept information about current data and uses its outputs to predict future data. It uses two “windows”, a future window and a past window. Both windows must have a window size, which is the amount of data that is either predicted or is needed to predict. To see the two windows in action, consider the following data.
预测神经网络使用其输入接受当前数据的信息,并利用其输出预测未来数据。它使用两个窗口,一个将来的窗口和一个过去的窗口。两个窗口必须有一个窗口大小,就是预测或需要预测的数据量。要查看操作中的两个窗口,请考虑以下数据。


Day 1: 100
Day 2: 102
Day 3: 104
Day 4: 110
Day 5: 99
Day 6: 100
Day 7: 105
Day 8: 106
Day 9: 110
Day 10: 120
Consider a temporal neural network with a past window size of five and a future window size of two. This neural network would have five input neurons and two output neurons. We would break the above data among these windows to produce training data. The following data shows one such element of training data.
考虑一个过去的窗口大小为五和未来两个窗口大小的时序神经网络。这个神经网络有五个输入神经元和两个输出神经元。我们将在这些窗口中打乱上述数据以生成培训数据。下面的数据显示了这样一个训练数据的元素。


Input 1: Day 1: 100 ( input neuron )
Input 2: Day 2: 102 ( input neuron )
Input 3: Day 3: 104 ( input neuron )
Input 4: Day 4: 110 ( input neuron )
Input 5: Day 5: 99 ( input neuron )
Ideal 1: Day 6: 100 ( output neuron )
Ideal 2: Day 7: 105 ( output neuron )
Of course the data above needs to be normalized in some way before it can be fed to the neural network. The above illustration simply shows how the input and output neurons are mapped to the actual data. To get additional data, both windows are simply slid forward. The next element of training data would be as follows.
当然,上面的数据需要在某种程度上进行规范化,才能将它送入神经网络。上面的插图简单地显示了输入和输出神经元是如何映射到实际数据的。为了获得追加数据,窗口要向前挪。培训数据的下一个内容如下。


Input 1: Day 2: 102 ( input neuron )
Input 2: Day 3: 104 ( input neuron )
Input 3: Day 4: 110 ( input neuron )
Input 4: Day 5: 99 ( input neuron )
Input 5: Day 6: 100 ( input neuron )
Ideal 1: Day 7: 105 ( output neuron )
Ideal 2: Day 8: 106 ( output neuron )
You would continue sliding the past and future windows forward as you generate more training data. Encog contains specialized classes to prepare data in this format. Simply specify the size of the past, or input, window and the future, or output, window. These specialized classes will be discussed in the next section.
当生成更多的训练数据时,您将继续向前滑动过去和将来的窗口。在这个格式准备数据上encog包含专门类。只需指定过去或输入、窗口和未来或输出窗口的大小。这些专门类将在下一节讨论。


8.2 Using the Encog Temporal Dataset
8.2 使用Encog时序数据集


The Encog temporal dataset is contained in the following package:
Encog时序数据集位于下面的包里:


org.encog.neural.data.temporal
There are a total of four classes that make up the Encog temporal dataset. These classes are as follows:
Encog时序数据集由四个类组成:


? TemporalDataDescription
? TemporalError
? TemporalMLDataSet
? TemporalPoint
The TemporalDataDescription class describes one unit of data that is either used for prediction or output. The TemporalError class is an exception that is thrown if there is an error while processing the temporal data. The TemporalMLDataSet class operates just like any Encog dataset and allows the temporal data to be used for training. The TemporalPoint class represents one point of temporal data. To begin using a TemporalMLDataSet we must instantiate it as follows:
TemporalDataDescription类描述了一个单元的要么是用于预测或输出的数据。如果处理时态数据有错误,抛出TemporalError异常。TemporalMLDataSet类就像任何encog数据集,允许时序数据被用于训练。TemporalPoint类代表数据的一个时间点。要开始使用TemporalMLDataSet必须实例化如下:


TemporalMLDataSet result = new TemporalMLDataSet ([ past window size ] , [ future window size ] ) ;
The above instantiation specifies both the size of the past and future windows. You must also define one or more TemporalDataDescription objects. These define the individual items inside of the past and future windows. One single TemporalDataDescription object can function as both a past and a future window element as illustrated in the code below.
上面的实例化指定了过去和未来窗口的大小。您还必须定义一个或多个TemporalDataDescription对象。这些定义了过去和未来窗口中的单个项。一个TemporalDataDescription对象可以作为过去和未来的窗口元素,如下面的代码所示。


TemporalDataDescription desc = new TemporalDataDescription ([ calculation type ] , [ use for past ] , [ use for future ] ) ;
result.addDescription ( desc ) ;
To specify that a TemporalDataDescription object functions as both a past and future element, use the value true for the last two parameters. There are several calculation types that you can specify for each data description. These types are summarized here.
指定一个TemporalDataDescription对象既是一个过去和未来的元素,Boolean值为最后两个参数。对于每个数据描述,您可以指定几种计算类型。这里总结了这些类型。


? TemporalDataDescription.RAW
? TemporalDataDescription.PERCENT CHANGE
? TemporalDataDescription.DELTA CHANGE
The RAW type specifies that the data points should be passed on to the neural network unmodified. The PERCENT CHANGE specifies that each point should be passed on as a percentage change. The DELTA CHANGE specifies that each point should be passed on as the actual change between the two values. If you are normalizing the data yourself, you would use the RAW type. Otherwise, it is very likely you would use the PERCENT CHANGE type.
RAW类型指定数据点应传递给未修改的神经网络。PERCENT CHANGE指定每个点应按百分比变化传递。DELTA CHANGE指定每个点应作为两个值之间的实际变化传递。如果您自己对数据进行规范化,那么您将使用RAW类型。否则,您很可能会使用PERCENT CHANGE类型。


Next, provide the raw data to train the temporal network from. To do this, create TemporalPoint objects and add them to the temporal dataset. Each TemporalPoint object can contain multiple values, i.e. have the same number of values in each temporal data point as in the TemporalDataDescription objects. The following code shows how to define a temporal data point.
接下来,提供原始数据来训练时序网络。要做到这一点,创造TemporalPoint对象并将其添加到时序数据集。每个TemporalPoint对象可以包含多个值,和在TemporalDataDescription对象里一样,即在每个时间数据点里数值的数量相同。下面的代码展示了如何定义一个时态数据点。


TemporalPoint point = new TemporalPoint ( [ number of values ] ) ;
point.setSequence ( [ a sequence number ] ) ;
point.setData (0 , [ value 1 ] ) ;
point.setData (1 , [ value 2 ] ) ;
result.getPoints ().add ( point ) ;
Every data point should have a sequence number in order to sort the data points. The setData method calls allow the individual values to be set and should match the specified number of values in the constructor.Finally, call the generate method. This method takes all of the temporal points and creates the training set. After generate has been called, the TemportalMLDataSet object can be use for training.
每个数据点都应该有一个序列号,以便对数据点进行排序。调用SetData方法时允许设置单独值,应匹配在构造函数里指定的值。最后,调用生成方法。该方法利用所有的时间点创建训练集。生成被调用后,TemportalMLDataSet对象可以被用于培训。


result.generate () ;
The next section will use a TemportalMLDataSet object to predict sunspots.
下一节将使用TemportalMLDataSet对象预测太阳黑子。


8.3 Application to Sunspots
8.3 应用到太阳黑子


In this section we will see how to use Encog to predict sunspots, which are fairly periodic and predictable. A neural network can learn this pattern and predict the number of sunspots with reasonable accuracy. The output from the sunspot prediction program is shown below. Of course, the neural network first begins training and will train until the error rate falls below six percent.
在这一节中我们将看到如何使用encog预测太阳黑子,这个的周期性和可预见性很强。神经网络可以学习这种模式,并以合理的精度预测太阳黑子的数量。太阳黑子预报程序的输出如下所示。当然,神经网络首先开始训练,直到错误率降到百分之六以下为止。


Epoch #1 Error :0.39165411390480664
Epoch #2 Error :1.2907898518116008
Epoch #3 Error :1.275679853982214
Epoch #4 Error :0.8026954615095163
Epoch #5 Error :0.4999305514145095
Epoch #6 Error :0.468223450164209
Epoch #7 Error :0.22034289938540677
Epoch #8 Error :0.2406776630699879
...
Epoch #128 Error :0.06025613803011326
Epoch #129 Error :0.06002069579351901
Epoch #130 Error :0.059830227113598734
Year  Actual  Predict Closed Loop Predict
1960  0.5723  0.5547  0.5547
1961  0.3267  0.4075  0.3918
1962  0.2577  0.1837  0.2672
1963  0.2173  0.1190  0.0739
1964  0.1429  0.1738  0.1135
1965  0.1635  0.2631  0.3650
1966  0.2977  0.2327  0.4203
1967  0.4946  0.2870  0.1342
1968  0.5455  0.6167  0.3533
1969  0.5438  0.5111  0.6415
1970  0.5395  0.3830  0.4011
1971  0.3801  0.4072  0.2469
1972  0.3898  0.2148  0.2342
1973  0.2598  0.2533  0.1788
1974  0.2451  0.1686  0.2163
1975  0.1652  0.1968  0.2064
1976  0.1530  0.1470  0.2032
1977  0.2148  0.1533  0.1751
1978  0.4891  0.3579  0.1014
Once the network has been trained, it tries to predict the number of sunspots between 1960 and 1978. It does this with at least some degree of accuracy. The number displayed is normalized and simply provides an idea of the relative number of sunspots. A larger number indicates more sunspot activity; a lower number indicates less sunspot activity.
一旦网络得到训练,它就试图预测1960到1978年间太阳黑子的数量。这样做至少有一定程度的准确性。显示的数字是标准化的,只提供了太阳黑子相对数量的概念。较大的数字表示太阳黑子活动较多;较低的数字表明太阳黑子活动较少。


There are two prediction numbers given: the regular prediction and the closed-loop prediction. Both prediction types use a past window of 30 and a future window of 1. The regular prediction simply uses the last 30 values from real data. The closed loop starts this way and, as it proceeds, its own predictions become the input as the window slides forward. This usually results in a less accurate prediction because any mistakes the neural network makes are compounded.
给出了两个预测数:规则预测和闭环预测。两种预测类型都使用了过去的30窗口和未来的1窗口。规则预测仅使用实际数据中的最后30个值。闭环以这种方式启动:当它前进时,它自己的预测成为窗口向前滑动时的输入。这通常导致一个不太准确的预测,因为任何神经网络的错误都会叠加。


We will now examine how this program was implemented. This program can be found at the following location.
我们现在将研究这个程序是如何实现的。这个程序可以在下面的位置找到。


org.encog.examples.neural.predict.sunspot.PredictSunspot
As you can see, the program has sunspot data hardcoded near the top of the file. This data was taken from a C-based neural network example program. You can find the original application at the following URL:
你可以看到,在文件的顶部程序有太阳黑子数据硬编码。这个数据是来自于一个基于C的神经网络的例子程序。您可以在下面的URL中找到原始应用程序:


http://www.neural-networks-at-your-fingertips.com/bpn.html
The older, C-based neural network example was modified to make use of Encog. You will notice that the Encog version is much shorter than the Cbased version. This is because much of what the example did was already implemented in Encog. Further, the Encog version trains the network faster because it makes use of resilient propagation, whereas the C-based example makes use of backpropagation.
旧的,基于C的神经网络的例子被修改来使用encog。你会注意到encog版本比基于C版短得多。这是因为例子所做的大部分已经在Encog实现。此外,encog版本的网络训练速度更快因为它利用弹性传播,而C的例子,利用反向传播算法。


This example goes through a two-step process for using the data. First, the raw data is normalized. Then, this normalized data is loaded into a TemportalMLDataSet object for temporal training. The normalizeSunspots method is called to normalize the sunspots. This method is shown below. 
这个例子通过两个步骤来使用数据。首先,规范化原始数据。然后这种规范化的数据加载到一个TemportalMLDataSet时序训练对象。normalizeSunspots方法来规范太阳黑子。此方法如下。


public void normalizeSunspots (double lo , double hi ) {
The hi and lo parameters specify the high and low range to which the sunspots should be normalized. This specifies the normalized sunspot range. Normalization was discussed in Chapter 2. For this example, the lo value is 0.1 and the high value is 0.9.
HI和LO参数指定太阳黑子正常化的高和低范围。这指定了规范化的太阳黑子范围。第2章讨论了规范化。在本例中,LO值为0.1,高值为0.9。


To normalize these arrays, create an instance of the NormalizeArray class. This object will allow you to quickly normalize an array. To use this object, simply set the normalized high and low values, as follows.
创建的NormalizeArray类的一个实例来规范这些数组。该对象将允许您快速地对数组进行规范化。若要使用此对象,只需设置规范化的高和低值,如下所示。


NormalizeArray norm = new NormalizeArray ( ) ;
norm.setNormalizedHigh ( hi ) ;
norm.setNormalizedLow ( l o ) ;
The array can now be normalized to this range by calling the process method.
调用的方法能规范数组的值到一个指定的范围。


normalizedSunspots = norm.process(SUNSPOTS) ;
Now copy the normalized sunspots to the closed loop sunspots.
closedLoopSunspots = EngineArray.arrayCopy (normalizedSunspots) ;
Initially, the closed-loop array starts out the same as the regular prediction. However, its predictions will used to fill this array. Now that the sunspot data has been normalized, it should be converted to temporal data. This is done by calling the generateTraining method, shown below.
最初,闭环数组的启动与常规预测相同。然而,它的预测会填充这个数组。现在太阳黑子数据已经被规范化,它应该被转换成时序数据。这是通过调用generateTraining方法完成,如下图所示。


public MLDataSet generateTraining ( ){
This method will return an Encog dataset that can be used for training. First a TemporalMLDataSet is created and past and future window sizes are specified.
该方法将返回一个encog数据集,可用于训练。首先是创造了一个指定过去和未来窗口大小的TemporalMLDataSet。


TemporalMLDataSet result = new TemporalMLDataSet (WINDOW SIZE, 1 ) ;
We will have a single data description. Because the data is already normalized, we will use RAW data. This data description will be used for both input and prediction, as the last two parameters specify. Finally, we add this description to the dataset.
我们将有一个单一的数据描述。因为数据已经规范化了,所以我们将使用原始数据。此数据描述将用于输入和预测,因为最后两个参数指定。最后,我们将这个描述添加到数据集中。


TemporalDataDescription desc = new TemporalDataDescription (
TemporalDataDescription.Type.RAW, true , true ) ;
result.addDescription ( desc ) ;
It is now necessary to create all of the data points. We will loop between the starting and ending year, which are the years used to train the neural network. Other years will be used to test the neural network’s predictive ability.
现在需要创建所有的数据点。我们将在开始和结束的一年之间循环,这是用来训练神经网络的年份。其他年份将被用来测试神经网络的预测能力。


for ( int year = TRAIN START ; year<TRAIN END ; year++){
Each data point will have only one value to predict the sunspots. The sequence is the year, because there is only one sunspot sample per year.
每个数据点只有一个值来预测太阳黑子。序列是一年,因为每年只有一个太阳黑子样本。


TemporalPoint point = new TemporalPoint ( 1 ) ;
point.setSequence( year ) ;
The one value we are using is the normalized number of sunspots. This number is both what we use to predict from past values and what we hope to predict in the future.
我们使用的值是太阳黑子的规范化数。这个数字既可以用来预测过去的价值,也可以用来预测未来的价值。


point.setData (0 , this.normalizedSunspots [ year ] ) ;
result.getPoints ( ).add ( point ) ;
}
Finally, we generate the training set and return it.
result.generate ( ) ;
return result ;
}
The data is now ready for training. This dataset is trained using resilient propagation. This process is the same as those used many times earlier in this book. Once training is complete, we will attempt to predict sunspots using the application. This is done with the predict method, which is shown here.
这些数据现在已经准备好进行训练。该数据集使用弹性传播进行训练。这个过程与本书中多次使用的过程相同。一旦训练完成,我们将尝试使用应用程序来预测太阳黑子。这是用预测方法预测的结果,在这里展示。


public void predict ( BasicNetwork network ){
First, we create a NumberFormat object so that the numbers can be properly formatted. We will display four decimal places.
NumberFormat f = NumberFormat.getNumberInstance ( ) ;
f.setMaximumFractionDigits (4) ;
f.setMinimumFractionDigits (4) ;
We display the heading for the table and begin to loop through the evaluation years.
System.out.println ( ”Year \ tActual \ tPredict \ tClosed Loop Predict ” );
for ( int year=EVALUATE START; year<EVALUATE END; year++){
We create input into the neural network based on actual data, which will be the actual prediction. We extract 30 years worth of data for the past window.
我们根据实际数据建立神经网络的输入,这将是实际的预测。我们为过去的窗口提取了30年的数据。


MLData input = new BasicMLData (WINDOW SIZE) ;
for ( int i =0; i<input.size () ; i++){
input.setData ( i , this.normalizedSunspots [
( year?WINDOW SIZE)+i ] ) ;}
The neural network is presented with the data and we retrieve the prediction.
MLData output = network.compute ( input ) ;
double prediction = output.getData (0) ;
The prediction is saved to the closed-loop array for use with future predictions.
this.closedLoopSunspots [ year ] = prediction ;
We will now calculate the closed-loop value. The calculation is essentially the same except that the closed-loop data, which is continually modified, is used. Just as before, we use 30 years worth of data.
现在我们计算闭环值。计算基本上是相同的,除了闭环数据,这是不断修改,使用。和以前一样,我们使用30年的数据。


for ( int i =0; i<input.size () ; i++){
input.setData ( i , this.closedLoopSunspots [
( year?WINDOW SIZE)+i ] ) ;}
We compute the output.
output = network.compute ( input ) ;
double closedLoopPrediction = output.getData (0) ;
Finally, we display the closed-loop prediction, the regular prediction and the actual value.
System.out.println ((STARTING YEAR+year )
+”\ t ”+f.format ( this.normalizedSunspots [ year ] )
+”\ t ”+f.format ( prediction )+” \ t ”+f.format (
closedLoopPrediction ) ) ;}}
This will display a list of all of the sunspot predictions made by Encog. In the next section we will see how Encog can automatically pull current market information and attempt to predict stock market directions.
这将显示一个由encog做出的太阳黑子预测组成的列表。在下一节中我们将看到encog如何可以自动拉取当前的市场信息和试图预测股市的方向。


8.4 Using the Encog Market Dataset
8.4 使用Encog市场数据集


Encog also includes a dataset specifically designed for stock market data. This dataset is capable of downloading data from external sources. Currently, the only external source included in Encog is Yahoo Finance. The Encog market dataset is built on top of the temporal dataset and most classes in the Encog market dataset descend directly from corresponding classes in the temporal data set. The following classes make up the Encog Market Dataset package:
encog还包括一个专门的股票市场数据集。此数据集能够从外部源下载数据。目前,唯一的外部来源包括Encog雅虎金融。encog市场数据是建立在时间数据顶部,encog市场数据集的大多数类直接来源于时序数据集中的相应类。下面的类组成encog市场数据包:


? MarketDataDescription
? MarketDataType
? MarketError
? MarketMLDataSet
? MarketPoint
? TickerSymbol
The MarketDataDescription class represents one piece of market data that is part of either the past or future window. It descends from the TemporalDataDescription class.  It consists primarily of a TickerSymbol object and a MarketDataType enumeration. The ticker symbol specifies the security to include and the MarketDataType specifies what type of data from this security to use. The available data types are listed below.
marketdatadescription类代表一个市场的数据,无论是过去或未来的窗口。它来源于TemporalDataDescription类。主要是由一个股票代码对象和一个marketdatatype枚举组成。股票代码指定证券,marketdatatype指定使用来自这个证券数据的什么类型。可用数据类型如下所示。


? OPEN - The market open for the day.
? CLOSE - The market close for the day.
? VOLUME - The volume for the day.
? ADJUSTED CLOSE - The adjusted close. Adjusted for splits and dividends.
? HIGH - The high for the day.
? LOW - The low for the day.
These are the market data types criteria currently supported by Encog. They are all represented inside of the MarketDataType enumeration. The MarketMLDataSet class is descended from the TemporalMLDataSet. This is the main class when creating market-based training data for Encog. This class is an Encog dataset and can be trained. If any errors occur, the MarketError exception will be thrown.
这些都是encog目前支持的市场数据类型标准。在marketdatatype枚举内部对他们有表示。marketmldataset类是来自TemporalMLDataSet。在为encog创建以市场为基础的训练数据时,这是主要的类。这个类是一个encog数据集,可以训练。如果出现任何错误,则抛出异常的marketerror。


The MarketPoint class descends from the TemporalPoint. You will usually not deal with this object directly, as Encog usually downloads market data from Yahoo Finance. The following code shows the general format for using the MarketMLDataSet class. First, create a loader. Currently, the YahooFinanceLoader is the only public loader available for Encog.
MarketPoint来自TemporalPoint MarketPoint类。你通常不会直接处理该对象,encog通常从雅虎财经下载市场数据。下面的代码展示了使用marketmldataset类的一般格式。首先,创建一个加载程序。目前,该yahoofinanceloader是唯一可用的encog公共下载程序。


MarketLoader loader = new YahooFinanceLoader () ;
Next, we create the market dataset. We pass the loader, as well as the size of the past and future windows.
MarketMLDataSet market = new MarketMLDataSet (loader , [ past window size ] , [ future window size ] ) ;
Next create a MarketDataDescription object. To do this, specify the needed ticker symbol and data type. The last two true values at the end specify that this item is used both for past and predictive purposes.
然后新建一个marketdatadescription对象。为此,请指定所需的股票代码和数据类型。末尾的最后两个真值指定此项用于过去和预测目的。


final MarketDataDescription desc = new MarketDataDescription ([ ticker ] , [ data type needed ] , true , true ) ;
We add this data description to the dataset.
market.addDescription ( desc ) ;
We can add additional descriptions as needed. Next, load the market data and generate the training data.
market.load ( [ begin date ] , [ end date ] ) ;
market.generate ( ) ;
As shown in the code, the beginning and ending dates must be specified. This tells Encog the range from which to generate training data.
如代码所示,必须指定开始日期和结束日期。这告诉encog,生成训练数据的范围。


8.5 Application to the Stock Market
8.5 应用到股票市场


We will now look at an example of applying Encog to stock market prediction. This program attempts to predict the direction of a single stock based on past performance. This is a very simple stock market example and is not meant to offer any sort of investment advice.
现在我们将看看应用encog预测股市的例子。这个程序试图根据过去的表现来预测单个股票的方向。这是一个非常简单的股票市场例子,不提供任何类型的投资建议。


First, let’s explain how to run this example. There are four distinct modes in which this example can be run, depending on the command line argument that was passed. These arguments are summarized below.
首先,让我们解释一下如何运行这个示例。这个示例可以运行四种不同的模式,这取决于传递的命令行参数。这些论点概括如下。


? generate - Download financial data and generate training file.
? train - Train the neural network.
? evaluate - Evaluate the neural network.
? prune - Evaluate try a number of different architectures to determine the best configuration.
To begin the example you should run the main class, which is named MarketPredict. The following sections will show how this example generates data, trains and then evaluates the resulting neural network. This application is located at the following location.
首先你应该运行主类的实例,即MarketPredict。下面的部分将展示这个示例是如何生成数据、训练和评估神经网络的结果。此应用程序位于以下位置。


org.encog.examples.neural.predict.market.MarketPredict
Each of these modes to use this program will now be covered.
8.5.1 Generating Training Data
8.5.1 产生训练数据


The first step is to generate the training data. The example is going to download about eight years worth of financial information to train with. It takes some time to download and process this information. The data is downloaded and written to an Encog EG file. The class MarketBuildTraining provides this functionality. All work performed by this class is in the static method named generate. This method is shown below.
第一步是生成训练数据。该示例将下载大约八年的财务信息以供培训。下载和处理此信息需要一定的时间。数据下载和写入到一个Encog EG文件。类marketbuildtraining提供此功能。该类执行的所有工作都在静态方法中,命名为生成。此方法如下所示。


public static void generate (File dataDir ) {
This method begins by creating a YahooFinanceLoader that will load the requested financial data.
final MarketLoader loader = new YahooFinanceLoader ( ) ;
A new MarketMLDataSet object is created that will use the loader and a specified size for the past and future windows. By default, the program uses a future window size of one and a past window size of 10. These constants are all defined in the Config class. This is the way to control how the network is structured and trained by changing any of the values in the Config class.
一个新的marketmldataset对象创建时,将使用装载机和对过去和未来的Windows指定的大小。默认情况下,程序使用未来窗口大小1和过去窗口大小10。这些常量都是在配置类中定义的。这是通过改变配置类中的值来控制网络结构和训练的方法。


final MarketMLDataSet market = new MarketMLDataSet ( loader ,Config .INPUT WINDOW, Config .PREDICT WINDOW) ;
The program uses a single market value from which to make predictions. It will use the adjusted closing price of the specified security. The security that the program is trying to predict is specified in the Config class.
这个程序使用单一的市场值来进行预测。它将使用指定证券的调整收盘价。程序试图预测的证券在配置类中指定。


final MarketDataDescription desc = new MarketDataDescription (Config .TICKER, MarketDataType .ADJUSTED CLOSE, true , true ) ;
market.addDescription ( desc ) ;
The market data is now loaded beginning two years ago and ending two months prior to today. The last two months will be used to evaluate the neural network’s performance.
市场数据是从两年前开始的,在今天之前的两个月结束。最后两个月将用于评估神经网络的性能。


Calendar end = new GregorianCalendar ( ) ; // end today
Calendar begin = ( Calendar ) end.clone() ; // begin 30 days ago
begin.add ( Calendar .DATE, -60) ;
end.add ( Calendar .DATE, -60) ;
begin.add ( Calendar .YEAR, -2) ;
market.load(begin.getTime ( ) , end.getTime ( ) ) ;
market.generate() ;
We now save the training data to a binary EGB file. It is important to note that TemporalDataSet or any of its derived classes will persist raw numeric data, just as a BasicMLDataSet would. Only the generated data will be saved, not the other support objects such as the MarketDataDescription objects.
我们现在以二进制EGB文件保存训练数据。需要注意的是,temporaldataset或其派生类将持久原数值数据,像basicmldataset那样。只有产生的数据被保存,而不是其他的支持对象如marketdatadescription对象。


EncogUtility.saveEGB (new File (dataDir , Config.TRAINING FILE) , market ) ;
We will create a network to save to an EG file. This network is a simple feedforward neural network that may have one or two hidden layers. The sizes of the hidden layers are specified in the Config class.
我们将创建一个网络,并保存到EG文件中。这个网络是一个简单的前馈神经网络,它可能有一个或两个隐藏层。隐藏层的大小在配置类中指定。


final BasicNetwork network = EncogUtility.simpleFeedForward (market.getInputsize ( ) ,Config .HIDDEN1 COUNT,Config .HIDDEN2 COUNT,market.getIdealsize ( ) ,true ) ;
We now create the EG file and store the network to an EG file.
EncogDirectoryPersistence.saveObject (new File( dataDir , Config .NETWORK FILE) , network ) ;}
Later phases of the program, such as the training and evaluation phases, will use this file.
8.5.2 Training the Neural Network
8.5.2 训练神经网络


Training the neural network is very simple. The network and training data are already created and stored in an EG file. All that the training class needs to do is load both of these resources from the EG file and begin training. The MarketTrain class does this. The static method train performs all of the training. This method is shown here.
训练神经网络非常简单。网络和培训数据已经创建并存储在一个EG文件中。培训类所需要做的就是从EG文件中加载这两个资源并开始培训。markettrain类就是这样做的。静态方法训练完成所有的训练。此处显示此方法。


public static void train () {
The method begins by verifying whether the Encog EG file is present. Training data and the network will be loaded from here.
final File networkFile = new File ( dataDir , Config .NETWORK FILE) ;
final File trainingFile = new File ( dataDir , Config .TRAINING FILE) ;
// network file
i f (! networkFile.exists () ) {
System.out.println (”Can’t read file: ”+ networkFile.getAbsolutePath () ) ;
return ;}
Next, use the EncogDirectoryPersistence object to load the EG file. We will extract a network.
BasicNetwork network = (BasicNetwork) EncogDirectoryPersistence .
loadObject ( networkFile ) ;
Next, load the training file from disk. This network will be used for training.
// training file
i f (! trainingFile.exists () ) {
System.out.println (”Can ’ t read file : ” + trainingFile .
getAbsolutePath () ) ;
return ;}
final MLDataSet trainingSet = EncogUtility .loadEGB2Memory(trainingFile ) ;
The neural network is now ready to train. We will use EncogUtility training and loop for the number of minutes specified in the Config class. This is the same as creating a training object and using iterations, as was done previously in this book. The trainConsole method is simply a shortcut to run the iterations for a specified number of minutes.
神经网络现在已经准备好训练了。我们将用encogutility训练和在配置类指定的分钟数。这与创建一个培训对象和使用迭代一样,正如之前在本书中所做的一样。trainconsole是运行指定的分钟数的迭代的简单快捷方法。


// t r a i n the neural network
EncogUtility.trainConsole (network , trainingSet , Config .TRAINING MINUTES) ;
Finally, the neural network is saved back to the EG file.
System.out.println( ” Final Error : ” + network.calculateError(trainingSet));
System.out.println( ” Training complete , saving network.” ) ;
EncogDirectoryPersistence.saveObject ( networkFile , network ) ;
System.out.println( ”Network saved.” ) ;
Encog.getInstance ( ).shutdown ( ) ;
At this point, the neural network is trained. To further train the neural network, run the training again or move on to evaluating the neural network. If you train the same neural network again using resilient propagation, the error rate will initially spike. This is because the resilient propagation algorithm must reestablish proper delta values for training.
此时,训练神经网络。为了进一步训练神经网络,再次运行训练或继续评估神经网络。如果使用弹性传播再次训练相同的神经网络,错误率将开始尖峰。这是因为弹性传播算法必须重建适当的delta值来进行训练。


8.5.3 Incremental Pruning
8.5.3 增量修剪


One challenge with neural networks is determining the optimal architecture for the hidden layers. Should there be one hidden layer or two? How many neurons should be in each of the hidden layers? There are no easy answers to these questions.
神经网络的一个挑战是确定隐藏层的最佳架构。应该隐藏一层还是两层?每个隐藏层中应该有多少神经元?这些问题没有简单的答案。


Generally, it is best to start with a neural network with one hidden layer and double the number of hidden neurons as input neurons. There are some reports that suggest that the second hidden layer has no advantages, although this is often debated. Other reports suggest a second hidden layer can sometimes lead to faster convergence. For more information, see the hidden layer page on the Heaton Research wiki.
一般来说,最好是用一个隐层的神经网络和隐藏层神经元数目是输入神经元的两倍。有一些报告表明,第二层隐藏层没有优势,尽管这经常引起争议。其他报告表明第二隐藏层有时会导致更快的收敛速度。更多信息,见隐层的希顿研究Wiki页面。


http://www.heatonresearch.com/wiki/Hidden_Layers
One utility provided by Encog is the incremental pruning class. This class allows you to use a brute force technique to determine an optimal hidden layer configuration. Calling the market example with the prune argument will perform an incremental prune. This will try a number of different hidden layer configurations to attempt to find the best one. This command begins by loading a training set to memory.
Encog提供了一个增量修剪类。这个类允许您使用蛮力技术来确定最佳隐藏层配置。用“修剪”参数调用市场示例将执行增量剪枝。这将尝试一些不同的隐藏层配置,试图找到最好的一个。这个命令从加载一个训练集到内存开始。


MLDataSet training = EncogUtility.loadEGB2Memory( file ) ;
Next a pattern is created to specify the type of neural network to be created.
FeedForwardPattern pattern = new FeedForwardPattern ( ) ;
pattern.setInputNeurons ( training.getInputsize ( ) ) ;
pattern.setOutputNeurons ( training.getIdealsize ( ) ) ;
pattern.setActivationFunction (new ActivationTANH ( ) ) ;
The above code specifies the creation of feedforward neural networks using the hyperbolic tangent activation function. Next, the pruning object is created.
PruneIncremental prune = new PruneIncremental ( training , pattern ,
100 , 1 , 10 , new ConsoleStatusReportable ( ) ) ;
The object will perform 100 training iterations, try one weight for each, and have 10 top networks. The object will take the 10 best networks after 100 training iterations. The best of these 10 is chosen to be the network with the smallest number of links.
该对象将执行100次训练迭代,每次尝试一个权重,并拥有10个顶级网络。在100次训练迭代之后,对象将挑出10个最佳网络。这10个最好的被选择成为网络的最少链接数。


The user may also specify the number and sizes of the hidden layers to try. Each call to addHiddenLayer specifies the lower and upper bound to try. The first call to addHiddenLayer specifies the range for the first hidden layer. Here we specify to try hidden layer one sizes from 5 to 50. Because the lower point is not zero, we are required to have a first hidden layer. 
用户还可以指定隐藏层的数量和大小以供尝试。每次调用addhiddenlayer指定上限和下限来尝试。对addhiddenlayer第一次调用指定第一隐层的范围。在这里,我们指定尝试隐藏层的大小从5到50。因为低点不是零,所以我们必须有第一个隐藏层。


prune.addHiddenLayer (5 , 50) ;
Next we specify the size for the second hidden layer. Here we are trying hidden layers between 0 and 50 neurons. Because the low point is zero, we will also try neural networks with no second layer.
接下来,我们指定第二隐藏层的大小。下面我们试着隐藏层有0到50个神经元。由于低点是零,我们也将尝试没有第二层的神经网络。


prune.addHiddenLayer (0 , 50) ;
Now that the object has been setup we are ready to search. Calling the process method will begin the search.
prune.process ( ) ;
Once the search is completed you can call the getBestNetwork to get the best performing network. The following code obtains this network and saves it.
一旦搜索完成后你可以调用getbestnetwork得到最好的执行网络。下面的代码获取这个网络并保存它。


file networkFile = new file ( dataDir , Config .NETWORK FILE) ;
EncogDirectoryPersistence.saveObject ( networkFile , prune . getBestNetwork ( ) ) ;
We now have a neural network saved with a good combination of hidden layers and neurons. The pruning object does not train each network particularly well, as it is trying to search a large number of networks. At this point, you will want to further train this best network.
我们现在保存了一个有良好的隐藏层和神经元组合的神经网络。修剪对象并不特别训练每一个网络,因为它试图搜索大量的网络。在这一点上,您将需要进一步培训这个最好的网络。


8.5.4 Evaluating the Neural Network
8.5.4 评估神经网络


We are now ready to evaluate the neural network using the trained neural network from the last section and gauge its performance on actual current stock market data. The MarketEvaluate class contains all of the evaluation code.
我们现在已经准备好从上一节使用经过训练的神经网络来评估,并评估它在实际的股票市场数据上的表现。marketevaluate类包含所有的评价准则。


There are two important methods used during the evaluation process. The first is the determineDirection class. We are not attempting to determine the actual percent change for a security, but rather, which direction it will move the next day.
在评估过程中有两个重要的方法。首先是determinedirection类。我们不是试图确定一个证券的实际百分比变化,而是第二天它将朝哪个方向移动。


public static Direction determineDirection ( final double d) {
if ( d<0 )
return Direction.down ;
else
return Direction.up ;
}
This method simply returns an enumeration that specifies whether the stock price moved up or down. We will need some current market data to evaluate against. The grabData method obtains the necessary market data. It makes use of a MarketMLDataSet, just as the training does, to obtain some market data. This method is shown here.
此方法只返回一个枚举,指定股票价格是向上还是向下移动。我们需要一些目前的市场数据来评估。grabdata方法获得必要的市场数据。它使用一个marketmldataset,就像训练一样,获得一些市场数据。此处显示此方法。


public static MarketMLDataSet grabData ( ) {
Just like the training data generation, market data is loaded from a YahooFinanceLoader object.
MarketLoader loader = new YahooFinanceLoader () ;
MarketMLDataSet result = new MarketMLDataSet (loader ,Config .INPUT WINDOW,Config .PREDICT WINDOW) ;
We create exactly the same data description as was used for training: the adjusted close for the specified ticker symbol. Past and future data are also desired. By feeding past data to the neural network, we will see how well the output matches the future data.
我们创建与培训相同的数据描述:用于指定股票代码的调整关闭。也希望得到过去和将来的数据。通过将过去的数据输入到神经网络,我们将看到输出与未来数据的匹配程度。


MarketDataDescription desc = new MarketDataDescription (Config .TICKER, MarketDataType .ADJUSTED CLOSE, true , true ) ;
result.addDescription ( desc ) ;
Choose what date range to evaluate the network. We will grab the last 60 days worth of data.
Calendar end = new GregorianCalendar () ; // end today
Calendar begin = ( Calendar ) end.clone () ; // begin 60 days ago
begin.add ( Calendar .DATE, ?60) ;
The market data is now loaded and generated by using the load method call.
result.load ( begin.getTime () , end.getTime () ) ;
result.generate () ;
return result ;
}
The resulting data is returned to the calling method. Now that we have covered the support methods, it is time to learn how the actual training occurs. The static method evaluate performs the actual evaluation. This method is shown below.
生成的数据被返回给调用方法。既然我们已经讨论了支持的方法,现在是学习实际训练如何发生的时候了。静态方法评价进行实际评价。此方法如下所示。


public static void evaluate () {
First, make sure that the Encog EG file exists.
file file = new file ( Config .FILENAME) ;
if ( ! file.exists()) {
System.out.println( ”Can ’ t read file : ” + file.getAbsolutePath ( )) ;
return ;
}
EncogPersistedCollection encog = new EncogPersistedCollection (file ) ;
Then, we load the neural network from the EG file. Use the neural network
that was trained in the previous section.
BasicNetwork network = ( BasicNetwork ) EncogDirectoryPersistence .
loadObject ( file ) ;
Load the market data to be used for network evaluation. This is done using the grabData method discussed earlier in this section.
MarketMLDataSet data = grabData ( ) ;
Use a formatter to format the percentages.
DecimalFormat format = new DecimalFormat ( ” #0.0000 ” ) ;
During evaluation, count the number of cases examined and how many were correct.
int count = 0 ;
int correct = 0 ;
Loop over all of the loaded market data.
for ( MLDataPair pair : data ) {
Retrieve one training pair and obtain the actual data as well as what was predicted. The predicted data is determined by running the network using the compute method.
检索一对训练集,获得实际数据和预测结果。预测数据是通过使用计算方法运行网络来确定的。


MLData input = pair.getInput ( ) ;
MLData actualData = pair.getIdeal( ) ;
MLData predictData = network.compute ( input ) ;
Now retrieve the actual and predicted data and calculate the difference. This establishes the accuracy off the neural network predicting the actual price change.
现在检索实际数据和预测数据,计算差异。这就建立了神经网络预测实际价格变化的准确性。


double actual = actualData.getData ( 0 ) ;
double predict = predictData.getData ( 0 ) ;
double diff = Math.abs ( predict ?actual ) ;
Also calculate the direction the network predicted security takes versus the direction the security actually took.
Direction actual Direction = determineDirection ( actual ) ;
Direction predict Direction = determineDirection ( predict ) ;
If the direction was correct, increment the correct count by one. Either way, increment the total count by one.
i f ( actual Direction==predict Direction )
correct ++;
count++;
Display the results for each case examined.
System.out.println( ”Day ” + count+” : actual=”
+format.format ( actual )+” ( ”+actual Direction+” ) ” +” , predict=”
+format.format ( predict )+” ( ”+actual Direction+” ) ” +” , diff=”+diff ) ;
}
Finally, display stats on the overall accuracy of the neural network.
double percent = (double ) correct /(double ) count ;
System.out.println( ” Direction correct : ” + correct + ”/” + count ) ;
System.out.println( ” Direction a l Accuracy : ”+format.format ( percent ?100)+”%” ) ;
}
The following code snippet shows the output of this application when launched once. Because it uses data preceding the current date, the results will be different when run. These results occur because the program is attempting to predict percent movement on Apple Computer’s stock price.
下面的代码片段显示了一旦启动一个应用程序的输出。因为它使用当前日期之前的数据,运行时结果会不同。这些结果是因为该程序试图预测苹果电脑股票价格的移动百分比。


Day 1: actual =0.05(up) , predict =-0.09(up) , diff =0.1331431391626865
Day 2: actual =-0.02(down) , predict =0.15(down) , diff=0.1752316137707985
Day 3: actual =-0.04(down) , predict =-0.08(down) , diff=0.04318588896364293
Day 4: actual =0.04(up) , predict =-0.13(up) , diff =0.167230163960771
Day 5: actual =0.04(up) , predict =0.08(up) , diff =0.041364210497886064
Day 6: actual =-0.05(down) , predict =-0.15(down) , diff=0.09856291235302134
Day 7: actual =0.03(up) , predict =0.02(up) , diff =0.0121349208067498
Day 8: actual =0.06(up) , predict =0.14(up) , diff =0.07873950162422072
Day 9: actual =0.00(up) , predict =-0.04(up) , diff =0.044884229765456175
Day 10: actual =-0.02(down) , predict =-0.11(down) , diff=0.08800357702537594
Day 11: actual =-0.03(down) , predict =0.10(down) , diff=0.1304932331559785
Day 12: actual =0.03(up) , predict =-0.00(up) , diff =0.03830226924277358
Day 13: actual =-0.04(down) , predict =-0.03(down) , diff=0.006017023124087514
Day 14: actual =0.01(up) , predict =-0.00(up) , diff =0.011094798099546017
Day 15: actual =-0.07(down) , predict =0.10(down) , diff=0.1634993352860712
Day 16: actual =0.00(up) , predict =0.09(up) , diff =0.08529079398874763
Day 17: actual =0.01(up) , predict =0.08(up) , diff =0.07476901867409716
Day 18: actual =-0.05(down) , predict =0.10(down) , diff=0.14462998342498684
Day 19: actual =0.01(up) , predict =0.01(up) , diff =0.0053944458622837204
Day 20: actual =-0.02(down) , predict =0.16(down) , diff=0.17692298105888082
Day 21: actual =0.01(up) , predict =0.01(up) , diff =0.003908063600862748
Day 22: actual =0.01(up) , predict =0.05(up) , diff =0.04043842368088156
Day 23: actual =-0.00(down) , predict =0.05(down) , diff=0.05856519756505361
Day 24: actual =-0.01(down) , predict =-0.01(down) , diff=0.0031913517175624975
Day 25: actual =0.06(up) , predict =0.03(up) , diff =0.02967685979492382
Day 26: actual =0.04(up) , predict =-0.01(up) , diff =0.05155871532643232
Day 27: actual =-0.02(down) , predict =-0.09(down) , diff=0.06931714317358993
Day 28: actual =-0.02(down) , predict =-0.04(down) , diff=0.019323500655091908
Day 29: actual =0.02(up) , predict =0.06(up) , diff =0.04364949212592098
Day 30: actual =-0.02(down) , predict =-0.06(down) , diff=0.036886336426948246
Direction correct :18/30
Directional Accuracy :60.00%
Here, the program had an accuracy of 60%, which is very good for this simple neural network. Accuracy rates generally range from 30-40% when this program was run at different intervals. This is a very simple stock market predictor and should not be used for any actual investing. It shows how to structure a neural network to predict market direction.
在这里,程序的准确率为60%,这是非常好的这个简单的神经网络。当这个程序以不同的间隔运行时,准确率一般在30-40%之间。这是一个非常简单的股市预测,不应该用于任何实际投资。它展示了如何构造一个神经网络来预测市场方向。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值