第十章 使用自组织映射

Chapter 10

Using a Self-Organizing Map

? What is a self-organizing map (SOM)?
? Mapping colors with a SOM
? Training a SOM
? Applying the SOM to the forest cover data
This chapter focuses on using Encog to implement a self-organizing map (SOM). A SOM is a special type of neural network that classifies data. Typically, a SOM will map higher resolution data to a single or multidimensional output. This can help a neural network see the similarities among its input data. Dr. Teuvo Kohonen of the Academy of Finland created the SOM. Because of this, the SOM is sometimes called a Kohonen neural network. Encog provides two different means by which SOM networks can be trained:
本章重点介绍用encog实现自组织映射(SOM)。SOM是一种特殊的神经网络,它对数据进行分类。通常,SOM将映射更高分辨率的数据映射到单维或多维输出。这可以帮助神经网络看到输入数据之间的相似之处。芬兰书院的Teuvo Kohonen博士创建的SOM。正因为如此,SOM有时称为Kohonen神经网络。encog提供了两种SOM网络可以训练的不同方法:

? Neighborhood Competitive Training
? Cluster Copy
Both training types are unsupervised. This means that no ideal data is provided. The network is simply given the data and the number of categories that data should be clustered into. During training, the SOM will cluster all of the training data. Additionally, the SOM will be capable of clustering new data without retraining.

The neighborhood competitive training method implements the classic SOM training model. The SOM is trained using a competitive, unsupervised training algorithm. Encog implements this training algorithm using the BasicTrainSOM class. This is a completely different type of training then those previously used in this book. The SOM does not use a training set or scoring object. There are no clearly defined objectives provided to the neural network at all. The only type of “objective” that the SOM has is to group similar inputs together.

The second training type provided by Encog is a cluster copy. This is a very simple training method that simply sets the weights into a pattern to accelerate the neighborhood competitive training. This training method can also be useful with a small training set where the number of training set elements exactly matches the number of clusters. The cluster copy training method is implemented in the SOMClusterCopyTraining class.

The first example in this chapter will take colors as input and map similar colors together. This GUI example program will visually show how similar colors are grouped together by the self-organizing map. The output from a self-organizing map is topological. This output is usually viewed in an n-dimensional way. Often, the output is single dimensional, but it can also be two-dimensional, three-dimensional, even four-dimensional or higher. This means that the “position” of the output neurons is important. If two output neurons are closer to each other, they will be trained together more so than two neurons that are not as close. 

All of the neural networks that examined so far in this book have not been topological. In previous examples from this book, the distance between neurons was unimportant. Output neuron number two was just as significant to output neuron number one as was output neuron number 100.

10.1 The Structure and Training of a SOM
10.1 SOM的结构和训练

An Encog SOM is implemented as a two-layer neural network. The SOM simply has an input layer and an output layer. The input layer maps data to the output layer. As patterns are presented to the input layer, the output neuron with the weights most similar to the input is considered the winner.
一个encog SOM是两层的神经网络实现。SOM简单地有一个输入层和一个输出层。输入层将数据映射到输出层。当模式传入输入层,带有与输入最相似权值的输出神经元被认为是获胜者。

This similarity is calculated by comparing the Euclidean distance between eight sets of weights and the input neurons. The shortest Euclidean distance wins. Euclidean distance calculation is covered in the next section. There are no bias values in the SOM as in the feedforward neural network. Rather, there are only weights from the input layer to the output layer. Additionally, only a linear activation function is used.

10.1.1 Structuring a SOM
10.1.1 构建SOM

We will study how to structure a SOM. This SOM will be given several colors to train on. These colors will be expressed as RGB vectors. The individual red, green and blue values can range between -1 and +1. Where -1 is no color, or black, and +1 is full intensity of red, green or blue. These three-color components comprise the neural network input.
我们将研究如何构造SOM。这个SOM将被赋予几种颜色来训练。这些颜色将用RGB向量表示。单个红色、绿色和蓝色值可以介于-1和1之间。- 1没有颜色,或黑色,1是完全强度的红色,绿色或蓝色。这三个颜色分量组成神经网络输入。

The output is a 2,500-neuron grid arranged into 50 rows by 50 columns. This SOM will organize similar colors near each other in this output grid. Figure 10.1 shows this output.

The above figure may not be as clear in black and white editions of this book as it is in color. However, you can see similar colors grouped near each other. A single, color-based SOM is a very simple example that allows you to visualize the grouping capabilities of the SOM.

10.1.2 Training a SOM
10.1.2 训练SOM

How is a SOM trained? The training process will update the weight matrix, which is 3 x2,500. Initialize the weight matrix to random values to start. Then 15 training colors are randomly chosen.

Just like previous examples, training will progress through a series of iterations. However, unlike feedforward neural networks, SOM networks are usually trained with a fixed number of iterations. For the colors example in this chapter, we will use 1,000 iterations.

Begin training the color sample that we wish to train for by choosing one random color sample per iteration. Pick one output neuron whose weights most closely match the basis training color. The training pattern is a vector of three numbers. The weights between each of the 2,500 output neurons and the three input neurons are also a vector of three numbers. Calculate the Euclidean distance between the weight and training pattern. Both are a vector of three numbers. This is done with Equation 10.1.

This is very similar to Equation 2.3, shown in Chapter 2. In the above equation the variable p represents the input pattern. The variable w represents the weight vector. By squaring the differences between each vector component and taking the square root of the resulting sum, we realize the Euclidean distance. This measures how different each weight vector is from the input training pattern.

This distance is calculated for every output neuron. The output neuron with the shortest distance is called the Best Matching Unit (BMU). The BMU is the neuron that will learn the most from the training pattern. The neighbors of the BMU will learn less. Now that a BMU is determined, loop over all weights in the matrix. Update every weight according to Equation 10.2.

In the above equation, the variable t represents time, or the iteration number. The purpose of the equation is to calculate the resulting weight vector Wv(t+1). The next weight will be calculated by adding to the current weight, which is Wv(t). The end goal is to calculate how different the current weight is from the input vector. The clause D(T)-Wv(t) achieves this. If we simply added this value to the weight, the weight would exactly match the input vector. We don’t want to do this. As a result, we scale it by multiplying it by two ratios. The first ratio, represented by theta, is the neighborhood function. The second ratio is a monotonically decreasing learning rate.
在上述方程中,变量t表示时间,或迭代次数。该方程的目的是计算得到的权重向量WV(T + 1)。接下来的权重会增加当前权重Wv(t)。最终目标是计算当前权重与输入向量的不同。语句D(T)- WV(T)达到此目的。如果我们简单地将这个值添加到权重,权重将正好匹配输入向量。我们不想这样做。我们把它乘以两个比例。以θ表示的第一个比率是邻域函数。第二个比率是单调递减的学习率。

The neighborhood function considers how close the output neuron we are training is to the BMU. For closer neurons, the neighborhood function will be close to one. For distant neighbors the neighborhood function will return zero.This controls how near and far neighbors are trained. 

We will look at how the neighborhood function determines this in the next section. The learning rate also scales how much the output neuron will learn. This learning rate is similar to the learning rate used in backpropagation training. However, the learning rate should decrease as the training progresses.

This learning rate must decrease monotonically, meaning the function output only decreases or remains the same as time progresses. The output from the function will never increase at any interval as time increases.

10.1.3 Understanding Neighborhood Functions
10.1.3 了解邻域函数

The neighborhood function determines to what degree each output neuron should receive training from the current training pattern. The neighborhood function will return a value of one for the BMU. This indicates that it should receive the most training of any neurons. Neurons further from the BMU will receive less training. The neighborhood function determines this percent.

If the output is arranged in only one dimension, a simple one-dimensional neighborhood function should be used. A single dimension self-organizing map treats the output as one long array of numbers. For instance, a single dimension network might have 100 output neurons that are simply treated as a long, single dimension array of 100 values.

A two-dimensional SOM might take these same 100 values and treat them as a grid, perhaps of 10 rows and 10 columns. The actual structure remains the same; the neural network has 100 output neurons. The only difference is the neighborhood function. The first would use a single dimensional neighborhood function; the second would use a two-dimensional neighborhood function. The function must consider this additional dimension and factor it into the distance returned.

It is also possible to have three, four, and even more dimensional functions for the neighborhood function. Two-dimension is the most popular choice. Single dimensional neighborhood functions are also somewhat common. Three or more dimensions are more unusual. It really comes down to computing how many ways an output neuron can be close to another. Encog supports any number of dimensions, though each additional dimension adds greatly to the amount of memory and processing power needed.

The Gaussian function is a popular choice for a neighborhood function. The Gaussian function has single- and multi-dimensional forms. The singledimension Gaussian function is shown in Equation 10.3.

The graph of the Gaussian function is shown in Figure 10.2.

The above figure shows why the Gaussian function is a popular choice for a neighborhood function. If the current output neuron is the BMU, then its distance (x-axis) will be zero. As a result, the training percent (y-axis) is 100%. As the distance increases either positively or negatively, the training percentage decreases. Once the distance is great enough, the training percent is near zero.

There are several constants in Equation 10.3 that govern the shape of the Gaussian function. The constants a is the height of the curve’s peak, b is the position of the center of the peak, and c constants the width of the ”bell”. The variable x represents the distance that the current neuron is from the BMU. The above Gaussian function is only useful for a one-dimensional output array. If using a two-dimensional output grid, it is important to use the twodimensional form of the Gaussian function. Equation 10.4 shows this.

The graph form of the two-dimensional form of the Gaussian function is shown
in Figure 10.3.

The two-dimensional form of the Gaussian function takes a single peak variable, but allows the user to specify separate values for the position and width of the curve. The equation does not need to be symmetrical.

How are the Gaussian constants used with a neural network? The peak is almost always one. To unilaterally decrease the effectiveness of training, the peak should be set below one. However, this is more the role of the learning rate. The center is almost always zero to center the curve on the origin. If the center is changed, then a neuron other than the BMU would receive the full learning. It is unlikely you would ever want to do this. For a multi-dimensional Gaussian, set all centers to zero to truly center the curve at the origin.

This leaves the width of the Gaussian function. The width should be set to something slightly less than the entire width of the grid or array. Then the width should be gradually decreased. The width should be decreased monotonically just like the learning rate.

10.1.4 Forcing a Winner
10.1.4 强制一个赢家

An optional feature to Encog SOM competitive training is the ability to force a winner. By default, Encog does not force a winner. However, this feature can be enabled for SOM training. Forcing a winner will try to ensure that each output neuron is winning for at least one of the training samples. This can cause a more even distribution of winners. However, it can also skew the data as somewhat “engineers” the neural network. Because of this, it is disabled by default.
对encog SOM竞技训练的一个可选功能是迫使赢家的能力。默认情况下,Encog没有强制一个赢家。但是,这个特性可以用于SOM训练。强迫一个胜利者将尝试确保每个输出神经元至少为一个训练样本获胜。这可能导致优胜者更均匀地分配。然而,它也可以将数据扭曲为某种“工程师”的神经网络。因此,默认情况下它是禁用的。

10.1.5 Calculating Error
10.1.5 计算误差

In propagation training we could measure the success of our training by examining the neural network current error. In a SOM there is no direct error because there is no expected output. Yet, the Encog interface Train exposes an error property. This property does return an estimation of the SOM error.

The error is defined to be the ”worst” or longest Euclidean distance of any BMUs. This value should be minimized as learning progresses. This gives a general approximation of how well the SOM has been trained.

10.2 Implementing the Colors SOM in Encog
10.2 使用Encog实现颜色SOM

We will now see how the color matching SOM is implemented. There are two classes that make up this example:
? MapPanel
? SomColors
The MapPanel class is used to display the weight matrix to the screen. The SomColors class extends the JPanel class and adds the MapPanel to itself for display. We will examine both classes, starting with the MapPanel.

10.2.1 Displaying the Weight Matrix
10.2.1 展示权重矩阵

The MapPanel class draws the GUI display for the SOM as it progresses. This relatively simple class can be found at the following location. 

The convertColor function is very important. It converts a double that contains a range of -1 to +1 into the 0 to 255 range that an RGB component requires. A neural network deals much better with -1 to +1 than 0 to 255. As a result, this normalization is needed.
convertcolor功能是非常重要的。它将包含1 - 1范围的double转换为RGB分量所需的0到255范围。一个神经网络处理—1到1比0到255要好得多。因此,需要进行这种规范化。

private int convertColor (double d ) {
double result = 128*d ;
result +=128;
result = Math.min ( result , 255) ;
result = Math.max( result , 0) ;
return ( int ) result ;
The number 128 is the midpoint between 0 and 255. We multiply the result by 128 to get it to the proper range and then add 128 to diverge from the midpoint. This ensures that the result is in the proper range. 

Using the convertColor method the paint method can properly draw the state of the SOM. The output from this function will be a color map of all of the neural network weights. Each of the 2,500 output neurons is shown on a grid. Their color is determined by the weight between that output neuron and the three input neurons. These three weights are treated as RGB color components. The convertColor method is shown here.

public void paint ( Graphics g )
Begin by looping through all 50 rows and columns.
for ( int y = 0 ; y< HEIGHT; y++) {
for ( int x = 0 ; x< WIDTH; x++) {
While the output neurons are shown as a two-dimensional grid, they are all stored as a one-dimensional array. We must calculate the current onedimensional index from the two-dimensional x and y values.

int index = ( y?WIDTH)+x ;
We obtain the three weight values from the matrix and use the convertColor method to convert these to RGB components.
int red = convertColor ( weights.get ( 0 , index ) ) ;
int green = convertColor ( weights.get ( 1 , index ) ) ;
int blue = convertColor ( weights.get ( 2 , index ) ) ;
These three components are used to create a new Color object.
g.setColor (new Color ( red , green , blue ) ) ;
A filled rectangle is displayed to display the neuron.
Once the loops complete, the entire weight matrix has been displayed to the screen.

10.2.2 Training the Color Matching SOM
10.2.2 训练颜色匹配SOM

The SomColors class acts as the main JPanel for the application. It also provides the neural network all of the training. This class can be found at the following location.

package org.encog.examples.neural.gui.som.SomColors
The BasicTrainSOM class must be set up so that the neural network will train. To do so, a neighborhood function is required. For this example, use the NeighborhoodGaussian neighborhood function. This neighborhood function can support a multi-dimensional Gaussian neighborhood function. The following line of code creates this neighborhood function.

this.gaussian = new NeighborhoodRBF (RBFEnum.Gaussian , MapPanel.WIDTH, MapPanel.HEIGHT) ;
This constructor creates a two-dimensional Gaussian neighborhood function. The first two parameters specify the height and width of the grid. There are other constructors that can create higher dimensional Gaussian functions. Additionally, there are other neighborhood functions provided by Encog. The most common is the NeighborhoodRBF. NeighborhoodRBF can use a Gaussian, or other radial basis functions. Below is the complete list of neighborhood functions.

? NeighborhoodBubble
? NeighborhoodRBF
? NeighborhoodRBF1D
? NeighborhoodSingle
The NeighborhoodBubble only provides one-dimensional neighborhood functions. A radius is specified and anything within that radius receives full training. The NeighborhoodSingle functions as a single-dimensional neighborhood function and only allows the BMU to receive training.

The NeighborhoodRBF class supports several RBF functions. The ”Mexican Hat” and ”Gaussian” RBF’s are common choices. However the Multiquadric and the Inverse Multiquadric are also available. We must also create a CompetitiveTraining object to make use of the neighborhood function.

this.train = new BasicTrainSOM ( this.network , 0.01 , null , gaussian );
The first parameter specifies the network to train and the second parameter is the learning rate. Automatically decrease the learning rate from a maximum value to a minimum value, so the learning rate specified here is not important. 

The third parameter is the training set. Randomly feed colors to the neural network, thus eliminating the need for the training set. Finally, the fourth parameter is the newly created neighborhood function.

The SOM training is provided for this example by a background thread. This allows the training to progress while the user watches. The background thread is implemented in the run method, as shown here.

public void run ( ) {
The run method begins by creating the 15 random colors to train the neural network. These random samples will be stored in the samples variable, which is a List.
List<MLData> samples = new ArrayList<MLData>() ;
The random colors are generated and have random numbers for the RGB components.
for ( int i =0; i <15; i++) {
MLData data = new BasicMLData (3) ;
data.setData (0 , RangeRandomizer.randomize ( -1 ,1) ) ;
data.setData (1 , RangeRandomizer.randomize ( -1 ,1) ) ;
data.setData (2 , RangeRandomizer.randomize ( -1 ,1) ) ;
samples.add( data ) ;
The following line sets the parameters for the automatic decay of the learning rate and the radius.
this.train.setAutoDecay (1000 , 0.8 , 0.003 , 30 , 5) ;
We must provide the anticipated number of iterations. For this example, the quantity is 1,000. For SOM neural networks, it is necessary to know the number of iterations up front. This is different than propagation training that trained for either a specific amount of time or until below a specific error rate. 

The parameters 0.8 and 0.003 are the beginning and ending learning rates. The error rate will be uniformly decreased from 0.8 to 0.003 over each iteration. It should reach close to 0.003 by the last iteration. 

Likewise, the parameters 30 and 5 represent the beginning and ending radius. The radius will start at 30 and should be near 5 by the final iteration. If more than the planned 1,000 iterations are performed, the radius and learning rate will not fall below their minimums.

for ( int i =0; i <1000; i++) {
For each competitive learning iteration, there are two choices. First, you can choose to simply provide an MLDataSet that contains the training data and call the iteration method CompetitiveTraining. Next we choose a random color index and obtain that color.

int idx = ( int ) (Math.random ( ) ? samples.s i z e ( ) ) ;
MLData c = samples.get ( idx ) ;
The trainPattern method will train the neural network for this random color pattern. The BMU will be located and updated as described earlier in this chapter.

this.train.trainPattern ( c ) ;
Alternatively, the colors could have been loaded into an MLDataSet object and the iteration method could have been used. However, training the patterns one at a time and using a random pattern looks better when displayed on the screen. Next, call the autoDecay method to decrease the learning rate and radius according to the parameters previously specified.

this.train.autoDecay ( ) ;
The screen is repainted.
this.map.repaint ( ) ;
Finally, we display information about the current iteration.
System.out.println ( ”Iteration” + i + ” , ”
+ this.train.toString() ) ;
This process continues for 1,000 iterations. By the final iteration, the colors will be grouped.




