Probabilistic Neural Networks
1. Parzen Windows Classifier
The development of the probabilistic neural network relies on Parzen windows classifiers.
The Parzen windows method is a non-parametric procedure that synthesizes an estimate of a probability density function (pdf) by superposition of a number of windows, replicas of a function (often the Gaussian).
The Parzen windows classifier takes a classification decision after calculating the probability density function of each class using the given training examples. The multicategory Classifier decision is expressed as follows:
pk fk > pj fj , for all j=/= k
where: pk is the prior probability of occurrence of examples from class k.
fk is the estimated pdf of class k.
The calculation of the pdf is performed with the following algorithm: add up the values of the d-dimensional Gaussians, evaluated at each training example, and scale the sum to produce the estimated probability density.
fk( x ) = ( 1/( 2p )d/2 sd) ( 1/N ) Si=1Nk exp[ - ( x-xki )T ( x-xki ) / ( 2s2) ]
where: xki is the d-dimensional i-th example from class k
This function fk( x) is a sum of small multivariate Gaussian probability distributions centered at each training example. Using probability distributions allows to achieve generalization beyond the provided examples. The index k in fk( x) indicates difference in spread between the distributions of the classes.
As the number of the training examples and their Gaussians increases the estimated pdf approaches the true pdf of the training set.
The classification decision is taken according to the inequality:
S i=1Nk exp[ - ( x-xki)T ( x-xki ) / ( 2s2)] > S i=1Nj exp[ - ( x-xji)T ( x-xji ) / ( 2s2 ) ], for all j=/=k.
which is derived using prior probabilities calculated as the relative frequency of the examples in each class:
pk = Nk / N.
where: N is the number of all training examples
Nk is the number of examples in class k.
The Parzen windows classifier uses the entire training set of examples to perform classification, that is it requires storage of the training set in the computer memory. The speed of computation is proportional to the training set size.
2. Probabilistic Neural Networks
The probabilistic neural network is a direct continuation of the work on Bayes classifiers.
The probabilistic neural network (PNN) learns to approximate the pdf of the training examples.
More precisely, the PNN is interpreted as a function which approximates the probability density of the underlying examples’ distribution (rather than the examples directly by fitting).
The PNN consists of nodes allocated in three layers after the inputs:
- pattern layer: there is one pattern node for each training example. Each pattern node forms a product of the weight vector and the given example for classification, where the weights entering a node are from a particular example. After that, the product is passed through the activation function:
exp[ ( xTwki-1 ) / s2 ]
- summation layer: each summation node receives the outputs from pattern nodes associated with a given class:
Si=1Nkexp[ ( xTwki-1 ) / s2 ]
- output layer: the output nodes are binary neurons that produce the classification decision
Si=1Nkexp[ ( xTwki-1 ) / s2 ] > Si=1Njexp[ ( xTwkj-1 ) / s2 ]
The only factor that needs to be selected for training is the smoothing factor, that is the deviation of the Gaussian functions:
- too small deviations cause a very spiky approximation which can not generalize well;
- too large deviations smooth out details.
An appropriate deviation is chosen by experiment.
3. Implementing PNN
// C is the number of classes, N is the number of examples, Nk are from class k
// d is the dimensionality of the training examples, sigma is the smoothing factor
// test_example[d] is the example to be classified
// Examples[N][d] are the training examples
int PNN(int C, int N, int d, float sigma, float test_example[d], float Examples[N][d])
{
float largest = 0;
float sum[ C ];
// The OUTPUT layer which computes the pdf for each class C
for ( int k=1; k<=C; k++ )
{
// The SUMMATION layer which accumulates the pdf
// for each example from the particular class k
for ( int i=0; i<Nk; i++ )
{
// The PATTERN layer that multiplies the test example by the weights
for ( int j=0; j<d; j++ )
product = ( product – 1 ) / ( sigma * sigma );
product = exp( product );
sum[ k ] += product;
sum[ k ] /= Nk;
for ( int k=1; k<=C; k++ )
{
classify = k;
4. Probabilistic Neural Networks vs. Multilayer Neural Networks
The PNN offers the following advantages [Wasserman, 1993]:
- rapid training speed: the PNN is more than five times faster than backpropagation;
- guaranteed convergence to a Bayes classifier if enough training examples are provided,that is it approaches Bayes optimality;
- enables incremental training which is fast, that is additionally provided training exmaples can be incorporated without difficulties;
- robustness to noisy examples.
The PNN posseses some useful characteristics as the backpropagation algorithm for training multilayer neural networks:
- learning capacity: it captures the relationships between given training examples and their given classification;
- generalization ability: it identifies the commonalities in the training examples and allows to perform classification of unseen examples from the predefined classes;
5. Example
IRIS Plants Database, Fisher,R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, pp.179-188 (1936); also in "Contributions to Mathematical Statistics" (John Wiley, NY, 1950). This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced requently to this day.
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.
( Number of Instances: 150 (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the class
Attribute Information:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica )
6.1, 2.9, 4.7, 1.4, Iris-versicolor [0.8 0.3 0.6 0.2]
5.9, 3.2, 4.8, 1.8, Iris-versicolor [0.7 0.4 0.6 0.3]
6.3, 2.5, 4.9, 1.5, Iris-versicolor [0.8 0.3 0.6 0.3]
6.7, 3.0, 5.0, 1.7, Iris-versicolor [0.8 0.4 0.6 0.2]
6.1, 2.7, 5.1, 1.9, Iris-virginica [0.8 0.3 0.5 0.2]
6.8, 3.0, 5.5, 2.1, Iris-virginica [0.9 0.4 0.7 0.2]
5.7, 2.5, 5.0, 2.0, Iris-virginica [0.7 0.3 0.5 0.2]
5.8, 2.8, 5.1, 2.4, Iris-virginica [0.7 0.4 0.5 0.3]
Classify the following unseen example:
(6.3, 2.7, 5.0, 1.8) which after normalization becomes: (0.75 0.32 0.6 0.21) ?
---------------------------------------------------------------------------------------- class: Iris-versicolor 0.8 * 0.75 + 0.3 * 0.32 + 0.6 * 0.6 + 0.2 * 0.21 - 1 = 0.098, exp( 0.098 ) = 1.10296 0.7 * 0.75 + 0.4 * 0.32 + 0.6 * 0.6 + 0.3 * 0.21 - 1 = 0.076, exp( 0.076 ) = 1.0789 0.8 * 0.75 + 0.3 * 0.32 + 0.6 * 0.6 + 0.3 * 0.21 - 1 = 0.119, exp( 0.119 ) = 1.12637 0.8 * 0.75 + 0.4 * 0.32 + 0.6 * 0.6 + 0.2 * 0.21 - 1 = 0.151, exp( 0.151 ) = 1.16299 Sum1 = 4.47116 class: Iris-virginica 0.8 * 0.75 + 0.3 * 0.32 + 0.5 * 0.6 + 0.2 * 0.21 - 1 = 0.038, exp( 0.038 ) = 1.0387 0.9 * 0.75 + 0.4 * 0.32 + 0.7 * 0.6 + 0.2 * 0.21 - 1 = 0.265, exp( 0.265 ) = 1.3034 0.7 * 0.75 + 0.3 * 0.32 + 0.5 * 0.6 + 0.2 * 0.21 - 1 = -0.037, exp( -0.037) = 0.9636 0.7 * 0.75 + 0.4 * 0.32 + 0.5 * 0.6 + 0.3 * 0.21 - 1 = -0.1, exp( -0.1 ) = 0.9048 Sum2 = 4.2114 Since: Sum1 > Sum2 this example is classified as: Iris-versicolor ----------------------------------------------------------------------------------------
Suggested Readings:
Specht, D.F. (1990) Probabilistic Neural Networks, Neural Networks, vol. 3, pp.109-118.
Specht, D.F. (1990) Probabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification, IEEE Transactions on Neural Networks, vol. 1, pp. 111-121.
Masters, T. (1993) Practical Neural Network Recipes, John Wiley, New York (Chapter 12), pp.201-222.
Wasserman, P.D. (1993) Advanced Methods in Neural Networks, Van Nostrand Reinhold, New York, (Chapter 3), pp.35-55.
来源:http://homepages.gold.ac.uk/nikolaev/311pnn.htm