THE Perceptron Convergence Theorem

最新推荐文章于 2022-12-07 21:51:47 发布

leossu11

最新推荐文章于 2022-12-07 21:51:47 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/sinat_34157365/article/details/50822429

版权

资料整理专栏收录该内容

2 篇文章 0 订阅

订阅专栏

State the fixed-increment convergence theorem

Let the subject of trainings $X_1$ and $X_2$ be linearly separable. input vector:

x (n) = [- 1, x 1 (n), x 2 (n), . . ., x p (n)]

${\bf x}(n) = [-1,x_1(n),x_2(n),...,x_p(n)]$
Correspondingly, we define the (p + 1)-by-1 weight vector:

w (n) = [θ (n), w 1 (n), w 2 (n,), . . ., w p (n)]

${\bf w}(n) = [\theta(n), w_1(n),w_2(n,),...,w_p(n)]$
The output is written in the compact form

v (n) = w T (n) x (n)

$v(n) = {\bf w}^T(n){\bf x}(n)$
For fixed n, the equation

w T x=0 ${\bf w}^T{\bf x} = 0$ , devide the inputs into tow classes as devision surface. let

X 1 $X_1$ be the subset of training vectors belong to class

ξ 1 $\xi_1$ and let

X 2 $X_2$ be the subset of training vectors belong to class

ξ 2 $\xi_2$ . We may state:

w T x≥0 ${\bf w}^T{\bf x}\geq 0$ for every input vector

x ${\bf x}$ belong to class

ξ 1 $\xi_1$ .

w T x≤0 ${\bf w}^T{\bf x}\leq 0$ for every input vector

x ${\bf x}$ belong to class

ξ 2 $\xi_2$ .
Then there exists a weight vector

w ${\bf w}$ such that we may state:
1.

w T ≥0 ${\bf w}^T\geq 0$ for every input vector

x ${\bf x}$ belonging to class

ξ 1 $\xi_1$
and
2.

w T ≤0 ${\bf w}^T\leq 0$ for every input vector

x ${\bf x}$ belonging to class

ξ 2 $\xi_2$

The algorithm for adapting the weight vector of the elementary perceptron may now be formulated as follows:
if the $n$ th member of the training vector, ${\bf x}(n)$ ,is correctly classified by the weight vector ${\bf w}(n)$
1. ${\bf w}(n+1) = {\bf w}(n)$ if ${\bf w}^T(n){\bf x}(n) \geq 0$ and ${\bf x}$ belong to class $\xi_1$
2. ${\bf w}(n+1) = {\bf w}(n)$ if ${\bf w}^T(n){\bf x}(n) \leq 0$ and ${\bf x}$ belong to class $\xi_2$
otherwise
1. ${\bf w}（n+1）= {\bf w}(n) -\eta(n){\bf x}(n)$ if ${\bf w}(n)^T{\bf x}(n) \geq 0$ and ${\bf x}(n)$ belongs to class $\xi_1$
2. ${\bf w}（n+1）= {\bf w}(n) +\eta(n){\bf x}(n)$ if ${\bf w}(n)^T{\bf x}(n) \leq 0$ and ${\bf x}(n)$ belongs to class $\xi_2$
where the learning-rate paramater $\eta(n)$ controls the adjustment applied to the weight vector at iteration $n$ .
If $\eta(n) = \eta\geq0$ , where $\eta$ is a constant independent of the iteration number $n$ , we have a fixed inrement adaptation rule for the perceptron.
In the sequel,we first prove the convergenece of a fixed inrement adaption rule for which $\eta$ =1. Clearly, the value of $\eta$ is unimportant, so long as it is positive.

Prove:

The initial condition ${\bf w}(0) = 0$ . Suppose that ${\bf w}^T(n){\bf x}(n)<0$ for $n = 1,2,...$ and an input vector ${\bf x}(n)$ belong to the subset $X_1$ . So

w (n + 1) = w (n) + x (n) (1)

${\bf w}(n+1) = {\bf w}(n) + {\bf x}(n) (1)$ for

x(n) ${\bf x}(n)$ belonging to class

ξ 1 ${\xi_1}$ .
Given the initial condition

w(0)=0 ${\bf w}(0) = 0$ , we may iteratively solve this equation for

w(n+1) ${\bf w}(n+1)$ abtianing the result

w (n + 1) = x (1) + x (2) + . . . + x (n) (2)

${\bf w}(n+1) = {\bf x}(1) + {\bf x}(2)+...+{\bf x}(n) (2)$
As there exists asolution

w 0 ${\bf w}_0$ , we may define a positive number

α ${\alpha}$ by the relation

α = m i n x (n) \in X 1 w T 0 x (n) (3)

${\alpha} = min_{{\bf x}(n)\in X_1} {\bf w}_0^T{\bf x}(n) (3)$
Hence, multiplying both sides of Eq.(2) by the row vector

w T 0 ${\bf w}_0^T$ , we get

w T 0 w (n + 1) = x (1) w T 0 + x (2) w T 0 + . . . + x (n) w T 0 \geq n α

${\bf w}_0^T{\bf w}(n+1)={\bf x}(1){\bf w}_0^T + {\bf x}(2){\bf w}_0^T+...+{\bf x}(n){\bf w}_0^T\geq n\alpha$
Next, the Cauchy-Schwarz inequality states that

| | w 0 | | 2 | | w (n + 1) | | 2 \geq | | w T 0 w (n + 1) | | 2 \geq n 2 α 2 (4)

$||{\bf w}_0||^2||{\bf w}(n+1)||^2\geq ||{\bf w}_0^T{\bf w}(n+1)||^2\geq n^2\alpha^2 (4)$ or,equivalently,

| | w (n + 1) | | 2 \geq n 2 α 2 | | w 0 | | 2 (5)

$||{\bf w}(n+1)||^2\geq\frac{n^2\alpha^2}{||{\bf w}_0||^2} (5)$
Next, we follow anther development route, as we know (1) rewrite:

w(k+1)=w(k)+x(k) ${\bf w}(k+1) = {\bf w}(k) + {\bf x}(k)$ , taking the squared Euclidean norm of both sides of Eq.(1), we get

| | w (k + 1) | | 2 = | | w (k) | | 2 + | | x (k) | | 2 + 2 w T (k) x (k) (6)

$||{\bf w}(k+1)||^2=||{\bf w}(k)||^2+||{\bf x}(k)||^2+2{\bf w}^T(k){\bf x}(k) (6)$ But, under the assumption that the perceptron incorrectlly classifies an input vector

x(k) ${\bf x}(k)$ belonging to the subset

X 1 $X_1$ , we have

w T (k)x(k)≤0 ${\bf w}^T(k){\bf x}(k)\leq0$ , so

| | w (k + 1) | | 2 \leq | | w (k) | | 2 + | | x (k) | | 2 (7)

$||{\bf w}(k+1)||^2\leq||{\bf w}(k)||^2+||{\bf x}(k)||^2(7)$ or, equivalently,

| | w (k + 1) | | 2 - | | w (k) | | 2 \leq | | x (k) | | 2, k = 1, . . ., n (8)

$||{\bf w}(k+1)||^2-||{\bf w}(k)||^2\leq||{\bf x}(k)||^2, k= 1,...,n (8)$
Adding these inequalities for k = 1,…,n, and assuming that the initial condition

w=0 ${\bf w}=0$ , we get the following condition:

| | w (n + 1) | | 2 \leq \sum k = 1 n | | x (k) | | 2 \leq n β (9)

$||{\bf w}(n+1)||^2\leq\sum_{k=1}^{n}||{\bf x}(k)||^2\leq n\beta (9)$ where

β $\beta$ is a positive number defined by

β = max x (k) \in X 1 | | x (k) | | 2 (10)

$\beta=\max_{x(k)\in X_1}||{\bf x}(k)||^2 (10)$
We can state that

n $n$ cannot be larger than some value

n max $n_{max}$ for which Eqs.(3)and(10) are both satisfied with the equality sign. That is,

n m a x = β | | w 0 | | 2 α 2 (11)

$n_{max}=\frac{\beta||{\bf w}_0||^2}{\alpha^2} (11)$ We have thus proved that for

η(n) ${\eta(n)}$ =1 for all n, and

w(0)=0 ${\bf w}(0)={\bf 0}$ , and given that a solution vector

w 0 ${\bf w}_0$ exists, the rule for adapting the synaptic weights connecting the associator units to the response unit of the perceptron must terminate after at most $n_{max}$ iterations