Machine learning(3) Linear Discriminant Functions--Least-squares classification

最新推荐文章于 2023-03-21 22:12:04 发布

MissDing桃子

最新推荐文章于 2023-03-21 22:12:04 发布

阅读量488

点赞数

本文链接：https://blog.csdn.net/qq_26386707/article/details/79404122

版权

机器学习专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Machine Learning（3）Least-squares classification

Chenjing Ding
2018/02/28

notation	meaning
M	the number of mixture components
x_n	n-th input vector
N	the number of training input vectors
K	the number of classes
w	a vector of the weight matrix
W	weight matrix
X	input metrix

To put it clearly, all vectors in this passage are column vector, the transpose of them are row vector; and all Capital letter represents matrix, otherwise it represents a vector.

1.General Classification Problem

1.1 one sample input case

Let’s consider K discriminant linear models:

y k (x) = w T k x + w k 0, k = 1... K (1.1.1)

$y_k(x) = w_k^Tx+w_{k0}, k = 1...K \tag{1.1.1}$ Both

wk w k $w_k$ and

x x $x$ are vector. if W is a matrix as followed:

\begin{matrix} (1.1.2) & W = [w_{1}, w_{2}, \dots w_{k}] = [\begin{matrix} w_{10} & w_{20} & . . . & w_{K 0} \\ w_{11} & w_{21} & . . . & w_{K 1} \\ . . . & . . . & . . . & . . . \\ w_{1 D} & w_{2 D} & . . . & w_{K D} \end{matrix}] \end{matrix}

$W = [w_1,w_2,…w_k] = [ \begin{matrix} w_{10} & w_{20} & ...&w_{K0} \\ w_{11} & w_{21} & ...& w_{K1} \\ ... & ... & ...&...\\ w_{1D}& w_{2D} & ...& w_{KD} \end{matrix} \tag{1.1.2} ]$
then we obtain

Y(x) Y ( x ) $Y(x)$ which is a column vector,

Y (x) = W T x = [y 1 (x) y 2 (x) . . . y K (x)] T (1.1.3)

$Y(x) = W^Tx = [y_1(x)\ y_2(x)\ ...y_K(x)]^T \tag{1.1.3}$

1.2 input as a matrix

For entire data set, X is a matrix.

Y ˆ (X) = X W

$\widehat{Y}(X) = XW$

X = [x 1 x 2 . . . x N] T

$X = [ x_1\ x_2\ ... \ x_N]^T$

T = [t 1 t 2... t N] T, Y ˆ (X) = [Y (x 1) Y (x 2) . . . Y (x N)] T (1.2.1)

$T = [t1\ t2... t_N] ^T , \widehat{Y}(X) = [Y(x_1) \ Y(x_2)\ ...Y(x_N)]^T \tag{1.2.1}$
and

t1,t2... t 1 , t 2 . . . $t_1,t_2...$ is column vectors , T and

Yˆ(X) Y ^ ( X ) $\widehat{Y}(X)$ are matrix, T is the target matrix ;

2. Closed-form solution

Try to find the closed-form solution of W, directly to minimize the sum-of-squares error:

E (W) = \sum n = 1 N \sum k = 1 K (y k (x n) - t n k) 2 = \sum n = 1 N \sum k = 1 K (w T k x n - t n k) 2

$E(W) = \sum_{n=1}^N \sum_{k=1}^K (y_k(x_n)-t_{nk})^2\\ =\sum_{n=1}^N \sum_{k=1}^K (w_k^Tx_n-t_{nk})^2$ Let’s formulate the sum-of-squares error in matrix notation:

\sum i j a 2 i j = T r (A T A), \partial T r ( A ) \partial A = I (2.1)

$\sum_{ij} a_{ij}^2 = Tr(A^TA), \frac{\partial Tr(A)}{\partial A} = I \tag{2.1}$

E (W) = 1 2 T r ((X W - T) T (X W - T)) (2.2)

$E(W) = \frac{1}{2}Tr((XW-T)^T(XW-T))\tag{2.2}$

\partial E ( W ) \partial W = 1 2 \partial E ( W ) \partial ( X W - T ) T ( X W - T ) \partial ( X W - T ) T ( X W - T ) \partial W (2.3)

$\frac{\partial E(W)}{\partial W} = \frac{1}{2} \frac{\partial E(W)}{\partial (XW-T)^T(XW-T)} \frac{\partial (XW-T)^T(XW-T)}{\partial W} \tag{2.3}$

= X T (X W - T) . . . . u s i n g (4) (2.4)

$\\=X^T(XW-T)....using (4) \tag{2.4}$
and for all matrix A the inverse of

ATA A T A $A^TA$ must exist.

\partial E ( W ) \partial W = 0 \Rightarrow W = (X T X) - 1 X T T

$\frac{\partial E(W)}{\partial W} = 0 \Rightarrow W = (X^T X)^{-1}X^TT$
Thus the closed form solution for

y(xn) y ( x n ) $y(x_n)$ is :

Y (x n) = W T x n = ((X T X) - 1 X T T) T x n

$Y(x_n) = W^T x_n = ( (X^T X)^{-1}X^TT)^T x_n$

3. Problems

Least-squares is very sensitive to outliers!
Least-squares corresponds to Maximum Likelihood under the assumption of a Gaussian conditional distribution.However, our binary target vectors have a distribution that is clearly non-Gaussian (0-1 distribution when K is 2)!

discuss later