线性判别分析-LDA

最新推荐文章于 2024-01-09 22:58:54 发布

JoinApper

最新推荐文章于 2024-01-09 22:58:54 发布

阅读量267

点赞数 1

分类专栏：机器学习详解文章标签：机器学习人工智能

本文链接：https://blog.csdn.net/qq1234534215/article/details/131750745

版权

机器学习详解专栏收录该内容

2 篇文章 0 订阅

订阅专栏

线性判别分析LDA同PCA一样，也是一种常用的降维方法，但有所不同的是LDA属于有监督的降维，PCA则属于无监督的降维。本篇文章将从原理推导、python实现以及案例分析三个方面进行讲解。

原理推导

二分类原理

LDA首先是为了分类任务服务的，因此需要找到一个投影方向w，使得投影后的样本尽可能按照原始类别分开。假设有C1、C2两个类别的样本，两类的均值分别是 $u_1=\frac{1}{N_1}\sum_{x \in C_1}x$ ， $u_2=\frac{1}{N_2}\sum_{x \in C_2}x$ ，我们希望投影后的类间距离越大，同时类内距离越小，因此优化问题可写为：
$\begin{aligned} maxJ(w)&=\frac{||w^T(u1-u2)||_2^2}{D_1+D_2} \\ s.t. w^Tw&=1 \\ \end{aligned}$
其中 $D_1$ 和 $D_2$ 的值定义为：
$\begin{aligned} D_1&=\sum_{x \in C_1}(w^Tx-w^Tu_1)^2 \\ &=\sum_{x \in C_1}w^T(x-u_1)(x-u_1)^Tw \\ D_2&=\sum_{x \in C_2}(w^Tx-w^Tu_2)^2 \\ &=\sum_{x \in C_2}w^T(x-u_2)(x-u_2)^Tw \\ \end{aligned}$
因此，上式可化简为：
$\begin{aligned} maxJ(w)&=\frac{w^T(u_1-u_2)(u_2-u_2)^Tw}{w^T\sum_{x \in C_i}(x-u_i)(x-u_i)^Tw} \\ s.t. w^Tw&=1 \end{aligned}$
我们定义类间散度矩阵 $S_B=(u_1-u_2)(u_1-u_2)^T$ ，类内散度矩阵为 $S_W=\sum_{x \in C_i}(x-u_i)(x-u_i)^T$ ，则上式可化简为：
$\begin{aligned} maxJ(w)=\frac{w^TS_Bw}{w^TS_ww} \end{aligned}$
为了最大化 $J (w)$ ，我们可以对w求偏导，并令导数等于零：
$\begin{aligned} \frac{\partial J(w)}{\partial w}=\frac{S_Bw(w^TS_ww)-S_Ww(w^TS_Bw)}{(w^TS_ww)^2}=0 \end{aligned}$
于是，我们可以得到：
$\begin{aligned} S_Bw(w^TS_ww)&-S_Ww(w^TS_Bw)=0 \\ S_Bw&=J(w)S_Ww \end{aligned}$
整理可得：
$S_W^{-1}S_Bw=J(w)w$
由于 $J (w)$ 实质上就是一个数，因此我们可以看出最大化的目标就是求取对应矩阵的特征值。
进一步地，对于二分类问题而言，由于 $S_B=(u_1-u_2)(u_1-u_2)^T$ ，我们可以得到：
$S_Bw=(u_1-u_2)(u_1-u_2)^Tw$
由于 $u_1-u_2)^Tw$ 是一个数，因此 $S_Bw$ 的方向始终与 $u_1-u_2$ 一致，如果只考虑w的方向，而不考虑其长度，我们可以得到 $w=S_W^{-1}(u_1-u_2)$ 。

多分类扩展

前面我们简单地分析了二分类的情况，进一步地，如果是多分类情况，又该如何处理呢？我们可以发现类内散度矩阵 $S_W$ 实质上是可以扩展的，但是类内散度矩阵 $S_B$ 之前是两类均值的距离平方，无法直接扩展到多类。因此，我们引入全局散度矩阵：
$S_T=\sum_{i=1}^n(x_i-u)(x_i-u)^T$
其中，u代表全体样本的均值。
如果我们把全局散度定义为类内散度与类间散度之和，即 $S_T=S_B+S_W$ ，那么类间散度矩阵可表示为：
$\begin{aligned} S_B&=S_T-S_W \\ &=\sum_{i=1}^n(x_i-u)(x_i-u)^T-\sum_{x \in C_i}(x-u_i)(x-u_i)^T \\ &=\sum_{j=1}^N(\sum_{x \in C_j}(x-u)(x-u)^T-\sum_{x \in C_j}(x-u_j)(x-u_j)^T) \\ &=\sum_{j=1}^N(\sum_{x \in C_j}(xx^T-2xu^T+uu^T-xx^T+2xu_j^T-u_ju_j^T)) \\ &=\sum_{j=1}^N(\sum_{x \in C_j}2x(u_j^T-u^T)+uu^T-u_ju_j^T) \\ &=\sum_{j=1}^N(m_j(2u_j(u_j^T-u^T)+uu^T-u_ju_j^T)) \\ &=\sum_{j=1}^N(m_j(-2u_ju^T+uu^T+u_ju_j^T)) \\ &=\sum_{j=1}^Nm_j(u_j-u)(u_j-u)^T \end{aligned}$
其中 $m_j$ 是该类的样本个数，从中我们也可以看出类间散度表示的就是每个类别中心到全局中心的一种甲醛距离。至此，我们可以得到多类别数据的LDA求解方法：

计算数据集中每个类别样本的均值向量 $u_j$ ，以及总体均值向量 $u$
计算类内散度矩阵 $S_W$ ，全局散度矩阵 $S_T$ ，并得到类间散度矩阵 $S_B=S_T-S_W$
对矩阵 $S_W^{-1}S_B$ 进行特征值分解，将特征值从大到小排列
取特征值前d大的对应的特征向量 $w_1,w_2,...w_d$ ，并将n维样本映射到d维

Python实现

二分类实现

def calculate_covariance_matrix(X, Y=None):  
    if Y is None:  
        Y = X  
    n_samples = np.shape(X)[0]  
    covariance_matrix = (1 / (n_samples - 1)) * (X - X.mean(axis=0)).T.dot(Y - Y.mean(axis=0))  
    return np.array(covariance_matrix, dtype=float)
    
class LDA():  
    def __init__(self):  
        self.w = None  
  
    def fit(self, X, y):  
        X1 = X[y == 0]  
        X2 = X[y == 1]  
  
        cov1 = calculate_covariance_matrix(X1)  
        cov2 = calculate_covariance_matrix(X2)  
        cov_tot = cov1 + cov2  
  
        mean1 = X1.mean(0)  
        mean2 = X2.mean(0)  
        mean_diff = np.atleast_1d(mean1 - mean2)  
  
        self.w = np.linalg.pinv(cov_tot).dot(mean_diff)  
  
    def transform(self, X, y):  
        self.fit(X, y)  
        X_transform = X.dot(self.w)  
        return X_transform  
  
    def predict(self, X):  
        # 这里理解为w的向量方向为u2指向u1(w的方向和u1-u2一致)，自然u1的方向和w的方向夹角小于90，而u2和w的方向夹角大于90  
        # 如果y大于0，则类别为0，y小于0，则类别为1  
        y_pred = []  
        for sample in X:  
            h = sample.dot(self.w)  
            y = 1 * (h < 0)  
            y_pred.append(y)  
        return y_pred

多分类实现

class MultiClassLDA():  
    def __init__(self, solver="svd"):  
        self.solver = solver  
  
    def _calculate_scatter_matrices(self, X, y):  
        n_features = np.shape(X)[1]  
        labels = np.unique(y)  
  
        # Within class scatter matrix:  
        # SW = sum{ (X_for_class - mean_of_X_for_class)^2 }        #   <=> (n_samples_X_for_class - 1) * covar(X_for_class)        
        SW = np.zeros((n_features, n_features))  
        for label in labels:  
            _X = X[y == label]  
            SW += (len(_X) - 1) * calculate_covariance_matrix(_X)  
  
        # Between class scatter:  
        # SB = sum{ n_samples_for_class * (mean_for_class - total_mean)^2 }        
        total_mean = np.mean(X, axis=0)  
        SB = np.zeros((n_features, n_features))  
        for label in labels:  
            _X = X[y == label]  
            _mean = np.mean(_X, axis=0)  
            SB += len(_X) * ((_mean - total_mean).reshape(-1, 1)).dot((_mean - total_mean).reshape(-1, 1).T)  
  
        # 也可以用另外一种方法进行计算  
        ST = (len(X) - 1) * calculate_covariance_matrix(X)  
        SB_v1 = ST - SW  
        return SW, SB  
  
    def transform(self, X, y, n_components):  
        SW, SB = self._calculate_scatter_matrices(X, y)  
  
        # Determine SW^-1 * SB by calculating inverse of SW  
        A = np.linalg.inv(SW).dot(SB)  
  
        # Get eigenvalues and eigenvectors of SW^-1 * SB  
        eigenvalues, eigenvectors = np.linalg.eigh(A)  
  
        # Sort the eigenvalues and corresponding eigenvectors from largest  
        # to smallest eigenvalue and select the first n_components        
        idx = eigenvalues.argsort()[::-1]  
        eigenvalues = eigenvalues[idx][:n_components]  
        eigenvectors = eigenvectors[:, idx][:, :n_components]  
  
        # Project the data onto eigenvectors  
        X_transformed = X.dot(eigenvectors)  
        return X_transformed  
  
    def plot_in_2d(self, X, y, title=None):  
        X_transformed = self.transform(X, y, n_components=2)  
        x1 = X_transformed[:, 0]  
        x2 = X_transformed[:, 1]  
        plt.scatter(x1, x2, c=y)  
        if title: plt.title(title)  
        plt.show()

案例分析

我们可以使用sklearn中的接口来看看LDA的使用，随机生成3类数据，每个样本有4个特征，我们使用LDA降维算法后，将前两个维度进行可视化：

import numpy as np
import pandas as pd
from sklearn.datasets import make_blobs
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
import matplotlib.pyplot as plt

#构造数据集
def make_data(centers=3, cluster_std=[1.0, 3.0, 2.5], n_samples=150, n_features=2):
    X, y = make_blobs(n_samples=n_samples, n_features=n_features, centers=centers, cluster_std=cluster_std)
    return X, y

X, y = make_data([[2.0, 1.0], [15.0, 5.0], [31.0, 12.0]], [1.0, 3.0, 2.5], n_features=4)
print(X.shape)

sk_lda = LinearDiscriminantAnalysis(n_components=2)
sk_lda.fit(X, y)
transformed = sk_lda.transform(X)
plt.subplots(figsize=(10, 8))
plt.scatter(transformed[:, 0], transformed[:, 1], c=y, cmap=plt.cm.Set1)
plt.title("sklearn's offical implementation")
plt.show()