机器学习笔记

GarveyMui

已于 2023-01-17 16:30:25 修改

阅读量60

点赞数

文章标签： c++ 算法

于 2022-05-07 09:32:44 首次发布

本文链接：https://blog.csdn.net/weixin_42978338/article/details/124623877

版权

文章目录

学习策略
- 非概率方式
- 概率方式
代数知识
- 若干背景
- 奇异值分解用于图像压缩
决策树
- 信息增益

学习策略

非概率方式

在假设空间 $\mathcal H$ 中寻找泛化误差最小的 $\mathcal h$ .
泛化误差： $\int_{\mathcal X \times \mathcal Y} L(y, h(\boldsymbol x))p(\boldsymbol x, y)d\boldsymbol xdy .$
即损失函数值的期望，认为模式与标签服从某个联合概率分布. 因为有些时候模式可能具有不同标签, 这是随机导致的.
测试集上的误差随测试集容量增大依概率收敛到泛化误差.

概率方式

判别方法：直接学习 $p(y|\boldsymbol x)$
生成方法：通过 $p(y|\boldsymbol x) = \frac{p(y, \boldsymbol x)}{p(\boldsymbol x)}= \frac{p(\boldsymbol x|y)p(y)}{p(\boldsymbol x)}$ 学习条件分布.
在给定样本集上：
$\quad$ 最大后验概率MAP: 使 $p(y|\boldsymbol x) \propto p(\boldsymbol x|y)p(y)$ 最大, 需要知道 $y$ 的分布
$\quad$ 最大似然估计MLE:使 $p(\boldsymbol x|y)$ 最大
具体使用 $l o g$ 似然函数法即可.
命名: $\frac{likelihood \cdot prior}{evidence}$

代数知识

若干背景

$∣ A B ∣ = ∣ A ∣∣ B ∣$
证: 由laplace定理作k行展开可得 $\begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=|A||B|$ , 又因倍加变换不改变行列式的值, 对此行列式作行倍加变换得 $\begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=\begin{vmatrix} \boldsymbol 0 & AB \\ -I & B \end{vmatrix}=|AB|{(-1)}^{\Sigma_{i=1}^{2n}i}{(-1)}^n=|AB|{(-1)}^{2n^2+2n}=|AB|$ , 得证.
$rank(A)+rank(B)-n\le rank(AB)\le min\{rank(A), rank(B)\}$
证: 第二个不等式: $AB=(\alpha_1, \cdots, \alpha_n)B=(\Sigma b_{i1}\alpha_i, \cdots, \Sigma b_{in}\alpha_i)$ , 所得矩阵的列向量组可由 $(\alpha_1, \cdots, \alpha_n)$ 的极大线性无关组线性表出, 因此 $\le rank(A)$ , 类似可得 $rank(AB)=rank(B^TA^T) \le rank(B^T)=rank(B)$ . 第一个不等式称作Sylvester不等式, 是Frobenius不等式 $rank(A)+rank(B)-rank(C)\le rank(ACB)$ 的特例, $\begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix}$ (由矩阵各自化成阶梯型时非零行个数可证), 对矩阵 $P=\begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix}$ 作倍加、倍乘及换行(列)变换得 $\begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix}$ , 而 $rank\begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} \ge rank(A)+rank(B)$ , 因为 $\begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix}$ 起码能找到 $r ank (A) + r ank (B)$ 阶子式不为0.
$r ank (A) + d im (n u ll (A)) = n$
$rank(A)=rank(A^TA)=rank(AA^T)$
证: 齐次线性方程组的解有三种情况：无解、唯一零解与无穷解. 但是无穷解之间又有一定区别, 例如他们的基础解系所形成解空间(又称系数矩阵的零空间: $n u ll (A)$ )的维度. $Ax=\boldsymbol 0$ 与 $A^TAx=\boldsymbol 0$ 同解, 具有完全相同的解空间, 因此第一个等式成立. 证明 $A^TAx=\boldsymbol 0$ 的解也是 $Ax=\boldsymbol 0$ 的解可通过对 $A^TAx=\boldsymbol 0$ 两侧同时乘 $x^T$ 实现. $rank(A)=rank(A^T)=rank(AA^T)$ , 因此第二个等式成立.
$若 B 可逆, 则 r ank (A B) = r ank (A)$
证: 由以上可得 $\le rank(A)$ , 又 $A=ABB^{-1}$ , 故 $rank(A)=rank(ABB^{-1}) \le rank(AB)$ . 得证.
$A=U\Sigma V^T$ , $V^H表示共轭转置$

奇异值分解用于图像压缩

本例中压缩率为36%

import numpy as np
import matplotlib.pyplot as plt
import keras

mnist = keras.datasets.mnist
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()
A = x_train[0]
u,s,vh = np.linalg.svd(A)
B = np.zeros(shape=A.shape)
C = np.matmul(u, np.diag(s))
C = np.matmul(C, vh)
print(u[:,0].shape)
print(vh[0,:].shape)
for i in range(5):
    B += s[i]*np.matmul(np.expand_dims(u[:,i],1), np.expand_dims(vh[i,:],0))
plt.subplot(131)
plt.title('original')
plt.imshow(A)
plt.subplot(132)
plt.title('partial')
plt.imshow(B)
plt.subplot(133)
plt.title('complete')
plt.imshow(C)
plt.show()
print(np.linalg.eigvals(A-B).max()) #矩阵2范数

在这里插入图片描述

决策树

信息增益

#include <iostream>
#include <string>
#include <vector>
#include <cmath>
using namespace std;
enum{ color, root, sound, texture, umbilical, touch };

class Object 
{
protected:
    vector<int> _feature;
    bool _label;
public:
    int num_feature;
    const vector<int>* feature;
    const bool* label;
    
};

class Cucumber : public Object
{
public:
    Cucumber(vector<int> feature, bool label) 
    {
        this->_feature = feature;
        this->_label = label;
        num_feature = feature.size() - 1;
        this->feature = &_feature;
        this->label = &_label;
    }
};

class Entropy 
{
public:
    static double calculate(vector<Cucumber> v) 
    {
        try {
            if (!v.size()) {
                throw "no object";
            }            
            int positive = 0, negative = 0;
            for (auto i : v) {
                if ((*i.label)) {
                    positive++;
                }
                else {
                    negative++;
                }
            }
            int total = v.size();
            double w1 = positive * 1.0 / total, w2 = negative * 1.0 / total;
            if (0 == w1) return -w2 * log2(w2);
            if (0 == w2) return -w1 * log2(w1);
            return -w1 * log2(w1) - w2 * log2(w2);
        }
        catch(exception e) {
            cout << e.what();
        }
    }
};
class Classify 
{
public:
    static vector<vector<Cucumber>> do_classify(vector<Cucumber> v, int rank_feature) 
    {
        try
        {
            if (!v.size()) {
                throw "no object";
            }
            if (v[0].num_feature < rank_feature) {
                throw "rank is out of range";
            }
            vector<vector<Cucumber>> res;
            for (auto i : v) {
                while (res.size() < (*i.feature)[rank_feature]+1){
                    res.push_back(*new vector<Cucumber > ());
                }
                res[(*i.feature)[rank_feature]].push_back(i);
            }
            return res;
        }
        catch (const std::exception& e)
        {
            cout << e.what();
        }
    }
};

class InformationGain
{
public:
    static double calculate(vector<vector<Cucumber>> v, int total, double entropy_s) 
    {
        double res = 0;
        for (auto i : v) {
            res += double(i.size()) * Entropy::calculate(i) / total;
        }
        return entropy_s - res;
    }
};

int main()
{
    vector<Cucumber> v;
    v.push_back(*new Cucumber({ 1,  0, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 2,  1, 0, 1, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 3,  1, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 4,  0, 0, 1, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 5,  2, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 6,  0, 1, 0, 0, 1, 1 }, true));
    v.push_back(*new Cucumber({ 7,  1, 1, 0, 1, 1, 1 }, true));
    v.push_back(*new Cucumber({ 8,  1, 1, 0, 0, 1, 0 }, true));
    v.push_back(*new Cucumber({ 9,  1, 1, 1, 1, 1, 0 }, false));
    v.push_back(*new Cucumber({ 10, 0, 2, 2, 0, 2, 1 }, false));
    v.push_back(*new Cucumber({ 11, 2, 2, 2, 2, 2, 0 }, false));
    v.push_back(*new Cucumber({ 12, 2, 0, 0, 2, 2, 1 }, false));
    v.push_back(*new Cucumber({ 13, 0, 1, 0, 1, 0, 0 }, false));
    v.push_back(*new Cucumber({ 14, 2, 1, 1, 1, 0, 0 }, false));
    v.push_back(*new Cucumber({ 15, 1, 1, 0, 0, 1, 1 }, false));
    v.push_back(*new Cucumber({ 16, 2, 0, 0, 2, 2, 0 }, false));
    v.push_back(*new Cucumber({ 17, 0, 0, 1, 1, 1, 0 }, false));
    double entropy_s = Entropy::calculate(v);
    for (int i = 1; i < 7; i++) {
        vector<vector<Cucumber>> tmp = Classify::do_classify(v, i);
        double ig = InformationGain::calculate(tmp, 17, entropy_s);
        printf("%d : %.3f\n", i, ig);
    }
    cin.get();
}

GarveyMui

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习笔记

即损失函数值的期望，认为模式与标签服从某个联合概率分布. 因为有些时候模式可能具有不同标签, 这是随机导致的.测试集上的误差随测试集容量增大依概率收敛到泛化误差.最大后验概率MAP: 使。最大似然估计MLE:使。中寻找泛化误差最小的。
复制链接

扫一扫