机器学习笔记

学习策略

非概率方式

在假设空间 H \mathcal H H中寻找泛化误差最小的 h \mathcal h h.
泛化误差: ∫ X × Y L ( y , h ( x ) ) p ( x , y ) d x d y . \int_{\mathcal X \times \mathcal Y} L(y, h(\boldsymbol x))p(\boldsymbol x, y)d\boldsymbol xdy . X×YL(y,h(x))p(x,y)dxdy.
即损失函数值的期望,认为模式与标签服从某个联合概率分布. 因为有些时候模式可能具有不同标签, 这是随机导致的.
测试集上的误差随测试集容量增大依概率收敛到泛化误差.

概率方式

判别方法:直接学习 p ( y ∣ x ) p(y|\boldsymbol x) p(yx)
生成方法:通过 p ( y ∣ x ) = p ( y , x ) p ( x ) = p ( x ∣ y ) p ( y ) p ( x ) p(y|\boldsymbol x) = \frac{p(y, \boldsymbol x)}{p(\boldsymbol x)}= \frac{p(\boldsymbol x|y)p(y)}{p(\boldsymbol x)} p(yx)=p(x)p(y,x)=p(x)p(xy)p(y)学习条件分布.
在给定样本集上:
\quad 最大后验概率MAP: 使 p ( y ∣ x ) ∝ p ( x ∣ y ) p ( y ) p(y|\boldsymbol x) \propto p(\boldsymbol x|y)p(y) p(yx)p(xy)p(y)最大, 需要知道 y y y的分布
\quad 最大似然估计MLE:使 p ( x ∣ y ) p(\boldsymbol x|y) p(xy)最大
具体使用 l o g log log似然函数法即可.
命名: p o s t e r i o r = l i k e l i h o o d ⋅ p r i o r e v i d e n c e posterior = \frac{likelihood \cdot prior}{evidence} posterior=evidencelikelihoodprior

代数知识

若干背景

  1. ∣ A B ∣ = ∣ A ∣ ∣ B ∣ |AB|=|A||B| AB=A∣∣B
    证: 由laplace定理作k行展开可得 ∣ A 0 − I B ∣ = ∣ A ∣ ∣ B ∣ \begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=|A||B| AI0B =A∣∣B, 又因倍加变换不改变行列式的值, 对此行列式作行倍加变换得 ∣ A 0 − I B ∣ = ∣ 0 A B − I B ∣ = ∣ A B ∣ ( − 1 ) Σ i = 1 2 n i ( − 1 ) n = ∣ A B ∣ ( − 1 ) 2 n 2 + 2 n = ∣ A B ∣ \begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=\begin{vmatrix} \boldsymbol 0 & AB \\ -I & B \end{vmatrix}=|AB|{(-1)}^{\Sigma_{i=1}^{2n}i}{(-1)}^n=|AB|{(-1)}^{2n^2+2n}=|AB| AI0B = 0IABB =AB(1)Σi=12ni(1)n=AB(1)2n2+2n=AB, 得证.
  2. r a n k ( A ) + r a n k ( B ) − n ≤ r a n k ( A B ) ≤ m i n { r a n k ( A ) , r a n k ( B ) } rank(A)+rank(B)-n\le rank(AB)\le min\{rank(A), rank(B)\} rank(A)+rank(B)nrank(AB)min{rank(A),rank(B)}
    证: 第二个不等式: A B = ( α 1 , ⋯   , α n ) B = ( Σ b i 1 α i , ⋯   , Σ b i n α i ) AB=(\alpha_1, \cdots, \alpha_n)B=(\Sigma b_{i1}\alpha_i, \cdots, \Sigma b_{in}\alpha_i) AB=(α1,,αn)B=(Σbi1αi,,Σbinαi), 所得矩阵的列向量组可由 ( α 1 , ⋯   , α n ) (\alpha_1, \cdots, \alpha_n) (α1,,αn)的极大线性无关组线性表出, 因此 r a n k ( A B ) ≤ r a n k ( A ) rank(AB) \le rank(A) rank(AB)rank(A), 类似可得 r a n k ( A B ) = r a n k ( B T A T ) ≤ r a n k ( B T ) = r a n k ( B ) rank(AB)=rank(B^TA^T) \le rank(B^T)=rank(B) rank(AB)=rank(BTAT)rank(BT)=rank(B). 第一个不等式称作Sylvester不等式, 是Frobenius不等式 r a n k ( A ) + r a n k ( B ) − r a n k ( C ) ≤ r a n k ( A C B ) rank(A)+rank(B)-rank(C)\le rank(ACB) rank(A)+rank(B)rank(C)rank(ACB)的特例, r a n k ( A B ) + n = r a n k ( I n 0 0 A B ) rank(AB)+n=rank \begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix} rank(AB)+n=rank(In00AB)(由矩阵各自化成阶梯型时非零行个数可证), 对矩阵 P = ( I n 0 0 A B ) P=\begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix} P=(In00AB)作倍加、倍乘及换行(列)变换得 ( B I n 0 A ) \begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} (B0InA), 而 r a n k ( B I n 0 A ) ≥ r a n k ( A ) + r a n k ( B ) rank\begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} \ge rank(A)+rank(B) rank(B0InA)rank(A)+rank(B), 因为 ( B I n 0 A ) \begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} (B0InA)起码能找到 r a n k ( A ) + r a n k ( B ) rank(A)+rank(B) rank(A)+rank(B)阶子式不为0.
  3. r a n k ( A ) + d i m ( n u l l ( A ) ) = n rank(A)+dim(null(A))=n rank(A)+dim(null(A))=n
  4. r a n k ( A ) = r a n k ( A T A ) = r a n k ( A A T ) rank(A)=rank(A^TA)=rank(AA^T) rank(A)=rank(ATA)=rank(AAT)
    证: 齐次线性方程组的解有三种情况:无解、唯一零解与无穷解. 但是无穷解之间又有一定区别, 例如他们的基础解系所形成解空间(又称系数矩阵的零空间: n u l l ( A ) null(A) null(A))的维度. A x = 0 Ax=\boldsymbol 0 Ax=0 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0同解, 具有完全相同的解空间, 因此第一个等式成立. 证明 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0的解也是 A x = 0 Ax=\boldsymbol 0 Ax=0的解可通过对 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0两侧同时乘 x T x^T xT实现. r a n k ( A ) = r a n k ( A T ) = r a n k ( A A T ) rank(A)=rank(A^T)=rank(AA^T) rank(A)=rank(AT)=rank(AAT), 因此第二个等式成立.
  5. 若 B 可逆 , 则 r a n k ( A B ) = r a n k ( A ) 若B可逆, 则rank(AB)=rank(A) B可逆,rank(AB)=rank(A)
    证: 由以上可得 r a n k ( A B ) ≤ r a n k ( A ) rank(AB) \le rank(A) rank(AB)rank(A), 又 A = A B B − 1 A=ABB^{-1} A=ABB1, 故 r a n k ( A ) = r a n k ( A B B − 1 ) ≤ r a n k ( A B ) rank(A)=rank(ABB^{-1}) \le rank(AB) rank(A)=rank(ABB1)rank(AB). 得证.
  6. A = U Σ V T A=U\Sigma V^T A=UΣVT, V H 表示共轭转置 V^H表示共轭转置 VH表示共轭转置

奇异值分解用于图像压缩

本例中压缩率为36%

import numpy as np
import matplotlib.pyplot as plt
import keras

mnist = keras.datasets.mnist
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()
A = x_train[0]
u,s,vh = np.linalg.svd(A)
B = np.zeros(shape=A.shape)
C = np.matmul(u, np.diag(s))
C = np.matmul(C, vh)
print(u[:,0].shape)
print(vh[0,:].shape)
for i in range(5):
    B += s[i]*np.matmul(np.expand_dims(u[:,i],1), np.expand_dims(vh[i,:],0))
plt.subplot(131)
plt.title('original')
plt.imshow(A)
plt.subplot(132)
plt.title('partial')
plt.imshow(B)
plt.subplot(133)
plt.title('complete')
plt.imshow(C)
plt.show()
print(np.linalg.eigvals(A-B).max()) #矩阵2范数

在这里插入图片描述

决策树

信息增益

#include <iostream>
#include <string>
#include <vector>
#include <cmath>
using namespace std;
enum{ color, root, sound, texture, umbilical, touch };

class Object 
{
protected:
    vector<int> _feature;
    bool _label;
public:
    int num_feature;
    const vector<int>* feature;
    const bool* label;
    
};

class Cucumber : public Object
{
public:
    Cucumber(vector<int> feature, bool label) 
    {
        this->_feature = feature;
        this->_label = label;
        num_feature = feature.size() - 1;
        this->feature = &_feature;
        this->label = &_label;
    }
};

class Entropy 
{
public:
    static double calculate(vector<Cucumber> v) 
    {
        try {
            if (!v.size()) {
                throw "no object";
            }            
            int positive = 0, negative = 0;
            for (auto i : v) {
                if ((*i.label)) {
                    positive++;
                }
                else {
                    negative++;
                }
            }
            int total = v.size();
            double w1 = positive * 1.0 / total, w2 = negative * 1.0 / total;
            if (0 == w1) return -w2 * log2(w2);
            if (0 == w2) return -w1 * log2(w1);
            return -w1 * log2(w1) - w2 * log2(w2);
        }
        catch(exception e) {
            cout << e.what();
        }
    }
};
class Classify 
{
public:
    static vector<vector<Cucumber>> do_classify(vector<Cucumber> v, int rank_feature) 
    {
        try
        {
            if (!v.size()) {
                throw "no object";
            }
            if (v[0].num_feature < rank_feature) {
                throw "rank is out of range";
            }
            vector<vector<Cucumber>> res;
            for (auto i : v) {
                while (res.size() < (*i.feature)[rank_feature]+1){
                    res.push_back(*new vector<Cucumber > ());
                }
                res[(*i.feature)[rank_feature]].push_back(i);
            }
            return res;
        }
        catch (const std::exception& e)
        {
            cout << e.what();
        }
    }
};

class InformationGain
{
public:
    static double calculate(vector<vector<Cucumber>> v, int total, double entropy_s) 
    {
        double res = 0;
        for (auto i : v) {
            res += double(i.size()) * Entropy::calculate(i) / total;
        }
        return entropy_s - res;
    }
};

int main()
{
    vector<Cucumber> v;
    v.push_back(*new Cucumber({ 1,  0, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 2,  1, 0, 1, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 3,  1, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 4,  0, 0, 1, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 5,  2, 0, 0, 0, 0, 0 }, true));
    v.push_back(*new Cucumber({ 6,  0, 1, 0, 0, 1, 1 }, true));
    v.push_back(*new Cucumber({ 7,  1, 1, 0, 1, 1, 1 }, true));
    v.push_back(*new Cucumber({ 8,  1, 1, 0, 0, 1, 0 }, true));
    v.push_back(*new Cucumber({ 9,  1, 1, 1, 1, 1, 0 }, false));
    v.push_back(*new Cucumber({ 10, 0, 2, 2, 0, 2, 1 }, false));
    v.push_back(*new Cucumber({ 11, 2, 2, 2, 2, 2, 0 }, false));
    v.push_back(*new Cucumber({ 12, 2, 0, 0, 2, 2, 1 }, false));
    v.push_back(*new Cucumber({ 13, 0, 1, 0, 1, 0, 0 }, false));
    v.push_back(*new Cucumber({ 14, 2, 1, 1, 1, 0, 0 }, false));
    v.push_back(*new Cucumber({ 15, 1, 1, 0, 0, 1, 1 }, false));
    v.push_back(*new Cucumber({ 16, 2, 0, 0, 2, 2, 0 }, false));
    v.push_back(*new Cucumber({ 17, 0, 0, 1, 1, 1, 0 }, false));
    double entropy_s = Entropy::calculate(v);
    for (int i = 1; i < 7; i++) {
        vector<vector<Cucumber>> tmp = Classify::do_classify(v, i);
        double ig = InformationGain::calculate(tmp, 17, entropy_s);
        printf("%d : %.3f\n", i, ig);
    }
    cin.get();
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值