机器学习学习笔记1

C-beams

已于 2024-04-15 09:54:09 修改

阅读量673

点赞数 22

分类专栏：机器学习学习笔记文章标签：学习笔记

于 2024-04-15 09:52:50 首次发布

本文链接：https://blog.csdn.net/2401_82787858/article/details/137734208

版权

机器学习学习笔记专栏收录该内容

6 篇文章 0 订阅

订阅专栏

L1 基础知识

WHAT: Making decisions from data

WHY: To apply, understand,evaluate

what do we have?

特征向量： $x^{(i^{})}=(x_{1}^{i},...,x_{d}^{i})^{T}\in \mathbb{R}^{d}$

标签： $y^{(i)}\in\{-1,+1\}$

数据集： $D_n=\{(x^{(1)},y^{(1)}),...,(x^{(n)},y^{(n)})\}$

what do we want?

A good way to label new points

x -> h -> y (函数名h：hypothesis)

Example h:For any x,h(x) = +1

Linear classifiers 线性分类器

直线的重新定义： $x:\theta ^{\top}x/||\theta||+\theta_0=0$

一个线性分类器：

$h(x;\theta,\theta_0)=sign(\theta^{\top}x+\theta_0)\\=\begin{cases}+1\ if\ \theta^{\top}x+\theta_0>0\\-1\ if\ \theta^{\top}x+\theta_0\leq0 \end{cases}$

H ：一系列这样的函数h

How good is a classfier?

可以很好的预测未知的数据

我们有一个损失函数Loss $L(g,a)$ g:guess a:actual

e.g. 0-1 loss

$L(g,a)=\begin{cases}0\ if \ g=a\\1\ else \end{cases}$

e.g. asymmetric loss 非对称损失

$L(g,a)=\begin{cases}100\ if \ g=1 ,a=-1\\1\ if\ g=-1,a=1\\0\ else \end{cases}$

100：假正例（FP:false positive）判假为真

1:假反例（FN:false negative）判真为假

对 $n{'}$ 个未知数据，定义测试误差函数Test error:

$\varepsilon (h)=\frac{1}{n{'}}\sum_{i=n+1}^{n+n{'}}L(h(x^{(i)}),y^{(i)})$

定义训练误差函数： $\varepsilon(h)=\frac{1}{n}\sum_{i=1}^{n}L(h(x^{(i)}),y^{(i)})$

我们希望找到一个学习算法：使 $D_n\rightarrow leaning\ algo\rightarrow h$

思考：h 是否依赖于D？如何确定参数？

L2 感知器 perceptron

Perceptron algorithm

伪代码如下：

Perceptron( $D_n;\tau$ )

Initialize $\theta=[0 0 ... 0]^{\top}$

Initialize $\theta_0 = 0$

for t = 1 to $\tau$

change = False

for i = 1 to n

$if\ y^{(i)}(\theta^{\top}x^{(i)}+\theta_0)\leq0\ \\\ set \ \theta=\theta+y^{(i)}x^{(i)}\ \ \ *\\set \ \theta_0=\theta_0+y^{(i)}\ \ \ *$ an updata

change = True

if not change

break

return $\theta,\theta_0$

python代码：

from matplotlib import pyplot as plt
import numpy as np
import math

def perceptron(D, T,theta,theta0):
    for _ in range(T):
        changed = False
        for i in range(len(D)):
            temp = np.matmul(theta,D[i])+theta0
            if y[i] * temp <= 0:
                theta = np.add(theta,np.multiply(y[i],D[i]))
                theta0 = theta0 + y[i]
                changed = True
        if not changed:
            break
    return theta,theta0


D = [[10, 70], [20, 30], [30, 90], [60, 75], [40, 70], [40, 15], [55, 55], [30, 35], [50, 15], [90, 40]]
y = [-1,-1,1,1,1,-1,1,-1,-1,1]
theta = np.transpose([0, 0])
theta0 = 0
T = 1
t,t0 = perceptron(D, T,theta,theta0)
positive_x = []
positive_y = []
negative_x = []
negative_y = []
i = 0
for i in range(len(D)):
    if y[i]<0:
        negative_x.append(D[i][0])
        negative_y.append(D[i][1])
    else:
        positive_x.append(D[i][0])
        positive_y.append(D[i][1])
x = np.linspace(0,100,100,endpoint=False)

mod_t = math.sqrt(t[0]**2+t[1]**2)
y = mod_t*(-t0)- t[0]/t[1] * x
plt.title('Perceptron')
plt.plot(positive_x,positive_y,'+')
plt.plot(negative_x,negative_y,'_')
plt.plot(x,y,'b-')
plt.show()

结果：

*式做了什么？

尝试转向更正确的分类：

$y^{(i)}((\theta+y^{(i)}x^{(i)})^{\top}x^{(i)}+(\theta_0+y^{(i)}))\\= y^{(i)}(\theta^{\top}x^{(i)}+\theta_0)+(y^{(i)})^{2}(x^{(i)\top}x^{(i)}+1)\\= y^{(i)}(\theta^{\top}x^{(i)}+\theta_0)+(||x^{(i)}||^{2}+1)$

让原来的表达式添加一个正数，使guess 与actual 的符号相同

Classifier Quality 分类器性能分析

定义signed distance 符号距离：

$=x_{proj\ on \ \theta}^{*}-signed \ distance\ of \ line\ to\ origin\\= \frac{\theta^{\top}x^{*}}{||\theta||}-\frac{-\theta_0}{||\theta||}\\= \frac{\theta^{\top}x^{*}+\theta_0}{||\theta||}$

定义margin 边距：

$y^{*}(\frac{\theta^{\top}x^{*}+\theta_0}{||\theta||})$

定义整个训练集 $D_n$ 的边距：

$min_{i=1..n}\left \{ y^{(i)}(\frac{\theta^{\top}x^{*}+\theta_0}{||\theta||}) \right \}$

Perceptron Performance 感知器表现

Assumptions:

A : 通过原点（i.e. $\theta_0 = 0$ ）

B : 存在一个 $\theta^{*}$ 和 $\gamma$ 使边距 $y^{*}(\frac{\theta^{\top}x^{*}+\theta_0}{||\theta||})>\gamma$

C : 特征向量在一个半径R内 $||x^{(i)}||\leq R$

Conclusion:

迭代次数 $\tau$ 至多为 $(R/\gamma)^{2}$ , 使 $\varepsilon (h) = 0$

如何让classifiers通过原点？

添加一个维度：

$x_{new}\in\mathbb{R}^{d+1},\theta_{new}\in\mathbb{R}^{d+1}\\ x_{new}=[x_1,x_2,...,x_d,1],\theta_{new}=[\theta_1,\theta_2,..,\theta_d,\theta_0]\\ x_{new,1;d}:\theta_{new}x_{new}=0$