# 线性模型

### 线性模型(linear model)

$f(x) = w^Tx+b$

• 回归任务
• 二分类
• 多分类

#### 线性回归

$f(x_i)=wx_i+b,使得f(x_i)\approx y$

$(w^*,b^*) = arg min\sum_{i=1}^{m}(f(x_i)-y_i)^2\\ = arg min\sum_{i=1}^{m}(y_i-wx_i-b)^2\\$

$E(w,b)$ 分别对 $w$$b$ 求导：

$\frac{\partial E(w,b)}{\partial w}=2(w\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i),\\ \frac{\partial E(w,b)}{\partial b}=2(mb-\sum_{i=1}^{m}(y_i-wx_i)),$

$w = \frac{\sum_{i=1}^{m}y_i(x_i-\bar{x})}{\sum_{i=1}^{m}x_i^2-\frac{1}{m}(\sum_{i=1}^{m}x_i)^2}\\ b = \frac{1}{m}\sum_{i=1}^{m}(y_i-wx_i)$

$f(x_i)=w^Tx_i+b,使得f(x_i)\approx y_i$

$\hat{w}^*=arg min(y-X\hat{w})^T(y-X\hat{w})$

$\frac{\partial E_{\hat{w}}}{\hat{w}} = 2X^T(X\hat{w}-y)$

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import linear_model
from sklearn.metrics import r2_score, mean_squared_error

def setData(n):
# 线性模型实战
data = []
for i in range(n):
x1 = np.random.uniform(0., 10.)  # 随机采样输入x
x2 = np.random.uniform(0., 10.)  # 随机采样输入x
# 采样高斯噪声
eps = np.random.normal(0., 0.1)
# 得到模型的输出
y = 1.452 * x1 + 1.848 * x2 + 2.523 + eps
data.append([x1,x2,y])
data = np.array(data)

return data

n = 270
theta = 0.3

data = setData(n)

# fit_intercept=True 设置偏置
reg = linear_model.LinearRegression(fit_intercept=True)

reg.fit(data[:int(n*theta),0:2],data[:int(n*theta),-1])

print("w：{}\n b：{}".format(reg.coef_,[reg.intercept_]))

pre = reg.predict(data[int(n*theta):n,0:2])

print('Coefficient of determination: %.2f'%r2_score(data[int(n*theta):n,2], pre))
print('Mean squared error: %.2f'%mean_squared_error(data[int(n*theta):n,2], pre))

w：[1.45605848 1.86095833]
b：[2.4422744187779024]
Coefficient of determination: 1.00
Mean squared error: 0.01

# 绘图
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data[int(n*theta):n,0],data[int(n*theta):n,1],data[int(n*theta):n,2],c='b',marker='^')
ax.scatter(data[int(n*theta):n,0],data[int(n*theta):n,1],pre,c='r')

plt.figure()
plt.scatter(data[int(n*theta):n,0],data[int(n*theta):n,1],data[int(n*theta):n,2],alpha=0.2)
plt.scatter(data[int(n*theta):n,0],data[int(n*theta):n,1],pre,c='r',alpha=0.2)

# plt.show()

<matplotlib.collections.PathCollection at 0x1456ad835c0>


$lny=w^Tx+b$

$y = g^{-1}(w^Tx+b)$

#### 对数几率回归（log-linear regression）

# 阶跃函数
def unit_step(x):
y = x>0
return y.astype(np.int)
x = np.linspace(-10,10,200)
y = unit_step(x)
plt.scatter(x,y,alpha=0.2,c='r')
plt.plot(x,y,c='b',alpha=0.8)
plt.xlabel('x')
plt.ylabel('y')
# plt.show()

Text(0,0.5,'y')


$y = \frac{1}{1+e^{-z}}$

# 对数几率函数
def logistic(x):
y = 1/(1+np.exp(-x))

return y
x = np.linspace(-10,10,200)
y = logistic(x)
plt.scatter(x,y,alpha=0.2,c='r')
plt.plot(x,y,c='b',alpha=0.8)
plt.xlabel('x')
plt.ylabel('y')
# plt.show()

Text(0,0.5,'y')


$y = \frac{1}{1+e^{-(w^Tx+b)}}\\ ln(\frac{y}{1-y})=w^Tx+b$
$\frac{y}{1-y}$ 称为几率 (odds),反映了 $x$ 作为正例的相对可能性，对几率取对数则可以得到对数几率(log odds / logit) $ln\frac{y}{1-y}$

···

#### 线性判别分析

$LDA$ 的思想非常朴素：给定训练样例集，设法将样例投影到一条直线上，使得同类样例的投影点尽可能接近、异类样例的投影点尽可能远离；在对新样本进行分类时，将其投影到同样的这条直线上，再根据投影点的位置来确定新样本的类别。

#### 多分类学习




©️2019 CSDN 皮肤主题: 大白 设计师: CSDN官方博客