# 监督学习

## 线性回归

hθ(x)=θ0+θ1x1+θ2x2+=i=0nθixi,xo=1

hθ(x)=θ0+θ1x1+θ2x22+=i=0nθixii
J(θ)=12i=1m(hθ(x(i))y(i))2

### 梯度下降

LMS是采用梯度下降法来求得极小值：θj:=θjαθjJ(θ)$\theta_j:=\theta_j-\alpha\frac \partial{\partial\theta_j}J(\theta)$，因为我们需要求出所有的θ$\theta$现在需要对J(θ)$J(\theta)$θj$\theta_j$的偏导，书上假设只有一组数据(x,y)：
θjJ(θ)=12θj(hθ(x)y)2$\frac\partial{\partial\theta_j}J(\theta)=\frac12\frac\partial{\partial\theta_j}(h_\theta(x)-y)^2$
=212(hθ(x)y)θj(hθ(x)y)$=2\cdot\frac12(h_\theta(x)-y)\cdot\frac\partial{\partial\theta_j}(h_\theta(x)-y)$
=(hθ(x)y)θj(ni=0θixiy)$=(h_\theta(x)-y)\cdot\frac\partial{\partial\theta_j}(\sum_{i=0}^n\theta_ix_i-y)$ ———–求和上面的n为主题数
=(hθ(x)y)xj$=(h_\theta(x)-y)x_j$………………j为求第j个参数时相应的那个主题下的数据

n$n$ = 特征数目

x(i)$x^{(i)}$ = 第i组训练样本

x(i)j$x_j^{(i)}$ = 第i组训练样本的第j个特征值

For convenience of notation,define x0=1$x_0=1$ => x(i)0=1$x_0^{(i)}=1$

X=x0x1x2...xn$X=\left[\begin{matrix}x_0\\x_1\\x_2\\.\\.\\.\\x_n\end{matrix}\right]$θ=θ0θ1θ2...θn$\theta = \left[\begin{matrix}\theta_0\\\theta_1\\\theta_2\\.\\.\\.\\\theta_n\end{matrix}\right]$

SO we get that：hθ(x)=θX$h_\theta(x)=\theta^\intercal X$

#### 批量梯度下降

Repeat until convergence {
θj:=θj+αmi=1(y(i)hθ(x(i)))x(i)j$\theta_j:=\theta_j+\alpha\sum_{i=1}^m(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}$   (for every j)
}

def matchGradientDescent(X,Y,alpha,numIterations):
m = X.shape[0]
n=X.shape[1]+1
X = np.column_stack((np.ones(m), X))
X=X.transpose()
theta = np.zeros(n)
for iter in range(0, numIterations):
hypothesis = np.dot(theta,X)
loss = hypothesis - Y
for j in range(0,n):
aJ=np.sum(loss*X[j])/m
theta[j] = theta[j] - alpha * aJ
return theta

X,Y=make_regression(n_samples=200, n_features=1, n_informative=1, random_state=0, noise=50)
Y=np.array(Y).transpose()
plt.plot(X,Y,'.')
x=np.arange(-3,3,0.01)
plt.plot(x,theta[0]+theta[1]*x)

#### 随机梯度下降

Loop {
for i=1 to m , {
θj:=α(y(i)hθ(x(i)))x(i)j$\theta_j:=\alpha(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}$   (for every j)
}
}

def stochasticGradientDescent(X,Y,alpha,numIterations):
m = X.shape[0]
n=X.shape[1]+1
X = np.column_stack((np.ones(m), X))
X=X.transpose()
theta = np.zeros(n)
for iter in range(0, numIterations):
for i in range(0,m):
for j in range(0, n):
hypothesis = np.dot(theta, X[:,i])
loss = hypothesis - Y[i]
aJ = loss*X[j][i]
theta[j] = theta[j] - alpha * aJ
return theta

x, y = make_regression(n_samples=200, n_features=1, n_informative=1, random_state=0, noise=50)
alpha = 0.01
y=np.array(y).transpose()
theta = stochasticGradientDescent( x, y,alpha, 1000) # plot
plt.plot(x, y, '.')
s = np.arange(-3, 3, 0.01)
plt.plot(s,theta[0]+theta[1]*s)