Machine Learning
1. Linear Regression with One Variable
1. Basic theory
-
Objective Function(Hypothesis)
h θ ( x ) = θ 0 + θ 1 x h_{\theta}(x) = \theta_0+\theta_1x hθ(x)=θ0+θ1x
-
Parameters
θ 0 , θ 1 \theta_0,\theta_1 θ0,θ1
-
Cost/Loss Function
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(\theta_0,\theta_1) = \frac{1}{2m}\sum\limits^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 J(θ0,θ1)=2m1i=1∑m(hθ(x(i))−y(i))2 (square error function)
-
Goal
m i n i m i z e θ 0 , θ 1 J ( θ 0 , θ 1 ) minimize_{\theta_0,\theta_1}J(\theta_0,\theta_1) minimizeθ0,θ1J(θ0,θ1)
2. Gradient descent algorithm
r e p e a t u n t i l c o n v e r g e n c e repeat\quad until\quad convergence repeatuntilconvergence { \{ {
θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) \theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) θj:=θj−α∂θj∂J(θ0,θ1) f o r j = 0 a n d j = 1 for\ j= 0 \ and\ j = 1 for j=0 and j=1
} \} }
w h e r e , α i s t h e l e a r n i n g r a t e where,\ \alpha\ is \ the \ learning\ rate where, α is the learning rate
Simultaneous update
t
e
m
p
0
:
=
θ
0
−
α
∂
∂
θ
0
J
(
θ
0
,
θ
1
)
temp0 := \theta_0 - \alpha\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1)
temp0:=θ0−α∂θ0∂J(θ0,θ1)
t e m p 1 : = θ 1 − α ∂ ∂ θ j J ( θ 0 , θ 1 ) temp1: = \theta_1-\alpha \frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) temp1:=θ1−α∂θj∂J(θ0,θ1)
θ 0 : = t e m p 0 \theta_0:=temp0 θ0:=temp0
θ 1 : = t e m p 1 \theta1:=temp1 θ1:=temp1
r e p e a t u n t i l c o n v e r g e n c e { repeat\quad until\quad convergence\{ repeatuntilconvergence{
θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) \theta_0:=\theta_0-\alpha\frac{1}{m}\sum\limits_{i=1} ^m(h_\theta(x^{(i)})-y^{(i)}) θ0:=θ0−αm1i=1∑m(hθ(x(i))−y(i))
KaTeX parse error: Expected group after '_' at position 41: …\frac{1}{m}\sum_̲\limits{i=1} ^m…
} u p d a t a θ 0 a n d θ 1 s i m u l t a n e o u s l y \}\\updata\ \theta_0\ and\ \theta_1\ simultaneously }updata θ0 and θ1 simultaneously
2.Linear Regression with Multiple Variables
1. Basic theory
$h_\theta(x) = \theta_0+\theta_1x_1+\theta_2x_2+\cdot\cdot\cdot+\theta_nx_n$
For convenience of notation, define x 0 = 1 ( x 0 ( i ) = 1 ) \ x_0 = 1\ (x^{(i)}_0 = 1) x0=1 (x0(i)=1).
Let
f e c t u r e v e c t o r x = [ x 0 x 1 x 2 ⋅ ⋅ ⋅ x n ] ∈ R n + 1 p a r a m e t e r v e c t o r θ = [ θ 0 θ 1 θ 2 ⋅ ⋅ ⋅ θ n ] ∈ R n + 1 fecture\ vector\ \boldsymbol{x} =\left[ \begin{matrix} x_0\\x_1\\x_2\\\cdot\\\cdot\\\cdot\\x_n \end{matrix} \right] \in \mathbb{R}^{n+1}\quad\quad\quad\quad parameter\ vector\ \boldsymbol{\theta} =\left[ \begin{matrix} \theta_0\\\theta_1\\\theta_2\\\cdot\\\cdot\\\cdot\\\theta_n \end{matrix} \right] \in \mathbb{R}^{n+1} fecture vector x=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡x0x1x2⋅⋅⋅xn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∈Rn+1parameter vector θ=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡θ0θ1θ2⋅⋅⋅θn⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤∈Rn+1
So,
h
θ
(
x
)
=
θ
0
x
0
+
θ
1
x
1
+
θ
2
x
2
+
⋅
⋅
⋅
+
θ
n
x
n
=
θ
T
x
h_\theta(x) = \theta_0x_0+\theta_1x_1+\theta_2x_2+\cdot\cdot\cdot+\theta_nx_n =\boldsymbol {\theta}^T \boldsymbol{x}
hθ(x)=θ0x0+θ1x1+θ2x2+⋅⋅⋅+θnxn=θTx
Support Vector Machine
svm 应用实例
1. 简单例子
2. 划分超平面
#sklearn划分超平面
print(__doc__)
import numpy as np
import pylab as pl #绘图功能
from sklearn import svm
#创建 40 个点
np.random.seed(0)#让每次运行程序生成的随机样本点不变
#生成训练实例并保证是线性可分的
#np.r_表示将矩阵在行方向上进行相连
#random.randn(a,b) 表示生成a行b列的矩阵,且随机数服从标准正态分布
#array(20,2)-[2,2] 相当于给每一行的两个数都减去2
X = np.r_[np.random.randn(20,2) - [2,2],np.random.randn(20,2)+[2,2]]
# 两个类别 每类有 20 个点, Y 为 40 行 1 列的列向量
Y = [0]*20+[1]*20
#建立svm模型
clf = svm.SVC(kernel = "linear")
clf.fit(X,Y)
#获得划分超平面
#划分超平面原方程:w0x0+w1x1+b = 0
#将其转化为点斜式方程,并把 x0看作 x, x1看作 y, b看作w2
#点斜式:y = -(w0/w1)x-(w2/w1)
w = clf.coef_[0]# w 是一个二维数据,coef就是w = [w0,w1]
a = -w[0]/w[1] # 斜率
xx = np.linspace(-5,5) #从 -5 到5 产生一些连续的值(随机的)
yy = a * xx-(clf.intercept_[0])/w[1] #带入 x 的值,获得直线方程
#画出和划分超平面平行且经过支持向量的两条线(斜率相同,截距不同)
b = clf.support_vectors_[0] # 取出第一个支持向量点
yy_down = a *xx +(b[1] - a*b[0])
b = clf.support_vectors_[-1]# 取出最后一个支持向量点
yy_up = a*xx +(b[1] - a*b[0])
#注意,b是确定的
#print(clf.support_vectors_)
#查看相关的参数值
print("w:",w)
print("a:",a)
print("support_vectors_:",clf.support_vectors_)
print("clf.coef_:",clf.coef_)
print("X:",X)
print("Y:",Y)
#在scikit-learn中,coef_保存了线性模型中划分超平面的参数向量
#形式为(n_classes,n_features).若n_classes>1,则为多分类问题
#(1,n_features)为二分类。
#绘制划分超平面,边际平面和样本点
pl.plot(xx,yy,'k-')
pl.plot(xx,yy_down,'k--')
pl.plot(xx,yy_up,'k--')
# 圈出支持向量
pl.scatter(clf.support_vectors_[:,0],clf.support_vectors_[:,1],
s = 80, facecolors = "none")
pl.scatter(X[:,0],X[:,1],c = Y, cmap = pl.cm.Paired)
pl.axis("tight")
pl.show()
``