Multivariate Linear Regression
-
Multiple Features
- X j ( i ) X_j^{(i)} Xj(i) 其中j表示迭代次数,i表示矩阵索引
- 转换
原来: h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_{\theta}(x)=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\cdots+\theta_{n} x_{n} hθ(x)=θ0+θ1x1+θ2x2+⋯+θnxn
define x 0 = 1 x_0=1 x0=1
现在: h θ ( x ) = θ T x = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_{\theta}(x)=\theta^{T} x=\theta_{0} x_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\cdots+\theta_{n} x_{n} hθ(x)=θTx=θ0x0+θ1x1+θ2x2+⋯+θnxn
-
Gradient descent for multilple variables
New algorithm ( n ≥ 1 ) (n \geq 1) (n≥1) : (多个特征变量)
Repeat {θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θj:=θj−αm1∑i=1m(hθ(x(i))−y(i))xj(i)
(simultaneously update θ j \theta_{j} θj for j = 0 , … , n j=0, \ldots, n j=0,…,n)
}
其中:先计算 ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} (hθ(x(i))−y(i))xj(i),再求和 -
Gradient descent in practice I :Feature Scaling (特征缩放)
- 目的:
Get every feature into approximately a − 1 ≤ x i ≤ 1 -1 \leq x_{i} \leq 1 −1≤xi≤1 range.
控制特征范围大致相近,使梯度下降法可以更快的收敛 - 方法
Mean normalization 均值归一化
新特征值 = (特征值 - 均值)/范围
- 目的:
-
Gradient descent in practice II :Learning rate (学习率)
- 判断收敛条件:
代价函数随迭代次数的变化曲线
代价函数没有随着迭代次数的增加而减小时,减小学习率 - 学习率的选取
0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1
- 判断收敛条件:
-
Features and polynomial regression 特征选择和多项式回归
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3 = θ 0 + θ 1 ( size ) + θ 2 ( size ) 2 + θ 3 ( size ) 3 x 1 = ( size ) x 2 = ( size ) 2 x 3 = ( size ) 3 \begin{aligned} h_{\theta}(x) &=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\theta_{3} x_{3} \\ &=\theta_{0}+\theta_{1}(\operatorname{size})+\theta_{2}(\operatorname{size})^{2}+\theta_{3}(\operatorname{size})^{3} \\ x_{1} &=(\operatorname{size}) \\ x_{2} &=(\operatorname{size})^{2} \\ x_{3} &=(\operatorname{size})^{3} \end{aligned} hθ(x)x1x2x3=θ0+θ1x1+θ2x2+θ3x3=θ0+θ1(size)+θ2(size)2+θ3(size)3=(size)=(size)2=(size)3 -
Normal equation 正规方程
- 目的
令代价函数导数为0 ,直接求出最优值,无需迭代 - 表达式
X θ = y X\theta=y Xθ=y 推导出:
θ = ( X T X ) − 1 X T y \theta=\left(X^{T} X\right)^{-1} X^{T} y θ=(XTX)−1XTy - 注意点
使用正规方程时,不需要进行特征缩放 - 正规方程法和梯度下降法的比较
- 目的
梯度下降法 | 正规方程法 |
---|---|
需要确定学习率 | 不需要确定学习率 |
需要多次的迭代 | 不需要迭代 |
复杂度 O ( k n 2 ) O(kn^2) O(kn2) | 复杂度 O ( n 3 ) O(n^3) O(n3), need to calculate inverse of X T X X^{T}X XTX |
适用于特征变量较多时 | 特征变量较多时变慢 |
说明:使用正规方程法,计算矩阵的逆时比较耗时,特征变量的数量超过10000时,建议使用梯度下降法。
- Normal equation and non-invertibility 正规方程以及不可逆性
- 说明:
上述在求解 θ \theta θ时,用到 ( X T X ) − 1 \left(X^{T} X\right)^{-1} (XTX)−1,要考虑到其中 X T X X^{T} X XTX是否可逆?? -
X
T
X
X^{T} X
XTX不可逆的原因
特征变量冗余 & 特征变量过多 - octave操作
- pinv和inv均可求解矩阵的逆,pinv求解的是矩阵的伪逆,即使矩阵的逆不存在,也可以求出来
- 方程求解的代码: pinvX`*X*X`*y
- 说明:
Octave Tutorial
- Basic operations
1 ~= 2 1 && 0 1 || 0 xor(1,0) disp(sprintf('2 decimals: %0.2f', a)) v = 1:0.1:2 w = -6 + sqrt(10)*(randn(1,10000)); hist(w) hist(w,50) % plot histogram using 50 bins I = eye(4)
- Moving Data Around
size(A) size(A,1) length(v) load q1y.dat whos % list variables in workspace (detailed view) clear q1y % clear command without any args clears all vars save hello.mat v save hello.txt v -ascii A = [A, [100; 101; 102]]; % append column vec A(:) % Select all elements as a column vector. C = [A, B] % concatenating A and B matrices side by side C = [A; B] % Concatenating A and B top and bottom
- Computing on data
A * C % matrix multiplication A .* B % element-wise multiplication A .^ 2 v + ones(length(v), 1) A' % 矩阵转置 [val,ind] = max(a) find(a < 3) [r,c] = find(A>=7) % row, column indices for values matching comparison sum(a) prod(a) # 求积 floor(a) % or ceil(a) # 向上取整 向下取整 max(A,[],1) # maximum along columns max(A,[],2) # maximum along rows sum(sum( A .* eye(9) )) sum(sum( A .* flipud(eye(9)) )) # 沿对角线翻转 pinv(A)
- Plotting Data
plot(t,y2,'r'); legend('sin','cos'); print -dpng 'myPlot.png' close; figure(2), clf; axis([0.5 1 -1 1]); figure; imagesc(magic(15)), colorbar, colormap gray; # 灰度矩阵
i = 1;
while true,
v(i) = 999;
i = i+1;
if i == 6,
break;
end;
end
function y = squareThisNumber(x)
y = x^2;
squareThisNumber(2)
# 一个函数,两个返回值
function [y1, y2] = squareandCubeThisNo(x)
y1 = x^2
y2 = x^3
[a,b] = squareandCubeThisNo(x)
```
- Vectorization
目的:简单,提高计算效率
备注:
- submit 提交作业
- 需要生成码的话,从网站上的提交页面复制