【学习笔记】吴恩达机器学习 WEEK2 线性回归 & Octave教程

Multivariate Linear Regression

  1. Multiple Features

    1. X j ( i ) X_j^{(i)} Xj(i) 其中j表示迭代次数,i表示矩阵索引
    2. 转换
      原来: h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_{\theta}(x)=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\cdots+\theta_{n} x_{n} hθ(x)=θ0+θ1x1+θ2x2++θnxn
      define x 0 = 1 x_0=1 x0=1
      现在: h θ ( x ) = θ T x = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n h_{\theta}(x)=\theta^{T} x=\theta_{0} x_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\cdots+\theta_{n} x_{n} hθ(x)=θTx=θ0x0+θ1x1+θ2x2++θnxn
  2. Gradient descent for multilple variables
    New algorithm ( n ≥ 1 ) (n \geq 1) (n1) : (多个特征变量)
    Repeat {

    θ j : = θ j − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} θj:=θjαm1i=1m(hθ(x(i))y(i))xj(i)

    (simultaneously update θ j \theta_{j} θj for j = 0 , … , n j=0, \ldots, n j=0,,n)
    }
    其中:先计算 ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} (hθ(x(i))y(i))xj(i),再求和

  3. Gradient descent in practice I :Feature Scaling (特征缩放)

    1. 目的:
      Get every feature into approximately a − 1 ≤ x i ≤ 1 -1 \leq x_{i} \leq 1 1xi1 range.
      控制特征范围大致相近,使梯度下降法可以更快的收敛
    2. 方法
      Mean normalization 均值归一化
      新特征值 = (特征值 - 均值)/范围
  4. Gradient descent in practice II :Learning rate (学习率)

    1. 判断收敛条件:
      代价函数迭代次数的变化曲线
      代价函数没有随着迭代次数的增加而减小时,减小学习率
    2. 学习率的选取
      0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1
  5. Features and polynomial regression 特征选择和多项式回归
    h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3 = θ 0 + θ 1 ( size ⁡ ) + θ 2 ( size ⁡ ) 2 + θ 3 ( size ⁡ ) 3 x 1 = ( size ⁡ ) x 2 = ( size ⁡ ) 2 x 3 = ( size ⁡ ) 3 \begin{aligned} h_{\theta}(x) &=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}+\theta_{3} x_{3} \\ &=\theta_{0}+\theta_{1}(\operatorname{size})+\theta_{2}(\operatorname{size})^{2}+\theta_{3}(\operatorname{size})^{3} \\ x_{1} &=(\operatorname{size}) \\ x_{2} &=(\operatorname{size})^{2} \\ x_{3} &=(\operatorname{size})^{3} \end{aligned} hθ(x)x1x2x3=θ0+θ1x1+θ2x2+θ3x3=θ0+θ1(size)+θ2(size)2+θ3(size)3=(size)=(size)2=(size)3

  6. Normal equation 正规方程

    1. 目的
      令代价函数导数为0 ,直接求出最优值,无需迭代
    2. 表达式
      X θ = y X\theta=y Xθ=y 推导出:
      θ = ( X T X ) − 1 X T y \theta=\left(X^{T} X\right)^{-1} X^{T} y θ=(XTX)1XTy
    3. 注意点
      使用正规方程时,不需要进行特征缩放
    4. 正规方程法和梯度下降法的比较
梯度下降法正规方程法
需要确定学习率不需要确定学习率
需要多次的迭代不需要迭代
复杂度 O ( k n 2 ) O(kn^2) O(kn2)复杂度 O ( n 3 ) O(n^3) O(n3), need to calculate inverse of X T X X^{T}X XTX
适用于特征变量较多时特征变量较多时变慢

说明:使用正规方程法,计算矩阵的逆时比较耗时,特征变量的数量超过10000时,建议使用梯度下降法。

  1. Normal equation and non-invertibility 正规方程以及不可逆性
    1. 说明:
      上述在求解 θ \theta θ时,用到 ( X T X ) − 1 \left(X^{T} X\right)^{-1} (XTX)1,要考虑到其中 X T X X^{T} X XTX是否可逆??
    2. X T X X^{T} X XTX不可逆的原因
      特征变量冗余 & 特征变量过多
    3. octave操作
      1. pinv和inv均可求解矩阵的逆,pinv求解的是矩阵的伪逆,即使矩阵的逆不存在,也可以求出来
      2. 方程求解的代码: pinvX`*X*X`*y

Octave Tutorial

  1. Basic operations
    1 ~= 2 
    1 && 0
    1 || 0
    xor(1,0)
    disp(sprintf('2 decimals: %0.2f', a))
    v = 1:0.1:2   
    w = -6 + sqrt(10)*(randn(1,10000)); 
    hist(w)    
    hist(w,50) % plot histogram using 50 bins
    I = eye(4)
    
  2. Moving Data Around
    size(A)
    size(A,1)
    length(v)
    load q1y.dat
    whos  			% list variables in workspace (detailed view) 
    clear q1y       % clear command without any args clears all vars
    save hello.mat v
    save hello.txt v -ascii
    A = [A, [100; 101; 102]]; 		% append column vec
    A(:) 							% Select all elements as a column vector.
    C = [A, B] 						% concatenating A and B matrices side by side
    C = [A; B] 						% Concatenating A and B top and bottom
    
  3. Computing on data
    A * C  		% matrix multiplication
    A .* B 		% element-wise multiplication
    A .^ 2
    v + ones(length(v), 1) 
    A'  		% 矩阵转置
    [val,ind] = max(a)
    find(a < 3)
    [r,c] = find(A>=7)  	% row, column indices for values matching comparison
    sum(a)
    prod(a)					# 求积
    floor(a) % or ceil(a)	# 向上取整 向下取整
    max(A,[],1)				# maximum along columns
    max(A,[],2)				# maximum along rows
    sum(sum( A .* eye(9) ))
    sum(sum( A .* flipud(eye(9)) ))			# 沿对角线翻转
    pinv(A)
    
  4. Plotting Data
    plot(t,y2,'r');
    legend('sin','cos');
    print -dpng 'myPlot.png'
    close;  
    figure(2), clf;
    axis([0.5 1 -1 1]);
    
    figure;
    imagesc(magic(15)), colorbar, colormap gray;	# 灰度矩阵
    
5. Control statements & Functions ```python v = zeros(10,1); for i=1:10, v(i) = 2^i; end;
i = 1;
while true, 
  v(i) = 999; 
  i = i+1;
  if i == 6,
    break;
  end;
end

function y = squareThisNumber(x)
	y = x^2;

squareThisNumber(2)

# 一个函数,两个返回值
function [y1, y2] = squareandCubeThisNo(x)
    y1 = x^2
    y2 = x^3
[a,b] = squareandCubeThisNo(x)
```
  1. Vectorization
    目的:简单,提高计算效率

备注:

  1. submit 提交作业
  2. 需要生成码的话,从网站上的提交页面复制
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值