4.1 Multiple features (多特征量)
Multiple features (variables)
Size ( x 1 ) (x_1) (x1) | Number of bedrooms ( x 2 ) (x_2) (x2) | Number of floors ( x 3 ) (x_3) (x3) | Age of homes ( x 4 ) (x_4) (x4) | Price ( y ) (y) (y) |
---|---|---|---|---|
2104 | 5 | 1 | 45 | 460 |
1416 | 3 | 2 | 40 | 232 |
1534 | 3 | 2 | 30 | 315 |
852 | 2 | 1 | 36 | 178 |
… | … | … | … | … |
Notation :
- n n n = number of features
- x ( i ) x^{(i)} x(i)= input (features) of i t h i^{th} ith training example
- x j ( i ) x_j^{(i)} xj(i) = value of feature j j j in i t h i^{th} ith training example
Hypothesis :
previously : h θ ( x ) = θ 0 + θ 1 x h_{\theta}(x) = \theta_0 + \theta_1x hθ(x)=θ0+θ1x
now : h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 3 x 3 + θ 4 x 4 h_{\theta}(x) = \theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3+\theta_4x_4 hθ(x)=θ0+θ1x1+θ2x2+θ3x3+θ4x4
Multivariate linear regression 多元线性回归
4.2 Gradient descent for multiple variables
How to fit the parameters of that hypothesis ? How to use gradient descent for linear regression with multiple features ?
4.3 Gradient descent in practice I : Feature Scaling(特征缩放)
Feature Scaling : Get every feature into approximately a − 1 ⩽ x i ⩽ 1 -1 \leqslant x_i\leqslant 1 −1⩽xi⩽1 range.
Mean normalization (均值归一化) :
4.4 Gradient descent in practice II : Learning rate
4.5 Features and polynomial regression (特征和多项式回归)
Choosing feature
The price could be a quadratic function (二次函数), or a cubic function (三次函数)
now feature scaling is more important
How to choose features ? Discuss later…
4.6 Normal equation (正规方程)
Normal equation : Method to solve for θ \theta θ analytically.
One step, you get to the optimal value right there.
这里X矩阵上下标大概率写错了
Feature Scaling is no need
When choose gradient descent when choose normal equation ?
4.7 Normal equation and non-invertibility