机器学习笔记--线性模型之多元线性回归

多元线性回归

由最小二乘法导出损失函数
证明损失函数是凸函数
对损失函数求一阶导数
令一阶导数等于0

类似地把 w w w b b b吸收入向量形式 w ^ = ( w ; b ) \hat{\boldsymbol{w}}=(\boldsymbol{w} ; \boldsymbol{b}) w^=(w;b)把数据集表示为一个 m × ( d + 1 ) m \times(d+1) m×(d+1)矩阵。前d个元素对应于示例的d个属性值,最后一个元素恒置为1
X = ( x 11 x 12 … x 1 d 1 x 21 x 22 … x 2 d 1 ⋮ ⋮ ⋱ ⋮ ⋮ x m 1 x m 2 … x m d 1 ) = ( x 1 T 1 x 2 T 1 ⋮ ⋮ x m T 1 ) = ( x ^ 1 T x ^ 2 Γ ⋮ x ^ m T ) \mathbf{X}=\left(\begin{array}{ccccc}{x_{11}} & {x_{12}} & {\dots} & {x_{1 d}} & {1} \\ {x_{21}} & {x_{22}} & {\dots} & {x_{2 d}} & {1} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & {\vdots} \\ {x_{m 1}} & {x_{m 2}} & {\dots} & {x_{m d}} & {1}\end{array}\right)=\left(\begin{array}{cc}{\boldsymbol{x}_{1}^{\mathrm{T}}} & {1} \\ {\boldsymbol{x}_{2}^{\mathrm{T}}} & {1} \\ {\vdots} & {\vdots} \\ {\boldsymbol{x}_{m}^{\mathrm{T}}} & {1}\end{array}\right) = \left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{\Gamma}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}}\end{array}\right) X=x11x21xm1x12x22xm2x1dx2dxmd111=x1Tx2TxmT111=x^1Tx^2Γx^mT

w w w和b的吸收:

f ( x i ) = w 1 x i 1 + w 2 x i 2 + … + w d x i d + b f\left(\boldsymbol{x}_{i}\right)=w_{1} x_{i 1}+w_{2} x_{i 2}+\ldots+w_{d} x_{i d}+b f(xi)=w1xi1+w2xi2++wdxid+b
将b看作 w d + 1 ⋅ 1 w_{d+1}\cdot1 wd+11

( w 1 , w 2 . . . w d w d + 1 ) (w_1,w_2...w_d w_{d+1}) (w1,w2...wdwd+1)记作 w d ^ \hat{w_{d}} wd^, ( x i 1 x i 2 . . . x i d 1 ) (x_{i1} x_{i2}...x_{id} 1) (xi1xi2...xid1)记作 x i ^ \hat{x_i} xi^


f ( x i ^ ) = w ^ T x i ^ f(\hat{x_i}) = \hat{w}^T\hat{x_{i}} f(xi^)=w^Txi^

由最小二乘法导出损失函数
E w ^ = ∑ i = 1 m ( y i − f ( x i ^ ) ) 2 = ∑ i = 1 m ( y i − w ^ T x i ^ ) 2 E_{\hat{w}} = \sum_{i=1}^{m}(y_{i}-f(\hat{x_i}))^{2 }\\ =\sum_{i=1}^{m}(y_i-\hat{w}^T\hat{x_{i}})^2 Ew^=i=1m(yif(xi^))2=i=1m(yiw^Txi^)2

E w ^ = ∑ i = 1 m ( y i − w ^ T x ^ i ) 2 = ( y 1 − w ^ T x ^ 1 ) 2 + ( y 2 − w ^ T x ^ 2 ) 2 + … + ( y m − w ^ T x ^ m ) 2 \begin{aligned} E_{\hat{\boldsymbol{w}}} &=\sum_{i=1}^{m}\left(y_{i}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{i}\right)^{2} \\ &=\left(y_{1}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{1}\right)^{2}+\left(y_{2}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{2}\right)^{2}+\ldots+\left(y_{m}-\hat{\boldsymbol{w}}^{T} \hat{\boldsymbol{x}}_{m}\right)^{2} \end{aligned} Ew^=i=1m(yiw^Tx^i)2=(y1w^Tx^1)2+(y2w^Tx^2)2++(ymw^Tx^m)2

E w ^ = ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋯ y d − w ^ T x ^ d ) ( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y d − w ^ T x ^ d ) E_{\hat{w}}=\left(\begin{array}{cccc}{y_{1}-\hat{w}^{T} \hat{x}_{1}} & {y_{2}-\hat{w}^{T} \hat{x}_{2}} & {\cdots} & {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)\left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right) Ew^=(y1w^Tx^1y2w^Tx^2ydw^Tx^d)y1w^Tx^1y2w^Tx^2ydw^Tx^d

由于 w ^ T x i ^ \hat{w}^T\hat{x_i} w^Txi^为标量所以对它进行转置没有影响
( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y d − w ^ T x ^ d ) = ( y 1 y 2 ⋮ y d ) − ( w ^ T x ^ 1 w ^ T x ^ 2 ⋮ w ^ T x ^ d ) = ( y 1 y 2 ⋮ y d ) − ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ d T w ^ ) \left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{w}^{T} \hat{x}_{1}} \\ {\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{x}_{1}^{T} \hat{w}} \\ {\hat{x}_{2}^{T} \hat{w}} \\ {\vdots} \\ {\hat{x}_{d}^{T} \hat{w}}\end{array}\right) y1w^Tx^1y2w^Tx^2ydw^Tx^d=y1y2ydw^Tx^1w^Tx^2w^Tx^d=y1y2ydx^1Tw^x^2Tw^x^dTw^
又因为
( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ d T w ^ ) = ( x ^ 1 T x ^ 2 T ⋮ x ^ m T ) ⋅ w ^ = X ⋅ w ^ \left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{d}^{T} \hat{\boldsymbol{w}}}\end{array}\right)=\left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T}} \\ {\hat{\boldsymbol{x}}_{2}^{T}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{m}^{T}}\end{array}\right) \cdot \boldsymbol{\hat { w }}=\mathbf{X} \cdot \hat{\boldsymbol{w}} x^1Tw^x^2Tw^x^dTw^=x^1Tx^2Tx^mTw^=Xw^
所以
( y 1 − w ^ T x ^ 1 y 2 − w ^ T x ^ 2 ⋮ y d − w ^ T x ^ d ) = ( y 1 y 2 ⋮ y d ) − ( x ^ 1 T w ^ x ^ 2 T w ^ ⋮ x ^ d T w ^ ) = y − X w ^ \left(\begin{array}{c}{y_{1}-\hat{w}^{T} \hat{x}_{1}} \\ {y_{2}-\hat{w}^{T} \hat{x}_{2}} \\ {\vdots} \\ {y_{d}-\hat{w}^{T} \hat{x}_{d}}\end{array}\right)=\left(\begin{array}{c}{y_{1}} \\ {y_{2}} \\ {\vdots} \\ {y_{d}}\end{array}\right)-\left(\begin{array}{c}{\hat{\boldsymbol{x}}_{1}^{T} \hat{\boldsymbol{w}}} \\ {\hat{\boldsymbol{x}}_{2}^{T} \hat{\boldsymbol{w}}} \\ {\vdots} \\ {\hat{\boldsymbol{x}}_{d}^{T} \hat{\boldsymbol{w}}}\end{array}\right)=\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}} y1w^Tx^1y2w^Tx^2ydw^Tx^d=y1y2ydx^1Tw^x^2Tw^x^dTw^=yXw^
y = ( y 1 ; y 2 ; … ; y m ) \boldsymbol{y}=\left(y_{1} ; y_{2} ; \ldots ; y_{m}\right) y=(y1;y2;;ym)

目标:
w ^ ∗ = arg ⁡ min ⁡ w ^ ( y − X w ^ ) T ( y − X w ^ ) \hat{\boldsymbol{w}}^{*}=\underset{\hat{\boldsymbol{w}}}{\arg \min }(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{\mathrm{T}}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}}) w^=w^argmin(yXw^)T(yXw^)

∂ E w ^ ∂ w ^ = ∂ ∂ w ^ [ ( y − X w ^ ) T ( y − X w ^ ) ] = ∂ ∂ w ^ [ ( y T − w ^ T X T ) ( y − X w ^ ) ] = ∂ ∂ w ^ [ y T y − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = ∂ ∂ w ^ [ − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] \begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})^{T}(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\left(\boldsymbol{y}^{T}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T}\right)(\boldsymbol{y}-\mathbf{X} \hat{\boldsymbol{w}})\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[\boldsymbol{y}^{T} \boldsymbol{y}-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \end{aligned} w^Ew^=w^[(yXw^)T(yXw^)]=w^[(yTw^TXT)(yXw^)]=w^[yTyyTXw^w^TXTy+w^TXTXw^]=w^[yTXw^w^TXTy+w^TXTXw^]

矩阵微分公式:

【标量-向量】的矩阵微分公式为:

其中 x = ( x 1 , x 2 , . . . , x n ) T x = (x_1,x_2,...,x_n)^T x=(x1,x2,...,xn)T为n维向量, y y y x x x的n元标量函数。
∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋮ ∂ y ∂ x n ) \frac{\partial y}{\partial x}=\left(\begin{array}{c}{\frac{\partial y}{\partial x_{1}}} \\ {\frac{\partial y}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial y}{\partial x_{n}}}\end{array}\right) xy=x1yx2yxny
(分母布局)【默认采用】
∂ y ∂ x = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ⋯ ∂ y ∂ x n ) \frac{\partial y}{\partial x}=\left(\begin{array}{ccc}{\frac{\partial y}{\partial x_{1}}} & {\frac{\partial y}{\partial x_{2}}} & {\cdots} & {\frac{\partial y}{\partial x_{n}}}\end{array}\right) xy=(x1yx2yxny)
(分子布局)

由【标量-向量】的矩阵微分公式可推得:
∂ x T a ∂ x = ∂ a T x ∂ x = ( ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x 1 ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x 2 ⋮ ∂ ( a 1 x 1 + a 2 x 2 + … + a n x n ) ∂ x n ) = ( a 1 a 2 ⋮ a n ) \frac{\partial \boldsymbol{x}^{T} \boldsymbol{a}}{\partial \boldsymbol{x}}=\frac{\partial \boldsymbol{a}^{T} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\begin{array}{c}{\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{1}}} \\ {\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial\left(a_{1} x_{1}+a_{2} x_{2}+\ldots+a_{n} x_{n}\right)}{\partial x_{n}}}\end{array}\right)=\left(\begin{array}{c}{a_{1}} \\ {a_{2}} \\ {\vdots} \\ {a_{n}}\end{array}\right) xxTa=xaTx=x1(a1x1+a2x2++anxn)x2(a1x1+a2x2++anxn)xn(a1x1+a2x2++anxn)=a1a2an
同理可推得:
∂ x T B x ∂ x = ( B + B T ) x \frac{\partial \boldsymbol{x}^{T} \mathbf{B} \boldsymbol{x}}{\partial \boldsymbol{x}}=\left(\mathbf{B}+\mathbf{B}^{T}\right) \boldsymbol{x} xxTBx=(B+BT)x

∂ E w ^ ∂ w ^ = ∂ ∂ w ^ [ − y T X w ^ − w ^ T X T y + w ^ T X T X w ^ ] = − ∂ y T X w ^ ∂ w ^ − ∂ w ^ T X T y ∂ w ^ + ∂ w ^ T X T X w ^ ∂ w ^ \begin{aligned} \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}} &=\frac{\partial}{\partial \hat{\boldsymbol{w}}}\left[-\boldsymbol{y}^{T} \mathbf{X} \hat{\boldsymbol{w}}-\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}+\hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}\right] \\ &=-\frac{\partial \boldsymbol{y}^{T} \mathbf{X} \hat{w}}{\partial \hat{\boldsymbol{w}}}-\frac{\partial \hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \boldsymbol{y}}{\partial \hat{\boldsymbol{w}}}+\frac{\partial \hat{\boldsymbol{w}}^{T} \mathbf{X}^{T} \mathbf{X} \hat{\boldsymbol{w}}}{\partial \hat{\boldsymbol{w}}} \end{aligned} w^Ew^=w^[yTXw^w^TXTy+w^TXTXw^]=w^yTXw^w^w^TXTy+w^w^TXTXw^

∂ E w ^ ∂ w ^ = − X T y − X T y + ( X T X + X T X ) w ^ \frac{\partial E_{\hat{w}}}{\partial \hat{w}}=-\mathbf{X}^{T} \boldsymbol{y}-\mathbf{X}^{T} \boldsymbol{y}+\left(\mathbf{X}^{T} \mathbf{X}+\mathbf{X}^{T} \mathbf{X}\right) \hat{w} w^Ew^=XTyXTy+(XTX+XTX)w^

= 2 X T ( X w ^ − y ) =2 \mathbf{X}^{T}(\mathbf{X} \hat{w}-\boldsymbol{y}) =2XT(Xw^y)

∂ E w ^ ∂ w ^ = 2 X T ( X w ^ − y ) \frac{\partial E_{\hat{\boldsymbol{w}}}}{\partial \hat{\boldsymbol{w}}}=2 \mathbf{X}^{\mathrm{T}}(\mathbf{X} \hat{\boldsymbol{w}}-\boldsymbol{y}) w^Ew^=2XT(Xw^y)

凸集:设集合 D ∈ R n D \in R^{n} DRn,如果对任意的 x , y ∈ D x,y \in D x,yD与任意的 a ∈ [ 0 , 1 ] a \in [0,1] a[0,1],有 a x + ( 1 − a ) y ∈ D ax+(1-a)y \in D ax+(1a)yD则称集合D是凸集。

凸集的几何意义是:若两个点属于此集合,则这两点连线上的任意一点均属于此集合。

多元实值函数的一级导数:

梯度的定义:
∇ f ( x ) = ( ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ) \nabla f(\boldsymbol{x})=\left(\begin{array}{c}{\frac{\partial f(\boldsymbol{x})}{\partial x_{1}}} \\ {\frac{\partial f(\boldsymbol{x})}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(\boldsymbol{x})}{\partial x_{n}}}\end{array}\right) f(x)=x1f(x)x2f(x)xnf(x)

多元实值函数的二级导数:

海塞因矩阵
∇ 2 f ( x ) = [ ∂ 2 f ( x ) ∂ x 1 2 ∂ 2 f ( x ) ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x 1 ∂ x n ∂ 2 f ( x ) ∂ x 2 ∂ x 1 ∂ 2 f ( x ) ∂ x 2 2 ⋯ ∂ 2 f ( x ) ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ( x ) ∂ x n ∂ x 1 ∂ 2 f ( x ) ∂ x n ∂ x 2 ⋯ ∂ 2 f ( x ) ∂ x n 2 ] \nabla^{2} f(\boldsymbol{x})=\left[\begin{array}{cccc}{\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{1} \partial x_{n}}} \\ {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2}^{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{2} \partial x_{n}}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{1}}} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{n}^{2}}}\end{array}\right] 2f(x)=x122f(x)x2x12f(x)xnx12f(x)x1x22f(x)x222f(x)xnx22f(x)x1xn2f(x)x2xn2f(x)xn22f(x)
f ( x ) f(x) f(x) x x x各变元的所有二阶偏导数都连续,则 ∂ 2 f ( x ) ∂ x i ∂ x j = ∂ 2 f ( x ) ∂ x j ∂ x i \frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{i} \partial x_{j}}=\frac{\partial^{2} f(\boldsymbol{x})}{\partial x_{j} \partial x_{i}} xixj2f(x)=xjxi2f(x)此时 ∇ 2 f ( x ) \nabla^{2} f(\boldsymbol{x}) 2f(x)为对称矩阵。

多元实值函数凹凸性判定定理:

D ⊂ R n D \subset R^{n} DRn是非空开凸集,f: D ⊂ R n → R D \subset R^{n} \rightarrow R DRnR(即n元实值函数),且f(x)在D上二阶连续可微,如果 f ( x ) f(x) f(x)的Hessian矩阵 ∇ 2 f ( x ) \nabla^{2} f(\boldsymbol{x}) 2f(x)在D上是正定的,则 f ( x ) f(x) f(x)是D上的严格凸函数。

凸充分性定理:

若f: R n → R R^{n} \rightarrow R RnR是凸函数,且 f ( x ) f(x) f(x)一阶连续可微,则 x ∗ \boldsymbol{x}^{*} x是全局解的充分必要条件是 ∇ f ( x ∗ ) = 0 \nabla f\left(\boldsymbol{x}^{*}\right)=\mathbf{0} f(x)=0

∂ 2 E w ^ ∂ w ^ ∂ w ^ T = ∂ ∂ w ^ ( ∂ E w ^ ∂ w ^ ) = ∂ ∂ w ^ [ 2 X T ( X w ^ − y ) ] = ∂ ∂ w ^ ( 2 X T X w ^ − 2 X T y ) \begin{aligned} \frac{\partial^{2} E_{\hat{w}}}{\partial \hat{w} \partial \hat{w}^{T}} &=\frac{\partial}{\partial \hat{w}}\left(\frac{\partial E_{\hat{w}}}{\partial \hat{w}}\right) \\ &=\frac{\partial}{\partial \hat{w}}\left[2 \mathbf{X}^{T}(\mathbf{X} \hat{w}-y)\right] \\ &=\frac{\partial}{\partial \hat{w}}\left(2 \mathbf{X}^{T} \mathbf{X} \hat{w}-2 \mathbf{X}^{T} \boldsymbol{y}\right) \end{aligned} w^w^T2Ew^=w^(w^Ew^)=w^[2XT(Xw^y)]=w^(2XTXw^2XTy)

= 2 X T X =2X^TX =2XTX

当$
\mathbf{X}^{\mathrm{T}} \mathbf{X}
$为满秩矩阵或正定矩阵
,则可判定为凸函数。令上式为零
w ^ ∗ = ( X T X ) − 1 X T y \hat{\boldsymbol{w}}^{*}=\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y} w^=(XTX)1XTy
x ^ i = ( x i , 1 ) \hat{\boldsymbol{x}}_{i}=\left(\boldsymbol{x}_{i}, 1\right) x^i=(xi,1)

得到多元线性回归模型为:
f ( x ^ i ) = x ^ i T ( X T X ) − 1 X T y f\left(\hat{\boldsymbol{x}}_{i}\right)=\hat{\boldsymbol{x}}_{i}^{\mathrm{T}}\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-1} \mathbf{X}^{\mathrm{T}} \boldsymbol{y} f(x^i)=x^iT(XTX)1XTy

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值