线性回归（Linear Regression）

最新推荐文章于 2023-10-05 12:01:45 发布

猪猪奋斗记

最新推荐文章于 2023-10-05 12:01:45 发布

阅读量1k

点赞数 1

分类专栏： Machine Learning 文章标签：机器学习梯度下降

本文链接：https://blog.csdn.net/bigbigship/article/details/50451514

版权

Machine Learning 专栏收录该内容

20 篇文章 0 订阅

订阅专栏

基本概念

线性回归( $Linear$ $Regression$ )是利用称为线性回归方程的最小平方函数对一个或多个自变量和因变量之间关系进行建模的一种回归分析。这种函数是一个或多个称为回归系数的模型参数的线性组合。只有一个自变量的情况称为简单回归,大于一个自变量情况的叫做多元回归。

假设函数(Hypothesis function)

公式

代数表示

h θ (x) = θ 0 + θ 1 * x 1 + . . . + θ n * x n

$h_\theta(x)=\theta_0+\theta_1*x_1+...+\theta_n*x_n$
向量表示

h θ (x) = θ τ * X

$h_\theta(x)=\theta^\tau*X$
注：

θ $\theta$ 为

n+1 $n+1$ 维的列向量

X $X$ 为

(n+1)∗m $(n+1)*m$ 维的向量。

代价函数(Cost function)

公式

结合平方误差和公式

J (θ) = 1 m Σ m i = 1 (x (i) - y (i)) 2

$J(\theta)=\frac{1}{m} \Sigma_{i=1}^m(x^{(i)}-y^{(i)})^2$
选择该公式的原因：

假设根据特征的预测结果与实际结果有误差 $\epsilon^{(i)}$ ,那么预测结果 $\theta^\tau x^{(i)}$ 与真实结果 $y^{(i)}$ 满足：

$y (i) = θ τ * x (i) + ϵ (i)$ $y^{(i)}=\theta^\tau*x^{(i)}+\epsilon^{(i)}$
一般来讲，误差满足平均值为0的高斯分布，也就是正态分布。那么x和y的条件概率也就是:
$p (y (i) | x (i); θ) = 1 2 π - - \sqrt σ e x p (( y ( i ) - θ τ x ( i ) ) 2 2 σ 2)$ $p(y^{(i)}|x^{(i)};\theta)=\frac{1}{\sqrt{2\pi}\sigma}exp(\frac{(y^{(i)}-\theta^\tau x^{(i)})^2}{2\sigma^2})$
这样就估计了一条样本的结果概率，然而我们期待的是模型能够在全部样本上预测最准，也就是概率积最大。注意这里的概率积是概率密度函数积，连续函数的概率密度函数与离散值的概率函数不同。这个概率积成为最大似然估计。我们希望在最大似然估计得到最大值时确定θ。那么需要对最大似然估计公式求导，求导结果既是:
$1 2 Σ m i = 1 (θ τ x (i) - y (i)) 2$ $\frac{1}{2} \Sigma_{i=1}^m(\theta^\tau x^{(i)}-y^{(i)})^2$
这就解释了为何误差函数要使用平方和。当然推导过程中也做了一些假定，但这个假定符合客观规律。

代数表示

J (θ) = 1 2 * m Σ m i = 1 (h θ (x (i)) - y (i)) 2

$J(\theta) =\frac{1}{2*m} \Sigma_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
向量表示

J (θ) = 1 2 * m θ τ * X * Y

$J(\theta) = \frac{1}{2*m}\theta^\tau*X*Y$

注： $\theta$ 为 $n+1$ 维的列向量, $X$ 为 $(n+1)*m$ 维的向量, $Y$ 为 $m+1$ 维的列向量。

目标

通过训练找到使得 $J(\theta)$ 值最小的 $\theta$ .我们可以用梯度下降( $gradient$ $descent$ )法来求。

梯度下降法：

基本概念

梯度下降法，就是利用负梯度方向来决定每次迭代的新的搜索方向，使得每次迭代能使待优化的目标函数逐步减小。

在线性回归中的应用

每次对代价函数求偏导，得到前进的方向，然后设定一个学习的速率 $\alpha$ ,使得代价函数的值往小的方向收敛。
代数公式：

θ i = θ i - α * \partial \partial y J (θ)

$\theta_i=\theta_i-\alpha* \frac{\partial }{\partial y} J(\theta)$
结合线性回归:

θ j = θ j - α * Σ m i = 1 (h θ (x (i)) - y (i)) * x (i) j

$\theta_j=\theta_j-\alpha* \Sigma_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})*x_j^{(i)}$
注：

α $\alpha$ 的选择十分关键，过小需要的迭代次数较长，过大可能不会收敛，所有的

θ $\theta$ 需要同时更新(批量梯度下降)
向量表示：

θ = θ - α m * X * (X * θ - Y)

$\theta=\theta-\frac{\alpha}{m}*X*(X*\theta-Y)$
注：这样也保证了同时更新。
可以用线性代数的方法直接求出

θ $\theta$

θ = (X τ * X) - 1 * X τ * Y

$\theta=(X^\tau*X)^{-1}*X^\tau*Y$

matalb Code

$Cost$ $function$

function J = computeCostMulti(X, y, theta)

m = length(y); % number of training examples

J = 0;

predictions = X*theta; %假设函数的值

sqrterror = (predictions-y).^2; %平方误差

J=1/(2*m)*sum(sqrterror);

end

$gradient$ $descent$

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

    theta = theta - alpha/m*(X'*(X*theta-y));

    J_history(iter) = computeCostMulti(X, y, theta);

end

end

实例

链接

房价预测

Code

#include <bits/stdc++.h>

using namespace std;

typedef vector<double > vd;

struct data{
    vd x;
    double y;
    data(){
        x.clear();
        y=0;
    }
    data(const vd &_x,const double &_y){
        x=_x;
        y=_y;
    }
};

double H(vd x,vd thea)//hypothesis function
{
    int len = thea.size();
    double sum = 0;
    for(int i=0;i<len;i++)
        sum=sum+x[i]*thea[i];
    return sum;
}

double getvalue(data dt[],vd thea,int n){//最小二乘得到方差
    double sum = 0;
    for(int i=0;i<n;i++){
        double tmpsum = H(dt[i].x,thea);
        sum+=(tmpsum-dt[i].y)*(tmpsum-dt[i].y);
    }
    return sum/2.0;
}

void gradient(data dt[],vd &thea,int n,double alpha)//梯度下降更新thea
{
    int len = thea.size();
    vd tmp;
    for(int i=0;i<len;i++){
        double tmpsum = 0;
        for(int j=0;j<n;j++){
            tmpsum = tmpsum + alpha*(H(dt[j].x,thea)-dt[j].y)*dt[j].x[i];
        }
        tmp.push_back(thea[i]-tmpsum);
    }
    thea = tmp;
}

void solve(data dt[],vd &thea,int n)//处理迭代
{
    double alpha = 0.005;
    double delta = 1e-6;
    double pre = getvalue(dt,thea,n);
    double now = 0;
    while(abs(pre-now)>delta){
        pre = now;
        now = getvalue(dt,thea,n);
        gradient(dt,thea,n,alpha);
    }
}

int main()
{
    int f,t,m,n;
    while(~scanf("%d%d",&f,&n)){
        data test[110];
        for(int i=0;i<n;i++){
            vd tmp;
            double x;
            tmp.push_back(1.0);
            for(int i=0;i<f;i++){
                scanf("%lf",&x);
                tmp.push_back(x);
            }
            scanf("%lf",&x);
            test[i]=data(tmp,x);
        }
        scanf("%d",&m);
        data ans[110];
        for(int i=0;i<m;i++){
            vd tmp;
            double x;
            tmp.push_back(1.0);
            for(int i=0;i<f;i++){
                scanf("%lf",&x);
                tmp.push_back(x);
            }
            ans[i]=data(tmp,0);
        }
        vd thea;
        for(int i=0;i<=f;i++)
            thea.push_back(0);
        solve(test,thea,n);
        for(int i=0;i<m;i++){
            ans[i].y=H(ans[i].x,thea);
            printf("%.2lf\n",ans[i].y);
        }
    }
    return 0;
}