梯度下降原理及线性回归代码实现（python/java/c++）

最新推荐文章于 2024-08-18 20:54:35 发布

链巨人

最新推荐文章于 2024-08-18 20:54:35 发布

阅读量6.4k

点赞数 9

分类专栏：机器学习 math 文章标签：机器学习

本文链接：https://blog.csdn.net/liangyihuai/article/details/77341551

版权

机器学习同时被 2 个专栏收录

41 篇文章 6 订阅

订阅专栏

math

4 篇文章 0 订阅

订阅专栏

“梯度下降”顾名思义通过一步一步迭代逼近理想结果，当达到一定的精度或者超过迭代次数才退出，所以所获得的结果是一个近似值。在其他博客上面基本都有一个通俗的比喻：从山顶一步步下山。下面将用到几个概念：
- 步长：移动一步的长度。
- 维度：一个空间的表示方式，通常一个模型参数表示一个维度。比如（x，y）表示的是2维空间。
- 梯度：最陡的那个方向。通过求导获得。如果是某一维度的梯度，表示在该维度上变化最快的方向，可以通过求该维度（参数）的偏导数获得。所有参数构成的梯度向量表示空间内该点最陡的方向。关于最陡问题[可以参考]
这里写图片描述

通常使用梯度下降来解决线性拟合的问题。

一个线性方程可以表示为：

f (θ 0, θ 1, θ 2 . . . θ n) = h θ + J θ

$f(\theta_0, \theta_1,\theta_2...\theta_n) = h_\theta+J_\theta$
其中假设函数:

h θ = h θ (x 1, x 2 . . . x n) = θ 0 + θ 1 x 1 + θ 1 x 2 + . . . θ n x n

$h_\theta = h_\theta(x_1, x_2...x_n) = \theta_0+\theta_1x_1+\theta_1x_2+...\theta_nx_n$
其中

θi(i=0,1,2...n) $\theta_i(i = 0, 1, 2...n)$ 为模型参数，

x1,x2...xn $x_1, x_2...x_n$ 和y为样本数据。

错误函数:

J θ = J θ (x 1, x 2 . . . x n) = \sum i = 0 m (h θ (x i) - y i) 2

$J_\theta = J_\theta(x_1, x_2...x_n) = \sum_{i=0}^m(h_\theta(x_i) - y_i)^2$
通过上式知，错误函数越小，

f(θ0,θ1,θ2...θn) $f(\theta_0, \theta_1,\theta_2...\theta_n)$ 就越准确，所以，在进行线性拟合的时候，需要让错误函数最小，而我们的目的是求解模型参数

θi(i=0,1,2...n) $\theta_i(i = 0, 1, 2...n)$ 。

总体思路是：先随机设置模型参数的初始值（相当于把一个人放到山的某一地方），通过求偏导得到梯度向量（即移动的方向或者斜率），梯度向量乘以步长就是一步的量。把参数减去步长，得到新的模型参数。不断更新参数，直到满足一定条件（人往最陡的方向下山）

下面是第i个参数的梯度，通过求该参数的偏导数获得：

\partial \partial θ i J θ = \partial \partial θ i (1 2 m \sum i = 0 m (h θ (x 1, x 2, . . . x n) - y i) 2) = 1 m \sum i = 0 m ((h θ (x 1, x 2, . . . x n) - y i) x i)

$\frac{\partial}{\partial\theta_i}J_\theta =\frac{\partial}{\partial\theta_i}(\frac{1}{2m}\sum_{i=0}^m(h_\theta(x_1,x_2,...x_n) - y_i)^2) = \frac{1}{m}\sum_{i=0}^m((h_\theta(x_1,x_2,...x_n) - y_i) x_i)$
分别对每一个模型参数（即

θ0,θ1,θ2...θn $\theta_0, \theta_1,\theta_2...\theta_n$ ）求偏导，得到梯度向量：

\nabla θ J = [\partial \partial θ 0 J θ, \partial \partial θ 1 J θ, . . . \partial \partial θ n J θ] T

$\nabla_\theta J = \Biggl[ \frac{\partial}{\partial\theta_0}J_\theta, \frac{\partial}{\partial\theta_1}J_\theta, ...\frac{\partial}{\partial\theta_n}J_\theta \Biggr]^T$

某一个参数梯度的几何意义为：因为这里一个参数可以代表一个空间维度，所以该参数梯度在一个多维空间中，表示某一点在该维度中变化最快的方向。因此，对于梯度向量，就表示某一点变化最快的方向。

如果S表示步长，那么 $S\frac{\partial}{\partial\theta_i}J_\theta$ 就表示在第i维度上面移动一步的量。因此 $\nabla_\theta J$ 表示在该空间中移动一步的量。

既然已经知道在某一维度中移动一步的量，那么就可以通过下列函数得到第i维（也就是第i个模型参数）移动之后的值。

θ i = θ i - S \partial \partial θ i J θ = θ i - S \partial \partial θ i (1 2 m \sum i = 0 m (h θ (x 1, x 2, . . . x n) - y i) 2) = θ i - S 1 m \sum i = 0 m ((h θ (x 1, x 2, . . . x n) - y i) x i)

$\theta_i = \theta_i - S \frac{\partial}{\partial\theta_i}J_\theta = \theta_i - S \frac{\partial}{\partial\theta_i}(\frac{1}{2m}\sum_{i=0}^m(h_\theta(x_1,x_2,...x_n) - y_i)^2) =\theta_i - S \frac{1}{m}\sum_{i=0}^m((h_\theta(x_1,x_2,...x_n) - y_i) x_i)$
其中S表示步长

因为开始的时候 $（\theta_0, \theta_1,\theta_2...\theta_n）$ 都有一个自己设置的初始值，并且 $(x_1,x_2...x_n)$ 以及y 是数据样本，所以，上式的右边不存在未知变量。如果使用代码实现的话，可以不断迭代计算上面这个式子，如果“一步”的长度达到了精确度或者达到了迭代次数，就结束迭代。

下面的代码是通过梯度下降的思路求解一个2维的线性回归方程。有python、java和C++三个版本。详细的注释请看python版的。

其中假设函数为 $h_\theta = \theta_0+\theta_1x_1$ ,其中 $\theta_i(i = 0, 1)$ 为模型参数,所以，下面代码的作用是通过样本点坐标计算出模型参数 $\theta_i(i = 0, 1)$ 。

python代码

# -*- coding=utf8 -*-

import math;


def sum_of_gradient(x, y, thetas):
    """计算梯度向量，参数分别是x和y轴点坐标数据以及方程参数"""
    m = len(x);
    grad0 = 1.0 / m * sum([(thetas[0] + thetas[1] * x[i] - y[i]) for i in range(m)])
    grad1 = 1.0 / m * sum([(thetas[0] + thetas[1] * x[i] - y[i]) * x[i] for i in range(m)])
    return [grad0, grad1];


def step(thetas, direction, step_size):
    """move step_size in the direction from thetas"""
    return [thetas_i + step_size * direction_i
            for thetas_i, direction_i in zip(thetas, direction)]


def distance(v, w):
    """两点的距离"""
    return math.sqrt(squared_distance(v, w))


def squared_distance(v, w):
    vector_subtract = [v_i - w_i for v_i, w_i in zip(v, w)]
    return sum(vector_subtract_i * vector_subtract_i for vector_subtract_i, vector_subtract_i
               in zip(vector_subtract, vector_subtract))


def gradient_descent(stepSize, x, y, tolerance=0.000000001, max_iter=100000):
    """梯度下降"""
    iter = 0
    # initial theta
    thetas = [0, 0];
    # Iterate Loop
    while True:
        gradient = sum_of_gradient(x, y, thetas);

        next_thetas = step(thetas, gradient, stepSize);

        if distance(next_thetas, thetas) < tolerance:  # stop if we're converging
            break
        thetas = next_thetas  # continue if we're not

        iter += 1  # update iter

        if iter == max_iter:
            print 'Max iteractions exceeded!'
            break;

    return thetas


x = [1, 2, 3];
y = [5, 9, 13];
stepSize = 0.001;
t0, t1 = gradient_descent(-stepSize, x, y);
print t0, " ", t1;

C++代码

#pragma once

#include <vector>
#include <iostream>

using namespace std;

#ifndef GRADIENTDESCENT_H
#define GRADIENTDESCENT_H

class GradientDescent {
public:
    vector<double> sumOfGradient(const vector<double> &x, const vector<double> &y, const vector<double>&thetas);

    vector<double> step(const vector<double>&thetas, const vector<double> &direction, double stepSize);

    double distance(const vector<double> &v, const vector<double> &w);

    vector<double> gradientDescent(const double stepSize, const vector<double> &x,
        const vector<double> &y, double tolerance, int maxIter);

};
#endif // !GRADIENTDESCENT_H

vector<double> GradientDescent::sumOfGradient(const vector<double> &x, const vector<double> &y, const vector<double>&thetas) {
    int m = x.size();

    double sum = 0;
    double sum1 = 0;
    for (int i = 0; i < m; ++i) {
        sum += thetas[0] + thetas[1] * x[i] - y[i];
        sum1 += (thetas[0] + thetas[1] * x[i] - y[i])*x[i];
    }
    double grad0 = 1.0 / m * sum;
    double grad1 = 1.0 / m * sum1;

    vector<double> result;
    result.push_back(grad0);
    result.push_back(grad1);
    return result;
}

vector<double> GradientDescent::step(const vector<double>&thetas, const vector<double> &direction, double stepSize) {
    vector<double> result;
    for (int i = 0; i < direction.size(); ++i) {
        result.push_back(thetas[i] + stepSize * direction[i]);
    }
    return result;
}

double GradientDescent::distance(const vector<double> &v, const vector<double> &w) {
    vector<double> subtract;
    for (int i = 0; i < v.size(); ++i) {
        subtract.push_back(pow(v[i] - w[i], 2));
    }
    double sum = 0;
    for (int i = 0; i < v.size(); ++i) {
        sum += subtract[i];
    }
    return sqrt(sum);
}

vector<double> GradientDescent::gradientDescent(const double stepSize, const vector<double> &x,
    const vector<double> &y, double tolerance = 0.0000001, int maxIter = 10000000) {
    int iterNum = 0;
    vector<double> thetas(3, 0);
    while (true) {
        vector<double> gradients = sumOfGradient(x, y, thetas);
        vector<double> nextThetas = step(thetas, gradients, stepSize);
        if (distance(nextThetas, thetas) < tolerance) 
            break;
        thetas = nextThetas;
        iterNum += 1;

        if (iterNum == maxIter) {
            cout << "Max iteractions exceeded!";
            break;
        }
    }
    return thetas;
}

main 函数

#include <iostream>
#include "GradientDescent.h"

int main() {
    GradientDescent gradientDescent;
    vector<double> x;
    x.push_back(1);
    x.push_back(2);
    x.push_back(3);
    vector<double> y;
    y.push_back(5);
    y.push_back(9);
    y.push_back(13);

    double stepSize = 0.001;
    vector<double> result = gradientDescent.gradientDescent(-stepSize, x, y);
    cout << "theta0 = " << result[0] << "; theta1 = " << result[1] << endl;

    system("pause");
    return 0;
}

Java代码

import java.util.ArrayList;
import java.util.List;

/**
 * Created by liangyh on 2017-08-17.
 */
public class GradientDescent {
    public List<Double> sumOfGradient(final List<Double> x,
                               final List<Double>y,
                               final List<Double>thetas){
        int m = x.size();
        double sum = 0;
        double sum1 = 0;
        for (int i = 0; i < m; ++i) {
            sum += thetas.get(0) + thetas.get(1) * x.get(i) - y.get(i);
            sum1 += (thetas.get(0) + thetas.get(1) * x.get(i) - y.get(i))*x.get(i);
        }
        double grad0 = 1.0 / m * sum;
        double grad1 = 1.0 / m * sum1;

        List<Double> result = new ArrayList<>();
        result.add(grad0);
        result.add(grad1);
        return result;
    }

    public List<Double> step(final List<Double> thetas,
                      final List<Double> direction,
                      double stepSize){
        List<Double> result = new ArrayList<>();
        for (int i = 0; i < direction.size(); ++i) {
            result.add(thetas.get(i) + stepSize * direction.get(i));
        }
        return result;
    }

    public double distance(final List<Double> v, final List<Double> w){
        List<Double> subtract = new ArrayList<>();
        for (int i = 0; i < v.size(); ++i) {
            subtract.add(Math.pow(v.get(i) - w.get(i), 2));
        }
        double sum = 0;
        for (int i = 0; i < v.size(); ++i) {
            sum += subtract.get(i);
        }
        return Math.sqrt(sum);
    }

    public List<Double> gradientDescent(double stepSize,
                                 final List<Double> x,
                                 final List<Double> y,
                                 double tolerance, int maxIter){
        int iterNum = 0;
        List<Double> thethas = new ArrayList<>();
        thethas.add(0D);
        thethas.add(0D);
        thethas.add(0D);
        while(true){
            List<Double> gradients = sumOfGradient(x, y, thethas);
            List<Double> nextThetas = step(thethas, gradients, stepSize);
            if(distance(nextThetas, thethas) < tolerance){
                break;
            }
            thethas = nextThetas;
            iterNum += 1;

            if(iterNum == maxIter){
                System.out.println("Max iterations exceeded!");
                break;
            }
        }
        return thethas;
    }

    public static void main(String[] args) {
        GradientDescent gradientDescent = new GradientDescent();
        List<Double> x = new ArrayList<>();
        x.add(1d);
        x.add(2d);
        x.add(3d);

        List<Double> y = new ArrayList<>();
        y.add(5d);
        y.add(9d);
        y.add(13d);

        double stepSize = 0.001;
        List<Double> result = gradientDescent.gradientDescent(-stepSize, x, y,  0.0000001, 10000000);
        System.out.println("theta0 = "+result.get(0) +"; theta1 = "+result.get(1));
    }
}