深度学习基础 - 线性回归

最新推荐文章于 2023-11-01 16:26:36 发布

二分掌柜的

最新推荐文章于 2023-11-01 16:26:36 发布

阅读量705

点赞数 1

分类专栏：深度学习基础

本文链接：https://blog.csdn.net/flyfish1986/article/details/86909160

版权

深度学习基础专栏收录该内容

61 篇文章 23 订阅

订阅专栏

线性回归-理论篇

- "回归"一词是怎么来的
- 解决什么样的问题

flyfish

本文包括了实现原理、Python版本的实现和C++版本的实现

TensorFlow实现线性回归（包括三个例子）
PyTorch实现线性回归(推荐的版本包括模型训练、保存、推理使用等)
PyTorch版本的线性回归源码下载地址

知识脉络
梯度下降 ⇔ 梯度⇔ 方向导数⇔偏导数⇔三角函数⇔余弦定理⇔勾股定理

"回归"一词是怎么来的

本文中描述的高中教科书是人民教育出版社B版，高中教科书《数学选修 1-2》的第一章《“回归”一词的由来》22页

英国的统计学家弗朗西斯·高尔顿（Francis Galton,1822—1911）用统计方法研究两个变量之间关系。他研究父母身高与子女身高之间的关系，“回归”这个词由他引入的。

解决什么样的问题

假设你是一个老板，想到其他城市再开一家分店，你拥有一份城市人口数对应收益的数据，问当一个城市有8 百万人，那么这家分店收益是多少？
数据如下

人口（百万）	收益（亿元）
1	6
2	5
3	7
4	10

回归公式

斜率（图片来自wiki）
在这里插入图片描述
$\tan \theta = \frac { y _ { 2 } - y _ { 1 } } { x _ { 2 } - x _ { 1 } } = \frac { \Delta y } { \Delta x }$

字母变更下

$h_\theta(x)=\theta_0+\theta_1x$

h 表示 hypothesis(假设)，是一个函数，从x 到 y 的函数映射
m 表示样本数对应例子中的4个样本
x 代表特征，输入变量,自变量对应例子中的人口数
y 代表目标变量，输出变量,因变量对应例子中的收益
整个步骤

1 预测函数(Hypothesis)

$h_{\theta}(x)=\theta_{0}+\theta_{1}x$

2 参数(Parameters)

$\theta_{0},\theta_{1}$

3 代价函数 (Cost Function),有的地方称为损失函数(Loss Function) 或者误差函数(Error Function）

$J(\theta_{0},\theta_{1}) = \frac{1}{2m}\sum_{i=1}^{m} (h_\theta x^{(i)}-y^{(i)})^2$

4 目的(Goal)

$(\theta_{0},\theta_{1})=\min_{\theta_{0},\theta_{1}} J(\theta_{0},\theta_{1})$
只有一个特征，这样的问题叫作单变量线性回归问题或者一元线性回归问题
可视化理解 $\theta_0=0$ 的情况

#假设有三组数据，数据分别为(1，1)，(2，2)，(3，3)
x=np.array([1,2,3])
y=np.array([1,2,3])
#预测1
y1=0.5*x
y2=1*x
y3=-0.5*x
print( (1/(2*len(x)) )* sum(np.power(y1-y,2))) #0.58
print( (1/(2*len(x)) )* sum(np.power(y2-y,2))) #0
print( (1/(2*len(x)) )* sum(np.power(y3-y,2))) #5.25

#画图 看看是J(theta1)什么样子的
#a=np.linspace(-2,2,9)#[-2.  -1.5 -1.  -0.5  0.   0.5  1.   1.5  2. ]
theta1=np.linspace(-2,4,17)
j=np.array([])
for i in theta1:
    y_=i*x
    b= (1/(2*len(x)) )* sum(np.power(y_-y,2))
    j=np.append(j,b)
print(j) 

plt.xlabel('theta1')
plt.ylabel('J(theta1)')
plt.plot(theta1,j)

在这里插入图片描述
这个图像很像抛物线，高中《数学选修 2-1》59页抛物线

三维可视化

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

simple_count=30
x = np.arange(1,simple_count+1,1) 
y = 2*x+1 


k = np.arange(-20,simple_count-20,1)
b = np.arange(-20,simple_count-20,1)

theta_1,theta_0 = np.meshgrid(k, b)

J_theta = np.zeros([simple_count, simple_count])
 
for i in range(simple_count) :
    for j in range(simple_count) :
        J_theta[i][j]=((k[j]* x + b[i]-y)**2).sum()/simple_count
 


fig = plt.figure()
plt3d = Axes3D(fig)


plt.title("cost function three-dimensional visualization")
plt3d.set_xlabel("theta_1")
plt3d.set_ylabel("theta_0")
plt3d.set_zlabel("J_theta")
plt3d.plot_surface(theta_1, theta_0, J_theta, cmap=plt.cm.summer)

plt.show()

在这里插入图片描述
如果有多个特征，不仅仅是人口 $x_1$ ,还包括，等级 $x_2$ ，GDP $x_3$ 等，这样的问题叫作多变量线性回归问题或者多元线性回归问题
回归公式
$h_\theta(x)=\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_nx_n=\theta^Tx$
公式的矩阵形式
$\theta^T x = \left[ \begin{matrix} \theta _ 1 \\\\ \theta _ 2 \\\\ ...\\\\ \theta _ n \\\\ \end{matrix} \right] \left[ \begin{matrix} x _ 1 & x _ 2 & ... & x _ n \end{matrix} \right] = \sum_{i=1}^n\theta _ i x _ i = h_\theta(x)$
上述公式中T的含义行变列，列变行。
${\begin{bmatrix}1,2\\3,4\\5,6\end{bmatrix}}^{\mathrm {T} }={\begin{bmatrix}1,3,5\\2,4,6\end{bmatrix}}$
代价函数（cost function）
$J(\theta)=\frac{1}{2m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2,\quad {m 为样本数}$

矩阵形式
$J(\theta)=\frac{1}{2m}(X\theta-y)^T(X\theta-y)$
梯度下降

教科书的内容不好理解，从简单的入手，为了易于理解我从勾股定理和三角函数开始说

三角函数 $\sin \cos$
sin
$\sin A = \frac { \text { opposite(对边) } } { \text { hypotenuse(斜边) } }$
cos
$\cos A = \frac { \text { adjacent(邻边) } } { \text { hypotenuse(斜边) } }$
tan
$\tan A = \frac { \sin A } { \cos A }$

泰勒公式
《普林斯顿数学指南》第三卷中说泰勒并不是第一个发现这个定理的人，尽管这个定理以他这个名字命名，但他是第一个领会到它的意义和应用它的人。
泰勒公式的思想是局部逼近，用多项式来近似表示一个复杂函数，就像有句话说的“如果我看过你看过的世界，走过你走过的路，是不是就能更靠近你一点”

用切线近似的表示一个弧

近似公式

$\approx f \left( x _ { 0 } \right) + f ^ { \prime } \left( x _ { 0 } \right) \left( x - x _ { 0 } \right)$

$\approx f \left( x _ { 0 } \right) + f ^ { \prime } \left( x _ { 0 } \right) \left( x - x _ { 0 } \right) + a _ { 2 } \left( x - x _ { 0 } \right) ^ { 2 }$

n次多项式近似的表示f(x)
$\left( x - x _ { 0 } \right) + a _ { 2 } \left( x - x _ { 0 } \right) ^ { 2 } + \cdots + a _ { n } \left( x - x _ { 0 } \right) ^ { n }$

$\begin{aligned} P _ { n } ( x ) = & P \left( x _ { 0 } \right) + P ^ { \prime } \left( x _ { 0 } \right) \left( x - x _ { 0 } \right) + \frac { P ^ { \prime \prime } \left( x _ { 0 } \right) } { 2 ! } \left( x - x _ { 0 } \right) ^ { 2 } + \cdots + \frac { P ^ { ( n ) } \left( x _ { 0 } \right) } { n ! } \left( x - x _ { 0 } \right) ^ { n } \end{aligned}$

算法
$\theta_j ：= \theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta) \quad {\alpha 是学习率}$

$\begin{aligned} & repeat\ until\ convergence \{ \\ & \theta _ { j } : = \theta _ { j } - \alpha \frac { \partial } { \partial \theta _ { j } } J \left( \theta _ { 0 } , \theta _ { 1 } \right) \\ &\}\\ \end{aligned}$

求代价函数的导数
$\frac { \partial } { \partial \theta _ { j } } J \left( \theta _ { 0 } \theta _ { 1 } \right) = \frac { \partial } { \partial \theta _ { j } } \frac { 1 } { 2 m } \sum _ { i = 1 } ^ { m } \left( h _ { \theta } \left( x ^ { ( i ) } \right) - y ^ { ( i ) } \right) ^ { 2 }$

j=0
$\frac { \partial } { \partial \theta _ { 0 } } J \left( \theta _ { 0 } \theta _ { 1 } \right) = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } \left( h _ { \theta } \left( x ^ { ( i ) } \right) - y ^ { ( i ) } \right)$
j=1
$\frac { \partial } { \partial \theta _ { 1 } } J \left( \theta _ { 0 } \theta _ { 1 } \right) = \frac { 1 } { m } \sum _ { i = 1 } ^ { m } \left( \left( h _ { \theta } \left( x ^ { ( i ) } \right) - y ^ { ( i ) } \right) \cdot x ^ { ( i ) } \right)$

$\begin{aligned} repeat \ \{\\ &\theta _ { 0 } : = \theta _ { 0 } - \alpha \frac { 1 } { \mathrm { m } } \sum _ { \mathrm { i } = 1 } ^ { \mathrm { m } } \left( \mathrm { h } _ { \theta } \left( \mathrm { x } ^ { ( \mathrm { i } ) } \right) - \mathrm { y } ^ { ( \mathrm { i } ) } \right) \ \ \\ &\theta _ { 1 } : = \theta _ { 1 } - \alpha \frac { 1 } { \mathrm { m } } \sum _ { \mathrm { i } = 1 } ^ { \mathrm { m } } \left( \left( \mathrm { h } _ { \theta } \left( \mathrm { x } ^ { ( \mathrm { i } ) } \right) - \mathrm { y } ^ { ( \mathrm { i } ) } \cdot \mathrm { x } ^ { ( \mathrm { i } ) } \right)\right. &\ \\\} \end{aligned}$
代码

import matplotlib.pyplot as plt
import numpy as np
import matplotlib
zhfont = matplotlib.font_manager.FontProperties(fname='C:\Windows\Fonts\simsun.ttc')



#theta0需要和1相乘，所以加了一列1
def getDataSet():
    dataset_x = np.array([[1,1,1,1],[1,2,3,4]]).T
    y = np.array([[6],[5],[7],[10]])
    print(dataset_x)
    print(y)
    return dataset_x,y


def getCost(dataset_x, y ,theta):
    temp = np.power((dataset_x*theta.T) - y,2)
    return 1/(2*len(dataset_x)*sum(temp))


    '''
    theta: 需要更新的theta值
    alpha: 学习速率
    iters：迭代次数
    '''
def gradientDescent(dataset_x, y ,theta, alpha, iters):

    temp = np.mat(np.zeros(theta.shape))    
    cost = np.zeros(iters)
    parameters = int (theta.shape[1])     
 
    for  i in range(iters):               
        error = dataset_x*theta.T - y
        for j in range(parameters):      
            term = np.multiply(error,dataset_x[:,j])
            temp[0,j] = theta[0,j] - alpha / len(dataset_x) * sum(term)
        theta = temp
        cost[i] = getCost(dataset_x,y,theta)
    return  theta,cost
 
 
if __name__ == '__main__':
    dataset_x,y = getDataSet()
    alpha = 0.01   
    theta = np.mat(np.array([0,0]))  
    print(theta)
    dataset_x = np.mat(dataset_x)
    y = np.mat(y)
    print(dataset_x)
    print(y)
    iters = 100
    theta,cost = gradientDescent(dataset_x,y,theta,alpha,iters)
    x = np.linspace(dataset_x[:,1].min(),dataset_x[:,1].max(),100)
    h = theta[0,0] + (theta[0,1] * x)
    
    
    plt.scatter(np.array(dataset_x[:,1]),np.array(y[:,0]))
    plt.xlabel('人口数',fontproperties=zhfont)
    plt.ylabel('收益',fontproperties=zhfont)
    plt.plot(x,h)

在这里插入图片描述
L1-norm(LASSO回归)
$J(\theta) = \frac{1}{2}\sum^m_{i=1}(h_{\theta}(x^{i}) - y^{(i)})^2 + \lambda\sum^n_{j=1} |\theta_j| \ \ \ \lambda > 0$
L2-norm(Ridge回归,岭回归)
$J(\theta) = \frac{1}{2}\sum^m_{i=1}(h_{\theta}(x^{i}) - y^{(i)})^2 + \lambda\sum^n_{j=1} \theta_j^2 \ \ \ \lambda > 0$

同时引入L1-norm和L2-norm(弹性网络ElasitcNet)
$J(\theta) = \frac{1}{2}\sum^m_{i=1}(h_{\theta}(x^{i} - y^{(i)}))^2 + \lambda(p\sum^n_{j=1} \theta_j^2 +(1-p)\sum^n_{j=1} |\theta_j| )\ \ \ \lambda > 0 \&\& p \in [0,1]$

C++ 实现

#include <iostream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <numeric>



class VectorAssist
{
public:

	static std::vector<double> diff(std::vector<double>* predictions, std::vector<double>* y)
	{
		std::vector<double> diff(predictions->size(), 0);;
		std::transform(predictions->begin(), predictions->end(), y->begin(), diff.begin(), std::minus<double>());
		return diff;
	}

	static std::vector<double> multiplication(std::vector<double>* diff, std::vector<double>* x)
	{
		std::vector<double> differror(diff->size(), 0);;
		std::transform(diff->begin(), diff->end(), x->begin(), differror.begin(), std::multiplies<double>());
		return differror;
	}

	static double sum(std::vector<double>* error)
	{
		return std::accumulate(error->begin(), error->end(), 0.0);
	}

	static std::vector<double> square(std::vector<double>* error)
	{
		std::vector<double> square_errors;

		std::for_each(error->begin(), error->end(), [&square_errors](double i) {
			square_errors.push_back(std::pow(static_cast<double>(i), 2)); });
		return square_errors;
	}
};

class LinearRegression
{
public:
	std::vector<double>* x_;
	std::vector<double>* y_;
	int sample_count_;

	double *theta_;
public:
	LinearRegression(std::vector<double>* x, std::vector<double>* y)
	{
		this->x_ = x;
		this->y_ = y;
		sample_count_ = x->size();
	}

	//梯度下降
	double *gradient_descent(double alpha, int iters, double *J)
	{

		double *theta = new double[2];
		theta[0] = 1;
		theta[1] = 1;
		for (int i = 0; i < iters; i++)
		{
			std::vector<double> predictions = calculate_predictions(x_, theta);
			std::vector<double> diff = VectorAssist::diff(&predictions, y_);
			std::vector<double> error_x1 = diff;
			std::vector<double> error_x2 = VectorAssist::multiplication(&diff, x_);
			theta[0] = theta[0] - alpha * (1.0 / sample_count_) * VectorAssist::sum(&error_x1);
			theta[1] = theta[1] - alpha * (1.0 / sample_count_) * VectorAssist::sum(&error_x2);
			J[i] = compute_cost(x_, y_, theta);
		}
		return theta;
	}

	// 训练
	void train(double alpha, int iterations)
	{
		double *J = new double[iterations];
		this->theta_ = gradient_descent(alpha, iterations, J);
		std::cout << "Theta: " << theta_[0] << ", " << theta_[1] << std::endl;
	}

	//预测
	double predict(double x)
	{
		return h(x, theta_);
	}

	//代价函数 (Cost Function)
	double compute_cost(std::vector<double>* x, std::vector<double>* y, double theta[])
	{
		std::vector<double> predictions = calculate_predictions(x, theta);
		std::vector<double> diff = VectorAssist::diff(&predictions, y);
		std::vector<double>  square_errors  = VectorAssist::square(&diff);
		return (1.0 / (2 * sample_count_)) * VectorAssist::sum(&square_errors );
	}

	//预测
	double h(double x, double theta[])
	{
		return theta[0] + theta[1] * x;
	}


	std::vector<double> calculate_predictions(std::vector<double>* x, double theta[])
	{
		std::vector<double> predictions;
		for (int i = 0; i < sample_count_; i++)
		{
			predictions.push_back(h(x->at(i), theta));
		}
		return predictions;
	}
};


int main()
{
	std::cout << "Hello World!\n";
	double alpha = 0.01;
	int iterations = 20;
	double x_predict = 6;
	double y_predict;

	std::vector<double> X;
	std::vector<double> Y;
	for (int i = 0; i < 10; i++)
	{
		X.push_back(i);
		Y.push_back(i * 2 + 1);
	}

	LinearRegression lr(&X, &Y);
	lr.train(alpha, iterations);
	y_predict = lr.predict(x_predict);
	std::cout << y_predict << std::endl;

	system("pause");
}