Week 1 Linear regression (线性回归）

最新推荐文章于 2023-02-18 21:11:25 发布

修罗_GUAN

最新推荐文章于 2023-02-18 21:11:25 发布

阅读量348

点赞数

分类专栏： STATS

本文链接：https://blog.csdn.net/weixin_42515443/article/details/80768841

版权

这篇博客介绍了线性回归的基础知识，包括自变量X和因变量Y的概念，以及如何使用线性函数进行数据拟合。博主通过Python代码展示了损失函数（均方误差）及其全局最小值的重要性，并提到了使用梯度下降算法进行参数优化的过程。文章适合初学者，旨在帮助没有统计学背景的人学习线性回归。

摘要由CSDN通过智能技术生成

Week 1 Linear regression (线性回归)

I have been learning the statistics for about nine months. Nine months ago, I didn’t believe that I can survive from statistics classes as I don’t know much about statistics. For example, I don’t even know that Gaussian distribution is another name of Normal distribution. I just want to write something here to summarize what I have learned during this year.

统计学也陆陆续续的学了一年了。没想到自己除了土木工程还是可以学一些其他的东西。现在利用暑假总结一下自己这一年学过的统计学知识，也算是对自己统计学的辅修有一个交代。

What I am writing here is something I summarized using my own words. These “words” may not be accurate enough as I am an Engineering student, not a student majoring in statistics. I just wrote all this stuff based on my own interest. I hope this note could help someone who doesn’t has much background in statistics but wants to learn some statistics.

这些所写的内容都是基于我自己的理解，里面所用到的一些词语并不是非常准确。毕竟是我只是一个工程学科的学生，不是一个统计学学生。我写下这些东西也只是基于自己的兴趣爱好，同时希望帮助一些像我一样零基础的同学学习统计学。

———————————————– starts from here ———————————————————

Background (问题背景):

Assume that both X and Y are a N by 1 vector. X is called as predictor and Y is called as response.

假定X和Y均是一个N行1列的向量（矩阵）。X被称作自变量，Y被称作因变量。

The relationship between X and Y is presented in the figure below (python code is provided):

X与Y之间的关系见下图所示 (python源代码也附在下面)：

# Add this command in order to show the plot in Anaconda
# Otherwise the plot may not show up
% matplotlib inline

# Import necessary packages
import matplotlib.pyplot as plt  # Package used for plotting the figure
import numpy as np  # Package for matrix operation

# Define the number of observation
# This number is also the length of X or Y vector
number_observation = 1000

# Generate X using linear space, i.e., X increase from -5 to 5. The interval is a constant such that there will be 1000 X.
X = np.linspace(-5, 5, number_observation)

# Generate Y
# Assume Y is generated using the equation: Y = 2X + 3 + e
# e refers to the noise, which may be caused by measurement error.
# e is assumed to follow normal distribution with a mean of 0 and a standard deviation of 1
e = np.random.normal(0, 1, number_observation)
Y = 2*X + 3 + e

# Plot the X and Y
plt.figure(figsize=(16,8))
plt.scatter(X, Y, color='r')
font = {
  'family':'Times New Roman','weight' : 'normal','size': 16,}
plt.xlabel('X', font)
plt.ylabel('Y', font)
plt.xticks(fontsize=16, fontname='Times New Roman')
plt.yticks(fontsize=16, fontname='Times New Roman')
plt.title('The relationship between X and Y', font)
plt.show()

这里写图片描述

Now we roughly know the relationship between X and Y. It seems that Y is linearly dependent on X. Therefore we try to use a linear function to fit these data points.

现在我们大致了解X与Y之间的关系。通过图可以看出，Y似乎和X呈现出一种线性关系。因此我们需要用线性函数来拟合这些离散的数据点。

The linear function is presented below:

线性函数的形式如下所示：

$h(x_i)=\beta_1 X_i + \beta_0$

Where $h(x_i)$ is the predicted response when $x_i$ is given.

函数中， $h(x_i)$ 是当给定 $X_i$ 值时，预测的Y值。

Now, we only need to determine the values for $\beta_1$ and $\beta_0$ such that $h(x_i)$ is close to $Y_i$ as much as possible.

现在，我们需要确定 $\beta_1$ 和 $\beta_0$ 的值，使得我们预测出的每一个 $h(x_i)$ 都尽可能的接近真实值 $y_i$ .

Therefore, we need to create a indicator t

最低0.47元/天解锁文章

修罗_GUAN

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Week 1 Linear regression (线性回归）

Week 1 Linear regression (线性回归)I have been learning the statstics for about one year. Nine months ago, I don’t believe that I can survive from statistics classes as I don’t know much about statistic...
复制链接

扫一扫