Week 1 Linear regression (线性回归)

这篇博客介绍了线性回归的基础知识,包括自变量X和因变量Y的概念,以及如何使用线性函数进行数据拟合。博主通过Python代码展示了损失函数(均方误差)及其全局最小值的重要性,并提到了使用梯度下降算法进行参数优化的过程。文章适合初学者,旨在帮助没有统计学背景的人学习线性回归。
摘要由CSDN通过智能技术生成

Week 1 Linear regression (线性回归)

I have been learning the statistics for about nine months. Nine months ago, I didn’t believe that I can survive from statistics classes as I don’t know much about statistics. For example, I don’t even know that Gaussian distribution is another name of Normal distribution. I just want to write something here to summarize what I have learned during this year.

统计学也陆陆续续的学了一年了。没想到自己除了土木工程还是可以学一些其他的东西。现在利用暑假总结一下自己这一年学过的统计学知识,也算是对自己统计学的辅修有一个交代。

What I am writing here is something I summarized using my own words. These “words” may not be accurate enough as I am an Engineering student, not a student majoring in statistics. I just wrote all this stuff based on my own interest. I hope this note could help someone who doesn’t has much background in statistics but wants to learn some statistics.

这些所写的内容都是基于我自己的理解,里面所用到的一些词语并不是非常准确。毕竟是我只是一个工程学科的学生,不是一个统计学学生。我写下这些东西也只是基于自己的兴趣爱好,同时希望帮助一些像我一样零基础的同学学习统计学。

———————————————– starts from here ———————————————————

Background (问题背景):

Assume that both X and Y are a N by 1 vector. X is called as predictor and Y is called as response.

假定X和Y均是一个N行1列的向量(矩阵)。X被称作自变量,Y被称作因变量。

The relationship between X and Y is presented in the figure below (python code is provided):

X与Y之间的关系见下图所示 (python源代码也附在下面):

# Add this command in order to show the plot in Anaconda
# Otherwise the plot may not show up
% matplotlib inline

# Import necessary packages
import matplotlib.pyplot as plt  # Package used for plotting the figure
import numpy as np  # Package for matrix operation

# Define the number of observation
# This number is also the length of X or Y vector
number_observation = 1000

# Generate X using linear space, i.e., X increase from -5 to 5. The interval is a constant such that there will be 1000 X.
X = np.linspace(-5, 5, number_observation)

# Generate Y
# Assume Y is generated using the equation: Y = 2X + 3 + e
# e refers to the noise, which may be caused by measurement error.
# e is assumed to follow normal distribution with a mean of 0 and a standard deviation of 1
e = np.random.normal(0, 1, number_observation)
Y = 2*X + 3 + e
# Plot the X and Y
plt.figure(figsize=(16,8))
plt.scatter(X, Y, color='r')
font = {
  'family':'Times New Roman','weight' : 'normal','size': 16,}
plt.xlabel('X', font)
plt.ylabel('Y', font)
plt.xticks(fontsize=16, fontname='Times New Roman')
plt.yticks(fontsize=16, fontname='Times New Roman')
plt.title('The relationship between X and Y', font)
plt.show()

这里写图片描述

Now we roughly know the relationship between X and Y. It seems that Y is linearly dependent on X. Therefore we try to use a linear function to fit these data points.

现在我们大致了解X与Y之间的关系。通过图可以看出,Y似乎和X呈现出一种线性关系。因此我们需要用线性函数来拟合这些离散的数据点。

The linear function is presented below:

线性函数的形式如下所示:

h(xi)=β1Xi+β0 h ( x i ) = β 1 X i + β 0

Where h(xi) h ( x i ) is the predicted response when xi x i is given.

函数中, h(xi) h ( x i ) 是当给定 Xi X i 值时,预测的Y值。

Now, we only need to determine the values for β1 β 1 and β0 β 0 such that h(xi) h ( x i ) is close to Yi Y i as much as possible.

现在,我们需要确定 β1 β 1 β0 β 0 的值,使得我们预测出的每一个 h(xi) h ( x i ) 都尽可能的接近真实值 yi y i .

Therefore, we need to create a indicator t

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值