机器学习--基础算法--线性回归法

1 简单线性回归

线性回归算法的特点:
1.解决回归问题
2.思想简单,实现容易
3.许多强大的非线性模型的基础
4.结果 具有很好的可解释性
5.蕴含机器学习中的很多重要思想

样本特征只有一个,称为简单线性回归
样本特征有多个,称为多元线性回归

假设我们找到了最佳拟合的直线方程:y=ax+b。则对于每一个样本点x(i) ,根据我们的直线方程,预测值为:y_hat(i) = ax(i) + b,真值为y(i)
我们希望y(i)和y_hat(i)的差距尽量小。表达y(i)和y_hat(i)的差距:(y(i) - y_hat(i) )2
在这里插入图片描述
在这里插入图片描述在这里插入图片描述在这里插入图片描述在这里插入图片描述

2 简单线性回归的实现

实现Simple Linear Regression:

import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 3, 2, 3, 5])
plt.scatter(x, y)
plt.axis([0, 6, 0, 6])
plt.show()

输出:
在这里插入图片描述

# 计算xy的均值
x_mean = np.mean(x)
y_mean = np.mean(y)
num = 0
d = 0
for x_i, y_i in zip(x, y):
    num += (x_i - x_mean) * (y_i - y_mean)
    d += (x_i - x_mean) ** 2
a = num / d
b = y_mean - a * x_mean
print(a)
print(b)

输出:

0.8
0.39999999999999947

将拟合出的直线绘制出来:

# 将我们计算得出的方程绘制出来
y_hat = a * x + b
plt.scatter(x, y)
plt.plot(x, y_hat, color = 'r')
plt.axis([0, 6, 0, 6])
plt.show()

输出:
在这里插入图片描述

# 当新来了一个样本数据时,如何使用得出的模型预测
x_predict = 6
y_predict = a * x_predict + b
y_predict
>>>5.2

在pycharm中整理SimpleLinearRegression算法:

import numpy as np

class SimpleLinearRegression1:

    def __init__(self):
        self.a_ = None
        self.b_ = None

    def fit(self, x_train, y_train):
        """根据x_train, y_train训练SimpleLinearRegression模型"""
        assert x_train.ndim == 1, 'simple linear regression can only solve single feature training data'
        assert len(x_train) == len(y_train), 'the size of x_train must be equal to y_train'

        x_mean = np.mean(x_train)
        y_mean = np.mean(y_train)
        num = 0
        d = 0
        for x_i, y_i in zip(x_train, y_train):
            num += (x_i - x_mean) * (y_i - y_mean)
            d += (x_i - x_mean) ** 2
        self.a_ = num / d
        self.b_ = y_mean - self.a_ * x_mean

        return self

    def predict(self, x_predict):
        """给定待预测的数据集x_predict,返回表示结果的向量"""
        assert x_predict.ndim == 1, 'simple linear regression can only solve single feature training data'
        assert self.a_ is not None and self.b_ is not None, 'must fit before predict'

        return np.array([self._predict(x_i) for x_i in x_predict])

    def _predict(self, x_i):

        return self.a_ * x_i + self.b_

    def __repr__(self):

        return 'SimpleLinearRegression1()'

接着在jupyter notebook中使用自己编写的SimpleLinearRegression:

from simple_linear_regression.SimpleLinearRegression import SimpleLinearRegression1
reg = SimpleLinearRegression1()
reg.fit(x, y)
>>>SimpleLinearRegression1()
reg.predict(np.array([x_predict]))
>>>array([5.2])
# 查看计算出的参数
print(reg.a_)
print(reg.b_)

输出:

0.8
0.39999999999999947

3 向量化

这节课将上文计算a与b的for循环改成向量化运算,将上面编写的SimpleLinearRegression1类整个复制一份,命名为SimpleLinearRegression2,将其中的fit方法改为如下:

    def fit(self, x_train, y_train):
        """根据x_train, y_train训练SimpleLinearRegression模型"""
        assert x_train.ndim == 1, 'simple linear regression can only solve single feature training data'
        assert len(x_train) == len(y_train), 'the size of x_train must be equal to y_train'

        x_mean = np.mean(x_train)
        y_mean = np.mean(y_train)

        # 向量化运算
        num = (x_train - x_mean).dot(y_train - y_mean)
        d = (x_train - x_mean).dot(x_train - x_mean)

        self.a_ = num / d
        self.b_ = y_mean - self.a_ * x_mean

        return self

在jupyter notebook中调用编写的SimpleLinearRegression2,接着上节课的notebook内容继续输入:

from simple_linear_regression.SimpleLinearRegression import SimpleLinearRegression2
reg2 = SimpleLinearRegression2()
reg2.fit(x, y)
>>>SimpleLinearRegression2()
print(reg2.a_)
print(reg2.b_)

输出:

0.8
0.39999999999999947

接下来进行向量化运算与普通运算的性能测试:

m = 1000000
big_x = np.random.random(size = m)
big_y = big_x 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值