吴恩达Coursera课程——第一部分：监督学习Week1

jqqjrr123

已于 2023-01-03 16:04:45 修改

阅读量1.1k

点赞数 4

分类专栏：吴恩达_机器学习课文章标签：学习人工智能

于 2022-11-02 20:45:28 首次发布

本文链接：https://blog.csdn.net/jqqjrr123/article/details/127655460

版权

吴恩达_机器学习课专栏收录该内容

3 篇文章 1 订阅

订阅专栏

本人ML小白一枚，知识的学习和笔记的整理也借鉴了多位大佬的知识，一起加油~~

一、机器学习

1、定义

Tom认为能从经验E中学习，解决任务T，达到性能度量值P,当且仅当，有了经验E后，经过P评判，程序在处理T时的性能有所提升。

监督学习: 教会计算机完成某项任务。
非监督学习：让计算机自己学习完成某项任务。
案例:

医疗、自动驾驶、读手语、音乐生成、自然语言处理的案例。

2、监督学习

给算法一个数据集，其中包含了正确答案，算法的目的是给出更多的正确答案。

监督学习方式有：

回归问题-预测连续的值的输出：预测房价。
分类问题-预测离散值：预测肿瘤是良性或恶性。

3、非监督学习

只给算法一个数据集，但是不给数据集的正确答案，由算法自行分类。

聚类算法

二、单变量线性回归

2.1 模型表示

举例：房价预测。

使用一个数据集，包括某地的住房价格。在这里，要根据不同房屋的尺寸大小，得知这个房子的价格。所以，我们可以构建一个模型。这也是回归问题。回归一词指的是，我们根据之前的数据预测出一个准确的连续值，即房子的价格。这里的Data table是我们的数据集，即训练集。

另一个分类问题，当我们想要预测离散的输出值，例如，我们正在寻找癌症肿瘤，并想要确定肿瘤是良性的还是恶性的，这就是0/1离散输出的问题。

我们将要用来描述这个回归问题的标记如下:

假设函数：f=wx+b

m 代表训练集中实例的数量

x 代表特征/输入变量

y 代表目标变量/输出变量

(x,y) 代表训练集中的样本

(xi,yi) 代表第i个观察实例

f 代表学习算法的解决方案或函数也称为假设

2.2 代价函数（损失函数）

代价函数也被称为平方误差函数或者平方误差代价函数，在线性回归问题中，平方误差函数是最常用的手段。

f(x(i))是预测值，y(i)是实际值。两者取差，再平方。从其他博主那得知，此处可能用到了最小二乘法。。。。一些不大懂的东东。先按误差来理解吧！

目标： 选择合适的参数w和b,最小化代价函数，即minimize J(w, b)

#J（a,b）代价函数的计算求和。
def compute_cost(x, y, w, b): 
    """
    Computes the cost function for linear regression.
    
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    
    Returns
        total_cost (float): The cost of using w,b as the parameters for linear regression
               to fit the data points in x and y
    """
    # number of training examples
    m = x.shape[0] 
    
    cost_sum = 0 
    for i in range(m): 
        f_wb = w * x[i] + b #预测值，y[i]是实际值 
        cost = (f_wb - y[i]) ** 2  
        cost_sum = cost_sum + cost  
    total_cost = (1 / (2 * m)) * cost_sum  

    return total_cost

cost function直观理解1：

上图只是关于一个w特征的二维图。

左侧×符号标记是对于w,b所能确定的f预测值，直线表示实际值。

右侧是探究w对J的影响。

cost function直观理解2：

三维图像始终包含一个最低点。

总之，我们真正需要的是一种有效的算法，能够自动地找出这些使代价函数 J 取最小值的参数 w和b 来。

2.3 梯度下降

梯度下降是一个用来求函数最小值的算法。我们将使用梯度下降算法求出代价函数的最小值。

算法思路

指定w 和 b 的初始值，通常设w=0,b=0
不断改变w和b的值，使J(w,b)不断减小
得到一个最小值或局部最小值时停止。

梯度： 函数中某一点(x, y)的梯度代表函数在该点变化最快的方向。
（选用不同的点开始可能达到另一个局部最小值，选择不同的初始参数组合，可能会找到不同的局部最小值。）

梯度下降公式：

说明：

第二个公式的是未更新的值。
i从第一个数据开始。
w和b应同步更新。如果先更新w，会使得b是根据更新后的w去更新的。

关于学习率 α ：

如果学习率太小，那速度太慢了，每次都是移动很小很小的一步，所以最后需要很多步才能到达局部最低点。
如果学习率太大，可能走几步就越过了局部最低点，甚至无法收敛。

但，Don`t worry!

随着梯度下降法的运行，你移动的幅度会自动变得越来越小，直到最终移动幅度非常小，你会发现，已经收敛到局部极小值。

在梯度下降法中，当我们接近局部最低点时，梯度下降法会自动采取更小的幅度，这是因为当我们接近局部最低点时，很显然在局部最低时导数等于零，所以当我们接近局部最低时，导数值会自动变得越来越小，所以梯度下降将自动采取较小的幅度，这就是梯度下降的做法。

即经过多次迭代，学习率小，慢慢迭代,偏导数逐渐趋向于0.迭代不能进行，W值不会变了，模型逐渐趋向于平稳（收敛），不再更新，则到达局部最小值，则合适的W，b就能找出来。

梯度下降完整代码：

#自动化优化过程 𝑤 和 𝑏 使用梯度下降。
import math, copy
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
from my_Function.lab_utils_uni import plt_house_x, plt_contour_wgrad, plt_divergence, plt_gradients
# Load our data set
x_train = np.array([1.0, 2.0])   #features
y_train = np.array([300.0, 500.0])   #target value


# Function to calculate the cost
#代价函数J（w,b）求和。
#𝐽(𝑤,𝑏)=1/2𝑚∑(𝑓𝑤,𝑏(𝑥(𝑖))−𝑦(𝑖))²
def compute_cost(x, y, w, b):
    """
       Computes the cost function for linear regression.
       Args:
         x (ndarray (m,)): Data, m examples
         y (ndarray (m,)): target values
         w,b (scalar)    : model parameters

       Returns
           total_cost (float): The cost of using w,b as the parameters for linear regression
                  to fit the data points in x and y
       """
    # number of training examples
    m = x.shape[0]
    cost = 0
    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i]) ** 2  #y[i]是实际值
    total_cost = 1 / (2 * m) * cost

    return total_cost


# 求偏导的
def compute_gradient(x, y, w, b):
    """
    Computes the gradient for linear regression
    Args:
      x (ndarray (m,)): Data, m examples
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b
     """

    # Number of training examples
    m = x.shape[0]
    dj_dw = 0
    dj_db = 0

    for i in range(m):
        f_wb = w * x[i] + b
        dj_dw_i = (f_wb - y[i]) * x[i]
        dj_db_i = f_wb - y[i]
        dj_db += dj_db_i  # 对b求偏导，1-m个数据的导数的和
        dj_dw += dj_dw_i
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    return dj_dw, dj_db

plt_gradients(x_train,y_train, compute_cost, compute_gradient)
plt.show()

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
    """
    Performs gradient descent to fit w,b. Updates w,b by taking
    num_iters gradient steps with learning rate alpha
    --执行梯度下降以适应w，b。通过以下方式更新w、b
      具有学习速率α的num_iters梯度步长

    Args:
      x (ndarray (m,))  : Data, m examples
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent——运行梯度下降的迭代次数
      cost_function:     function to call to produce cost——函数调用以生成成本
      gradient_function: function to call to produce gradient

    Returns:
      w (scalar): Updated value of parameter after running gradient descent——运行梯度下降后更新的参数值
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b]

      列表是最常用的Python数据类型，它可以作为一个方括号内的逗号分隔值出现。
      列表的数据项不需要具有相同的类型
      创建一个列表，只要把逗号分隔的不同的数据项使用方括号括起来即可。如下所示：

      list1 = ['physics', 'chemistry', 1997, 2000]
      list2 = [1, 2, 3, 4, 5 ]
      list3 = ["a", "b", "c", "d"]
      """
    # copy.deepcopy()函数是一个深复制函数。所谓深复制，就是从输入变量完全复刻一个相同的变量，无论怎么改变新变量，原有变量的值都不会受到影响。
    # 与等号赋值不同，等号复制类似于贴标签，两者实质上是同一段内存。
    w = copy.deepcopy(w_in)  # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    # 一个数组，用于存储每次迭代的成本J和w，主要用于以后的绘图
    J_history = []
    p_history = []
    b = b_in
    w = w_in

    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        # 计算梯度并使用gradient_function更新参数
        dj_dw, dj_db = gradient_function(x, y, w, b)  # 这是那个导数项

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db
        w = w - alpha * dj_dw

        # Save cost J at each iteration——每次迭代节省成本J
        if i < 100000:  # prevent resource exhaustion ——防止资源耗尽
            J_history.append(cost_function(x, y, w, b))
            p_history.append([w, b])  # 把每一次迭代产生的w,b都加入p_history
        # Print cost every at intervals 10 times or as many iterations if < 10
        # 每隔10次打印一次成本，如果<10次，则重复次数相同
        # math.ceil：向上取整，四舍五入。
        if i % math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")

    return w, b, J_history, p_history  # return w and J,w history for graphing——返回w和J，w的图形历史记录

# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 20000
tmp_alpha = 1.0e-2
# run gradient descent——运行梯度下降函数
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha,
                                                    iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")
#经过多次迭代，学习率小，慢慢迭代,偏导数逐渐趋向于0.迭代不能进行，W值不会变了，模型逐渐趋向于平稳（收敛），不再更新，则到达局部最小值。
#则合适的W，b就能找出来。

# plot cost versus iteration
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12,4))
ax1.plot(J_hist[:100])
ax2.plot(1000 + np.arange(len(J_hist[1000:])), J_hist[1000:])
ax1.set_title("Cost vs. iteration(start)");  ax2.set_title("Cost vs. iteration (end)")
ax1.set_ylabel('Cost')            ;  ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step')  ;  ax2.set_xlabel('iteration step')
plt.show()

print(f"1000 sqft house prediction {w_final*1.0 + b_final:0.1f} Thousand dollars")
print(f"1200 sqft house prediction {w_final*1.2 + b_final:0.1f} Thousand dollars")
print(f"2000 sqft house prediction {w_final*2.0 + b_final:0.1f} Thousand dollars")