机器学习（Machine Learning）学习笔记

Mujin6843

已于 2023-10-24 16:19:41 修改

阅读量164

点赞数

文章标签：学习笔记机器学习

于 2023-10-24 16:19:19 首次发布

本文链接：https://blog.csdn.net/Mujin6843/article/details/133814308

版权

Week1：

监督学习：指学习x到y或输入到输出映射的算法。关键（给予的学习算法示例）

监督学习算法：回归 regression ——连续数值分类 classification ——离散值

无监督学习：无监督学习就是使用不带有标签(label)的数据，去找到数据之间的规律。

无监督学习算法：聚类算法（获取没有标签的数据并尝试自动将它们分组到集群中）

线性回归模型：

x表示输入的变量/特征(图中为房子的大小) y 表示输出的目标变量(图中为房子的价格) m表示总共有多少个训练样例(图中表示有总共有47个训练样例) （x，y）表示单个训练样例表示第i个训练样例

f代表的是函数也就是训练的model y-hat也是对y的预测值就相当于前面提到的房屋大小与价格中预测的房屋价格

代价函数(Cost Function)

代价函数J所做的就是衡量模型预测与y的实际真实值之间的差异，线性回归会尝试找到w和b的值，然后使w的J尽可能小

通过三维图来表现w，b怎么样影响J的

梯度下降(Gradient Descent)

如下图，在确定初始位置的情况下，通过选择当前位置最陡的下坡位置移动一小步，到达新的位置后继续重复这个过程，最终到达一个凹地。这个小凹地就是局部最小值（local minima）。选择不同的起点，可能会到达不同的局部最小值点。

梯度下降公式:

α为学习率

只考虑w的情况下当斜率为正时 w会逐渐变小靠近最小值反之w逐渐变大靠近最小值

在计算minimun时，梯度下降的第一步斜率一般都很大，接下来斜率逐渐减小，更新速度也会变慢

代码实现:

实现cost-function

def compute_cost(x, y, w, b):
   
    m = x.shape[0]   
    cost = 0
    
    for i in range(m):   
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1 / (2 * m) * cost

    return total_cost

实现cost_function对w，b偏导:

def compute_gradient(x, y, w, b): 
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

计算每一次迭代的w，b并且得出使cost_function最小的w，b的值

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        #计算使cost_function最小时的w，b
        dj_dw, dj_db = gradient_function(x, y, w , b)     
        
        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            
        
        # Save cost J at each iteration 存储每一次迭代cost_function的值
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

Week2：

多维特征(multiple feature):

多维特征就是指拥有多个x的值影响着y

多维特征模型:

矢量化与使用numpy函数

矢量化和未矢量化进行对比:

多维特征求w,b,J函数的代码实现:

设置X的训练集为[2104,5,1,45],[1416,3,2,40],[852,2,1,35] y的训练集为[460,232,178]

X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

为了演示，设置了𝐰和𝑏接近最优的初始选择值。。

b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

非矢量化fw,b(x)代码实现:

def predict_single_loop(x, w, b): 
    n = x.shape[0]  #x.shape[0]为3 得知变量总共有3组  x.shape:(3,4)
    p = 0
    for i in range(n):
        p_i = x[i] * w[i]   #计算公式中的wi*x*
        p = p + p_i         
    p = p + b                
    return p

矢量化fw,b(x)代码实现:

def predict(x, w, b): 
    p = np.dot(x, w) + b     #运用numpy中的dot直接算出
    return p

J(w,b)代码实现:

def compute_cost(X, y, w, b): 
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           
        cost = cost + (f_wb_i - y[i])**2       
    cost = cost / (2 * m)                       
    return cost

J(w,b)分别对w,b的偏导后的代码实现:

def compute_gradient(X, y, w, b): 
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))    #m = 3,n = 4
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

得出使J(w,b)最小的w,b的值:

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in    
    for i in range(num_iters): #num_iters为迭代次数
        dj_db,dj_dw = gradient_function(X, y, w, b)   
        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw              
        b = b - alpha * dj_db                    
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

特征缩放:

当特征值(feature)值很大的时候就像前面提到的住房面积(300-2000)，房间数量(0-5)只有一个很小的值，这种情况下最适合的w1一般是一个很小的值如0.1，w2会为一个较大的值如50，这时预测结果就会相对准确。