Machine learning week 1(Andrew Ng)——机器学习介绍及分类、单变量线性回归实现

小白有颗大白梦

已于 2022-12-26 20:08:09 修改

阅读量350

点赞数 1

分类专栏： Machine learning 文章标签：算法人工智能

于 2022-08-03 20:07:39 首次发布

本文链接：https://blog.csdn.net/weixin_62012485/article/details/126099463

版权

Machine learning 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

文章目录

- - Week one

Week one

1.1、Supervised learning-part-01（regression）

Learn from being given “right answer”
给定输入x与输出y然后让机器判断新的输入x下的正确输出

1.2、Supervised learning-part-02（classification）

It predicts categories. How to find a boundary line?

1.3、Unsupervised learning-part-01

We don’t tell them in advance
eg： Clustering algorithm

1.4、Unsupervised learning-part-02

Anomaly detection异常检测
Dimensionality reduction 数据降维

1.5、Jupyter notebooks

2.1、Line regression model part 0102

在这里插入图片描述

y-hat is the prediction target and y is the output or “target” variable（目标变量）
Line regression with one variable is called uni-variate(单变量）linear regression

2.2、Cost Function Formula

$\frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \tag{1}$
The extra division by 2 is just meant to make some of our later calculations look neater

def compute_cost(x, y, w, b):
   
    m = x.shape[0] 
    cost = 0
    
    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1 / (2 * m) * cost

    return total_cost

In this class, we have a simplified version which make b equal zero and minimize $J (w, b)$ .

2.3、Visualizing The Cost Function

The original model with w and b was plotted in 3D.
在这里插入图片描述

2.4、Visualization Examples

Different choice of w and b.

3.1、Gradient Descent（梯度下降）

Spin around and decide direction
For the calculation:

$w=w-\alpha\frac{\partial J(w,b)}{\partial w},b=b-\alpha \frac{\partial J(w,b)}{\partial b}$

$\alpha$ is the learning rate,which is a small positive number.

3.1.1 The relation between slope and w

在这里插入图片描述
When the point is on the right,the slope is 2/1,which is a positive number.So the $\frac{\partial J(w,b)}{\partial w}$ is positive.Therefore,the new $w$ will decrease .That’s how the w reaches the correct value.

3.1.2 The learning rate

If $\alpha$ is too small,gradient descent may be slow.
If $\alpha$ is too big, gradient descent may be fail to converge(收敛）just like this pic.
在这里插入图片描述
We should also notice that the change of w will not make the pot leave the curve ,because it only changes the x axis and the vertical axis will change following w.

3.2、Gradient Descent for linear regression

在这里插入图片描述
Depending on where you initialize the parameters w and b, you can end up at different local minima.
But it turns out when you’re using a squared error cost function with linear regression, the cost function does not and will never have multiple local minima.

$\frac{\partial J(w,b)}{\partial w}$ is equal to $\frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)}$
$\frac{\partial J(w,b)}{\partial b}$ is equal to $\frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})$

def compute_gradient(x, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    
    # Number of training examples
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

The overall implement:

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing


# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha, 
                                                    iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")

Finally,to be more precise, this gradient descent process is called batch gradient descent. The term bashed grading descent refers to the fact that on every step of gradient descent, we’re looking at all of the training examples, instead of just a subset of the training data.