【Machine Learning】9.逻辑回归产生的逻辑损失


主要研究逻辑回归的损失和代价问题,理论与实践结合

1.导入

import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
from plt_logistic_loss import  plt_logistic_cost, plt_two_logistic_loss_curves, plt_simple_example
from plt_logistic_loss import soup_bowl, plt_logistic_squared_error
plt.style.use('./deeplearning.mplstyle')

2.逻辑回归的代价

squared error cost function 平方误差成本函数:
The equation for the squared error cost with one variable is 含一个变量的平方误差代价函数:
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 (1) J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \tag{1} J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2(1)

where
f w , b ( x ( i ) ) = w x ( i ) + b (2) f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{2} fw,b(x(i))=wx(i)+b(2)

x_train = np.array([0., 1, 2, 3, 4, 5],dtype=np.longdouble)
y_train = np.array([0,  0, 0, 1, 1, 1],dtype=np.longdouble)
plt_simple_example(x_train, y_train)

在这里插入图片描述
Now, let’s get a surface plot of the cost using a squared error cost:
J ( w , b ) = 1 2 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) 2 J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 J(w,b)=2m1i=0m1(fw,b(x(i))y(i))2

where
f w , b ( x ( i ) ) = s i g m o i d ( w x ( i ) + b ) f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b ) fw,b(x(i))=sigmoid(wx(i)+b)

plt.close('all')
plt_logistic_squared_error(x_train,y_train)
plt.show()

在这里插入图片描述
下面这是线性回归的
在这里插入图片描述
Logistic Regression使用的损失函数更适合目标为0或1而不是任何数字的分类任务。

Definition Note: In this course, these definitions are used:
Loss is a measure of the difference of a single example to its target value while the 损失是用来衡量单个数据与目标的差值
Cost is a measure of the losses over the training set 代价是用来衡量整个训练集上的损失

3.损失函数

损失是用来衡量单个数据与目标的差值,而代价是用来衡量整个训练集上的损失

This is defined:

  • l o s s ( f w , b ( x ( i ) ) , y ( i ) ) loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) loss(fw,b(x(i)),y(i)) is the cost for a single data point, which is:

在这里插入图片描述

  • f w , b ( x ( i ) ) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) fw,b(x(i)) is the model’s prediction, while y ( i ) y^{(i)} y(i) is the target value.

  • f w , b ( x ( i ) ) = g ( w ⋅ x ( i ) + b ) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot\mathbf{x}^{(i)}+b) fw,b(x(i))=g(wx(i)+b) where function g g g is the sigmoid function.

此损失函数的定义特征是它使用两条单独的曲线。一个用于目标为零( y = 0 y=0 y=0)的情况,另一个用于当目标为一( y = 1 y=1 y=1)时的情况。结合起来,这些曲线提供了对损失函数有用的表示,即,当预测与目标匹配时为零,当预测不同于目标时,值迅速增加。考虑以下曲线
在这里插入图片描述
上面的损失函数可以写成更简单的形式
l o s s ( f w , b ( x ( i ) ) , y ( i ) ) = ( − y ( i ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) loss(fw,b(x(i)),y(i))=(y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))

上面那个式子看起来有点吓人,其实 y ( i ) y^{(i)} y(i) 只有两个值 0 and 1.
y ( i ) = 0 y^{(i)} = 0 y(i)=0 式子变成
l o s s ( f w , b ( x ( i ) ) , 0 ) = ( − ( 0 ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − 0 ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) = − log ⁡ ( 1 − f w , b ( x ( i ) ) ) \begin{align} loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), 0) &= (-(0) \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - 0\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \\ &= -\log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \end{align} loss(fw,b(x(i)),0)=((0)log(fw,b(x(i)))(10)log(1fw,b(x(i)))=log(1fw,b(x(i)))
y ( i ) = 1 y^{(i)} = 1 y(i)=1 式子变成
l o s s ( f w , b ( x ( i ) ) , 1 ) = ( − ( 1 ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − 1 ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) = − log ⁡ ( f w , b ( x ( i ) ) ) \begin{align} loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), 1) &= (-(1) \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - 1\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)\\ &= -\log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \end{align} loss(fw,b(x(i)),1)=((1)log(fw,b(x(i)))(11)log(1fw,b(x(i)))=log(fw,b(x(i)))

这个损失函数,可以生成一个成本函数,它包含了所有示例中的损失,下面会讨论

plt.close('all')
cst = plt_logistic_cost(x_train,y_train)

在这里插入图片描述
这条曲线非常适合梯度下降!它没有高原、局部极小值或间断(plateaus, local minima, or discontinuities)。注意,在平方误差的情况下,它不是一个碗。绘制成本和成本对数以说明这样一个事实,即当成本较小时,曲线有一个斜率,并继续下降。

4.从损失函数得出代价函数

4.1 导入

import numpy as np
%matplotlib widget
import matplotlib.pyplot as plt
from lab_utils_common import  plot_data, sigmoid, dlc
plt.style.use('./deeplearning.mplstyle')

4.2 数据载入与分析

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])  #(m,n)
y_train = np.array([0, 0, 0, 1, 1, 1])   

绘图

fig,ax = plt.subplots(1,1,figsize=(4,4))
plot_data(X_train, y_train, ax)

# Set both axes to be from 0-4
ax.axis([0, 4, 0, 3.5])
ax.set_ylabel('$x_1$', fontsize=12)
ax.set_xlabel('$x_0$', fontsize=12)
plt.show()

在这里插入图片描述

4.3 逻辑回归的代价计算

这里是把所有数据的loss合并为整个函数的cost

J ( w , b ) = 1 m ∑ i = 0 m − 1 [ l o s s ( f w , b ( x ( i ) ) , y ( i ) ) ] (1) J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) \right] \tag{1} J(w,b)=m1i=0m1[loss(fw,b(x(i)),y(i))](1)

where

  • l o s s ( f w , b ( x ( i ) ) , y ( i ) ) loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) loss(fw,b(x(i)),y(i)) is the cost for a single data point, which is:

    l o s s ( f w , b ( x ( i ) ) , y ( i ) ) = − y ( i ) log ⁡ ( f w , b ( x ( i ) ) ) − ( 1 − y ( i ) ) log ⁡ ( 1 − f w , b ( x ( i ) ) ) (2) loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \tag{2} loss(fw,b(x(i)),y(i))=y(i)log(fw,b(x(i)))(1y(i))log(1fw,b(x(i)))(2)

  • where m is the number of training examples in the data set and:
    f w , b ( x ( i ) ) = g ( z ( i ) ) z ( i ) = w ⋅ x ( i ) + b g ( z ( i ) ) = 1 1 + e − z ( i ) \begin{align} f_{\mathbf{w},b}(\mathbf{x^{(i)}}) &= g(z^{(i)})\\ z^{(i)} &= \mathbf{w} \cdot \mathbf{x}^{(i)}+ b \\ g(z^{(i)}) &= \frac{1}{1+e^{-z^{(i)}}} \end{align} fw,b(x(i))z(i)g(z(i))=g(z(i))=wx(i)+b=1+ez(i)1

计算代码:

def compute_cost_logistic(X, y, w, b):
    """
    Computes cost

    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """

    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
             
    cost = cost / m
    return cost

调用上述代码

w_tmp = np.array([1,1])
b_tmp = -3
print(compute_cost_logistic(X_train, y_train, w_tmp, b_tmp))

4.4 举例计算

把上述代码应用在具体例子中

看看 w w w的不同值输出的cost函数.

  • In a previous lab, you plotted the decision boundary for b = − 3 , w 0 = 1 , w 1 = 1 b = -3, w_0 = 1, w_1 = 1 b=3,w0=1,w1=1. That is, you had w = np.array([-3,1,1]).

  • Let’s say(假设) you want to see if b = − 4 , w 0 = 1 , w 1 = 1 b = -4, w_0 = 1, w_1 = 1 b=4,w0=1,w1=1, or w = np.array([-4,1,1]) provides a better model.

Let’s first plot the decision boundary for these two different b b b values to see which one fits the data better.

  • For b = − 3 , w 0 = 1 , w 1 = 1 b = -3, w_0 = 1, w_1 = 1 b=3,w0=1,w1=1, we’ll plot − 3 + x 0 + x 1 = 0 -3 + x_0+x_1 = 0 3+x0+x1=0 (shown in blue)
  • For b = − 4 , w 0 = 1 , w 1 = 1 b = -4, w_0 = 1, w_1 = 1 b=4,w0=1,w1=1, we’ll plot − 4 + x 0 + x 1 = 0 -4 + x_0+x_1 = 0 4+x0+x1=0 (shown in magenta)
import matplotlib.pyplot as plt

# Choose values between 0 and 6
x0 = np.arange(0,6)

# Plot the two decision boundaries
x1 = 3 - x0
x1_other = 4 - x0

fig,ax = plt.subplots(1, 1, figsize=(4,4))
# Plot the decision boundary
ax.plot(x0,x1, c=dlc["dlblue"], label="$b$=-3")
ax.plot(x0,x1_other, c=dlc["dlmagenta"], label="$b$=-4")
ax.axis([0, 4, 0, 4])

# Plot the original data
plot_data(X_train,y_train,ax)
ax.axis([0, 4, 0, 4])
ax.set_ylabel('$x_1$', fontsize=12)
ax.set_xlabel('$x_0$', fontsize=12)
plt.legend(loc="upper right")
plt.title("Decision Boundary")
plt.show()

在这里插入图片描述

w_array1 = np.array([1,1])
b_1 = -3
w_array2 = np.array([1,1])
b_2 = -4

print("Cost for b = -3 : ", compute_cost_logistic(X_train, y_train, w_array1, b_1))
print("Cost for b = -4 : ", compute_cost_logistic(X_train, y_train, w_array2, b_2))


Cost for b = -3 :  0.36686678640551745
Cost for b = -4 :  0.5036808636748461

5.课后题

在这里插入图片描述

在这里插入图片描述

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Chapter 1, Theano Basics, helps the reader to reader learn main concepts of Theano to write code that can compile on different hardware architectures and optimize automatically complex mathematical objective functions. Chapter 2, Classifying Handwritten Digits with a Feedforward Network, will introduce a simple, well-known and historical example which has been the starting proof of superiority of deep learning algorithms. The initial problem was to recognize handwritten digits. Chapter 3, Encoding word into Vector, one of the main challenge with neural nets is to connect the real world data to the input of a neural net, in particular for categorical and discrete data. This chapter presents an example on how to build an embedding space through training with Theano. Such embeddings are very useful in machine translation, robotics, image captioning, and so on because they translate the real world data into arrays of vectors that can be processed by neural nets. Chapter 4, Generating Text with a Recurrent Neural Net, introduces recurrency in neural nets with a simple example in practice, to generate text. Recurrent neural nets (RNN) are a popular topic in deep learning, enabling more possibilities for sequence prediction, sequence generation, machine translation, connected objects. Natural Language Processing (NLP) is a second field of interest that has driven the research for new machine learning techniques. Chapter 5, Analyzing Sentiments with a Bidirectional LSTM, applies embeddings and recurrent layers to a new task of natural language processing, sentiment analysis. It acts as a kind of validation of prior chapters. In the meantime, it demonstrates an alternative way to build neural nets on Theano, with a higher level library, Keras. Chapter 6, Locating with Spatial Transformer Networks, applies recurrency to image, to read multiple digits on a page at once. This time, we take the opportunity to rewrite the classification network for handwritten digits images, and our recurrent models, with the help of Lasagne, a library of built-in modules for deep learning with Theano. Lasagne library helps design neural networks for experimenting faster. With this help, we'll address object localization, a common computer vision challenge, with Spatial Transformer modules to improve our classification scores. Chapter 7, Classifying Images with Residual Networks, classifies any type of images at the best accuracy. In the mean time, to build more complex nets with ease, we introduce a library based on Theano framework, Lasagne, with many already implemented components to help implement neural nets faster for Theano. Chapter 8, Translating and Explaining through Encoding – decoding Networks, presents encoding-decoding techniques: applied to text, these techniques are heavily used in machine-translation and simple chatbots systems. Applied to images, they serve scene segmentations and object localization. Last, image captioning is a mixed, encoding images and decoding to texts. This chapter goes one step further with a very popular high level library, Keras, that simplifies even more the development of neural nets with Theano. Chapter 9, Selecting Relevant Inputs or Memories with the Mechanism of Attention, for solving more complicated tasks, the machine learning world has been looking for higher level of intelligence, inspired by nature: reasoning, attention and memory. In this chapter, the reader will discover the memory networks on the main purpose of artificial intelligence for natural language processing (NLP): the language understanding. Chapter 10, Predicting Times Sequence with Advanced RNN, time sequences are an important field where machine learning has been used heavily. This chapter will go for advanced techniques with Recurrent Neural Networks (RNN), to get state-of-art results. Chapter 11, Learning from the Environment with Reinforcement, reinforcement learning is the vast area of machine learning, which consists in training an agent to behave in an environment (such as a video game) so as to optimize a quantity (maximizing the game score), by performing certain actions in the environment (pressing buttons on the controller) and observing what happens. Reinforcement learning new paradigm opens a complete new path for designing algorithms and interactions between computers and real world. Chapter 12, Learning Features with Unsupervised Generative Networks, unsupervised learning consists in new training algorithms that do not require the data to be labeled to be trained. These algorithms try to infer the hidden labels from the data, called the factors, and, for some of them, to generate new synthetic data. Unsupervised training is very useful in many cases, either when no labeling exists, or when labeling the data with humans is too expensive, or lastly when the dataset is too small and feature engineering would overfit the data. In this last case, extra amounts of unlabeled data train better features as a basis for supervised learning. Chapter 13, Extending Deep Learning with Theano, extends the set of possibilities in Deep Learning with Theano. It addresses the way to create new operators for the computation graph, either in Python for simplicity, or in C to overcome the Python overhead, either for the CPU or for the GPU. Also, introduces the basic concept of parallel programming for GPU. Lastly, we open the field of General Intelligence, based on the first skills developped in this book, to develop new skills, in a gradual way, to improve itself one step further.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值