halcon中mlp的训练_使用反向传播从头训练mlp求解数学方程式

halcon中mlp的训练

This post demonstrates the concept and use of backpropagation for solving a mathematical equation.

这篇文章演示了反向传播的概念及其在求解数学方程式中的用途。

First of all, why should we even care about Neural Networks?

首先,我们为什么还要关心神经网络?

Well, we do have a biological inspiration for that! Rigorous studies in biology in the past have revealed that the human brain consists of connected structures of neurons. Each neuron performs a specific task, and millions of such neurons together, are capable of performing highly complex tasks. First simplest Neural Network Model call the Perceptron was built in 1958 by Rosenblatt.The Perceptron is an algorithm for learning a binary classifier called a threshold function: a function that maps its input ‘x’(a real-valued vector) to an output value f(x)(a single binary value). The value of f(x)(0 or 1) is used to classify x as either a positive or a negative instance. Perceptron is this single neuron which is loosely inspired from the biological neuron and is not an exact replica but is powerful enough to solve many interesting problems.

好吧,我们确实对此有生物学上的启发! 过去对生物学的严格研究表明,人脑由神经元的相连结构组成。 每个神经元执行特定的任务,数百万个这样的神经元一起能够执行高度复杂的任务。 最简单的神经网络模型Perceptron由Rosenblatt于1958年建立。Perceptron是一种用于学习二进制分类器的算法,称为阈值函数:该函数将其输入“ x”(实值向量)映射为输出值f(x)(单个二进制值)。 f(x)(0或1)的值用于将x分类为正实例或负实例。 感知器就是这种单一的神经元,它是从生物神经元中受到宽松启发的,不是精确的复制品,但功能强大,足以解决许多有趣的问题。

But for performing complex tasks, a bunch of such Perceptrons need to work together.

但是,为了执行复杂的任务,需要将许多此类感知器协同工作。

Multi-Layered Perceptron

多层感知器

In Multi-Layered Perceptrons, bunch of single neurons are stacked to form a layer and many such layers are stacked to form a network of neurons. This can be thought of as a function of functions, analogous to the composite functions we learnt in high school. Thus, with MLP we can have complex functions to act on inputs to get the output. MLPs can easily overfit, thus regularizers are applied to avoid overfitting. MLP can be thought of as a graphical way of representing functional compositions.

在多层感知器中,一堆单个神经元堆叠在一起形成一层,而许多这样的层堆叠在一起形成神经元网络。 可以将其视为函数的函数,类似于我们在高中学习的复合函数。 因此,使用MLP,我们可以具有复杂的功能来作用于输入以获得输出。 MLP很容易过度拟合,因此应用了正则化函数以避免过度拟合。 可以将MLP视为表示功能组成的图形方式。

Consider a simple equation where we perform some basic arithmetic operations:

考虑一个简单的方程式,我们在其中执行一些基本的算术运算:

y = (x1 + x2) * (x3 — x4)

y =(x1 + x2)*(x3-x4)

Here we are using 3 arithmetic operations: addition, subtraction and multiplication.Let’s write 3 functions add(), sub()and mul() which perform addition, subtraction and multiplication respectively.The above equation can now be written as follows:

这里我们使用3个算术运算:加法,减法和乘法。让我们编写3个函数add(),sub()和mul()分别执行加法,减法和乘法。现在可以将上述等式编写如下:

y = mul(add(x1, x2), sub(x3, x4))

y = mul(add(x1,x2),sub(x3,x4))

Computational graph for this equation can be constructed as shown below:

该方程的计算图可以如下所示构建:

Image for post

Backpropagation

反向传播

Backpropagation can simply be thought of as Chain Rule + Memoization. Chain rule refers to the one we generally use in differentiation. Memoizationrefers to storing frequently used values. This reduces efforts of performing same computations again and again.

反向传播可以简单地认为是链规则+记忆化。 连锁规则是指我们通常用于区分的规则。 备注是指存储常用值。 这减少了一次又一次执行相同计算的工作量。

As per Wikipedia, “Backpropagation is a widely used algorithm in training feed forward neural networks for supervised learning. The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight by the chain rule, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule; this is an example of dynamic programming.”

根据Wikipedia的说法,“反向传播是一种广泛用于训练前馈神经网络以进行监督学习的算法。 反向传播算法的工作原理是:通过链规则计算损失函数相对于每个权重的梯度,一次计算梯度,从最后一层开始迭代,以避免链规则中中间项的冗余计算; 这是动态编程的一个例子。”

One very important thing to note is that for backpropagation to work, our Activation Functions must be differentiable. Functions which are easy/fast to differentiate speeds up the overall process as it takes less time for computing derivatives.

要注意的一件非常重要的事情是,要使反向传播起作用,我们的激活功能必须是可区分的。 易于/快速区分的功能可加快整个过程,因为计算导数所需的时间更少。

Having got a taste of some key concepts, lets jump in to see how the idea of backpropagation can be used for solving a complex mathematical equation.

领会了一些关键概念后,让我们跳进去看看如何将反向传播的思想用于解决复杂的数学方程式。

Consider the equation shown below:

考虑以下所示的方程式:

Image for post

As previously seen, we can write the above equation as:

如前所述,我们可以将上面的等式写成:

q = add{exp{sin[sub[(mul(add(w1, f1), mul(w, f2))), (add(add(w1, f1), add(w2, f2)))]}, exp{add(sigmoid(mul(w3, f3)), w5)}, exp{add(sigmoid(add(w4, f4)), w6)}}

q =添加{exp {sin [sub [(mul(add(w1,f1),mul(w,f2))),(add(add(w1,f1),add(w2,f2)))]}}, exp {add(sigmoid(mul(w3,f3)),w5)},exp {add(sigmoid(add(w4,f4)),w6)}}

Let’s just say y_pred = q.

假设y_pred = q

Phew! This seems complex. But fortunately, it’s not:) Refer the computational graph below for better understanding.

! 这似乎很复杂。 但幸运的是,它不是:)请参考下面的计算图以更好地理解。

Image for post

We can think of each node or neuron as one of the functions we are using in this equation.

我们可以将每个节点或神经元视为方程式中使用的功能之一。

Here we have 4 dimensional inputs and a weight vector of size 6. Loss used is squared loss. We need to find the best possible solution by reducing the loss as much as possible. Flow of the Gradient Descent(one can use any among GD, SGD or mini batch SGD as per his\her choice, although mini batch SGD is preferred by many) algorithm to solve the optimization problem looks as shown below.

这里,我们有4维输入和大小为6的权重向量。使用的损失是平方损失。 我们需要通过尽可能减少损失来找到最佳解决方案。 梯度下降的流程(可以选择使用GD,SGD或小批量SGD中的任何一种,尽管很多人都喜欢小批量SGD)算法来解决优化问题,如下所示。

Given D = {x_i, y_i} // x_i is the input data point and y_i is the corresponding class label.1. Initialize weights → randomly (there are effective initialization  methods available)
2. For each x_i:
→ Pass x_i forward through network:
— → Get y_i_hat
— → This is forward propagation
→ Compute loss
→ Compute derivatives using chain rule and memoization
→ Update weights from end to start
3. Repeat step 2 until convergence, i.e. W_new ~ W_old

Forward propagation → Using x_i to calculate y_i and LBackward propagation → Using L to update weightsBoth combine to form an epoch.

前向传播→使用x_i计算y_i和L向后传播→使用L更新权重二者结合形成一个历元。

We will be using numpy which can be imported as follows:

我们将使用numpy,可以将其导入如下:

import numpy as np

Let’s have a look at the forward pass or forward propagation.

让我们看一下正向传递或正向传播。

Forward propagation can be considered as the process of moving from left to right. We calculate y_pred and Loss from x_i in this pass.

前向传播可被视为从左向右移动的过程。 在此过程中,我们从x_i计算y_pred和损失。

def forward_pass(x, y, w):
'''In this function, we will compute the forward propagation '''
# x: input data point, note that in current equation we are having
4-d data points
# y: output variable
# w: weight array, its of length 6, w[0] corresponds to w1 in
graph, w[1] corresponds to w2 in graph,..., w[5] corresponds to
w6 in computational graph.
a = w[0] + x[0]
b = w[1] + x[1]
c = a * b
d = a + b
e = c - d
f = sin(e)
g = math.e(f)
h = w[2] * x[2]
i = sigmoid(h)
k = w[4] + i
m = math.e(k)
j = w[3] + x[3]
l = sigmoid(j)
n = w[5] + l
o = math.e(n)
p = m * o
q = g + p
r = q = y'
L = (y - y_pred) ** 2
dldy_pred = -2 * (y - y_pred)
dl = dldy_pred
dictionary = {
"a": a, "b": b, "c": c, "d": d, "e": e, "f": f, "g": g, "h": h,
"i": i, "j": j, "k": k, "l": l, "m": m, "n": n, "o": o, "p": p,
"q": q, "r": r, "y_pred": q, "loss": L, "dl": dl,"Loss_Function" :
(y - y_pred) ** 2, "dldy_pred": -2 * (y - y_pred)
}
return (dictionary)

Sigmoid function we used above can be written as follows:

我们上面使用的Sigmoid函数可以编写如下:

def sigmoid():
'''In this function, we will compute the sigmoid(z)'''
# we can use this function in forward and backward propagation
return 1 / (1 + np.e**(-z))

Now let’s look the backward pass or backward propagation.

现在让我们看一下反向传递或反向传播。

Backward propagation can be considered as the process of moving from right to left, that is backwards for updating weights using Loss L. Remember, we need to find optimal weights to get the best result.

向后传播可以看作是从右向左移动的过程,也就是向后传播以使用Loss L更新权重。请记住,我们需要找到最佳权重以获得最佳结果。

Notation: dydx = differentation of y w.r.t. x.

表示法: dydx = y wrt x的微分

Here, we differentate of L w.r.t all 6 weights, that is we compute dLdw1, dLdw2, dLdw3, dLdw4, dLdw5 and dLdw6. We can easily do it as follows:

在这里,我们对L wrt的所有6个权重进行求和,即计算dLdw1,dLdw2,dLdw3,dLdw4,dLdw5和dLdw6。 我们可以很容易地做到如下:

dqdg = 1dqdf = dqdg * dgdf = 1 * np.e**(f) = np.e**(f)dqde = dqdf * dfde = np.e**(f) * np.cos(e)dqdc = dqde * dedc = [np.e**(f) * np.cos(e)] * 1 = np.e**(f) * np.cos(e)dqdd = dqde * dedd = [np.e**(f) * np.cos(e)] * (-1) = -[np.e**(f) * np.cos(e)]dqda = dqde * (dedc * dcda + dedd * ddda) = [np.e**(f) * np.cos(e)] * (b- 1)dqdb = dqde * (dedc * dcdb + dedd * dddb) = [np.e**(f) * np.cos(e)] * (a-1)dqdw1 = dqda * dadw1 = {[np.e**(f) * np.cos(e)] * (b-1)} * 1 = [np.e**(f) * np.cos(e)] * (b-1)dqdw2 = dqdb * dbdw2 = {[np.e**(f) * np.cos(e)] * (a-1)} * f2 dqdp = 1 dqdm = dqdp * dpdm = 1 * o = o dqdk = dqdm * dmdk = o * np.e**(k) dqdi = dqdk * dkdi = (o * np.e**(k)) * 1 = o * np.e**(k) dqdh = dqdi * didh = (o * np.e**(k)) * [sigmoid(h) * (1-sigmoid(h))] dqdw3 = dqdh * dhdw3 = (o * np.e**(k)) * [sigmoid(h) * (1-sigmoid(h))] *f3 dqdw5 = dqdk * dkdw5 = (o * np.e**(k)) * 1 = o * np.e**(k) dqdo = dqdp * dpdo = 1 * m = m dqdn = dqdo * dodn = m * np.e**(n) dqdw6 = dqdn * dndw6 = (m * np.e**(n)) * 1 = m * np.e**(n) dqdl = dqdn * dndl = (m * np.e**(n)) * 1 = m * np.e**(n) dqdj = dqdl * dldj = (m * np.e**(n)) * [sigmoid(j)(1-sigmoid(j))] * 1 = (m * np.e**(n)) * [sigmoid(j)(1-sigmoid(j))]

dqdg = 1dqdf = dqdg * dgdf = 1 * np.e **(f)= np.e **(f)dqde = dqdf * dfde = np.e **(f)* np.cos(e)dqdc = dqde * dedc = [np.e **(f)* np.cos(e)] * 1 = np.e **(f)* np.cos(e)dqdd = dqde * dedd = [np.e * *(f)* np.cos(e)] *(-1)=-[np.e **(f)* np.cos(e)] dqda = dqde *(dedc * dcda + dedd * ddda)= [np.e **(f)* np.cos(e)] *(b-1)dqdb = dqde *(dedc * dcdb + dedd * dddb)= [np.e **(f)* np.cos (e)] *(a-1)dqdw1 = dqda * dadw1 = {[np.e **(f)* np.cos(e)] *(b-1)} * 1 = [np.e ** (f)* np.cos(e)] *(b-1)dqdw2 = dqdb * dbdw2 = {[np.e **(f)* np.cos(e)] *(a-1)} * f2 dqdp = 1 dqdm = dqdp * dpdm = 1 * o = o dqdk = dqdm * dmdk = o * np.e **(k)dqdi = dqdk * dkdi =(o * np.e **(k))* 1 = o * np.e **(k)dqdh = dqdi * didh =(o * np.e **(k))* [Sigmoid(h)*(1-Sigmoid(h))] dqdw3 = dqdh * dhdw3 =(o * np.e **(k))* [sigmoid(h)*(1-sigmoid(h))] * f3 dqdw5 = dqdk * dkdw5 =(o * np.e **(k))* 1 = o * np.e **(k)dqdo = dqdp * dpdo = 1 * m = m dqdn = dqdo * dodn = m * np.e **(n)dqdw6 = dqdn * dndw6 =(m * np。 e **(n))* 1 =米 * np.e **(n)dqdl = dqdn * dndl =(m * np.e **(n))* 1 = m * np.e **(n)dqdj = dqdl * dldj =(m * np .e **(n))* [sigmoid(j)(1-sigmoid(j))] * 1 =(m * np.e **(n))* [Sigmoid(j)(1-sigmoid(j ))]

Thus, we can write the backward_pass function as:

因此,我们可以将backward_pass函数编写为:

def backward_pass(x, w, d):
'''In this function, we will compute the backward propagation '''
# L: the loss we calculated for the current point
# dictionary: the outputs of the forward_pass() function
# code to compute the gradients of each weight [w1,w2,w3,...,w6]
dqdg = 1
dqdf = np.e**(d["f"])
dqde = np.e**(d["f"]) * np.cos(d["e"])
dqdc = np.e**(d["f"]) * np.cos(d["e"])
dqdd = -[np.e**(d["f"]) * np.cos(d["e"])]
dqda = [np.e**(d["f"]) * np.cos(d["e"])] * (d["b"] - 1)
dqdb = [np.e**(d["f"]) * np.cos(d["e"])] * (d["a"] - 1)
dqdw1 = [np.e**(d["f"]) * np.cos(d["e"])] * (d["b"] - 1)
dqdw2 = [np.e**(d["f"]) * np.cos(d["e"])] * (d["a"] - 1) * x[1]
dqdp = 1
dqdm = d["o"]
dqdk = d["o"] * np.e**(d["k"])
dqdi = d["o"] * np.e**(d["k"])
dqdh = (d["o"] * np.e**(d["k"])) * [sigmoid(d["h"]) *
(1 - sigmoid(d["h"]))]
dqdw3 = (d["o"] * np.e**(d["k"])) * [sigmoid(d["h"]) * (1 -
sigmoid(d["h"]))] *x[2]
dqdw5 = d["o"] * np.e**(d["k"])
dqdo = d["m"]
dqdn = d["m"] * np.e**(d["n"])
dqdw6 = d["m"] * np.e**(d["n"])
dqdl = d["m"] * np.e**(d["n"])
dqdj = (d["m"] * np.e**(d["mn"])) * [sigmoid(d["j"])(1 -
sigmoid(d["j"]))]
dLdw1 = d["dl"] * dqdw1
dLdw2 = d["dl"] * dqdw2
dLdw3 = d["dl"] * dqdw3
dLdw4 = d["dl"] * dqdw4
dLdw5 = d["dl"] * dqdw5
dLdw6 = d["dl"] * dqdw6
dW = {
"dw1": dLdw1, "dw2": dLdw2, "dw3": dLdw3, "dw4": dLdw4, "dw5":
dLdw5, "dw6": dLdw6
}
# return dW, dW is a dictionary with gradients of all the weights
return dW

We can do sanity check by checking all the return values after performing gradient checking, they have to be zero. We haven’t discussed details of gradient checking in this blog.

我们可以通过执行梯度检查后检查所有返回值来进行完整性检查 ,它们必须为零。 在此博客中,我们没有讨论梯度检查的详细信息。

Thus, this is how we can convert a mathematical equation into computational graph and write it as a function of function(s), i.e. using the concept of composite functions and then find optimal edge weights using backpropagation.

因此,这就是我们可以将数学方程式转换为计算图并将其写为函数的方法,即使用复合函数的概念,然后使用反向传播找到最佳边缘权重。

Thank you!

谢谢!

翻译自: https://medium.com/analytics-vidhya/training-an-mlp-from-scratch-using-backpropagation-for-solving-mathematical-equations-91b523c24748

halcon中mlp的训练

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值