这篇博客是对这篇论文的翻译以及理解
绪论以及2.2节以前的部分比较好理解,这里我就从2.2节开始,一段英文一段中文的翻译。
2.2 Spring Model Analogy
An analogy to a particular mechanical spring system is given to provide an intuition of what is happening when the loss function is minimized.The output of Gw can be thought of as masses attracting and repelling each other with springs.
F=-kX
下面用弹簧系统的机制来理解在损失函数是怎么最小化的,整个网络的输出Gw可以看做是(弹簧之间)许多吸引力和排斥力的作用。公式为:F=-kX
when F is the force ,K is the spring constant and X is the displacement of the spring from its rest length.
F是表示弹簧的弹力,K是弹簧的劲度系数(是一个常量),X是弹簧的位移。
下面是弹簧系统和损失函数之间的类比关系。
1、A spring is attract-only if its rest length is equal to zero ,Thus any positive displacement X will result in an attractive force between its ends(这里相当于拉长弹簧,位移为正)
2、A spring is said to be m-repulse-only if its rest length is equal to m.Thus two point that are connected to with m-repulse-only will be pushed apart if X is less than m。
3、However ,this spring has a special property that if the spring is stretched by a length X>m,then no attractive force bring it back to rest length.
4、Each point is connected to other points using these two kind of springs
5、Seen in the light of the loss function ,each point is connected by attract-only spring to similar points ,and its connected by by m-repulse-only springs to dissimilar points.
1、当一个弹簧的静置长度为零的时候,这个弹簧是属于attract-only类型的。因此当弹簧的位移为正的时候,在弹簧的两端会产生吸引力。
2、当一个弹簧的静止长度为m的时候,这个弹簧是属于m-repulse-only类型的,若两个端点为通过m-repulse-only类型的弹簧相连接,则当弹簧的位移长度X
先回顾一下,我们分析这个损失函数的目的到底是什么,主要是为了对每对输入的图片经过网络得到Gw,通过对比两个Gw的欧氏距离,优化损失函数的同时,对权重W进行调整,这样当训练网络收敛的时候,学习到W的网络可以将输入进行降维处理。下面我们接着来讲。
1、Consider the loss function
The loss function L is mninmized using the stochastic gradient descent algorithm,The gradient of Ls is :
将这个公式与
进行对比。
It’s clear that the gradient ∂Ls∂W of Ls gives the attractive force between the two points, ∂Dw∂W defines the spring constant K of the spring and DW ,which is the distance between the two points,gives the perturbation X of the spring from its rest length.
Clearly, even a small value of DW will generate a gradient (force) to decrease DW
(我理解的是可以降低L_{s},因为 Ls=12(Dw)2 ,所以它等同于减少了 DW )
Thus the similar loss function corresponds to the attract-only spring
2、Now consider the partial loss function LD :
when Dw>m, ∂LD∂W=0 ,there is no gradient (force) on the two points that are dissimilar and are at a distance Dw>m.
if Dw 小于 m then
对比公式
Again ,comparing equation of the two formula above,it’s clear that the dissimilar loss function L_{D} corresponds to the m-repulse-only spring;its gradient gives the force of the spring. ∂DW∂W gives the spring constant K and (m−DW) gives the perturbation X .The nagetive sign denote the fact that the force is repulsive only
Clearly,the force is maximum when DW=0 (这里不懂可以看论文里面的figure2中的图d) and absent when Dw=m.
3、Here,especially in the case of Ls ,one might think that simply making DW=0 for all atract-only springs would put the system in equilibtium.Consider ,However figure2.
4、Suppose b1 is connected to b2 and b3 with attract-only springs.Then the Decreasing DW between b1 and b2 will increase DW between the b1 and b3,Thus by minizing the global loss function over all springs,one would ultimately drive the system to its equilibrium state.
2.3 The Algorithm
the algorithm first generates the training set,then trains the mechine.
Step 1: For each input sample
Xi−→
,do the following:
a.Using prior knowledge find the set of samples SXi→={Xj−→}pj=1 such that Xj−→ is deemed similar to Xi−→
Combine all the pairs to form the labeled training set.
Step2:Repeat until convergence
For each pair (Xi−→,Xj−→) in the training set do
if Yij=0 ,then update W to decreaseDw=∥∥Gw(Xi−→)−Gw(Xj−→)∥∥2if Yij=1 ,then update W to increase
Dw=∥∥Gw(Xi−→)−Gw(Xj−→)∥∥2