在GCN论文中遇到的问题与理解(二)

Q&A

非线性激活函数ReLU()

ReLU ( x ) = ( x ) + = max ⁡ ( 0 , x ) \text{ReLU}(x) = (x)^+ = \max(0, x) ReLU(x)=(x)+=max(0,x)

作用
	使用线性激活函数,那么这个模型的输出不过是你输入特征x的线性组合。
	使用非线性线性激活函数,
	使得神经网络可以任意逼近任何非线性函数,这样神经网络就可以应用到众多的非线性模型中。

o u t p u t ( 1 ) = w ( 1 ) x + b i a s ( 1 ) o u t p u t ( 2 ) = w ( 2 ) o u t p u t ( 1 ) + b i a s ( 2 ) o u t p u t ( 2 ) = w ( 2 ) w ( 1 ) x + b i a s ( 1 ) + b i a s ( 2 ) 如 果 将 w ′ = w ( 2 ) w ( 1 ) , b i a s ′ = b i a s ( 1 ) + b i a s ( 2 ) o u t p u t ( 2 ) = w ′ x + b i a s ′ output^{(1)} = w^{(1)}x + bias^{(1)}\\ output^{(2)} = w^{(2)}output^{(1)}+ bias^{(2)}\\ output^{(2)} = w^{(2)}w^{(1)}x + bias^{(1)}+ bias^{(2)}\\ 如果将w'= w^{(2)}w^{(1)},bias' = bias^{(1)}+ bias^{(2)}\\ output^{(2)} = w'x + bias'\\ output(1)=w(1)x+bias(1)output(2)=w(2)output(1)+bias(2)output(2)=w(2)w(1)x+bias(1)+bias(2)w=w(2)w(1),bias=bias(1)+bias(2)output(2)=wx+bias

关于GCN第二层

Z = f ( X , A ) = s o f t m a x ( A ^ R e L U ( A ^ X W ( 0 ) ) W ( 1 ) )   ( 9 ) Z = f(X, A) = softmax(\widehat{A} ReLU(\widehat{A}XW^{(0)})W^{(1)} ) \ (9) Z=f(X,A)=softmax(A ReLU(A XW(0))W(1)) (9)

H 1 = R e L U ( A ^ X W ( 0 ) ) H^1 = ReLU(\widehat{A}XW^{(0)}) H1=ReLU(A XW(0))

o u t p u t = A ^ H 1 W ( 1 ) output = \widehat{A}H^1 W^{(1)} output=A H1W(1)

o u t p u t ( 第 一 层 ) = A ^ ∗ s u p p o r t = [ a 1 , 1 x 1 w 1 + . . . + a 1 , 2708 x 2708 w 1 ⋯ a 1 , 1 x 1 w 16 + . . . + a 1 , 2708 x 2708 w 16 a 2 , 1 x 1 w 1 + . . . + a 2 , 2708 x 2708 w 1 ⋯ a 2 , 1 x 1 w 16 + . . . + a 2 , 2708 x 2708 w 16 ⋮ ⋱ ⋮ a 2708 , 1 x 1 w 1 + . . . + a 2708 , 2708 x 2708 w 1 ⋯ a 2708 , 1 x 1 w 16 + . . . + a 2708 , 2708 x 2708 w 16 ] output(第一层) =\widehat{A} *support=\\ \begin{bmatrix} {a_{1,1}x_{1}w_{1}+...+a_{1,2708}x_{2708}w_{1}}&{\cdots}&{a_{1,1}x_{1}w_{16}+...+a_{1,2708}x_{2708}w_{16}}\\ {a_{2,1}x_{1}w_{1}+...+a_{2,2708}x_{2708}w_{1}}&{\cdots}&{a_{2,1}x_{1}w_{16}+...+a_{2,2708}x_{2708}w_{16}}\\ {\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}x_{1}w_{1}+...+a_{2708,2708}x_{2708}w_{1}}&{\cdots}&{a_{2708,1}x_{1}w_{16}+...+a_{2708,2708}x_{2708}w_{16}}\\ \end{bmatrix}\\ output()=A support=a1,1x1w1+...+a1,2708x2708w1a2,1x1w1+...+a2,2708x2708w1a2708,1x1w1+...+a2708,2708x2708w1a1,1x1w16+...+a1,2708x2708w16a2,1x1w16+...+a2,2708x2708w16a2708,1x1w16+...+a2708,2708x2708w16

将 a i , 1 x 1 w j + . . . + a i , 2708 x 2708 w j 定 义 为 h i , j 。 i ∈ [ 1 , 2708 ] , j ∈ [ 1 , 16 ] 将a_{i,1}x_{1}w_{j}+...+a_{i,2708}x_{2708}w_{j}定义为h_{i,j}。\\ i\in[1,2708],j\in[1,16] ai,1x1wj+...+ai,2708x2708wjhi,ji[1,2708],j[1,16]

H 1 = o u t p u t = [ h 1 , 1 h 1 , 2 ⋯ h 1 , 16 h 2 , 1 h 2 , 2 ⋯ h 2 , 16 ⋮ ⋮ ⋱ ⋮ h 2708 , 1 h 2708 , 2 ⋯ h 2708 , 16 ] ( 第 二 层 的 输 入 ) H^1=output=\\ \begin{bmatrix} {h_{1,1}}&{h_{1,2}}&{\cdots}&{h_{1,16}}\\ {h_{2,1}}&{h_{2,2}}&{\cdots}&{h_{2,16}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}}&{h_{2708,2}}&{\cdots}&{h_{2708,16}}\\ \end{bmatrix}\\ (第二层的输入) H1=output=h1,1h2,1h2708,1h1,2h2,2h2708,2h1,16h2,16h2708,16()

W 1 = [ w 1 , 1 w 1 , 2 ⋯ w 1 , 7 w 2 , 1 w 2 , 2 ⋯ w 2 , 7 ⋮ ⋮ ⋱ ⋮ w 16 , 1 w 16 , 2 ⋯ w 16 , 7 ] ( 第 二 层 的 权 重 矩 阵 ) W^{1}= \begin{bmatrix} {w_{1,1}}&{w_{1,2}}&{\cdots}&{w_{1,7}}\\ {w_{2,1}}&{w_{2,2}}&{\cdots}&{w_{2,7}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {w_{16,1}}&{w_{16,2}}&{\cdots}&{w_{16,7}}\\ \end{bmatrix}\\ (第二层的权重矩阵) W1=w1,1w2,1w16,1w1,2w2,2w16,2w1,7w2,7w16,7()

A ^ = [ a 1 , 1 a 1 , 2 ⋯ a 1 , 2708 a 2 , 1 a 2 , 2 ⋯ a 2 , 2708 ⋮ ⋮ ⋱ ⋮ a 2708 , 1 a 2708 , 2 ⋯ a 2708 , 2708 ] ( c o r a 数 据 集 对 应 的 归 一 化 的 对 称 矩 阵 ) \widehat{A} = \begin{bmatrix} {a_{1,1}}&{a_{1,2}}&{\cdots}&{a_{1,2708}}\\ {a_{2,1}}&{a_{2,2}}&{\cdots}&{a_{2,2708}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}}&{a_{2708,2}}&{\cdots}&{a_{2708,2708}}\\ \end{bmatrix}\\ (cora数据集对应的归一化的对称矩阵) A =a1,1a2,1a2708,1a1,2a2,2a2708,2a1,2708a2,2708a2708,2708(cora)

s u p p o r t = H 1 W 1 = [ h 1 , 1 h 1 , 2 ⋯ h 1 , 16 h 2 , 1 h 2 , 2 ⋯ h 2 , 16 ⋮ ⋮ ⋱ ⋮ h 2708 , 1 h 2708 , 2 ⋯ h 2708 , 16 ] ∗ [ w 1 , 1 w 1 , 2 ⋯ w 1 , 7 w 2 , 1 w 2 , 2 ⋯ w 2 , 7 ⋮ ⋮ ⋱ ⋮ w 16 , 1 w 16 , 2 ⋯ w 16 , 7 ] support = H^1W^{1}=\begin{bmatrix} {h_{1,1}}&{h_{1,2}}&{\cdots}&{h_{1,16}}\\ {h_{2,1}}&{h_{2,2}}&{\cdots}&{h_{2,16}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}}&{h_{2708,2}}&{\cdots}&{h_{2708,16}}\\ \end{bmatrix} *\begin{bmatrix} {w_{1,1}}&{w_{1,2}}&{\cdots}&{w_{1,7}}\\ {w_{2,1}}&{w_{2,2}}&{\cdots}&{w_{2,7}}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}\\ {w_{16,1}}&{w_{16,2}}&{\cdots}&{w_{16,7}}\\ \end{bmatrix} support=H1W1=h1,1h2,1h2708,1h1,2h2,2h2708,2h1,16h2,16h2708,16w1,1w2,1w16,1w1,2w2,2w16,2w1,7w2,7w16,7

H 1 W 1 = [ h 1 , 1 w 1 , 1 + h 1 , 2 w 2 , 1 + ⋯ + h 1 , 16 w 16 , 1 ⋯ h 1 , 1 w 1 , 7 + h 1 , 2 w 2 , 7 + ⋯ + h 1 , 16 w 16 , 7 h 2 , 1 w 1 , 1 + h 2 , 2 w 2 , 1 + ⋯ + h 2 , 16 w 16 , 1 ⋯ h 2 , 1 w 1 , 7 + h 2 , 2 w 2 , 7 + ⋯ + h 2 , 16 w 16 , 7 ⋮ ⋱ ⋮ h 2708 , 1 w 1 , 1 + h 2708 , 2 w 2 , 1 + ⋯ + h 2708 , 16 w 16 , 1 ⋯ h 2708 , 1 w 1 , 7 + h 2708 , 2 w 2 , 7 + ⋯ + h 2708 , 16 w 16 , 7 ] ( s h a p e = [ 2708 , 7 ] ) H^1W^{1}=\begin{bmatrix} {h_{1,1}w_{1,1}}+{h_{1,2}w_{2,1}}+{\cdots}+{h_{1,16}w_{16,1}}&{\cdots}&{h_{1,1}w_{1,7}}+{h_{1,2}w_{2,7}}+{\cdots}+{h_{1,16}w_{16,7}}\\ {h_{2,1}w_{1,1}}+{h_{2,2}w_{2,1}}+{\cdots}+{h_{2,16}w_{16,1}}&{\cdots}&{h_{2,1}w_{1,7}}+{h_{2,2}w_{2,7}}+{\cdots}+{h_{2,16}w_{16,7}}\\ {\vdots}&{\ddots}&{\vdots}\\ {h_{2708,1}w_{1,1}}+{h_{2708,2}w_{2,1}}+{\cdots}+{h_{2708,16}w_{16,1}}&{\cdots}&{h_{2708,1}w_{1,7}}+{h_{2708,2}w_{2,7}}+{\cdots}+{h_{2708,16}w_{16,7}}\\ \end{bmatrix}\\ (shape=[2708,7]) H1W1=h1,1w1,1+h1,2w2,1++h1,16w16,1h2,1w1,1+h2,2w2,1++h2,16w16,1h2708,1w1,1+h2708,2w2,1++h2708,16w16,1h1,1w1,7+h1,2w2,7++h1,16w16,7h2,1w1,7+h2,2w2,7++h2,16w16,7h2708,1w1,7+h2708,2w2,7++h2708,16w16,7(shape=[2708,7])

将 h i , 1 w 1 , j + h i , 2 w 2 , j + ⋯ + h i , 16 w 16 , j 记 为 h i ⃗ w j ⃗ h i ⃗ = [ h i , 1 , h i , 2 , ⋯ , h i , 16 ] , i ∈ [ 1 , 2708 ] , w j ⃗ = [ w 1 , j , w 2 , j , ⋯ , w 16 , j ] T , j ∈ [ 1 , 7 ] . 将{h_{i,1}w_{1,j}}+{h_{i,2}w_{2,j}}+{\cdots}+{h_{i,16}w_{16,j}}记为\vec{h_{i}}\vec{w_{j}}\\ \vec{h_{i}}=[ h_{i,1},h_{i,2},{\cdots},h_{i,16}], i\in[1,2708], \vec{w_{j}}=[ w_{1,j},w_{2,j},{\cdots},w_{16,j}]^{T}, j\in[1,7]. hi,1w1,j+hi,2w2,j++hi,16w16,jhi wj hi =[hi,1,hi,2,,hi,16],i[1,2708],wj =[w1,j,w2,j,,w16,j]T,j[1,7].

o u t p u t = A ^ ∗ s u p p o r t = [ a 1 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 1 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 1 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 1 , 2708 h 2708 ⃗ w 7 ⃗ a 2 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 2 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 2 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 2 , 2708 h 2708 ⃗ w 7 ⃗ ⋮ ⋱ ⋮ a 2708 , 1 h 1 ⃗ w 1 ⃗ + . . . + a 2708 , 2708 h 2708 ⃗ w 1 ⃗ ⋯ a 2708 , 1 h 1 ⃗ w 7 ⃗ + . . . + a 2708 , 2708 h 2708 ⃗ w 7 ⃗ ] output =\widehat{A} *support=\\ \begin{bmatrix} {a_{1,1}\vec{h_{1}}\vec{w_{1}}+...+a_{1,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{1,1}\vec{h_{1}}\vec{w_{7}}+...+a_{1,2708}\vec{h_{2708}}\vec{w_{7}}}\\ {a_{2,1}\vec{h_{1}}\vec{w_{1}}+...+a_{2,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{2,1}\vec{h_{1}}\vec{w_{7}}+...+a_{2,2708}\vec{h_{2708}}\vec{w_{7}}}\\ {\vdots}&{\ddots}&{\vdots}\\ {a_{2708,1}\vec{h_{1}}\vec{w_{1}}+...+a_{2708,2708}\vec{h_{2708}}\vec{w_{1}}}&{\cdots}&{a_{2708,1}\vec{h_{1}}\vec{w_{7}}+...+a_{2708,2708}\vec{h_{2708}}\vec{w_{7}}}\\ \end{bmatrix}\\ output=A support=a1,1h1 w1 +...+a1,2708h2708 w1 a2,1h1 w1 +...+a2,2708h2708 w1 a2708,1h1 w1 +...+a2708,2708h2708 w1 a1,1h1 w7 +...+a1,2708h2708 w7 a2,1h1 w7 +...+a2,2708h2708 w7 a2708,1h1 w7 +...+a2708,2708h2708 w7

GCN的第一层反向传播

∂ L o s s ∂ w 1 , 1 = ∂ L o s s ∂ l n Z ∗ ∂ l n Z ∂ H 1 ∗ ∂ H 1 ∂ w 1 , 1 \frac{\partial Loss}{\partial w_{1,1}} = \frac{\partial Loss}{\partial lnZ} *\frac{\partial lnZ}{\partial H^1} *\frac{\partial H^1}{\partial w_{1,1}} w1,1Loss=lnZLossH1lnZw1,1H1

X ( 输 入 的 结 点 特 征 ) = [ x ⃗ 1 [ 1433 ] , x ⃗ 2 [ 1433 ] , x ⃗ 3 [ 1433 ] , . . . , x ⃗ 2708 [ 1433 ] ] T X(输入的结点特征) = [\vec{x}_1[1433],\vec{x}_2[1433],\vec{x}_3[1433],...,\vec{x}_{2708}[1433]]^T X()=[x 1[1433],x 2[1433],x 3[1433],...,x 2708[1433]]T

X W 0 = [ x ⃗ 1 ′ [ 16 ] , x ⃗ 2 ′ [ 16 ] , x ⃗ 3 ′ [ 16 ] , . . . , x ⃗ 2708 ′ [ 16 ] ] T XW^0 =[\vec{x}'_1[16],\vec{x}'_2[16],\vec{x}'_3[16],...,\vec{x}'_{2708}[16]]^T XW0=[x 1[16],x 2[16],x 3[16],...,x 2708[16]]T

x ⃗ 1 ′ [ 0 ] = x 1 , 1 w 1 , 1 + x 1 , 2 w 2 , 1 + ⋯ + x 1 , 1433 w 1433 , 1 + b i a s \vec{x}'_1[0] =x_{1,1}w_{1,1}+{x_{1,2}w_{2,1}}+{\cdots}+{x_{1,1433}w_{1433,1}} +bias x 1[0]=x1,1w1,1+x1,2w2,1++x1,1433w1433,1+bias

对称归一化和非对称归一化

A ^ = D ‾ − 1 A ‾ , 其 中 a i j ^ = a i j ‾ d i i ‾ A ‾ 是 原 始 的 矩 阵 A 对 称 化 , 再 加 上 上 对 角 矩 阵 的 结 果 。 \widehat{A} = \overline{D}^{-1}\overline{A},其中\widehat{a_{ij}}=\frac{\overline{a_{ij}}}{\overline{d_{ii}}}\\ \overline{A}是原始的矩阵A对称化,再加上上对角矩阵的结果。 A =D1A,aij =diiaijAA

原 始 的 邻 接 矩 阵 A = [ 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 ] A ~ = [ 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 ] , D ~ = [ 3 0 0 0 0 2 0 0 0 0 2 0 0 0 0 2 ] A ‾ = [ 1 1 1 0 1 1 1 0 1 1 1 1 0 0 1 1 ] , D ‾ = [ 3 0 0 0 0 3 0 0 0 0 4 0 0 0 0 2 ] 原始的邻接矩阵A =\begin{bmatrix} {0}&{1}&{1}&{0}\\ {1}&{0}&{0}&{0}\\ {0}&{1}&{0}&{0}\\ {0}&{0}&{1}&{0}\\ \end{bmatrix}\\ \widetilde{A} =\begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{0}&{0}\\ {0}&{1}&{1}&{0}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix} , \widetilde{D} =\begin{bmatrix} {3}&{0}&{0}&{0}\\ {0}&{2}&{0}&{0}\\ {0}&{0}&{2}&{0}\\ {0}&{0}&{0}&{2}\\ \end{bmatrix}\\ \overline{A} =\begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{1}&{0}\\ {1}&{1}&{1}&{1}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix}, \overline{D} =\begin{bmatrix} {3}&{0}&{0}&{0}\\ {0}&{3}&{0}&{0}\\ {0}&{0}&{4}&{0}\\ {0}&{0}&{0}&{2}\\ \end{bmatrix}\\ A=0100101010010000A =1100111010110001,D =3000020000200002A=1110111011110011,D=3000030000400002

A ^ = D ‾ − 1 A ‾ = [ 1 / 3 1 / 3 1 / 3 0 1 / 3 1 / 3 1 / 3 0 1 / 4 1 / 4 1 / 4 1 / 4 0 0 1 / 2 1 / 2 ] \widehat{A} = \overline{D}^{-1}\overline{A}= \begin{bmatrix} {1/3}&{1/3}&{1/3}&{0}\\ {1/3}&{1/3}&{1/3}&{0}\\ {1/4}&{1/4}&{1/4}&{1/4}\\ {0}&{0}&{1/2}&{1/2}\\ \end{bmatrix} A =D1A=1/31/31/401/31/31/401/31/31/41/2001/41/2

A ^ = D ~ − 1 2 A ~ D ~ − 1 2 , 其 中 a i j ^ = a i j ~ d i i ∗ d j j A ~ = A + I N \widehat{A} = \widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{-\frac{1}{2}},其中\widehat{a_{ij}}=\frac{\widetilde{a_{ij}}}{\sqrt{d_{ii}}*\sqrt{d_{jj}}}\\ \widetilde{A} = A + I_N A =D 21A D 21,aij =dii djj aij A =A+IN

A ^ = D ~ − 1 2 A ~ D ~ − 1 2 = [ 1 3 0 0 0 0 1 2 0 0 0 0 1 2 0 0 0 0 1 2 ] [ 1 1 1 0 1 1 0 0 0 1 1 0 0 0 1 1 ] [ 1 3 0 0 0 0 1 2 0 0 0 0 1 2 0 0 0 0 1 2 ] r e s u l t = [ 1 3 1 6 1 6 0 1 6 1 2 0 0 0 1 2 1 2 0 0 0 1 2 1 2 ] \widehat{A} = \widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{-\frac{1}{2}}= \begin{bmatrix} {\frac{1}{\sqrt{3}}}&{0}&{0}&{0}\\ {0}&{\frac{1}{\sqrt{2}}}&{0}&{0}\\ {0}&{0}&{\frac{1}{\sqrt{2}}}&{0}\\ {0}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ \end{bmatrix} \begin{bmatrix} {1}&{1}&{1}&{0}\\ {1}&{1}&{0}&{0}\\ {0}&{1}&{1}&{0}\\ {0}&{0}&{1}&{1}\\ \end{bmatrix} \begin{bmatrix} {\frac{1}{\sqrt{3}}}&{0}&{0}&{0}\\ {0}&{\frac{1}{\sqrt{2}}}&{0}&{0}\\ {0}&{0}&{\frac{1}{\sqrt{2}}}&{0}\\ {0}&{0}&{0}&{\frac{1}{\sqrt{2}}}\\ \end{bmatrix}\\ result=\begin{bmatrix} {\frac{1}{3}}&{\frac{1}{\sqrt{6}}}&{\frac{1}{\sqrt{6}}}&{0}\\ {\frac{1}{\sqrt{6}}}&{\frac{1}{2}}&{0}&{0}\\ {0}&{\frac{1}{2}}&{\frac{1}{2}}&{0}\\ {0}&{0}&{\frac{1}{2}}&{\frac{1}{2}}\\ \end{bmatrix} A =D 21A D 21=3 100002 100002 100002 111001110101100013 100002 100002 100002 1result=316 1006 1212106 10212100021

全连接层

H ( i ) = α ( X ( i ) W ( i − 1 ) + b i a s ( i ) ) , 1 ⩽ i i 为 隐 藏 层 数 量 ; 其 中 α 是 激 活 函 数 ; X ( i ) 为 第 i 层 输 入 的 矩 阵 表 示 ; W ( i − 1 ) 为 第 i 层 的 权 重 矩 阵 ; b i a s ( i ) 为 第 i 层 的 偏 置 ; X ( i + 1 ) = H ( i ) ; H^{(i)} = \alpha (X^{(i)}W^{(i-1)}+bias^{(i)}), 1\leqslant i \\ i为隐藏层数量;\\ 其中\alpha是激活函数;\\ X^{(i)}为第i层输入的矩阵表示;\\ W^{(i-1)}为第i层的权重矩阵;\\ bias^{(i)}为第i层的偏置;\\ X^{(i+1)}=H^{(i)}; H(i)=α(X(i)W(i1)+bias(i)),1ii;α;X(i)i;W(i1)i;bias(i)i;X(i+1)=H(i);

H ( 1 ) = α ( X W ( 0 ) + b i a s ( 1 ) ) H^{(1)} = \alpha (XW^{(0)}+bias^{(1)}) H(1)=α(XW(0)+bias(1))

X W 0 = [ x 1 , 1 w 1 , 1 + x 1 , 2 w 2 , 1 + ⋯ + x 1 , 1433 w 1433 , 1 ⋯ x 1 , 1 w 1 , 16 + x 1 , 2 w 2 , 16 + ⋯ + x 1 , 1433 w 1433 , 16 x 2 , 1 w 1 , 1 + x 2 , 2 w 2 , 1 + ⋯ + x 2 , 1433 w 1433 , 1 ⋯ x 2 , 1 w 1 , 16 + x 2 , 2 w 2 , 16 + ⋯ + x 2 , 1433 w 1433 , 16 ⋮ ⋱ ⋮ x 2708 , 1 w 1 , 1 + x 2708 , 2 w 2 , 1 + ⋯ + x 2708 , 1433 w 1433 , 1 ⋯ x 2708 , 1 w 1 , 16 + x 2708 , 2 w 2 , 16 + ⋯ + x 2708 , 1433 w 1433 , 16 ] ( s h a p e = [ 2708 , 16 ] ) XW^{0} =\begin{bmatrix} {x_{1,1}w_{1,1}}+{x_{1,2}w_{2,1}}+{\cdots}+{x_{1,1433}w_{1433,1}}&{\cdots}&{x_{1,1}w_{1,16}}+{x_{1,2}w_{2,16}}+{\cdots}+{x_{1,1433}w_{1433,16}}\\ {x_{2,1}w_{1,1}}+{x_{2,2}w_{2,1}}+{\cdots}+{x_{2,1433}w_{1433,1}}&{\cdots}& {x_{2,1}w_{1,16}}+{x_{2,2}w_{2,16}}+{\cdots}+{x_{2,1433}w_{1433,16}}\\ {\vdots}&{\ddots}&{\vdots}\\ {x_{2708,1}w_{1,1}}+{x_{2708,2}w_{2,1}}+{\cdots}+{x_{2708,1433}w_{1433,1}}&{\cdots}& {x_{2708,1}w_{1,16}}+{x_{2708,2}w_{2,16}}+{\cdots}+{x_{2708,1433}w_{1433,16}}\\ \end{bmatrix}\\ (shape=[2708,16]) XW0=x1,1w1,1+x1,2w2,1++x1,1433w1433,1x2,1w1,1+x2,2w2,1++x2,1433w1433,1x2708,1w1,1+x2708,2w2,1++x2708,1433w1433,1x1,1w1,16+x1,2w2,16++x1,1433w1433,16x2,1w1,16+x2,2w2,16++x2,1433w1433,16x2708,1w1,16+x2708,2w2,16++x2708,1433w1433,16(shape=[2708,16])

将 x i , 1 w 1 , j + x i , 2 w 2 , j + ⋯ + x i , 1433 w 1433 , j 记 为 x i w j x i ⃗ = [ x i , 1 , x i , 2 , ⋯ , x i , 1433 ] , i ∈ [ 1 , 2708 ] , w j ⃗ = [ w 1 , j , w 2 , j , ⋯ , w 1433 , j ] T , j ∈ [ 1 , 16 ] , 将{x_{i,1}w_{1,j}}+{x_{i,2}w_{2,j}}+{\cdots}+{x_{i,1433}w_{1433,j}}记为x_{i}w_{j}\\ \vec{x_{i}}=[ x_{i,1},x_{i,2},{\cdots},x_{i,1433}], i\in[1,2708], \vec{w_{j}}=[ w_{1,j},w_{2,j},{\cdots},w_{1433,j}]^{T}, j\in[1,16], xi,1w1,j+xi,2w2,j++xi,1433w1433,jxiwjxi =[xi,1,xi,2,,xi,1433],i[1,2708],wj =[w1,j,w2,j,,w1433,j]T,j[1,16],

b i a s ( 1 ) = [ b i a s 1 , b i a s 2 , . . . , b i a s 16 ] T bias^{(1)}=[bias_1,bias_2,...,bias_{16}]^{T} bias(1)=[bias1,bias2,...,bias16]T

α ( X W 0 + b i a s ( 1 ) ) = [ α ( x 1 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 1 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 1 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 1 ⃗ w 16 ⃗ + b i a s 16 ) α ( x 2 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 2 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 2 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 2 ⃗ w 16 ⃗ + b i a s 16 ) ⋮ ⋮ ⋱ ⋮ ⋮ α ( x 2708 ⃗ w 1 ⃗ + b i a s 1 ) α ( x 2708 ⃗ w 2 ⃗ + b i a s 2 ) ⋯ α ( x 2708 ⃗ w 15 ⃗ + b i a s 15 ) α ( x 2708 ⃗ w 16 ⃗ + b i a s 16 ) ] ( s h a p e = [ 2708 , 16 ] ) \alpha(XW^{0} + bias^{(1)}) =\begin{bmatrix} {\alpha(\vec{x_{1}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{1}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{1}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{1}} \vec{w_{16}}+bias_{16})}\\ {\alpha(\vec{x_{2}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{2}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{2}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{2}} \vec{w_{16}}+bias_{16})}\\ {\vdots}&{\vdots}&{\ddots}&{\vdots}&{\vdots}\\ {\alpha(\vec{x_{2708}} \vec{w_{1}} +bias_1)}&{\alpha(\vec{x_{2708}} \vec{w_{2}}+bias_2)}&{\cdots}&{\alpha(\vec{x_{2708}} \vec{w_{15}}+bias_{15})}&{\alpha(\vec{x_{2708}} \vec{w_{16}}+bias_{16})}\\ \end{bmatrix}\\ (shape=[2708,16]) α(XW0+bias(1))=α(x1 w1 +bias1)α(x2 w1 +bias1)α(x2708 w1 +bias1)α(x1 w2 +bias2)α(x2 w2 +bias2)α(x2708 w2 +bias2)α(x1 w15 +bias15)α(x2 w15 +bias15)α(x2708 w15 +bias15)α(x1 w16 +bias16)α(x2 w16 +bias16)α(x2708 w16 +bias16)(shape=[2708,16])

参考资料

https://pytorch.apachecn.org

PyTorch

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值