tensorflow-- ＞论文代码阅读总结

最新推荐文章于 2024-06-25 08:59:26 发布

村头陶员外

最新推荐文章于 2024-06-25 08:59:26 发布

阅读量807

点赞数

分类专栏：深度学习-神经网络文章标签： tensorflow 张量变换代码总结

本文链接：https://blog.csdn.net/Mr_tyting/article/details/80148019

版权

深度学习-神经网络专栏收录该内容

17 篇文章 1 订阅

订阅专栏

张量介绍

在 $t e n s o r f l o w$ 中我们要处理的张量通常有二维，三维，甚至四维，那么该如何判断给定张量的维度呢？以及各个维度上的大小呢？

简单来说，从给定张量的最外层开始，判断次外层相同等级的括号，有几对，如有 $n$ 对，则 $n$ 为该张量第一个维度上的大小，然后依次同理向内层。

例如 $[\ [1, 2, 3], \ [4, 5, 6]]$ 的 $s h a p e$ 为 $(2, 3)$ ； $[\ [\ [1, 2, 3], \ [4, 5, 6]\ ]\ ]$ 的 $s h a p e$ 为 $(1, 2, 3)$

$t f . s l i c e 函数$

该函数在 $t e n s o r f l o w$ 官方网站tf.slice有详细的介绍，但是个人感觉不太直观。

直接举例来说：

t = tf.constant([[[1, 1, 1], [2, 2, 2]],
                 [[3, 3, 3], [4, 4, 4]],
                 [[5, 5, 5], [6, 6, 6]]])
# 这里t的shape为(3, 2, 3)
#tf.slice:第一个参数为input张量，第二个参数可以理解为坐标，该坐标可以定位到input张量中某一个元素作为起始位置元素，第三个参数可以理解为从起始位置元素开始，取各维度上size个元素，形成新的张量，该张量shape即为第二参数。

tf.slice(t, [1, 0, 0], [1, 1, 3])  # [[[3, 3, 3]]]
#我们看上面这个例子，第二个参数[1, 0, 0]，其中t的第一个维度大小为3,
#t[0, :, :] = [[1, 1, 1], [2, 2, 2]]
#t[1, :, :] = [[3, 3, 3], [4, 4, 4]]
#t[2, :, :] = [[5, 5, 5], [6, 6, 6]]
那么t[1, 0, :] = [[3, 3, 3]]; t[1, 0, 0] = 3
那么起始位置元素为3，再看第二个参数[1, 1, 3]，首先看第一个维度为１，即在维度１方向上选取大小为１的元素（注意是以３为起始位置，并且这里指的元素不是指一个数字，而是以维度为单位，第一个维度大小为３），这时可以确定选定的为[[3, 3, 3], [4, 4, 4]]，然后第二个维度上选取的大小也为１（第二个维度大小为２，只选取以３为起始位置的第一个），可以在细的得出选定的为[[3, 3, 3]]，再看第三个维度上选取的大小为３，则就是[[3, 3, 3]]]

以下同理
tf.slice(t, [1, 0, 0], [1, 2, 3])  # [[[3, 3, 3],
                                   #   [4, 4, 4]]]
tf.slice(t, [1, 0, 0], [2, 1, 3])  # [[[3, 3, 3]],
                                   #  [[5, 5, 5]]]

$t f . t i l e 函数$

$t e n s o r f l o w$ 官网 $A P I$ tf.title

tile(
    input,#任意维度的tensor
    multiples,#一维的张量，其长度必须和input的维度个数相等
    name=None
)

很简单的一个函数，就是将input重复multiples次
举例来说：
a = tf.constant([[1, 2, 3,], [4, 5, 6]])#shape为(2, 3)
b = tf.constant([2, 3])# 表示在第一个维度上重复２次，在第二个维度上重复３次
执行tf.tile(a, b)后
其结果如下：
[[1 2 3 1 2 3 1 2 3]
 [4 5 6 4 5 6 4 5 6]
 [1 2 3 1 2 3 1 2 3]
 [4 5 6 4 5 6 4 5 6]]

同理以下例子：
a = tf.constant([[1, 2, 3,], [4, 5, 6]])#shape为(2, 3)
b = tf.constant([2, 2])
执行tf.tile(a, b)后，结果如下：
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]
 [1 2 3 1 2 3]
 [4 5 6 4 5 6]]

sess.run与eval区别

这是一个很简单的问题

If you have a Tensor t, calling t.eval() is equivalent to calling tf.get_default_session().run(t).

The most important difference is that you can use sess.run() to fetch the values of many tensors in the same step:

t = tf.constant(42.0)
u = tf.constant(37.0)
tu = tf.mul(t, u)
ut = tf.mul(u, t)
with sess.as_default():
   tu.eval()  # runs one step
   ut.eval()  # runs one step
   sess.run([tu, ut])  # evaluates both tensors in a single step

tf.Variable与tf.get_variable()区别

使用 $t f . V a r i a b l e$ 时，如果检测到命名冲突，系统会自己处理。使用 $tf.get\_variable()$ 时，系统不会处理冲突，而会报错。

由此需要共享变量的时候，需要使用 $tf.get\_variable()$ 。在不同的 $variable\_scope$ 中，可以定义相同的变量，会自动在变量前面加上不同 $variable\_scope\ name$ 加以区别，推荐使用 $tf.get\_variable$

tf.variable_scope与tf.name_scope 区别

由上面的 $t f . v a r i a b l e$ 与 $tf.get\_variable$ 的区别，可以延伸到这个问题，
在 tf.name_scope下时，tf.get_variable()创建的变量名不受 name_scope 的影响，而且在未指定共享变量时，如果重名会报错，tf.Variable()会自动检测有没有变量重名，如果有则会自行处理。

import tensorflow as tf

with tf.name_scope('name_scope_x'):
    var1 = tf.get_variable(name='var1', shape=[1], dtype=tf.float32)
    var3 = tf.Variable(name='var2', initial_value=[2], dtype=tf.float32)
    var4 = tf.Variable(name='var2', initial_value=[2], dtype=tf.float32)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(var1.name, sess.run(var1))
    print(var3.name, sess.run(var3))
    print(var4.name, sess.run(var4))
# 输出结果：
# var1:0 [-0.30036557]   可以看到前面不含有指定的'name_scope_x'
# name_scope_x/var2:0 [ 2.]
# name_scope_x/var2_1:0 [ 2.]  可以看到变量名自行变成了'var2_1'，避免了和'var2'冲突

如果使用 $tf.get\_variable()$ 创建变量，且没有设置共享变量，重名时会报错

import tensorflow as tf

with tf.name_scope('name_scope_1'):
    var1 = tf.get_variable(name='var1', shape=[1], dtype=tf.float32)
    var2 = tf.get_variable(name='var1', shape=[1], dtype=tf.float32)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(var1.name, sess.run(var1))
    print(var2.name, sess.run(var2))

# ValueError: Variable var1 already exists, disallowed. Did you mean 
# to set reuse=True in VarScope? Originally defined at:
# var1 = tf.get_variable(name='var1', shape=[1], dtype=tf.float32)

所以要共享变量，需要使用 $tf.variable\_scope()$

import tensorflow as tf

with tf.variable_scope('variable_scope_y') as scope:
    var1 = tf.get_variable(name='var1', shape=[1], dtype=tf.float32)
    scope.reuse_variables()  # 设置共享变量
    var1_reuse = tf.get_variable(name='var1')
    var2 = tf.Variable(initial_value=[2.], name='var2', dtype=tf.float32)
    var2_reuse = tf.Variable(initial_value=[2.], name='var2', dtype=tf.float32)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(var1.name, sess.run(var1))
    print(var1_reuse.name, sess.run(var1_reuse))
    print(var2.name, sess.run(var2))
    print(var2_reuse.name, sess.run(var2_reuse))
# 输出结果：
# variable_scope_y/var1:0 [-1.59682846]
# variable_scope_y/var1:0 [-1.59682846]   可以看到变量var1_reuse重复使用了var1
# variable_scope_y/var2:0 [ 2.]
# variable_scope_y/var2_1:0 [ 2.]

也可以这样

with tf.variable_scope('foo') as foo_scope:
    v = tf.get_variable('v', [1])
with tf.variable_scope('foo', reuse=True):
    v1 = tf.get_variable('v')
assert v1 == v

在训练深度网络时，为了减少需要训练参数的个数（比如具有 $s i m a s e$ 结构的 $L S T M$ 模型）、或是多机多卡并行化训练大数据大模型（比如数据并行化）等情况时，往往需要共享变量。另外一方面是当一个深度学习模型变得非常复杂的时候，往往存在大量的变量和操作，如何避免这些变量名和操作名的唯一不重复，同时维护一个条理清晰的graph非常重要。因此， $t e n s o r f l o w$ 中用 $tf.Variable()，tf.get\_variable()，tf.Variable\_scope()，tf.name\_scope()$ 几个函数来实现：

tf.Variable(<variable_name>)，tf.get_variable(<variable_name>)的作用与区别：tf.Variable(<variable_name>)会自动检测命名冲突并自行处理，但tf.get_variable(<variable_name>)则遇到重名的变量创建且变量名没有设置为共享变量时，则会报错。

tf.Variable(<variable_name>)和tf.get_variable(<variable_name>)都是用于在一个name_scope下面获取或创建一个变量的两种方式，区别在于：tf.Variable(<variable_name>)用于创建一个新变量，在同一个name_scope下面，可以创建相同名字的变量，底层实现会自动引入别名机制，两次调用产生了其实是两个不同的变量。

tf.get_variable(<variable_name>)用于获取一个变量，并且不受name_scope的约束。当这个变量已经存在时，则自动获取；如果不存在，则自动创建一个变量。

tf.name_scope(<scope_name>)与tf.variable_scope(<scope_name>)：tf.name_scope(<scope_name>)主要用于管理一个图里面的各种op，返回的是一个以scope_name命名的context manager。一个graph会维护一个name_space的堆，每一个namespace下面可以定义各种op或者子namespace，实现一种层次化有条理的管理，避免各个op之间命名冲突。tf.variable_scope(<scope_name>)：一般与tf.name_scope()配合使用，用于管理一个graph中变量的名字，避免变量之间的命名冲突，tf.variable_scope(<scope_name>)允许在一个variable_scope下面共享变量。

需要注意的是：创建一个新的variable_scope时不需要把reuse属性设置未False，只需要在使用的时候设置为True就可以了。

sparse_softmax_cross_entropy_with_logits 与 softmax_cross_entropy_with_logits 区别

The difference is simple:

 1. For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
 2. For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.
 
Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.

tf.train.exponential_decay

tf.train.exponential_decay(
    learning_rate,
    global_step,
    decay_steps,
    decay_rate,
    staircase=False,
    name=None
)

当我们训练模型时，我们希望随着训练步数的增加，学习率会越来越小，通俗的说，就是希望训练开始快点到最优点附近，然后再慢慢的渐近最优点，防止由于学习率过高一下越过了最优点。恩，就这么简单。

这个方法的计算公式：
decayed_learning_rate = learning_rate *
                        decay_rate ^ (global_step / decay_steps)

tf.add_to_collection，tf.get_collection和tf.add_n的用法

tf.add_to_collection：把变量放入一个集合，把很多变量变成一个列表

tf.get_collection：从一个结合中取出全部变量，是一个列表

tf.add_n：把一个列表的东西都依次加起来

tf.sparse_tensor_dense

将多维的 $S p a r s e T e n s o r$ 转换成 $d e n s e T e n s o r$

tf.sparse_tensor_to_dense(
    sp_input,## sparseTensor
    default_value=0,
    validate_indices=True,
    name=None
)

直接看例子：

import tensorflow as tf


sparseTensor = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4]) 
denseTensor = tf.sparse_tensor_to_dense(sparseTensor, 0)

with tf.Session() as sess:
    print (sess.run(denseTensor))


输出：
[[1 0 0 0]
 [0 0 2 0]
 [0 0 0 0]]

tf.sparse_to_dense

这个方法就比较厉害了，在自然语言处理里面可以用来快速的生成 $m a s k$ 矩阵。

tf.sparse_to_dense(
    sparse_indices,##　这个可以类似是sparseTensor里面的indices,也就是非零元素的位置索引。
    output_shape,
    sparse_values,
    default_value=0,
    validate_indices=True,
    name=None
)

直接举例来说：

import tensorflow as tf
import numpy as np
batch_size = 10


Y = tf.placeholder("int32", [None, 1]) 
# 如果你的batch_size不是已知的，也可以用batch_size = tf.shape(Y)[0]来获得
reshape = tf.reshape(tf.range(0,batch_size,1),[batch_size,1])

tmp = tf.concat([reshape, Y], axis=1)
onehotY = tf.sparse_to_dense(tf.concat([reshape,Y],axis=1),[batch_size,10],1.0,0.0)

with tf.Session() as sess:
    y = np.arange(10).reshape((10,1))
    print sess.run(reshape)
    print sess.run(tf.concat([reshape, Y], axis=1), feed_dict={Y: y}) 
    print (sess.run(onehotY, feed_dict={Y: y}))

输出：
[[0]
 [1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]
 [9]]
[[0 0]##生成稀疏矩阵的非零元素的索引
 [1 1]
 [2 2]
 [3 3]
 [4 4]
 [5 5]
 [6 6]
 [7 7]
 [8 8]
 [9 9]]
[[1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

这个方法可以很有效率的生成 $o n e h o t$ 矩阵。需要强调下 $sparse\_indices$ 参数：

一个值
一个一维向量
这两种情况下，返回值是一个一维向量， $output\_shape$ 也只能是一个长度为1的 $l i s t$ ，其唯一元素值代表返回的一维向量的长度，对应 $sparse\_indices$ 中每个整型值，返回的向量中每个值都被置为 $sparse\_values$ ，其他值为 $default\_value$
一个二维矩阵，而且每行两列
这种情况下，返回值是二维矩阵， $output\_shape$ 是一个长度为2的 $l i s t$ ，代表了矩阵的 $s h a p e$ 。对 $sparse\_indices$ 的每一行元素 $[i, j]$ ，返回的二维矩阵 $K$ 的 $K_{ij}$ 被设为 $sparse\_values$

tf.unsorted_segment_sum

这个方法比较厉害

unsorted_segment_sum(
    data,
    segment_ids,
    num_segments,##output的长度
    name=None
)

这里写图片描述

在一些自然语言处理的任务中（例如完形填空，需要得到某个词），模型最后得出词序列的概率分布，这个概率分布就是上图中的 $d a t a$ ，而对应的词 $i d$ 为上图中的 $segment\_ids$ ，那么对应 $i d$ 为０的词概率为 $5 + 1 + 3 = 9$ 。

tf.while_loop

参考tf.while_loop

tf.clip_by_global_norm

tf.clip_by_global_norm(
    t_list,## 由tensor组成的列表。
    clip_norm,##比率
    use_norm=None,##可以事先计算出global_norm
    name=None
)

计算公式：

t_list[i] * clip_norm / max(global_norm, clip_norm)

global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))

根据公式也很好理解，当梯度在反向回传时，我们会计算出每个参数的梯度 $d i f f$ ，也就是上面的 $t\_list$ ，求其在二范式的和等操作得到 $global\_norm$ ，这个 $global\_norm$ 可以用来衡量梯度爆炸或消散的程度，当 $global\_norm>clip\_norm$ ，我们就需要对梯度进行缩放，使其在一定范围内。

###高维张量相乘操作(context_vector计算)
在带有 $a t t e n t i o n$ 机制的 $s e q 2 s e q$ 模型中，常常需要计算 $context\_vector$ ，如下图的 $c^0$ 就是第0时刻的 $context\_vector$ ：

这里写图片描述

上图中的 $[0.5, 0.5, 0.0, 0.0]$ 就是 $attn\_dist$ ， $[\alpha_0^1, \alpha_0^2, \alpha_0^3, \alpha_0^4]$ 为 $encoder\_state$ 。在 $encoder\_state$ 中某个元素例如 $\alpha_0^i$ 为长度为 $attn\_size$ 的张量。故上图中的 $\alpha_0^i * h^i$ 实际是一个广播操作。

需要注意的是，在批量操作中， $attn\_dist$ 的 $s h a p e$ 为 $batch\_size, attn\_len]$ ，其中 $attn\_len$ 为 $e n o c d e r$ 的步长， $encoder\_state$ 的 $s h a p e$ 为 $batch\_size, attn\_len, attn\_size]$ ，这里的 $attn\_size$ 为 $e n c o d e r$ 的 $hidden\_size$ 。对于这两个向量应该如何相乘使得有上图的效果呢？

撇开 $batch\_size$ 不看，只看一个样本，则 $attn\_dist$ 为长度为 $attn\_len$ 的张量， $enocder\_state$ 为 $s h a p e$ 为 $attn\_len, attn\_size]$ ，即 $attn\_len$ 中的每个值的长度为 $attn\_size$ ，则此时 $attn\_dist$ 与 $encoder\_state$ 相乘应是采用广播的形式。

import tensorflow as tf
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import math_ops
import numpy as np

attn_dist = tf.placeholder(tf.float32, [5, 5]) ## shape [batch_size, attn_len]
encoder_states = tf.placeholder(tf.float32, [5, 5, 1, 6])## shape [batch_size, attn_len, attn_size] 通过tf.expand_dims(axis=2)操作后的结果


context_vector = array_ops.reshape(attn_dist, [5, -1, 1, 1]) * encoder_states ## [batch_size, attn_len, 1, 1] * [batch_size, attn_len, 1, attn_size]
## 上面的做的实际上一种广播的乘法操作，两张量的前三维相同，只是最后一维一个为1，一个为attn_size。

final_context_vector = math_ops.reduce_sum(context_vector, [1, 2])## [batch_size, attn_size]

with tf.Session() as sess:
    attn = np.ones((5, 5), dtype=np.float) * 2
    print "====================attn======================="
    print sess.run(array_ops.reshape(attn_dist, [5, -1, 1, 1]), feed_dict={attn_dist: attn})
    print "=============================encode================"
    encode = np.ones((5, 5, 1, 6), dtype=np.float) * 2
    print encode 
    print "====================context_vector========================="
    print sess.run(context_vector, feed_dict={attn_dist: attn, encoder_states: encode})

    print "===============final_context_vector===================="
    print sess.run(final_context_vector, feed_dict={attn_dist: attn, encoder_states: encode})

代码运行结果：

====================attn=======================
[[[[2.]]

  [[2.]]

  [[2.]]

  [[2.]]

  [[2.]]]


 [[[2.]]

  [[2.]]

  [[2.]]

  [[2.]]

  [[2.]]]


 [[[2.]]

  [[2.]]

  [[2.]]

  [[2.]]

  [[2.]]]


 [[[2.]]

  [[2.]]

  [[2.]]

  [[2.]]

  [[2.]]]


 [[[2.]]

  [[2.]]

  [[2.]]

  [[2.]]

  [[2.]]]]
=============================encode================
[[[[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]]


 [[[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]]


 [[[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]]


 [[[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]]


 [[[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]

  [[2. 2. 2. 2. 2. 2.]]]]
====================context_vector=========================
[[[[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]]


 [[[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]]


 [[[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]]


 [[[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]]


 [[[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]

  [[4. 4. 4. 4. 4. 4.]]]]
===============final_context_vector====================
[[20. 20. 20. 20. 20. 20.]
 [20. 20. 20. 20. 20. 20.]
 [20. 20. 20. 20. 20. 20.]
 [20. 20. 20. 20. 20. 20.]
 [20. 20. 20. 20. 20. 20.]]

利用卷积操作实现Context_vector计算

在一些论文中，为了实现 $a t t e n t i o n$ 机制，会将 $decoder\_state$ 加入到 $context\_vector$ 计算中。

$e_i^t=v^Ttanh(w_hh_i+W_ss_t+b)$
这里我们只计算 $e= w_hh_i+W_ss_t$

显然 $h_i$ 为 $encoder\_states$ ，其 $s h a p e$ 为 $batch\_size,\ attn\_len(time\_steps),\ attn\_size\_vec(隐藏神经元个数)]$

$w_h$ 为要学习的参数。
$s_t$ 为某一时刻 $t$ 的 $decoder\_state$ ，故其 $s h a p e$ 为 $batch\_size, \ attn\_size\_vec(隐藏神经元个数)]$

这里面关键的点，怎么实现 $w_hh_i$ 的。

import tensorflow as tf
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn_ops
import numpy as np
import pdb 

encoder_states_inp = tf.placeholder(tf.float32, [5, 5, 6]) ## [batch_size, attn_len(time_steps), attn_size_vec(隐藏神经元个数)]
encoder_states = tf.expand_dims(encoder_states_inp, axis=2) 
## [batch_size, attn_len, 1, attn_size_vec]可以理解为[batch_size张图片，图片长度，图片宽度， 图片通道数]
W_h = tf.get_variable("W_h", [1, 1, 6, 6]) 
## [长，宽，上一层feature_map数，下一层feature_map数]

encoder_states_W = nn_ops.conv2d(encoder_states, W_h, [1, 1, 1, 1], 'SAME') 
##[batch_size, attn_len, 1, attn_size_vec], 因为我们设置上一层通道数和下一层通道数是一样的
## 所以乘上w，shape没有变化, 但是根据卷积的操作已经对每个time_step做了乘积操作。
decoder_states_inp = tf.placeholder(tf.float32, [5, 6]) 
decoder_states = tf.expand_dims(tf.expand_dims(decoder_states_inp, axis=1), axis =2) 

e = encoder_states_W + decoder_states

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    encoder = np.ones((5, 5, 6), dtype=np.float) * 2;
    print "==================encoder_states====================="
    print sess.run(encoder_states_W, feed_dict={encoder_states_inp: encoder}).shape
    decoder = np.ones((5, 6), dtype=np.float) * 2;
    print "-----------------decoder_states-----------------------"
    print sess.run(decoder_states, feed_dict={decoder_states_inp: decoder}).shape
    print "+++++++++++++++++++++++e++++++++++++++++++++++++++++++"
    print sess.run(e, feed_dict={encoder_states_inp: encoder, decoder_states_inp: decoder}).shape

上面代码中的 $w_h$ 为大小 $1 * 1$ 的卷积核，其卷积操作可以理解为全连接操作，具体原因 1*1卷积核意义，这一点就太巧妙了！

$1 * 1$ 的卷积核是做线性变换和全连接的常用方法！！！
后续继续更新

村头陶员外

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tensorflow-- ＞论文代码阅读总结

张量介绍在tensorflowtensorflow中我们要处理的张量通常有二维，三维，甚至四维，那么该如何判断给定张量的维度呢？以及各个维度上的大小呢？简单来说，从给定张量的最外层开始，判断次外层相同等级的括号，有几对，如有nn对，则nn为该张量第一个维度上的大小，然后依次同理向内层。例如[[1,2,3],[4,5,6]][\ [1, 2, 3], \ [4, 5, 6]]的shapeshape
复制链接

扫一扫