Theano tutoria 有关于scan的说明:http://deeplearning.net/software/theano/library/scan.html
首先,两个简单的例子。
第一种:给定循环步数n_steps
计算那个A**k
<span style="font-size:14px;">import theano
import theano.tensor as T
k = T.iscalar("k")
A = T.vector("A")
# Symbolic description of the result
result, updates = theano.scan(fn=lambda prior_result, A: prior_result * A,
outputs_info=T.ones_like(A),
non_sequences=A,
n_steps=k)
# We only care about A**k, but scan has provided us with A**1 through A**k.
# Discard the values that we don't care about. Scan is smart enough to
# notice this and not waste memory saving them.
final_result = result[-1]
# compiled function that returns A**k
power = theano.function(inputs=[A,k], outputs=final_result, updates=updates)
print(power(range(10),2))
print(power(range(10),4))</span>
第二种:多项式加法
<span style="font-size:14px;">import numpy
coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")
max_coefficients_supported = 10000
# Generate the components of the polynomial
components, updates = theano.scan(fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power),
outputs_info=None,
sequences=[coefficients, theano.tensor.arange(max_coefficients_supported)],
non_sequences=x)
# Sum them up
polynomial = components.sum()
# Compile a function
calculate_polynomial = theano.function(inputs=[coefficients, x], outputs=polynomial)
# Test
test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
test_value = 3
print(calculate_polynomial(test_coefficients, test_value))
print(1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2))</span>
然后,既然是用scan倒腾循环,那么记住三个重要的东西:
1. 什么在变
2. 什么不变
3. 每次循环是什么结果
tutorial里面有句总结的话:
The general order of function parameters to fn
is:
对应的,每次循环需要调用一次fn函数,fn函数中的参数顺序就是:
在变的东西,上次的结果,不变的东西
然后,上次结果是什么样子的呢?看outputs_info
2016.12.17
今天被小航问题给蒙住了。
问题:
import os, time
import sys
import timeit
import numpy
import theano
import theano.typed_list
import theano.tensor as T
X = T.fmatrix("X")
emb = theano.shared(name='embeddings',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (50 + 1, 10)).astype('float32'))
wx = theano.shared(name='wx', value=0.2 * numpy.random.uniform(-1.0, 1.0, (10, 5)).astype('float32'))
wh = theano.shared(name='wh',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 5)).astype('float32'))
w = theano.shared(name='w',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 15)).astype('float32'))
bh = theano.shared(name='bh', value=numpy.zeros(5, dtype='float32'))
b = theano.shared(name='b', value=numpy.zeros(15, dtype='float32'))
h0 = theano.shared(name='h0', value=numpy.zeros(5, dtype='float32'))
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, wx) + T.dot(h_tm1, wh) + bh)
s_t = T.nnet.softmax(T.dot(h_t, w) + b)
return [h_t, s_t]
[h, s], _ = theano.scan(fn=recurrence, sequences=X, outputs_info=[h0, None], n_steps=X.shape[0])
fn = theano.function(inputs=[X], outputs=[s], allow_input_downcast=True, on_unused_input='ignore')
输入的X是2d matrix numpy.random.randint(1.,20.,(3,10)).astype('float32'),输出的s的shape。
初步一看,很简单。输入的(3,10)的matrix,经过scan每一步都做隐层表示和softmax输出,进行3步后的输出应该是s-->(3,15)。
但是正确的输出结果是s-->(3,1,15)。他的问题是中间那个维度是什么,怎么加进来的?
怎么就多了一个维度呢?
1. 我记得以前的实验中中间不会加入那个维度的,所以我好奇是否那个隐层表示是否也加了一个维度?
import os, time
import sys
import timeit
import numpy
import theano
import theano.typed_list
import theano.tensor as T
X = T.fmatrix("X")
emb = theano.shared(name='embeddings',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (50 + 1, 10)).astype('float32'))
wx = theano.shared(name='wx', value=0.2 * numpy.random.uniform(-1.0, 1.0, (10, 5)).astype('float32'))
wh = theano.shared(name='wh',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 5)).astype('float32'))
w = theano.shared(name='w',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 15)).astype('float32'))
bh = theano.shared(name='bh', value=numpy.zeros(5, dtype='float32'))
b = theano.shared(name='b', value=numpy.zeros(15, dtype='float32'))
h0 = theano.shared(name='h0', value=numpy.zeros(5, dtype='float32'))
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, wx) + T.dot(h_tm1, wh) + bh)
s_t = T.nnet.softmax(T.dot(h_t, w) + b)
return [h_t, s_t]
[h, s], _ = theano.scan(fn=recurrence, sequences=X, outputs_info=[h0, None], n_steps=X.shape[0])
fn = theano.function(inputs=[X], outputs=[h,s], allow_input_downcast=True, on_unused_input='ignore')
结果发现,h-->(3,5), s-->(3,1,15)。
隐层没有加入那个维度,所以应该不是scan自己加入的,我自己理解的scan没有问题。
2. softmax的问题?
import os, time
import sys
import timeit
import numpy
import theano
import theano.typed_list
import theano.tensor as T
X = T.fmatrix("X")
emb = theano.shared(name='embeddings',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (50 + 1, 10)).astype('float32'))
wx = theano.shared(name='wx', value=0.2 * numpy.random.uniform(-1.0, 1.0, (10, 5)).astype('float32'))
wh = theano.shared(name='wh',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 5)).astype('float32'))
w = theano.shared(name='w',
value=0.2 * numpy.random.uniform(-1.0, 1.0, (5, 15)).astype('float32'))
bh = theano.shared(name='bh', value=numpy.zeros(5, dtype='float32'))
b = theano.shared(name='b', value=numpy.zeros(15, dtype='float32'))
h0 = theano.shared(name='h0', value=numpy.zeros(5, dtype='float32'))
def recurrence(x_t, h_tm1):
h_t = T.nnet.sigmoid(T.dot(x_t, wx) + T.dot(h_tm1, wh) + bh)
s_t = T.nnet.sigmoid(T.dot(h_t, w) + b)
return [h_t, s_t]
[h, s], _ = theano.scan(fn=recurrence, sequences=X, outputs_info=[h0, None], n_steps=X.shape[0])
fn = theano.function(inputs=[X], outputs=[h,s], allow_input_downcast=True, on_unused_input='ignore')
T.nnet.sigmoid(T.dot(h_t, w) + b)
3. 继续查看theano的softmax函数 http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html
发现softmax的return是
Returns: | a symbolic 2D tensor |
---|
exp1
X = T.fvector("X")
out=T.nnet.softmax(X)
fn = theano.function(inputs=[X], outputs=[out], allow_input_downcast=True, on_unused_input='ignore')
b = numpy.zeros(15, dtype='float32')
c = fn(b)[0]
print numpy.shape(b)
print numpy.shape(c)
实验结果 b-->(15,) c-->(1,15)
exp2
X = T.fmatrix("X")
out=T.nnet.softmax(X)
fn = theano.function(inputs=[X], outputs=[out], allow_input_downcast=True, on_unused_input='ignore')
b = numpy.zeros([2,5], dtype='float32')
c = fn(b)[0]
print numpy.shape(b)
print numpy.shape(c)
实验结果 b-->(2,5) c-->(2,5)
果然,是softmax的问题。
softmax介绍中说The softmax function will, when applied to a matrix, compute the softmax values row-wise.
softmax会对每行进行softmax计算。
如果我们softmax的输入时类似exp1的vector-->(n,),softmax函数会把他认为是一行然后进行softmax计算并返回一个(1, n)。