# Tensorflow Day17 Sparse Autoencoder

## 今日目標

• 了解 Sparse Autoencoder
• 了解 KL divergence & L2 loss
• 實作 Sparse Autoencoder

Github Ipython Notebook 好讀完整版

• Sparsity Regularization
• L2 Regularization

## Sparsity Regularization

$$\hat{\rho{i}} = \frac{1}{n} \sum{j = 1}^{n} h(w{i}^{T} x{j} + b{i}) \ \hat{\rho {i}} : \text{ average output activation value of a neuron i} \ n: \text{ total number of training examples} \ x{j}: \text{jth training example} \ w {i}^{T}: \text{ith row of the weight matrix W} \ b_{i}: \text{ith entropy of the bias vector} \$$

### Kullback-Leibler divergence (relative entropy)

$$\Omega{sparsity} = \sum{i=1}^{D}\rho\log(\frac{\rho}{\hat{\rho{i}}})+(1-\rho)\log(\frac{1-\rho}{1-\hat{\rho{i}}}) \ \hat{\rho_{i}} : \text{ average output activation value of a neuron i}$$

Kullback-Leibler divergence 是用來計算兩個機率分佈接近的程度，如果兩個一樣的話就為 0．我們可以看以下的例子，設定值 rho_hat 為 0.2，而 rho 等於 0.2 的時候 kl_div = 0，rho 等於其他值時 kl_div 大於 0．

 12345678910111213 %matplotlib inlineimport numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltfrom tensorflow.examples.tutorials.mnist import input_datamnist = input_data.read_data_sets("MNIST_data/", one_hot = True)rho_hat = np.linspace(0 + 1e-2, 1 - 1e-2, 100)rho = 0.2kl_div = rho * np.log(rho/rho_hat) + (1 - rho) * np.log((1 - rho) / (1 - rho_hat))plt.plot(rho_hat, kl_div)plt.xlabel("rho_hat")plt.ylabel("kl_div")

### L2 Regularization

$$\Omega{weights} = \frac{1}{2}\sum{l}^{L}\sum{j}^{n}\sum{i}^{k}(w_{ji}^{(l)})^{2} \ L : \text{number of the hidden layers} \ n : \text{number of observations} \ k : \text{number of variables in training data}$$

### cost 函數

cost 函數就是把這幾項全部加起來，來 minimize 它．

$$E = \Omega{mse} + \beta * \Omega{sparsity} + \lambda * \Omega_{weights}$$

## 實作

### Normal Autoencoder

 123456789101112131415161718 def build_sae(): W_e_1 = weight_variable([784, 300], "w_e_1") b_e_1 = bias_variable([300], "b_e_1") h_e_1 = tf.nn.sigmoid(tf.add(tf.matmul(x, W_e_1), b_e_1)) W_e_2 = weight_variable([300, 30], "w_e_2") b_e_2 = bias_variable([30], "b_e_2") h_e_2 = tf.nn.sigmoid(tf.add(tf.matmul(h_e_1, W_e_2), b_e_2)) W_d_1 = weight_variable([30, 300], "w_d_1") b_d_1 = bias_variable([300], "b_d_1") h_d_1 = tf.nn.sigmoid(tf.add(tf.matmul(h_e_2, W_d_1), b_d_1)) W_d_2 = weight_variable([300, 784], "w_d_2") b_d_2 = bias_variable([784], "b_d_2") h_d_2 = tf.nn.sigmoid(tf.add(tf.matmul(h_d_1, W_d_2), b_d_2)) return [h_e_1, h_e_2], [W_e_1, W_e_2, W_d_1, W_d_2], h_d_2
 123456789101112131415161718 tf.reset_default_graph()sess = tf.InteractiveSession()x = tf.placeholder(tf.float32, shape = [None, 784])h, w, x_reconstruct = build_sae()loss = tf.reduce_mean(tf.pow(x_reconstruct - x, 2))optimizer = tf.train.AdamOptimizer(0.01).minimize(loss)init_op = tf.global_variables_initializer()sess.run(init_op)for i in range(20000): batch = mnist.train.next_batch(60) if i%100 == 0: print("step %d, loss %g"%(i, loss.eval(feed_dict={x:batch[0]}))) optimizer.run(feed_dict={x: batch[0]}) print("final loss %g" % loss.eval(feed_dict={x: mnist.test.images}))
step 0, loss 0.259796
step 100, loss 0.0712686
step 200, loss 0.056199
step 300, loss 0.0586076
step 400, loss 0.0488305
step 500, loss 0.0377571
step 600, loss 0.0372789
step 700, loss 0.0319157
step 800, loss 0.0314859
step 900, loss 0.0278508
step 1000, loss 0.0256422
step 1100, loss 0.0272346
step 1200, loss 0.0241254
step 1300, loss 0.023016
step 1400, loss 0.0212343
step 1500, loss 0.0179811
step 2000, loss 0.0155893
step 3000, loss 0.0145139
step 4000, loss 0.0117702
step 5000, loss 0.0119975
step 6000, loss 0.0106937
step 7000, loss 0.0113036
step 8000, loss 0.00997475
step 9000, loss 0.0116126
step 10000, loss 0.0104301
step 11000, loss 0.00969182
step 12000, loss 0.00969755
step 13000, loss 0.0104931
step 14000, loss 0.00950653
step 15000, loss 0.00963279
step 16000, loss 0.0098329
step 17000, loss 0.00817896
step 18000, loss 0.00903721
step 19000, loss 0.00828982
final loss 0.00885361


#### average output activation value

 12 for h_i in h: print("average output activation value %g" % tf.reduce_mean(h_i).eval(feed_dict={x: mnist.test.images}))
average output activation value 0.191295
average output activation value 0.378384


## Sparse Autoencoder

### KL divergence function

 12345678 def kl_div(rho, rho_hat): invrho = tf.sub(tf.constant(1.), rho) invrhohat = tf.sub(tf.constant(1.), rho_hat) logrho = tf.add(logfunc(rho,rho_hat), logfunc(invrho, invrhohat)) return logrho def logfunc(x, x2): return tf.mul( x, tf.log(tf.div(x,x2)))

### loss function

 123456789101112131415161718192021222324 tf.reset_default_graph()sess = tf.InteractiveSession()x = tf.placeholder(tf.float32, shape = [None, 784])h, w, x_reconstruct = build_sae()alpha = 5e-6beta = 7.5e-5kl_div_loss = reduce(lambda x, y: x + y, map(lambda x: tf.reduce_sum(kl_div(0.02, tf.reduce_mean(x,0))), h))#kl_div_loss = tf.reduce_sum(kl_div(0.02, tf.reduce_mean(h[0],0)))l2_loss = reduce(lambda x, y: x + y, map(lambda x: tf.nn.l2_loss(x), w))loss = tf.reduce_mean(tf.pow(x_reconstruct - x, 2)) + alpha * l2_loss + beta * kl_div_lossoptimizer = tf.train.AdamOptimizer(0.01).minimize(loss)init_op = tf.global_variables_initializer()sess.run(init_op)for i in range(20000): batch = mnist.train.next_batch(60) if i%100 == 0: print("step %d, loss %g"%(i, loss.eval(feed_dict={x:batch[0]}))) optimizer.run(feed_dict={x: batch[0]}) print("final loss %g" % loss.eval(feed_dict={x: mnist.test.images}))
step 0, loss 0.283789
step 100, loss 0.0673799
step 200, loss 0.061653
step 300, loss 0.0575306
step 400, loss 0.0549822
step 500, loss 0.0485821
step 600, loss 0.0470816
step 700, loss 0.0441757
step 800, loss 0.042368
step 900, loss 0.0441069
step 1000, loss 0.0419031
step 1100, loss 0.0435174
step 1200, loss 0.0414619
step 1300, loss 0.0423286
step 1400, loss 0.0394959
step 1500, loss 0.0423292
step 2000, loss 0.0399037
step 3000, loss 0.0394368
step 4000, loss 0.0379597
step 5000, loss 0.035319
step 6000, loss 0.0351442
step 7000, loss 0.0376415
step 8000, loss 0.0366516
step 9000, loss 0.0382368
step 10000, loss 0.0357169
step 11000, loss 0.0366914
step 12000, loss 0.0382858
step 13000, loss 0.0349964
step 14000, loss 0.0370025
step 15000, loss 0.036228
step 16000, loss 0.0367592
step 17000, loss 0.0356757
step 18000, loss 0.0369231
step 19000, loss 0.0345381
final loss 0.0355583


#### average output activation value

 12 for h_i in h: print("average output activation value %g" % tf.reduce_mean(h_i).eval(feed_dict={x: mnist.test.images}))
average output activation value 0.0529726
average output activation value 0.398633


## 今日心得

#### 問題

• 如果改用 L1 loss 的結果?
• 有沒有更好的方法來決定 hyperparameter?
• 這裡的 activation function 都是 sigmoid，如果用 ReLU?

• 广告
• 抄袭
• 版权
• 政治
• 色情
• 无意义
• 其他

120