记忆增强神经网络(memory-augment neural networks,MANN)也是一种单样本学习,其前身神经图灵机(Neural Turing Machines,NTM),NTM能够在存储器中存储和检索信息,其思想是用外存储器来增强神经网络,NTM不是使用隐藏状态作为存储器,而是使用外存储器来存储和检索信息,其架构图如下图所示
NTM重要组件:
控制器:基本上是前馈神经网络或递归神经网络,对存储器进行读写。
存储器:存储矩阵、存储体或简单地存储器是我们存储信息的地方,存储矩阵基本是由记忆细胞组成的二维矩阵,包括N行和M列,使用控制器从存储器中访问内容。因此,控制器接收来自外部环境的输入,通过与存储矩阵交互发生响应。
读头和写头:是包含存储器地址的指针,必须从存储器中读写。
使用NTM复制任务
复制任务的目标是观察NTM如何存储和回收任意长度的序列,向网络提供一个随机序列以及一个表示序列结束的标记,网络必须学会输出给定的输入序列,因此,它把输入序列存储在存储器中,然后从存储器中读取。
# coding=utf-8
import tensorflow as tf
import numpy as np
import os
import argparse
from PIL import Image
from PIL import ImageOps
import matplotlib.pyplot as plt
class NTMCell():
def __init__(self, rnn_size, memory_size, memory_vector_dim, read_head_num, write_head_num,
addressing_mode='content_and_location', shift_range=1, reuse=False, output_dim=None):
# initialize all the variables
self.rnn_size = rnn_size
self.memory_size = memory_size
self.memory_vector_dim = memory_vector_dim
self.read_head_num = read_head_num
self.write_head_num = write_head_num
self.addressing_mode = addressing_mode
self.reuse = reuse
self.step = 0
self.output_dim = output_dim
self.shift_range = shift_range
# initialize controller as the basic rnn cell
self.controller = tf.nn.rnn_cell.BasicRNNCell(self.rnn_size)
def __call__(self, x, prev_state):
# 用于实现NTM的操作,通过将输入x与之前读取的向量列表结合,获得控制器的输入
prev_read_vector_list = prev_state['read_vector_list']
prev_controller_state = prev_state['controller_state']
controller_input = tf.concat([x] + prev_read_vector_list, axis=1)
# 通过将controller_input和prev_controller_state输入,构建RNN单元控制器
with tf.variable_scope('controller', reuse=self.reuse):
controller_output, controller_state = self.controller(controller_input, prev_controller_state)
# 初始化 read and write heads
num_parameters_per_head = self.memory_vector_dim + 1 + 1 + (self.shift_range * 2 + 1) + 1
num_heads = self.read_head_num + self.write_head_num
total_parameter_num = num_parameters_per_head * num_heads + self.memory_vector_dim * 2 * self.write_head_num
# 初始化权重矩阵及偏移量,并使用前馈操作计算参数
with tf.variable_scope("o2p", reuse=(self.step > 0) or self.reuse):
o2p_w = tf.get_variable('o2p_w', [controller_output.get_shape()[1], total_parameter_num],
initializer=tf.random_normal_initializer(mean=0.0, stddev=0.5))
o2p_b = tf.get_variable('o2p_b', [total_parameter_num],
initializer=tf.random_normal_initializer(mean=0.0, stddev=0.5))
parameters = tf.nn.xw_plus_b(controller_output, o2p_w, o2p_b)
head_parameter_list = tf.split(parameters[:, :num_parameters_per_head * num_heads], num_heads, axis=1)
erase_add_list = tf.split(parameters[:, num_parameters_per_he