elmo调试

最新推荐文章于 2024-08-28 11:58:10 发布

Element简

最新推荐文章于 2024-08-28 11:58:10 发布

阅读量722

点赞数 1

本文链接：https://blog.csdn.net/yanyiting666/article/details/93656937

版权

模型代码主要包括以下几个部分:1.构建word embedding; 2.构建word_char embedding的准备; 3.语言模型介绍(双向lstm模型)。

2.1 构建word embedding
注意：在ELMo语言模型中，无论是word embedding还是word_char embedding都是在模型中一起训练和更新的！
只需要为词汇表中每一个词定义为相同维度的变量即可，然后对输入的word-ids进行embedding_lookup操作，然后再作为输入到语言模型中，随着语言模型一起训练即可。相应的代码如下:

def _build_word_embeddings(self):
    n_tokens_vocab = self.options['n_tokens_vocab']
    batch_size = self.options['batch_size']
    unroll_steps = self.options['unroll_steps']

    # LSTM options
    projection_dim = self.options['lstm']['projection_dim']

    # the input token_ids and word embeddings
    #句子中词ids输入，并预先准备词向量表示
    self.token_ids = tf.placeholder(DTYPE_INT,
                           shape=(batch_size, unroll_steps),
                           name='token_ids')
    # the word embeddings
    #将句子的word_ids表示转化为词向量矩阵
    with tf.device("/cpu:0"):
        self.embedding_weights = tf.get_variable(
            "embedding", [n_tokens_vocab, projection_dim],
            dtype=DTYPE,
        )
        self.embedding = tf.nn.embedding_lookup(self.embedding_weights,
                                            self.token_ids)

    # if a bidirectional LM then make placeholders for reverse
    #如果是双向语言模型，则输入的placeholders则需要反转一下
    # model and embeddings
    #模型和embeddings
    if self.bidirectional:
        self.token_ids_reverse = tf.placeholder(DTYPE_INT,
                           shape=(batch_size, unroll_steps),
                           name='token_ids_reverse')
        with tf.device("/cpu:0"):
            self.embedding_reverse = tf.nn.embedding_lookup(
                self.embedding_weights, self.token_ids_reverse)

2.2 构建word_char_embedding表示
word_char_embedding的构建则相对来说比较复杂，需要使用CNN进行卷积训练得来。具体的实现网络如下两个图所示:

其中第一个图表示的是以一条句子作为训练，实际上是一个batch的句子进行并行训练。单个句子的训练是以[n_token, max_char, char_dim]作为一个训练样本，分别表示句长，单词最大字符长和每个字符维数。我们使用一个大小为[1, n_width, char_dim]的卷积核进行卷积，即高度为1，宽度为n_width，通道数为char_dim的卷积核进行卷积，每次都是一行一行的对n_width大小的字符进行卷积。卷积完之后，我们会形成一个[n_token, max_char-n_width+1]的feature map图，然后我们再对feature map图的每一行进行一个最大池化处理，这样每一个卷积核最终得到[n_token]的数据。我们总共有m=n_filters个卷积核，将每个卷积核的结果拼接起来，最终会形成一个[n_token, m]的数据。

第二个图表示，我们得到了每个词经过m=n_filters个卷积和max pooling形成的feature之后，再通过多层highway网络进行特征筛选处理，最后再通过一个投影层将维度从m投影到p=proj_dim维。highway层和投影层都是可选的。
代码如下所示：

def _build_word_char_embeddings(self):
    '''
    options contains key 'char_cnn': {

    &

最低0.47元/天解锁文章

Element简

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫