Efficient estimation of word representations in vector space

最新推荐文章于 2023-03-06 15:51:39 发布

wxDai2001

最新推荐文章于 2023-03-06 15:51:39 发布

阅读量193

点赞数

分类专栏：坚强人的nlp 文章标签： nlp

本文链接：https://blog.csdn.net/ClearSSS/article/details/109134063

版权

坚强人的nlp 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Sharp tools make good work.

工欲善其事，必先利其器

Today I’ll explore word vectors presented by Mikolov et al. in the paper - “Efficient estimation of word representations in vector space”. Two novel model architectures for learning vector representations of words are proposed in this paper which significantly improve the quality of word vectors at lower computational cost, and the vectors are measured in a word similarity task using a word offset technique where simple algebraic operations are performed on the word vectors.

In this paper, Mikolov et al. give a short summary of previously proposed model architectures, including the well-known NNLM and RNNLM, and propose two new log-linear models called CBOW and Skip-gram.

The CBOW is similar to the feedforward NNLM, where the non-linear hidden layer is removed and the projection layer is shared for all words, and the objective of this model is to use words from the history and future simultaneously to correctly classify the middle word in the vocabulary. Unlike standard bag-of-words model, it uses continuous distributed representation of the context.

The Skip-gram is similar to CBOW, but instead of predicting the current word based on the context, it tries to maximize classification of a word based on another word in the same sentence. More precisely, each current word is used as an input to a log-linear classifier consisting of continuous projection layer, and the result is used for predicting words within a certain range before and after the current word. Note that increasing the range improves quality of the resulting word vectors, but it also increases the computational cost.

Below the model architecture of two models is shown.
在这里插入图片描述

Below I present my code implementing the Skip-gram and the CBOW.

class Skip_gram(nn.Module):
    def __init__(self):
        super(Skip_gram, self).__init__()
        self.embedding = nn.Embedding(MAX_VOCAB_SIZE, EMBEDDING_SIZE)
        self.embedding.weight.data.uniform_(-INIT_RANGE, INIT_RANGE)

        self.outLayer = nn.Linear(EMBEDDING_SIZE, MAX_VOCAB_SIZE)

    def forward(self, X):
        # X -> B
        embedded = self.embedding(X)  # B x EMBEDDING_SIZE

        output = self.outLayer(embedded)  # B x MAX_VOCAB_SIZE

        return F.softmax(output, -1)


class CBOW(nn.Module):
    def __init__(self):
        super(CBOW, self).__init__()
        self.embedding = nn.Embedding(MAX_VOCAB_SIZE, EMBEDDING_SIZE)
        self.embedding.weight.data.uniform_(-INIT_RANGE, INIT_RANGE)

        self.outLayer = nn.Linear(EMBEDDING_SIZE, MAX_VOCAB_SIZE)

    def forward(self, X):

        # X -> B x 2C
        embedded = self.embedding(X)  # B x 2C x EMBEDDING_SIZE
        embedded = embedded.sum(1) / (2 * C)  # B x EMBEDDING_SIZE

        output = self.outLayer(embedded)  # B x MAX_VOCAB_SIZE
        return F.softmax(output, -1)

To compare the quality of different versions of word vectors, previous papers typically use a table showing example words and their most similar words, and understand them intuitively. Since it has been observed that there can be many different types of similarities between words, for example, word big is similar to bigger in the same sense that small is similar to smaller. Mikolov et al. raise a question about how to find a word that is similar to small in the same sense as biggest is similar to big, and the question can be answered by simply computing vector $X$ = vector(“biggest”) - vector(“big”) + vector(“small”) that is used to search in the vector space to find the word closest to $X$ measured by cosine distance. When the word vectors are well trained, it is possible to find the correct answer using this method.

From my perspective, the most valuable contribution of this paper is that it proposes two novel and computationally efficient model architectures to obtain word vectors of high quality, with the expectation that many existing NLP applications can benefit from the model architectures described above, such as machine translation, information retrieval and question answering systems, and may enable other future applications yet to be invented.

wxDai2001

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
Efficient estimation of word representations in vector space

Sharp tools make good work.工欲善其事，必先利其器Today I’ll explore word vectors presented by Mikolov et al. in the paper - “Efficient estimation of word representations in vector space”. Two novel model architectures for learning vector representations of words.
复制链接

扫一扫