PyTorch Exercise: Computing Word Embeddings: Continuous Bag-of-Words

本文链接：https://blog.csdn.net/CrazyBull2012/article/details/79380669

PyTorch Tutorial

PyTorch中，关于训练词向量的练习，描述如下：

The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as pretraining embeddings. It almost always helps performance a couple of percent.

The CBOW model is as follows. Given a target word wi and an N context window on each side, wi−1,…,wi−N and wi+1,…,wi+N, referring to all context words collectively as C, CBOW tries to minimize

- log p (w i | C) = - log Softmax (A (\sum w \in C q w) + b)

where qw is the embedding for word w.

Implement this model in Pytorch by filling in the class below. Some tips:

Think about which parameters you need to define.
Make sure you know what shape each operation expects. Use .view() if you need to reshape.

代码如下：

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

CONTEXT_SIZE = 2  # 2 words to the left, 2 to the right
raw_text = """We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.""".split()

# By deriving a set from `raw_text`, we deduplicate the array
vocab = set(raw_text)
vocab_size = len(vocab)

word_to_ix = {word: i for i, word in enumerate(vocab)}
data = []
for i in range(2, len(raw_text) - 2):
    context = [raw_text[i - 2], raw_text[i - 1],
               raw_text[i + 1], raw_text[i + 2]]
    target = raw_text[i]
    data.append((context, target))
print(data[:5])


class CBOW(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(CBOW,self).__init__() 
        self.embeddings = nn.Embedding(vocab_size, embedding_dim) # embeddings， 待训练参数为embedding词表
        self.linear1 = nn.Linear(embedding_dim, vocab_size) # 待训练参数为 A b


    def forward(self, inputs):
        embeds = self.embeddings(inputs)
        add_embeds = torch.sum(embeds, dim=0).view(1,-1) # 相加后reshape
        out = self.linear1(add_embeds)
        log_probs = F.log_softmax(out)
        return log_probs

# create your model and train.  here are some functions to help you make
# the data ready for use by your module


def make_context_vector(context, word_to_ix):
    idxs = [word_to_ix[w] for w in context]
    tensor = torch.LongTensor(idxs)
    return Variable(tensor)


make_context_vector(data[0][0], word_to_ix)  # example

# 声明loss model optimizer
losses = []
loss_function = nn.NLLLoss()
model = CBOW(vocab_size, embedding_dim=20, context_size=CONTEXT_SIZE)
optimizer = optim.SGD(model.parameters(), lr=0.001)

# 训练10个epoch
for epoch in range(10):
    total_loss = torch.FloatTensor([0])
    for context, target in data:
        context_idxs = [word_to_ix[w] for w in context]
        target_idx = word_to_ix[target]
        context_var = Variable(torch.LongTensor(context_idxs))
        target_var = Variable(torch.LongTensor([target_idx]))
        model.zero_grad()
        log_probs = model(context_var)

        loss = loss_function(log_probs,target_var)
        loss.backward()
        optimizer.step()

        total_loss += loss.data
    losses.append(total_loss)
print(losses)

运行结果：

[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]
[
 260.2805
[torch.FloatTensor of size 1]
, 
 255.0300
[torch.FloatTensor of size 1]
, 
 249.8967
[torch.FloatTensor of size 1]
, 
 244.8781
[torch.FloatTensor of size 1]
, 
 239.9720
[torch.FloatTensor of size 1]
, 
 235.1766
[torch.FloatTensor of size 1]
, 
 230.4900
[torch.FloatTensor of size 1]
, 
 225.9105
[torch.FloatTensor of size 1]
, 
 221.4367
[torch.FloatTensor of size 1]
, 
 217.0672
[torch.FloatTensor of size 1]
]