word2vec代码详解(4)

最新推荐文章于 2024-01-07 09:19:49 发布

cy冲鸭

最新推荐文章于 2024-01-07 09:19:49 发布

阅读量403

点赞数

分类专栏：自然语言处理

本文链接：https://blog.csdn.net/weixin_41841797/article/details/84260568

版权

自然语言处理专栏收录该内容

8 篇文章 2 订阅

订阅专栏

# Step 5: Begin training.
num_steps = 100001

with tf.Session(graph=graph) as session:  

  # We must initialize all variables before we use them.
    init.run()
    print('Initialized')

    average_loss = 0
    for step in range(num_steps):
        batch_inputs, batch_labels = generate_batch(batch_size, num_skips,
                                                skip_window)
        feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels}   

    # We perform one update step by evaluating the optimizer op (including it
    # in the list of returned values for session.run()    
    # Feed metadata variable to session for visualizing the graph in TensorBoard.
        _, loss_val = session.run(
            [optimizer, loss],
            feed_dict=feed_dict)
        average_loss += loss_val

        #每2000步，计算一下平均loss
        if step % 2000 == 0:
            if step > 0:
                average_loss /= 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
            print('Average loss at step ', step, ': ', average_loss)
            average_loss = 0

    # Note that this is expensive (~20% slowdown if computed every 500 steps)
        if step % 10000 == 0:
            sim = similarity.eval()
            for i in range(valid_size):
                valid_word = reverse_dictionary[valid_examples[i]]
                top_k = 8  # number of nearest neighbors
                nearest = (-sim[i, :]).argsort()[1:top_k + 1]  
                #argsort()函数是将x中的元素从小到大排列，提取其对应的index(索引)，然后输出到y
                log_str = 'Nearest to %s:' % valid_word
                for k in range(top_k):
                    close_word = reverse_dictionary[nearest[k]]
                    log_str = '%s %s,' % (log_str, close_word)
                print(log_str)
    final_embeddings = normalized_embeddings.eval()

步骤五，开始训练，在每一步训练迭代中，先使用generate_batch生成一个batch的inputs和labels数据，并用它们创建feed_dict。然后使用session.run()执行一次优化器运算（即一次参数更新）和损失计算。