斯坦福大学NLP课程CS224N课第一次作业第三部分（下）

最新推荐文章于 2019-06-14 12:26:49 发布

pvop

最新推荐文章于 2019-06-14 12:26:49 发布

阅读量1k

点赞数

分类专栏： CS224N作业文章标签： SGD 随机梯度下降 CS224N

本文链接：https://blog.csdn.net/qwe1257/article/details/84139109

版权

CS224N作业专栏收录该内容

7 篇文章 5 订阅

订阅专栏

斯坦福大学NLP课程CS224N课第一次作业第三部分（下）

上一篇博客我们实现了word2vec，这一次我们需要实现两个内容，一个是随机梯度下降，一个是使用word2vec训练一个真正的词嵌入任务。

SGD（Stochastic Gradient Descent）随机梯度下降

对于一个损失函数J，我们想通过调节J的参数使得J最小，然而对于机器学习的很多算法，我们大部分不能直接求导数然后求极值点来得到最小值。这事我们对于每一个参数，求J对于它的梯度也就是导师，然后沿着梯度方向调节参数值就可以降低J的大小。
参数更新方法：
$\theta =\theta-\alpha \frac{\partial J(x;\theta)}{\partial \theta}$
其中alpha是学习率，控制每次参数改变的幅度。
我们看一下需要我们实现的函数：

def sgd(f, x0, step, iterations, postprocessing=None, useSaved=False,
        PRINT_EVERY=10):
    """ Stochastic Gradient Descent

    Implement the stochastic gradient descent method in this function.

    Arguments:
    f -- the function to optimize, it should take a single
         argument and yield two outputs, a cost and the gradient
         with respect to the arguments
    x0 -- the initial point to start SGD from
    step -- the step size for SGD
    iterations -- total iterations to run SGD for
    postprocessing -- postprocessing function for the parameters
                      if necessary. In the case of word2vec we will need to
                      normalize the word vectors to have unit length.
    PRINT_EVERY -- specifies how many iterations to output loss

    Return:
    x -- the parameter value after SGD finishes
    """

    # Anneal learning rate every several iterations
    ANNEAL_EVERY = 20000

    if useSaved:
        start_iter, oldx, state = load_saved_params()
        if start_iter > 0:
            x0 = oldx
            step *= 0.5 ** (start_iter / ANNEAL_EVERY)

        if state:
            random.setstate(state)
    else:
        start_iter = 0

    x = x0

    if not postprocessing:
        postprocessing = lambda x: x

    expcost = None

    for iter in xrange(start_iter + 1, iterations + 1):
        # Don't forget to apply the postprocessing after every iteration!
        # You might want to print the progress every few iterations.

        cost = None
        ### YOUR CODE HERE
        raise NotImplementedError
        ### END YOUR CODE

        if iter % PRINT_EVERY == 0:
            if not expcost:
                expcost = cost
            else:
                expcost = .95 * expcost + .05 * cost
            print "iter %d: %f" % (iter, expcost)

        if iter % SAVE_PARAMS_EVERY == 0 and useSaved:
            save_params(iter, x)

        if iter % ANNEAL_EVERY == 0:
            step *= 0.5

    return x

这个程序看起来很长，其实我们需要做的就是利用梯度更新参数即可，其中损失cost和梯度grad我们都可以通过f函数获得，所以我们需要补充的就是三行：

cost,grad = f(x)
x = x-grad*step
x = postprocessing(x)

然后运行q3_sgd.py就可以检验是否正确。
因为q3_run,py需要跑skip-gram模型，所以肯定需要语料，而获取语料的方法就是通过运行get_datasets.sh，然后就会自动下载语料了。
最后的q3_run.py是我们自己开始使用前面我们写的skip-gram模型训练一下word embedding，不需要补充代码，可以直接运行，但是可能因为python2和python3有些小地方不太一样，有些小bug需要自己改一下。我自己解决我遇到的所有bug，但是一些小东西，大家如果有什么bug解决不了，欢迎留言，我会帮你解决的。
有问题可以评论交流，有问必答。
欢迎评论交流，也欢迎关注，会将CS224N的所有作业写成博客的。