斯坦福大学NLP课程CS224N课第一次作业第四部分

最新推荐文章于 2024-07-12 00:43:05 发布

pvop

最新推荐文章于 2024-07-12 00:43:05 发布

阅读量914

点赞数 1

分类专栏： CS224N作业文章标签：情感分类 CS224N 作业 python3

本文链接：https://blog.csdn.net/qwe1257/article/details/84139565

版权

CS224N作业专栏收录该内容

7 篇文章 5 订阅

订阅专栏

斯坦福大学NLP课程CS224N课第一次作业第四部分

很开心到了第一次作业的最后一个部分，这次作业是通过词向量生成句向量然后来进行句子的情感分类。
我们要实现的是q4_sentiment.py文件，这个文件看起来很长，但是其实我们要实现的只有三个函数，我们依次看一看。
数据在运行q3_run.py之前已经下载过了。

1. getSentenceFeatures()

def getSentenceFeatures(tokens, wordVectors, sentence):
    """
    Obtain the sentence feature for sentiment analysis by averaging its
    word vectors
    """

    # Implement computation for the sentence features given a sentence.

    # Inputs:
    # tokens -- a dictionary that maps words to their indices in
    #           the word vector list
    # wordVectors -- word vectors (each row) for all tokens
    # sentence -- a list of words in the sentence of interest

    # Output:
    # - sentVector: feature vector for the sentence

    sentVector = np.zeros((wordVectors.shape[1],))

    ### YOUR CODE HERE
    raise NotImplementedError
    ### END YOUR CODE

    assert sentVector.shape == (wordVectors.shape[1],)
    return sentVector

我们通过函数说明可以知道这个函数就是通过句子中的词的词向量的平均得到句向量，虽然这种方法简单粗暴，但是效果还可以，我们看一下具体实现：

sentVector = np.zeros((wordVectors.shape[1],))
num_words = len(sentence)
for word in sentence:
    sentVector+=wordVectors[tokens[word]]/num_words

2. getRegularizationValues()

我们可以看下面使用的模型是sklearn的LogisticRegression，而LogisticRegression有一个参数C就是正则化参数，这里正则化参数是一个超参数，程序的目的就是通过尝试各种正则化参数来找出最优的一个：

def getRegularizationValues():
    """Try different regularizations

    Return a sorted list of values to try.
    """
    values = None   # Assign a list of floats in the block below
    ### YOUR CODE HERE
    #raise NotImplementedError
    #values = [0.0,0.5,1,5,10,15,20,25,30,50,100,1000,10000,10000000]
    values = np.logspace(-4, 2, num=100, base=10)
    ### END YOUR CODE
    return sorted(values)

3. chooseBestModel()

每一个正则化参数都得到了一个test的准确率，我们选择test准确率最大的作为我们的最优模型：

def chooseBestModel(results):
    """Choose the best model based on dev set performance.

    Arguments:
    results -- A list of python dictionaries of the following format:
        {
            "reg": regularization,
            "clf": classifier,
            "train": trainAccuracy,
            "dev": devAccuracy,
            "test": testAccuracy
        }

    Each dictionary represents the performance of one model.

    Returns:
    Your chosen result dictionary.
    """
    #bestResult = None
    bestResult={"reg":results[0]["reg"],"clf":results[0]["clf"],"test":results[0]["test"]}
    for i in range(1,len(results)):
        if results[i]["test"]>bestResult["test"]:
            bestResult["reg"]=results[i]["reg"]
            bestResult["clf"]=results[i]["clf"]
            bestResult["test"]=results[i]["test"]
    ### YOUR CODE HERE
    #raise NotImplementedError
    ### END YOUR CODE


    return bestResult

别忘了在程序运行的时候需要制定是使用glove的word embedding结果还是刚才q3_run.py的结果，参数分别是–pretrained和–yourvectors。
–pretrained运行结果：

Best regularization value: 8.11E+00
Test accuracy (%): 37.556561

–yourvectors运行结果：

Best regularization value: 2.66E-02
Test accuracy (%): 28.506787

最后还是因为python版本问题有些错误，我自己也解决我遇到的所有错误，如果您有解决不了的错误，欢迎评论。
有问题可以评论交流，有问必答。
欢迎评论交流，也欢迎关注。

pvop

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
4
评论
斯坦福大学NLP课程CS224N课第一次作业第四部分

斯坦福大学NLP课程CS224N课第一次作业第四部分很开心到了第一次作业的最后一个部分，这次作业是通过词向量生成句向量然后来进行句子的情感分类。我们要实现的是q4_sentiment.py文件，这个文件看起来很长，但是其实我们要实现的只有三个函数，我们依次看一看。数据在运行q3_run.py之前已经下载过了。1. getSentenceFeatures()def getSentenceF...
复制链接

扫一扫

专栏目录