原始题目
https://web.stanford.edu/class/cs224n/assignments/a1_preview/exploring_word_vectors.html
Question 1.1: Implement distinct_words [code] (2 points)
输入:
· corpus: 包括多个句子的语料
输出
- corpus_words: 先去重,再排序后的单词列表
- num_corpus_words:去重重复后的单词数量
def distinct_words(corpus):
""" Determine a list of distinct words for the corpus.
Params:
corpus (list of list of strings): corpus of documents
Return:
corpus_words (list of strings): sorted list of distinct words across the corpus
n_corpus_words (integer): number of distinct words across the corpus
"""
corpus_words = []
n_corpus_words = -1
# ------------------
# Write your implementation here.
distinct_words_set = set()
for sentence in corpus:
distinct_words_set.update(sentence )
corpus_words = sorted(list(distinct_words_set))
num_corpus_words = len(corpus_words)
# ------------------
return corpus_words, num_corpus_words
``
# Question 1.2: Implement compute_co_occurrence_matrix [code] (3 points)