词嵌入生成词向量
AI创意助手 (AI creative assistant)
With new and more complex language models coming out in recent times, sometimes it feels like an overwhelming task to make your initial steps into the natural language processing world.
随着最近出现的新的和更复杂的语言模型,有时候进入自然语言处理世界的第一步似乎是一项艰巨的任务。
The purpose of this guide is to provide a fair simple project, a poem generator, to grasp your hands on the NLP world using just one technique: word embeddings.
本指南的目的是提供一个简单的项目(一个诗歌生成器),以仅使用一种技术(词嵌入)来掌握NLP领域。
那么什么是词嵌入? (So what are word embeddings?)
Word embeddings are n-dimensional vectorial representations of words, that somehow capture their meaning based on the context existing on a corpus. Stated in another way: words that are used in similar ways in a specific corpus(collection of texts), will have similar vectors.
词嵌入是词的n维矢量表示,它可以基于语料库中存在的上下文以某种方式捕获其含义。 换句话说:在特定语料库(文本集合)中以相似方式使用的单词将具有相似的向量。
And the fun part of having numerical representation of words is that now we can use them in math operations like computing similarity, a measure that we will use in our poem generator.
单词数值表示的有趣之处在于,现在我们可以在数学运算中使用它们,例如计算相似性,这是我们将在诗歌生成器中使用的一种度量。
Similarity is used to quantify the sameness between two vectors. It computes the cosine of the angle between them, which results in a value of 1 for vectors with the same orientation an 0 when the angle is 90°.
相似性用于量化两个向量之间的相似性。 它计算它们之间的角度的余弦值,对于具有相同方向的矢量,当角度为90°时,值为1;值