![3bebd81015bab7848c04cbae174afab3.png](https://i-blog.csdnimg.cn/blog_migrate/33c07e2145cbdb539d6f607194216e37.jpeg)
该文章内容基于gensim对wordvec的教程以及一篇知乎专栏 https:// zhuanlan.zhihu.com/p/26 306795 https:// radimrehurek.com/gensim /auto_examples/tutorials/run_word2vec.html#sphx-glr-download-auto-examples-tutorials-run-word2vec-py
Bag-of-words(词袋模型)
该模型将每一条文本转换为固定长度的整数向量。比如:
John likes to watch movies. Mary likes movies too.
John also likes to watch football games. Mary hates football.
模型首先获取词袋,词袋中每个词的顺序是任意的,以下给出一个例子:
["John", "likes", "to", "watch", "movies", "Mary", "too", "also", "football", "games", "hates"]
于是ÿ