Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\CDAer\AppData\Local\Temp\jieba.cache
Loading model cost 1.233 seconds.
Prefix dict has been built succesfully.
cutword[:5]
['第一回', '\n', '灵根育孕', '源流', '出']
#处理\n
cutwords=[]for word in cutword:if word!='\n':
cutwords.append(word)
pd.read_csv?
#停留词的处理import pandas as pd
stopwords=pd.read_csv(r'C:\Users\CDAer\Desktop\data\stopwords.txt',sep='\t',
quoting=3,names=['stopword'])
D:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: Call to deprecated `similarity` (Method will be removed in 4.0.0, use self.wv.similarity() instead).
D:\ProgramData\Anaconda3\lib\site-packages\gensim\matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int32 == np.dtype(int).type`.
if np.issubdtype(vec.dtype, np.int):
0.8457024
#计算某个词的相关词的列表
model.most_similar('八戒',topn=10)
D:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: Call to deprecated `most_similar` (Method will be removed in 4.0.0, use self.wv.most_similar() instead).
D:\ProgramData\Anaconda3\lib\site-packages\gensim\matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int32 == np.dtype(int).type`.
if np.issubdtype(vec.dtype, np.int):
[('沙和尚', 0.9313569664955139),
('扯住', 0.9298043847084045),
('师兄', 0.9273734092712402),
('呆子', 0.9261776804924011),
('哥哥', 0.9243035316467285),
('挑着', 0.9239184856414795),
('挑', 0.9199938178062439),
('老猪', 0.9184861183166504),
('沙僧', 0.9174877405166626),
('哥', 0.9134185910224915)]