NLP之中文词向量训练

Word2vec

Word2vec是Goolge发布的、应用最广泛的词嵌入表示学习技术,其主要作用是高效获取词语的词向量,目前被用作许多NLP任务的特征工程。Word2vec 可以根据给定的语料库,通过优化后的训练模型快速有效地将一个词语表达成向量形式,为自然语言处理领域的应用研究提供了新的工具,包含Skip-gram(跳字模型)和CBOW(连续词袋模型)来建立词语的词嵌入表示。Skip-gram的主要作用是根据当前词,预测背景词(前后的词);CBOW的主要作用是根据背景词(前后的词)预测当前词。

1)Skip-gram

Skip-gram的主要作用是根据当前词,预测背景词(前后的词),其结构图如下图所示:

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

例如有如下语句:呼伦贝尔大草原

_ _贝_ _草原
呼_ _尔_ _原
呼伦_ _大_ _

预测出前后词的数量,称为window_size(以上示例中windows_size为2),实际是要将以下概率最大化:

P(呼|贝)P(伦|贝)P(尔|贝)P(大|贝)
P(伦|尔)P(贝|尔)P(大|尔)P(草|尔)
P(贝|大)P(尔|大)P(草|大)P(草|原)

可以写出概率的一般化表达式,设有文本Text,由N个单词组成:
T e x t = w 1 , w 2 , w 3 , . . . , w n Text = {w_1, w_2, w_3, ..., w_n} Text=w1,w2,w3,...,wn
目标函数可以写作:
a r g m a x ∏ w ∈ T e x t    ∏ c ∈ c ( w ) P ( c ∣ w ; θ ) argmax \prod_{w \in Text} \ \ \prod_{c \in c(w)} P(c|w; \theta) argmaxwText  cc(w)P(cw;θ)
其中, w w w为当前词, c c c w w w的上下文词, θ \theta θ为要优化的参数,这个参数即每个词(或字)的稠密向量表示,形如:

[ 呼 : θ 11    θ 12    θ 13   . . .    θ 1 n 伦 : θ 21    θ 22    θ 23   . . .    θ 2 n 贝 : θ 31    θ 32    θ 33   . . .    θ 3 n 尔 : θ 41    θ 42    θ 43   . . .    θ 4 n 大 : θ 51    θ 52    θ 53   . . .    θ 5 n 草 : θ 61    θ 62    θ 63   . . .    θ 6 n 原 : θ 71    θ 72    θ 73   . . .    θ 7 n ] \left[ \begin{matrix} 呼: \theta_{11} \ \ \theta_{12} \ \ \theta_{13}\ ...\ \ \theta_{1n} \\ 伦: \theta_{21} \ \ \theta_{22} \ \ \theta_{23}\ ...\ \ \theta_{2n} \\ 贝: \theta_{31} \ \ \theta_{32} \ \ \theta_{33}\ ...\ \ \theta_{3n} \\ 尔: \theta_{41} \ \ \theta_{42} \ \ \theta_{43}\ ...\ \ \theta_{4n} \\ 大: \theta_{51} \ \ \theta_{52} \ \ \theta_{53}\ ...\ \ \theta_{5n} \\ 草: \theta_{61} \ \ \theta_{62} \ \ \theta_{63}\ ...\ \ \theta_{6n} \\ 原: \theta_{71} \ \ \theta_{72} \ \ \theta_{73}\ ...\ \ \theta_{7n} \\ \end{matrix} \right] :θ11  θ12  θ13 ...  θ1n:θ21  θ22  θ23 ...  θ2n:θ31  θ32  θ33 ...  θ3n:θ41  θ42  θ43 ...  θ4n:θ51  θ52  θ53 ...  θ5n:θ61  θ62  θ63 ...  θ6n:θ71  θ72  θ73 ...  θ7n

该参数 θ \theta θ能够使得目标函数最大化。因为概率均为0~1之间的数字,连乘计算较为困难,所以转换为对数相加形式:
a r g m a x ∑ w ∈ T e x t   ∑ c ∈ c ( w ) l o g P ( c ∣ w ; θ ) argmax \sum_{w \in Text} \ \sum_{c \in c(w)} logP(c|w;\theta) argmaxwText cc(w)logP(cw;θ)
再表示为softmax形式:
a r g m a x ∑ w ∈ T e x t ∑ c ∈ c ( w ) l o g ( e u c ⋅ v w / ∑ c ′ ∈ v o c a b e u c ′ ⋅ v w ) argmax \sum_{w \in Text} \sum_{c \in c(w)} log \Big(e^{u_c \cdot v_w} / \sum_{c' \in vocab } e^{u_{c'} \cdot v_w} \Big) argmaxwTextcc(w)log(eucvw/cvocabeucvw)
其中,U为上下文单词矩阵,V为同样大小的中心词矩阵,因为每个词可以作为上下文词,同时也可以作为中心词, u c ⋅ v w u_c \cdot v_w ucvw表示上下文词和中心词向量的内积(内积表示向量的相似度),相似度越大,概率越高;分母部分是以 w w w为中心词,其它所有上下文词 c ′ c' c内积之和,再将上一步公式进行简化:
= a r g m a x ∑ w ∈ T e x t ∑ c ∈ c ( w ) ( l o g ( e u c ⋅ v w ) − l o g ( ∑ c ′ ∈ v o c a b e u c ′ ⋅ v w ) ) = a r g m a x ∑ w ∈ T e x t ∑ c ∈ c ( w ) ( u c ⋅ v w − l o g ∑ c ′ ∈ v o c a b e u c ′ ⋅ v w ) = argmax \sum_{w \in Text} \sum_{c \in c(w)} \Big(log(e^{u_c \cdot v_w}) - log(\sum_{c' \in vocab } e^{u_{c'} \cdot v_w}) \Big)\\ = argmax \sum_{w \in Text} \sum_{c \in c(w)} \Big(u_c \cdot v_w - log \sum_{c' \in vocab }e^{u_{c'} \cdot v_w} \Big) =argmaxwTextcc(w)(log(eucvw)log(cvocabeucvw))=argmaxwTextcc(w)(ucvwlogcvocabeucvw)
上式中,由于需要在整个词汇表中进行遍历,如果词汇表很大,计算效率会很低。所以,真正进行优化时,采用另一种优化形式。例如有如下语料库:

文本:呼伦贝尔大草原

将window_size设置为1,构建正案例词典、负案例词典(一般来说,负样本词典比正样本词典大的多):

正样本:D = {(呼,伦)(伦,呼)(伦,贝)(贝,伦),(贝,尔),(尔,贝)(尔,大)(大,尔)(大,草)(草,大)(草,原)(原,草)}

负样本:D’= {(呼,贝),(呼,尔),(呼,大)(呼,草)(呼,原)(伦,尔),(伦,大),(伦,草),(伦,原),(贝,呼),(贝,大),(贝,草),(贝,原),(尔,呼),(尔,伦)(尔,草),(尔,原),(大,呼),(大,伦),(大,原)(草,呼)(草,伦)(草,贝)(原,呼)(原,伦)(原,贝)(原,尔)(原,大)}

词向量优化的目标函数定义为正样本、负样本公共概率最大化函数:
a r g m a x ( ∏ w , c ∈ D l o g P ( D = 1 ∣ w , c ; θ ) ∏ w , c ∈ D ′ P ( D = 0 ∣ w , c ; θ ) ) = a r g m a x ( ∏ w , c ∈ D 1 1 + e x p ( − U c ⋅ V w ) ∏ w , c ∈ D ′ [ 1 − 1 1 + e x p ( − U c ⋅ V w ) ] ) = a r g m a x ( ∑ w , c ∈ D l o g σ ( U c ⋅ V w ) + ∑ w , c ∈ D ′ l o g σ ( − U c ⋅ V w ) ) argmax (\prod_{w,c \in D} log P(D=1|w,c; \theta) \prod_{w, c \in D'} P(D=0|w, c; \theta)) \\ = argmax (\prod_{w,c \in D} \frac{1}{1+exp(-U_c \cdot V_w)} \prod_{w, c \in D'} [1- \frac{1}{1+exp(-U_c \cdot V_w)}]) \\ = argmax(\sum_{w,c \in D} log \sigma (U_c \cdot V_w) + \sum_{w,c \in D'} log \sigma (-U_c \cdot V_w)) argmax(w,cDlogP(D=1∣w,c;θ)w,cDP(D=0∣w,c;θ))=argmax(w,cD1+exp(UcVw)1w,cD[11+exp(UcVw)1])=argmax(w,cDlogσ(UcVw)+w,cDlogσ(UcVw))
在实际训练时,会从负样本集合中选取部分样本(称之为“负采样”)来进行计算,从而降低运算量.要训练词向量,还需要借助于语言模型.

2)CBOW模型

CBOW模型全程为Continous Bag of Words(连续词袋模型),其核心思想是用上下文来预测中心词,例如:

呼伦贝_大草原

其模型结构示意图如下:

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

  • 输入: C × V C \times V C×V的矩阵,C表示上下文词语的个数,V表示词表大小
  • 隐藏层: V × N V \times N V×N的权重矩阵,一般称为word-embedding,N表示每个词的向量长度,和输入矩阵相乘得到 C × N C \times N C×N的矩阵。综合考虑上下文中所有词信息预测中心词,所以将 C × N C \times N C×N矩阵叠加,得到 1 × N 1 \times N 1×N的向量
  • 输出层:包含一个 N × V N \times V N×V的权重矩阵,隐藏层向量和该矩阵相乘,输出 1 × V 1 \times V 1×V的向量,经过softmax转换为概率,对应每个词表中词语的概率
3)示例:训练词向量

数据集:来自中文wiki文章,AIStudio下数据集名称:中文维基百科语料库

代码:建议在AIStudio下执行

  • 安装gensim
!pip install gensim==3.8.1 # 如果不在AIStudio下执行去掉前面的叹号

输出:

Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/
Collecting gensim==3.8.1
  Downloading https://mirrors.aliyun.com/pypi/packages/44/93/c6011037f24e3106d13f3be55297bf84ece2bf15b278cc4776339dc52db5/gensim-3.8.1-cp37-cp37m-manylinux1\_x86\_64.whl (24.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 4.4 MB/s eta 0:00:0000:0100:01
Collecting smart-open>=1.8.1
  Downloading https://mirrors.aliyun.com/pypi/packages/ad/08/dcd19850b79f72e3717c98b2088f8a24b549b29ce66849cd6b7f44679683/smart\_open-7.0.1-py3-none-any.whl (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.8/60.8 kB 5.0 MB/s eta 0:00:00
Requirement already satisfied: scipy>=0.18.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.6.3)
Requirement already satisfied: numpy>=1.11.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.19.5)
Requirement already satisfied: six>=1.5.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from gensim==3.8.1) (1.16.0)
Requirement already satisfied: wrapt in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from smart-open>=1.8.1->gensim==3.8.1) (1.12.1)
Installing collected packages: smart-open, gensim
Successfully installed gensim-3.8.1 smart-open-7.0.1

[notice] A new release of pip available: 22.1.2 -> 24.0
[notice] To update, run: pip install --upgrade pip
  • 用于解析XML,读取XML文件中的数据,并写入到新的文本文件中
# 利用wiki百科语料库训练词向量

#####################   解压语料库  ###################
import logging
import os
import os.path
from gensim.corpora import WikiCorpus

# 输入文件
in_file = "data/data104767/articles.xml.bz2"
# 输出文件
out_file = open("wiki.zh.text", "w", encoding="utf-8")

count = 0
# lemmatize:控制是否要做词性还原
wiki = WikiCorpus(in_file, lemmatize=False, dictionary={})

for text in wiki.get_texts():  # 遍历语料库
    out_file.write(" ".join(text) + "\n")  # 写入一行
    count += 1
    
    if count % 200 == 0:  # 每200笔打印一次
        print("处理笔数:", count)

    if count >= 20000:
        break
out_file.close()  # 关闭文件

输出

处理笔数: 200
处理笔数: 400
处理笔数: 600
处理笔数: 800
处理笔数: 1000
处理笔数: 1200
处理笔数: 1400
处理笔数: 1600
处理笔数: 1800
处理笔数: 2000
处理笔数: 2200
处理笔数: 2400
处理笔数: 2600
处理笔数: 2800
处理笔数: 3000
处理笔数: 3200
处理笔数: 3400
处理笔数: 3600
处理笔数: 3800
处理笔数: 4000
处理笔数: 4200
处理笔数: 4400
处理笔数: 4600
处理笔数: 4800
处理笔数: 5000
处理笔数: 5200
处理笔数: 5400
处理笔数: 5600
处理笔数: 5800
处理笔数: 6000
处理笔数: 6200
处理笔数: 6400
处理笔数: 6600
处理笔数: 6800
处理笔数: 7000
处理笔数: 7200
处理笔数: 7400
处理笔数: 7600
处理笔数: 7800
处理笔数: 8000
处理笔数: 8200
处理笔数: 8400
处理笔数: 8600
处理笔数: 8800
处理笔数: 9000
处理笔数: 9200
处理笔数: 9400
处理笔数: 9600
处理笔数: 9800
处理笔数: 10000
处理笔数: 10200
处理笔数: 10400
处理笔数: 10600
处理笔数: 10800
处理笔数: 11000
处理笔数: 11200
处理笔数: 11400
处理笔数: 11600
处理笔数: 11800
处理笔数: 12000
处理笔数: 12200
处理笔数: 12400
处理笔数: 12600
处理笔数: 12800
处理笔数: 13000
处理笔数: 13200
处理笔数: 13400
处理笔数: 13600
处理笔数: 13800
处理笔数: 14000
处理笔数: 14200
处理笔数: 14400
处理笔数: 14600
处理笔数: 14800
处理笔数: 15000
处理笔数: 15200
处理笔数: 15400
处理笔数: 15600
处理笔数: 15800
处理笔数: 16000
处理笔数: 16200
处理笔数: 16400
处理笔数: 16600
处理笔数: 16800
处理笔数: 17000
处理笔数: 17200
处理笔数: 17400
处理笔数: 17600
处理笔数: 17800
处理笔数: 18000
处理笔数: 18200
处理笔数: 18400
处理笔数: 18600
处理笔数: 18800
处理笔数: 19000
处理笔数: 19200
处理笔数: 19400
处理笔数: 19600
处理笔数: 19800
处理笔数: 20000
  • 生成分词文件
#####################   分词  ###################
import jieba
import jieba.analyse
import codecs  # 工具包模块

def process_wiki_text(src_file, dest_file):  # 参数为源文件,目标文件
    with codecs.open(src_file, "r", "utf-8") as f_in, codecs.open(dest_file, "w", "utf-8") as f_out:  # 打开源文件,目标文件
        num = 1
        for line in f_in.readlines():
            line_seg = " ".join(jieba.cut(line))
            f_out.writelines(line_seg)
            num += 1

            if num % 200 == 0:
                print("完成笔数:", num)

process_wiki_text("wiki.zh.text", "wiki.zh.text.seg")

输出:

Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.835 seconds.
Prefix dict has been built successfully.

完成笔数: 200
完成笔数: 400
完成笔数: 600
完成笔数: 800
完成笔数: 1000
完成笔数: 1200
完成笔数: 1400
完成笔数: 1600
完成笔数: 1800
完成笔数: 2000
完成笔数: 2200
完成笔数: 2400
完成笔数: 2600
完成笔数: 2800
完成笔数: 3000
完成笔数: 3200
完成笔数: 3400
完成笔数: 3600
完成笔数: 3800
完成笔数: 4000
完成笔数: 4200
完成笔数: 4400
完成笔数: 4600
完成笔数: 4800
完成笔数: 5000
完成笔数: 5200
完成笔数: 5400
完成笔数: 5600
完成笔数: 5800
完成笔数: 6000
完成笔数: 6200
完成笔数: 6400
完成笔数: 6600
完成笔数: 6800
完成笔数: 7000
完成笔数: 7200
完成笔数: 7400
完成笔数: 7600
完成笔数: 7800
完成笔数: 8000
完成笔数: 8200
完成笔数: 8400
完成笔数: 8600
完成笔数: 8800
完成笔数: 9000
完成笔数: 9200
完成笔数: 9400
完成笔数: 9600
完成笔数: 9800
完成笔数: 10000
完成笔数: 10200
完成笔数: 10400
完成笔数: 10600
完成笔数: 10800
完成笔数: 11000
完成笔数: 11200
完成笔数: 11400
完成笔数: 11600
完成笔数: 11800
完成笔数: 12000
完成笔数: 12200
完成笔数: 12400
完成笔数: 12600
完成笔数: 12800
完成笔数: 13000
完成笔数: 13200
完成笔数: 13400
完成笔数: 13600
完成笔数: 13800
完成笔数: 14000
完成笔数: 14200
完成笔数: 14400
完成笔数: 14600
完成笔数: 14800
完成笔数: 15000
完成笔数: 15200
完成笔数: 15400
完成笔数: 15600
完成笔数: 15800
完成笔数: 16000
完成笔数: 16200
完成笔数: 16400
完成笔数: 16600
完成笔数: 16800
完成笔数: 17000
完成笔数: 17200
完成笔数: 17400
完成笔数: 17600
完成笔数: 17800
完成笔数: 18000
完成笔数: 18200
完成笔数: 18400
完成笔数: 18600
完成笔数: 18800
完成笔数: 19000
完成笔数: 19200
完成笔数: 19400
完成笔数: 19600
完成笔数: 19800
完成笔数: 20000
  • 训练
#####################   训练  ###################
import logging
import sys
import multiprocessing
from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence  # 按行读取

logger = logging.getLogger(__name__)
# format: 指定输出的格式和内容,format可以输出很多有用信息,
# %(asctime)s: 打印日志的时间
# %(levelname)s: 打印日志级别名称
# %(message)s: 打印日志信息
logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
logging.root.setLevel(level=logging.INFO)

in_file = "wiki.zh.text.seg"  # 输入文件(经过分词结果)
out_file1 = "wiki.zh.text.model"  # 存模型
out_file2 = "wiki.zh.text.vector"  # 权重(词向量)

model = Word2Vec(LineSentence(in_file),  # 输入
                 size=100,  # 词维度向量(推荐50~300之间)
                 window=3,  # 窗口大小
                 min_count=5,  # 出现次数小于5,忽略
                 workers=multiprocessing.cpu_count())  # 线程数量(和CPU一致)
model.save(out_file1)  # 保存模型
model.wv.save_word2vec_format(out_file2,  # 权重文件
                              binary=False)  # 不保存成二进制

输出:

2024-02-29 18:46:07,204: INFO: collecting all words and their counts
2024-02-29 18:46:07,206: INFO: PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2024-02-29 18:46:12,568: INFO: PROGRESS: at sentence #10000, processed 12880963 words, keeping 865015 word types
2024-02-29 18:46:16,906: INFO: PROGRESS: at sentence #20000, processed 22396155 words, keeping 1278795 word types
2024-02-29 18:46:16,946: INFO: collected 1282620 word types from a corpus of 22481838 raw words and 20090 sentences
2024-02-29 18:46:16,947: INFO: Loading a fresh vocabulary
2024-02-29 18:46:18,289: INFO: effective\_min\_count=5 retains 240560 unique words (18% of original 1282620, drops 1042060)
2024-02-29 18:46:18,290: INFO: effective\_min\_count=5 leaves 20963673 word corpus (93% of original 22481838, drops 1518165)
2024-02-29 18:46:19,098: INFO: deleting the raw counts dictionary of 1282620 items
2024-02-29 18:46:19,158: INFO: sample=0.001 downsamples 17 most-common words
2024-02-29 18:46:19,159: INFO: downsampling leaves estimated 19623071 word corpus (93.6% of prior 20963673)
2024-02-29 18:46:20,367: INFO: estimated required memory for 240560 words and 100 dimensions: 312728000 bytes
2024-02-29 18:46:20,368: INFO: resetting layer weights
2024-02-29 18:47:00,223: INFO: training model with 24 workers on 240560 vocabulary and 100 features, using sg=0 hs=0 sample=0.001 negative=5 window=3
2024-02-29 18:47:01,239: INFO: EPOCH 1 - PROGRESS: at 0.69% examples, 331735 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:02,292: INFO: EPOCH 1 - PROGRESS: at 1.81% examples, 374411 words/s, in\_qsize 0, out\_qsize 2
2024-02-29 18:47:03,300: INFO: EPOCH 1 - PROGRESS: at 3.23% examples, 403992 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:04,310: INFO: EPOCH 1 - PROGRESS: at 4.81% examples, 418015 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:05,318: INFO: EPOCH 1 - PROGRESS: at 6.47% examples, 423913 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:06,318: INFO: EPOCH 1 - PROGRESS: at 8.22% examples, 426120 words/s, in\_qsize 1, out\_qsize 0
2024-02-29 18:47:07,319: INFO: EPOCH 1 - PROGRESS: at 9.80% examples, 428029 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:08,325: INFO: EPOCH 1 - PROGRESS: at 11.57% examples, 426433 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:09,344: INFO: EPOCH 1 - PROGRESS: at 13.26% examples, 428083 words/s, in\_qsize 6, out\_qsize 0
2024-02-29 18:47:10,406: INFO: EPOCH 1 - PROGRESS: at 15.17% examples, 428272 words/s, in\_qsize 7, out\_qsize 0
2024-02-29 18:47:11,489: INFO: EPOCH 1 - PROGRESS: at 17.59% examples, 424947 words/s, in\_qsize 13, out\_qsize 6
2024-02-29 18:47:12,501: INFO: EPOCH 1 - PROGRESS: at 19.53% examples, 429682 words/s, in\_qsize 33, out\_qsize 0
2024-02-29 18:47:13,503: INFO: EPOCH 1 - PROGRESS: at 21.84% examples, 432925 words/s, in\_qsize 36, out\_qsize 0
2024-02-29 18:47:14,532: INFO: EPOCH 1 - PROGRESS: at 23.66% examples, 432996 words/s, in\_qsize 45, out\_qsize 2
2024-02-29 18:47:15,538: INFO: EPOCH 1 - PROGRESS: at 25.82% examples, 434819 words/s, in\_qsize 43, out\_qsize 0
2024-02-29 18:47:16,593: INFO: EPOCH 1 - PROGRESS: at 28.29% examples, 432322 words/s, in\_qsize 41, out\_qsize 3
2024-02-29 18:47:17,598: INFO: EPOCH 1 - PROGRESS: at 30.61% examples, 433390 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:47:18,614: INFO: EPOCH 1 - PROGRESS: at 32.65% examples, 432558 words/s, in\_qsize 43, out\_qsize 0
2024-02-29 18:47:19,692: INFO: EPOCH 1 - PROGRESS: at 35.02% examples, 432228 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:47:20,694: INFO: EPOCH 1 - PROGRESS: at 37.10% examples, 431887 words/s, in\_qsize 36, out\_qsize 1
2024-02-29 18:47:21,792: INFO: EPOCH 1 - PROGRESS: at 39.20% examples, 427554 words/s, in\_qsize 43, out\_qsize 4
2024-02-29 18:47:22,793: INFO: EPOCH 1 - PROGRESS: at 41.81% examples, 430783 words/s, in\_qsize 38, out\_qsize 4
2024-02-29 18:47:23,816: INFO: EPOCH 1 - PROGRESS: at 44.25% examples, 431738 words/s, in\_qsize 41, out\_qsize 3
2024-02-29 18:47:24,827: INFO: EPOCH 1 - PROGRESS: at 46.95% examples, 433221 words/s, in\_qsize 40, out\_qsize 0
2024-02-29 18:47:25,928: INFO: EPOCH 1 - PROGRESS: at 48.98% examples, 431594 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:47:26,974: INFO: EPOCH 1 - PROGRESS: at 51.14% examples, 432264 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:47:27,998: INFO: EPOCH 1 - PROGRESS: at 53.19% examples, 431828 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:47:29,095: INFO: EPOCH 1 - PROGRESS: at 55.78% examples, 430677 words/s, in\_qsize 43, out\_qsize 4
2024-02-29 18:47:30,144: INFO: EPOCH 1 - PROGRESS: at 58.52% examples, 431111 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:47:31,204: INFO: EPOCH 1 - PROGRESS: at 61.01% examples, 431090 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:47:32,213: INFO: EPOCH 1 - PROGRESS: at 63.87% examples, 431603 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:47:33,216: INFO: EPOCH 1 - PROGRESS: at 66.45% examples, 430454 words/s, in\_qsize 37, out\_qsize 2
2024-02-29 18:47:34,232: INFO: EPOCH 1 - PROGRESS: at 68.92% examples, 429998 words/s, in\_qsize 38, out\_qsize 2
2024-02-29 18:47:35,245: INFO: EPOCH 1 - PROGRESS: at 71.61% examples, 429666 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:47:36,303: INFO: EPOCH 1 - PROGRESS: at 74.06% examples, 429054 words/s, in\_qsize 45, out\_qsize 0
2024-02-29 18:47:37,308: INFO: EPOCH 1 - PROGRESS: at 76.31% examples, 429393 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:47:38,316: INFO: EPOCH 1 - PROGRESS: at 78.99% examples, 428799 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:47:39,361: INFO: EPOCH 1 - PROGRESS: at 81.92% examples, 428748 words/s, in\_qsize 36, out\_qsize 0
2024-02-29 18:47:40,394: INFO: EPOCH 1 - PROGRESS: at 84.19% examples, 427402 words/s, in\_qsize 36, out\_qsize 2
2024-02-29 18:47:41,414: INFO: EPOCH 1 - PROGRESS: at 86.84% examples, 426989 words/s, in\_qsize 35, out\_qsize 3
2024-02-29 18:47:42,432: INFO: EPOCH 1 - PROGRESS: at 89.64% examples, 426468 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:47:43,441: INFO: EPOCH 1 - PROGRESS: at 92.60% examples, 426671 words/s, in\_qsize 46, out\_qsize 0
2024-02-29 18:47:44,492: INFO: EPOCH 1 - PROGRESS: at 95.35% examples, 425763 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:47:45,518: INFO: EPOCH 1 - PROGRESS: at 98.36% examples, 427010 words/s, in\_qsize 37, out\_qsize 0
2024-02-29 18:47:45,798: INFO: worker thread finished; awaiting finish of 23 more threads
2024-02-29 18:47:45,820: INFO: worker thread finished; awaiting finish of 22 more threads
2024-02-29 18:47:45,821: INFO: worker thread finished; awaiting finish of 21 more threads
2024-02-29 18:47:45,822: INFO: worker thread finished; awaiting finish of 20 more threads
2024-02-29 18:47:45,890: INFO: worker thread finished; awaiting finish of 19 more threads
2024-02-29 18:47:45,892: INFO: worker thread finished; awaiting finish of 18 more threads
2024-02-29 18:47:45,905: INFO: worker thread finished; awaiting finish of 17 more threads
2024-02-29 18:47:45,920: INFO: worker thread finished; awaiting finish of 16 more threads
2024-02-29 18:47:46,012: INFO: worker thread finished; awaiting finish of 15 more threads
2024-02-29 18:47:46,014: INFO: worker thread finished; awaiting finish of 14 more threads
2024-02-29 18:47:46,028: INFO: worker thread finished; awaiting finish of 13 more threads
2024-02-29 18:47:46,030: INFO: worker thread finished; awaiting finish of 12 more threads
2024-02-29 18:47:46,085: INFO: worker thread finished; awaiting finish of 11 more threads
2024-02-29 18:47:46,088: INFO: worker thread finished; awaiting finish of 10 more threads
2024-02-29 18:47:46,089: INFO: worker thread finished; awaiting finish of 9 more threads
2024-02-29 18:47:46,091: INFO: worker thread finished; awaiting finish of 8 more threads
2024-02-29 18:47:46,093: INFO: worker thread finished; awaiting finish of 7 more threads
2024-02-29 18:47:46,096: INFO: worker thread finished; awaiting finish of 6 more threads
2024-02-29 18:47:46,097: INFO: worker thread finished; awaiting finish of 5 more threads
2024-02-29 18:47:46,101: INFO: worker thread finished; awaiting finish of 4 more threads
2024-02-29 18:47:46,103: INFO: worker thread finished; awaiting finish of 3 more threads
2024-02-29 18:47:46,109: INFO: worker thread finished; awaiting finish of 2 more threads
2024-02-29 18:47:46,111: INFO: worker thread finished; awaiting finish of 1 more threads
2024-02-29 18:47:46,112: INFO: worker thread finished; awaiting finish of 0 more threads
2024-02-29 18:47:46,113: INFO: EPOCH - 1 : training on 22481838 raw words (19622451 effective words) took 45.9s, 427691 effective words/s
2024-02-29 18:47:47,204: INFO: EPOCH 2 - PROGRESS: at 1.01% examples, 449689 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:48,207: INFO: EPOCH 2 - PROGRESS: at 2.25% examples, 457446 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:47:49,216: INFO: EPOCH 2 - PROGRESS: at 3.82% examples, 451966 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:47:50,219: INFO: EPOCH 2 - PROGRESS: at 5.19% examples, 454192 words/s, in\_qsize 1, out\_qsize 0
2024-02-29 18:47:51,223: INFO: EPOCH 2 - PROGRESS: at 6.85% examples, 450118 words/s, in\_qsize 6, out\_qsize 1
2024-02-29 18:47:52,231: INFO: EPOCH 2 - PROGRESS: at 8.74% examples, 449910 words/s, in\_qsize 8, out\_qsize 1
2024-02-29 18:47:53,294: INFO: EPOCH 2 - PROGRESS: at 10.49% examples, 448478 words/s, in\_qsize 24, out\_qsize 0
2024-02-29 18:47:54,298: INFO: EPOCH 2 - PROGRESS: at 12.07% examples, 444214 words/s, in\_qsize 34, out\_qsize 0
2024-02-29 18:47:55,302: INFO: EPOCH 2 - PROGRESS: at 13.65% examples, 440894 words/s, in\_qsize 25, out\_qsize 2
2024-02-29 18:47:56,318: INFO: EPOCH 2 - PROGRESS: at 15.50% examples, 438951 words/s, in\_qsize 35, out\_qsize 3
2024-02-29 18:47:57,319: INFO: EPOCH 2 - PROGRESS: at 18.03% examples, 440375 words/s, in\_qsize 45, out\_qsize 1
2024-02-29 18:47:58,408: INFO: EPOCH 2 - PROGRESS: at 19.82% examples, 435335 words/s, in\_qsize 40, out\_qsize 4
2024-02-29 18:47:59,411: INFO: EPOCH 2 - PROGRESS: at 21.98% examples, 436860 words/s, in\_qsize 35, out\_qsize 5
2024-02-29 18:48:00,497: INFO: EPOCH 2 - PROGRESS: at 23.67% examples, 432678 words/s, in\_qsize 38, out\_qsize 9
2024-02-29 18:48:01,515: INFO: EPOCH 2 - PROGRESS: at 25.70% examples, 432856 words/s, in\_qsize 42, out\_qsize 5
2024-02-29 18:48:02,543: INFO: EPOCH 2 - PROGRESS: at 28.45% examples, 433628 words/s, in\_qsize 37, out\_qsize 0
2024-02-29 18:48:03,598: INFO: EPOCH 2 - PROGRESS: at 30.65% examples, 432969 words/s, in\_qsize 47, out\_qsize 1
2024-02-29 18:48:04,608: INFO: EPOCH 2 - PROGRESS: at 32.73% examples, 433091 words/s, in\_qsize 40, out\_qsize 4
2024-02-29 18:48:05,613: INFO: EPOCH 2 - PROGRESS: at 34.90% examples, 432350 words/s, in\_qsize 34, out\_qsize 4
2024-02-29 18:48:06,691: INFO: EPOCH 2 - PROGRESS: at 37.35% examples, 433678 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:48:07,700: INFO: EPOCH 2 - PROGRESS: at 39.60% examples, 432978 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:48:08,818: INFO: EPOCH 2 - PROGRESS: at 41.95% examples, 430732 words/s, in\_qsize 38, out\_qsize 10
2024-02-29 18:48:09,898: INFO: EPOCH 2 - PROGRESS: at 44.53% examples, 431617 words/s, in\_qsize 37, out\_qsize 7
2024-02-29 18:48:10,914: INFO: EPOCH 2 - PROGRESS: at 47.03% examples, 431466 words/s, in\_qsize 40, out\_qsize 1
2024-02-29 18:48:12,015: INFO: EPOCH 2 - PROGRESS: at 49.07% examples, 430086 words/s, in\_qsize 37, out\_qsize 9
2024-02-29 18:48:13,023: INFO: EPOCH 2 - PROGRESS: at 51.26% examples, 432197 words/s, in\_qsize 46, out\_qsize 0
2024-02-29 18:48:14,103: INFO: EPOCH 2 - PROGRESS: at 53.28% examples, 430321 words/s, in\_qsize 38, out\_qsize 7
2024-02-29 18:48:15,110: INFO: EPOCH 2 - PROGRESS: at 56.13% examples, 432085 words/s, in\_qsize 36, out\_qsize 2
2024-02-29 18:48:16,128: INFO: EPOCH 2 - PROGRESS: at 58.71% examples, 431576 words/s, in\_qsize 43, out\_qsize 1
2024-02-29 18:48:17,130: INFO: EPOCH 2 - PROGRESS: at 61.10% examples, 432362 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:48:18,132: INFO: EPOCH 2 - PROGRESS: at 63.86% examples, 432119 words/s, in\_qsize 33, out\_qsize 0
2024-02-29 18:48:19,136: INFO: EPOCH 2 - PROGRESS: at 66.49% examples, 431220 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:48:20,189: INFO: EPOCH 2 - PROGRESS: at 69.01% examples, 430994 words/s, in\_qsize 48, out\_qsize 0
2024-02-29 18:48:21,195: INFO: EPOCH 2 - PROGRESS: at 71.91% examples, 431854 words/s, in\_qsize 44, out\_qsize 2
2024-02-29 18:48:22,198: INFO: EPOCH 2 - PROGRESS: at 74.54% examples, 432033 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:48:23,198: INFO: EPOCH 2 - PROGRESS: at 76.90% examples, 432597 words/s, in\_qsize 37, out\_qsize 0
2024-02-29 18:48:24,200: INFO: EPOCH 2 - PROGRESS: at 79.89% examples, 432334 words/s, in\_qsize 38, out\_qsize 2
2024-02-29 18:48:25,213: INFO: EPOCH 2 - PROGRESS: at 82.33% examples, 431913 words/s, in\_qsize 43, out\_qsize 2
2024-02-29 18:48:26,297: INFO: EPOCH 2 - PROGRESS: at 85.37% examples, 432540 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:48:27,308: INFO: EPOCH 2 - PROGRESS: at 88.31% examples, 432506 words/s, in\_qsize 45, out\_qsize 0
2024-02-29 18:48:28,333: INFO: EPOCH 2 - PROGRESS: at 91.29% examples, 432265 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:48:29,367: INFO: EPOCH 2 - PROGRESS: at 94.23% examples, 432832 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:48:30,411: INFO: EPOCH 2 - PROGRESS: at 97.04% examples, 432374 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:48:31,206: INFO: worker thread finished; awaiting finish of 23 more threads
2024-02-29 18:48:31,216: INFO: worker thread finished; awaiting finish of 22 more threads
2024-02-29 18:48:31,224: INFO: worker thread finished; awaiting finish of 21 more threads
2024-02-29 18:48:31,237: INFO: worker thread finished; awaiting finish of 20 more threads
2024-02-29 18:48:31,246: INFO: worker thread finished; awaiting finish of 19 more threads
2024-02-29 18:48:31,246: INFO: worker thread finished; awaiting finish of 18 more threads
2024-02-29 18:48:31,248: INFO: worker thread finished; awaiting finish of 17 more threads
2024-02-29 18:48:31,255: INFO: worker thread finished; awaiting finish of 16 more threads
2024-02-29 18:48:31,256: INFO: worker thread finished; awaiting finish of 15 more threads
2024-02-29 18:48:31,259: INFO: worker thread finished; awaiting finish of 14 more threads
2024-02-29 18:48:31,261: INFO: worker thread finished; awaiting finish of 13 more threads
2024-02-29 18:48:31,391: INFO: worker thread finished; awaiting finish of 12 more threads
2024-02-29 18:48:31,396: INFO: worker thread finished; awaiting finish of 11 more threads
2024-02-29 18:48:31,398: INFO: worker thread finished; awaiting finish of 10 more threads
2024-02-29 18:48:31,399: INFO: worker thread finished; awaiting finish of 9 more threads
2024-02-29 18:48:31,400: INFO: worker thread finished; awaiting finish of 8 more threads
2024-02-29 18:48:31,400: INFO: worker thread finished; awaiting finish of 7 more threads
2024-02-29 18:48:31,401: INFO: worker thread finished; awaiting finish of 6 more threads
2024-02-29 18:48:31,401: INFO: worker thread finished; awaiting finish of 5 more threads
2024-02-29 18:48:31,402: INFO: worker thread finished; awaiting finish of 4 more threads
2024-02-29 18:48:31,403: INFO: worker thread finished; awaiting finish of 3 more threads
2024-02-29 18:48:31,421: INFO: EPOCH 2 - PROGRESS: at 99.88% examples, 433489 words/s, in\_qsize 2, out\_qsize 1
2024-02-29 18:48:31,422: INFO: worker thread finished; awaiting finish of 2 more threads
2024-02-29 18:48:31,424: INFO: worker thread finished; awaiting finish of 1 more threads
2024-02-29 18:48:31,426: INFO: worker thread finished; awaiting finish of 0 more threads
2024-02-29 18:48:31,426: INFO: EPOCH - 2 : training on 22481838 raw words (19623552 effective words) took 45.2s, 433810 effective words/s
2024-02-29 18:48:32,490: INFO: EPOCH 3 - PROGRESS: at 0.99% examples, 418910 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:33,500: INFO: EPOCH 3 - PROGRESS: at 2.25% examples, 446236 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:34,505: INFO: EPOCH 3 - PROGRESS: at 3.85% examples, 447744 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:35,519: INFO: EPOCH 3 - PROGRESS: at 5.21% examples, 449918 words/s, in\_qsize 1, out\_qsize 0
2024-02-29 18:48:36,523: INFO: EPOCH 3 - PROGRESS: at 7.03% examples, 450463 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:37,534: INFO: EPOCH 3 - PROGRESS: at 8.89% examples, 450600 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:38,539: INFO: EPOCH 3 - PROGRESS: at 10.50% examples, 448792 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:39,600: INFO: EPOCH 3 - PROGRESS: at 12.22% examples, 447007 words/s, in\_qsize 0, out\_qsize 2
2024-02-29 18:48:40,612: INFO: EPOCH 3 - PROGRESS: at 14.11% examples, 448933 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:41,622: INFO: EPOCH 3 - PROGRESS: at 16.20% examples, 449907 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:42,632: INFO: EPOCH 3 - PROGRESS: at 18.66% examples, 450790 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:43,637: INFO: EPOCH 3 - PROGRESS: at 20.68% examples, 450063 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:44,700: INFO: EPOCH 3 - PROGRESS: at 22.46% examples, 445005 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:45,711: INFO: EPOCH 3 - PROGRESS: at 24.22% examples, 441886 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:46,725: INFO: EPOCH 3 - PROGRESS: at 26.33% examples, 440543 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:47,736: INFO: EPOCH 3 - PROGRESS: at 29.09% examples, 440090 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:48,790: INFO: EPOCH 3 - PROGRESS: at 31.03% examples, 439551 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:49,794: INFO: EPOCH 3 - PROGRESS: at 33.19% examples, 439530 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:50,803: INFO: EPOCH 3 - PROGRESS: at 35.60% examples, 440299 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:48:51,806: INFO: EPOCH 3 - PROGRESS: at 37.85% examples, 440844 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:52,823: INFO: EPOCH 3 - PROGRESS: at 40.15% examples, 439859 words/s, in\_qsize 0, out\_qsize 2
2024-02-29 18:48:53,841: INFO: EPOCH 3 - PROGRESS: at 42.83% examples, 440999 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:48:54,900: INFO: EPOCH 3 - PROGRESS: at 45.18% examples, 440537 words/s, in\_qsize 11, out\_qsize 1
2024-02-29 18:48:55,908: INFO: EPOCH 3 - PROGRESS: at 47.63% examples, 441675 words/s, in\_qsize 6, out\_qsize 0
2024-02-29 18:48:56,927: INFO: EPOCH 3 - PROGRESS: at 49.79% examples, 441534 words/s, in\_qsize 5, out\_qsize 0
2024-02-29 18:48:57,987: INFO: EPOCH 3 - PROGRESS: at 51.68% examples, 440260 words/s, in\_qsize 20, out\_qsize 1
2024-02-29 18:48:59,021: INFO: EPOCH 3 - PROGRESS: at 53.91% examples, 439462 words/s, in\_qsize 40, out\_qsize 4
2024-02-29 18:49:00,103: INFO: EPOCH 3 - PROGRESS: at 56.47% examples, 437661 words/s, in\_qsize 44, out\_qsize 2
2024-02-29 18:49:01,192: INFO: EPOCH 3 - PROGRESS: at 59.31% examples, 437396 words/s, in\_qsize 42, out\_qsize 5
2024-02-29 18:49:02,195: INFO: EPOCH 3 - PROGRESS: at 61.75% examples, 438319 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:49:03,207: INFO: EPOCH 3 - PROGRESS: at 64.32% examples, 436848 words/s, in\_qsize 38, out\_qsize 6
2024-02-29 18:49:04,210: INFO: EPOCH 3 - PROGRESS: at 67.21% examples, 437024 words/s, in\_qsize 45, out\_qsize 0
2024-02-29 18:49:05,230: INFO: EPOCH 3 - PROGRESS: at 69.48% examples, 435721 words/s, in\_qsize 35, out\_qsize 4
2024-02-29 18:49:06,234: INFO: EPOCH 3 - PROGRESS: at 72.35% examples, 436815 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:49:07,296: INFO: EPOCH 3 - PROGRESS: at 74.96% examples, 435926 words/s, in\_qsize 43, out\_qsize 1
2024-02-29 18:49:08,311: INFO: EPOCH 3 - PROGRESS: at 77.20% examples, 435575 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:49:09,311: INFO: EPOCH 3 - PROGRESS: at 80.20% examples, 435663 words/s, in\_qsize 37, out\_qsize 1
2024-02-29 18:49:10,327: INFO: EPOCH 3 - PROGRESS: at 82.80% examples, 435220 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:49:11,402: INFO: EPOCH 3 - PROGRESS: at 85.69% examples, 435276 words/s, in\_qsize 40, out\_qsize 0
2024-02-29 18:49:12,490: INFO: EPOCH 3 - PROGRESS: at 88.34% examples, 433332 words/s, in\_qsize 44, out\_qsize 3
2024-02-29 18:49:13,511: INFO: EPOCH 3 - PROGRESS: at 91.61% examples, 434387 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:49:14,520: INFO: EPOCH 3 - PROGRESS: at 94.45% examples, 434627 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:49:15,619: INFO: EPOCH 3 - PROGRESS: at 97.01% examples, 432646 words/s, in\_qsize 42, out\_qsize 7
2024-02-29 18:49:16,301: INFO: worker thread finished; awaiting finish of 23 more threads
2024-02-29 18:49:16,332: INFO: worker thread finished; awaiting finish of 22 more threads
2024-02-29 18:49:16,349: INFO: worker thread finished; awaiting finish of 21 more threads
2024-02-29 18:49:16,350: INFO: worker thread finished; awaiting finish of 20 more threads
2024-02-29 18:49:16,350: INFO: worker thread finished; awaiting finish of 19 more threads
2024-02-29 18:49:16,360: INFO: worker thread finished; awaiting finish of 18 more threads
2024-02-29 18:49:16,396: INFO: worker thread finished; awaiting finish of 17 more threads
2024-02-29 18:49:16,503: INFO: worker thread finished; awaiting finish of 16 more threads
2024-02-29 18:49:16,505: INFO: worker thread finished; awaiting finish of 15 more threads
2024-02-29 18:49:16,521: INFO: worker thread finished; awaiting finish of 14 more threads
2024-02-29 18:49:16,522: INFO: worker thread finished; awaiting finish of 13 more threads
2024-02-29 18:49:16,524: INFO: worker thread finished; awaiting finish of 12 more threads
2024-02-29 18:49:16,526: INFO: worker thread finished; awaiting finish of 11 more threads
2024-02-29 18:49:16,528: INFO: worker thread finished; awaiting finish of 10 more threads
2024-02-29 18:49:16,530: INFO: worker thread finished; awaiting finish of 9 more threads
2024-02-29 18:49:16,534: INFO: worker thread finished; awaiting finish of 8 more threads
2024-02-29 18:49:16,588: INFO: worker thread finished; awaiting finish of 7 more threads
2024-02-29 18:49:16,590: INFO: worker thread finished; awaiting finish of 6 more threads
2024-02-29 18:49:16,592: INFO: worker thread finished; awaiting finish of 5 more threads
2024-02-29 18:49:16,593: INFO: worker thread finished; awaiting finish of 4 more threads
2024-02-29 18:49:16,595: INFO: worker thread finished; awaiting finish of 3 more threads
2024-02-29 18:49:16,601: INFO: worker thread finished; awaiting finish of 2 more threads
2024-02-29 18:49:16,603: INFO: worker thread finished; awaiting finish of 1 more threads
2024-02-29 18:49:16,604: INFO: worker thread finished; awaiting finish of 0 more threads
2024-02-29 18:49:16,604: INFO: EPOCH - 3 : training on 22481838 raw words (19623091 effective words) took 45.2s, 434423 effective words/s
2024-02-29 18:49:17,620: INFO: EPOCH 4 - PROGRESS: at 0.98% examples, 429701 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:49:18,622: INFO: EPOCH 4 - PROGRESS: at 2.10% examples, 430190 words/s, in\_qsize 1, out\_qsize 0
2024-02-29 18:49:19,635: INFO: EPOCH 4 - PROGRESS: at 3.71% examples, 437753 words/s, in\_qsize 10, out\_qsize 0
2024-02-29 18:49:20,693: INFO: EPOCH 4 - PROGRESS: at 5.06% examples, 436125 words/s, in\_qsize 15, out\_qsize 0
2024-02-29 18:49:21,701: INFO: EPOCH 4 - PROGRESS: at 6.59% examples, 432637 words/s, in\_qsize 13, out\_qsize 1
2024-02-29 18:49:22,707: INFO: EPOCH 4 - PROGRESS: at 8.38% examples, 432137 words/s, in\_qsize 23, out\_qsize 4
2024-02-29 18:49:23,731: INFO: EPOCH 4 - PROGRESS: at 10.00% examples, 433981 words/s, in\_qsize 40, out\_qsize 2
2024-02-29 18:49:24,794: INFO: EPOCH 4 - PROGRESS: at 11.84% examples, 432224 words/s, in\_qsize 43, out\_qsize 0
2024-02-29 18:49:25,798: INFO: EPOCH 4 - PROGRESS: at 13.42% examples, 431804 words/s, in\_qsize 44, out\_qsize 3
2024-02-29 18:49:26,844: INFO: EPOCH 4 - PROGRESS: at 15.42% examples, 431292 words/s, in\_qsize 35, out\_qsize 0
2024-02-29 18:49:27,892: INFO: EPOCH 4 - PROGRESS: at 17.81% examples, 429704 words/s, in\_qsize 41, out\_qsize 2
2024-02-29 18:49:28,893: INFO: EPOCH 4 - PROGRESS: at 19.74% examples, 431790 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:49:29,898: INFO: EPOCH 4 - PROGRESS: at 21.87% examples, 432649 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:49:30,909: INFO: EPOCH 4 - PROGRESS: at 23.74% examples, 434112 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:49:31,991: INFO: EPOCH 4 - PROGRESS: at 25.82% examples, 432283 words/s, in\_qsize 45, out\_qsize 2
2024-02-29 18:49:33,001: INFO: EPOCH 4 - PROGRESS: at 28.64% examples, 434800 words/s, in\_qsize 46, out\_qsize 0
2024-02-29 18:49:34,103: INFO: EPOCH 4 - PROGRESS: at 30.87% examples, 433345 words/s, in\_qsize 43, out\_qsize 2
2024-02-29 18:49:35,112: INFO: EPOCH 4 - PROGRESS: at 33.13% examples, 435904 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:49:36,128: INFO: EPOCH 4 - PROGRESS: at 35.30% examples, 433737 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:49:37,212: INFO: EPOCH 4 - PROGRESS: at 37.32% examples, 430731 words/s, in\_qsize 40, out\_qsize 6
2024-02-29 18:49:38,213: INFO: EPOCH 4 - PROGRESS: at 39.66% examples, 431553 words/s, in\_qsize 42, out\_qsize 5
2024-02-29 18:49:39,293: INFO: EPOCH 4 - PROGRESS: at 42.19% examples, 431437 words/s, in\_qsize 44, out\_qsize 3
2024-02-29 18:49:40,304: INFO: EPOCH 4 - PROGRESS: at 44.65% examples, 432954 words/s, in\_qsize 45, out\_qsize 0
2024-02-29 18:49:41,333: INFO: EPOCH 4 - PROGRESS: at 47.12% examples, 432816 words/s, in\_qsize 37, out\_qsize 3
2024-02-29 18:49:42,410: INFO: EPOCH 4 - PROGRESS: at 49.09% examples, 431043 words/s, in\_qsize 39, out\_qsize 8
2024-02-29 18:49:43,429: INFO: EPOCH 4 - PROGRESS: at 51.39% examples, 433326 words/s, in\_qsize 44, out\_qsize 1
2024-02-29 18:49:44,440: INFO: EPOCH 4 - PROGRESS: at 53.24% examples, 431354 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:49:45,491: INFO: EPOCH 4 - PROGRESS: at 55.48% examples, 428766 words/s, in\_qsize 38, out\_qsize 9
2024-02-29 18:49:46,559: INFO: EPOCH 4 - PROGRESS: at 57.71% examples, 426289 words/s, in\_qsize 35, out\_qsize 12
2024-02-29 18:49:47,608: INFO: EPOCH 4 - PROGRESS: at 60.59% examples, 427738 words/s, in\_qsize 45, out\_qsize 2
2024-02-29 18:49:48,625: INFO: EPOCH 4 - PROGRESS: at 63.18% examples, 427658 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:49:49,637: INFO: EPOCH 4 - PROGRESS: at 65.67% examples, 426155 words/s, in\_qsize 37, out\_qsize 1
2024-02-29 18:49:50,664: INFO: EPOCH 4 - PROGRESS: at 67.88% examples, 424348 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:49:51,688: INFO: EPOCH 4 - PROGRESS: at 70.41% examples, 423689 words/s, in\_qsize 42, out\_qsize 3
2024-02-29 18:49:52,705: INFO: EPOCH 4 - PROGRESS: at 72.83% examples, 423343 words/s, in\_qsize 40, out\_qsize 4
2024-02-29 18:49:53,715: INFO: EPOCH 4 - PROGRESS: at 75.34% examples, 424144 words/s, in\_qsize 45, out\_qsize 1
2024-02-29 18:49:54,785: INFO: EPOCH 4 - PROGRESS: at 78.02% examples, 423646 words/s, in\_qsize 37, out\_qsize 0
2024-02-29 18:49:55,797: INFO: EPOCH 4 - PROGRESS: at 81.04% examples, 424429 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:49:56,821: INFO: EPOCH 4 - PROGRESS: at 83.55% examples, 424469 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:49:57,893: INFO: EPOCH 4 - PROGRESS: at 86.34% examples, 424022 words/s, in\_qsize 43, out\_qsize 2
2024-02-29 18:49:58,894: INFO: EPOCH 4 - PROGRESS: at 89.27% examples, 424351 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:49:59,911: INFO: EPOCH 4 - PROGRESS: at 92.15% examples, 424126 words/s, in\_qsize 39, out\_qsize 1
2024-02-29 18:50:00,919: INFO: EPOCH 4 - PROGRESS: at 95.19% examples, 424954 words/s, in\_qsize 38, out\_qsize 3
2024-02-29 18:50:02,003: INFO: EPOCH 4 - PROGRESS: at 97.84% examples, 423982 words/s, in\_qsize 46, out\_qsize 3
2024-02-29 18:50:02,395: INFO: worker thread finished; awaiting finish of 23 more threads
2024-02-29 18:50:02,417: INFO: worker thread finished; awaiting finish of 22 more threads
2024-02-29 18:50:02,418: INFO: worker thread finished; awaiting finish of 21 more threads
2024-02-29 18:50:02,505: INFO: worker thread finished; awaiting finish of 20 more threads
2024-02-29 18:50:02,508: INFO: worker thread finished; awaiting finish of 19 more threads
2024-02-29 18:50:02,509: INFO: worker thread finished; awaiting finish of 18 more threads
2024-02-29 18:50:02,519: INFO: worker thread finished; awaiting finish of 17 more threads
2024-02-29 18:50:02,520: INFO: worker thread finished; awaiting finish of 16 more threads
2024-02-29 18:50:02,536: INFO: worker thread finished; awaiting finish of 15 more threads
2024-02-29 18:50:02,603: INFO: worker thread finished; awaiting finish of 14 more threads
2024-02-29 18:50:02,615: INFO: worker thread finished; awaiting finish of 13 more threads
2024-02-29 18:50:02,617: INFO: worker thread finished; awaiting finish of 12 more threads
2024-02-29 18:50:02,620: INFO: worker thread finished; awaiting finish of 11 more threads
2024-02-29 18:50:02,621: INFO: worker thread finished; awaiting finish of 10 more threads
2024-02-29 18:50:02,629: INFO: worker thread finished; awaiting finish of 9 more threads
2024-02-29 18:50:02,631: INFO: worker thread finished; awaiting finish of 8 more threads
2024-02-29 18:50:02,632: INFO: worker thread finished; awaiting finish of 7 more threads
2024-02-29 18:50:02,694: INFO: worker thread finished; awaiting finish of 6 more threads
2024-02-29 18:50:02,697: INFO: worker thread finished; awaiting finish of 5 more threads
2024-02-29 18:50:02,699: INFO: worker thread finished; awaiting finish of 4 more threads
2024-02-29 18:50:02,701: INFO: worker thread finished; awaiting finish of 3 more threads
2024-02-29 18:50:02,701: INFO: worker thread finished; awaiting finish of 2 more threads
2024-02-29 18:50:02,702: INFO: worker thread finished; awaiting finish of 1 more threads
2024-02-29 18:50:02,702: INFO: worker thread finished; awaiting finish of 0 more threads
2024-02-29 18:50:02,703: INFO: EPOCH - 4 : training on 22481838 raw words (19624175 effective words) took 46.1s, 425765 effective words/s
2024-02-29 18:50:03,718: INFO: EPOCH 5 - PROGRESS: at 0.99% examples, 448165 words/s, in\_qsize 0, out\_qsize 0
2024-02-29 18:50:04,724: INFO: EPOCH 5 - PROGRESS: at 2.21% examples, 445807 words/s, in\_qsize 3, out\_qsize 2
2024-02-29 18:50:05,726: INFO: EPOCH 5 - PROGRESS: at 3.81% examples, 451942 words/s, in\_qsize 0, out\_qsize 1
2024-02-29 18:50:06,806: INFO: EPOCH 5 - PROGRESS: at 5.07% examples, 433831 words/s, in\_qsize 9, out\_qsize 7
2024-02-29 18:50:07,816: INFO: EPOCH 5 - PROGRESS: at 6.80% examples, 440929 words/s, in\_qsize 16, out\_qsize 3
2024-02-29 18:50:08,850: INFO: EPOCH 5 - PROGRESS: at 8.61% examples, 437993 words/s, in\_qsize 22, out\_qsize 0
2024-02-29 18:50:09,891: INFO: EPOCH 5 - PROGRESS: at 10.26% examples, 438057 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:50:10,893: INFO: EPOCH 5 - PROGRESS: at 11.96% examples, 436458 words/s, in\_qsize 42, out\_qsize 1
2024-02-29 18:50:11,921: INFO: EPOCH 5 - PROGRESS: at 13.59% examples, 435171 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:50:12,937: INFO: EPOCH 5 - PROGRESS: at 15.48% examples, 434585 words/s, in\_qsize 41, out\_qsize 0
2024-02-29 18:50:14,065: INFO: EPOCH 5 - PROGRESS: at 17.99% examples, 430844 words/s, in\_qsize 43, out\_qsize 0
2024-02-29 18:50:15,097: INFO: EPOCH 5 - PROGRESS: at 19.98% examples, 431863 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:50:16,109: INFO: EPOCH 5 - PROGRESS: at 21.77% examples, 427576 words/s, in\_qsize 42, out\_qsize 5
2024-02-29 18:50:17,119: INFO: EPOCH 5 - PROGRESS: at 23.64% examples, 429198 words/s, in\_qsize 38, out\_qsize 2
2024-02-29 18:50:18,194: INFO: EPOCH 5 - PROGRESS: at 25.58% examples, 425934 words/s, in\_qsize 42, out\_qsize 4
2024-02-29 18:50:19,198: INFO: EPOCH 5 - PROGRESS: at 28.08% examples, 426551 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:50:20,211: INFO: EPOCH 5 - PROGRESS: at 30.47% examples, 428039 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:50:21,246: INFO: EPOCH 5 - PROGRESS: at 32.64% examples, 428557 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:50:22,249: INFO: EPOCH 5 - PROGRESS: at 34.85% examples, 429321 words/s, in\_qsize 42, out\_qsize 0
2024-02-29 18:50:23,330: INFO: EPOCH 5 - PROGRESS: at 37.07% examples, 428671 words/s, in\_qsize 48, out\_qsize 0
2024-02-29 18:50:24,394: INFO: EPOCH 5 - PROGRESS: at 39.49% examples, 428834 words/s, in\_qsize 38, out\_qsize 0
2024-02-29 18:50:25,396: INFO: EPOCH 5 - PROGRESS: at 41.68% examples, 427864 words/s, in\_qsize 41, out\_qsize 3
2024-02-29 18:50:26,400: INFO: EPOCH 5 - PROGRESS: at 43.97% examples, 427630 words/s, in\_qsize 47, out\_qsize 0
2024-02-29 18:50:27,412: INFO: EPOCH 5 - PROGRESS: at 46.74% examples, 429013 words/s, in\_qsize 39, out\_qsize 4
2024-02-29 18:50:28,488: INFO: EPOCH 5 - PROGRESS: at 48.66% examples, 428027 words/s, in\_qsize 41, out\_qsize 3
2024-02-29 18:50:29,507: INFO: EPOCH 5 - PROGRESS: at 50.84% examples, 428856 words/s, in\_qsize 45, out\_qsize 2
2024-02-29 18:50:30,512: INFO: EPOCH 5 - PROGRESS: at 52.94% examples, 429820 words/s, in\_qsize 44, out\_qsize 1
2024-02-29 18:50:31,546: INFO: EPOCH 5 - PROGRESS: at 55.59% examples, 430131 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:50:32,548: INFO: EPOCH 5 - PROGRESS: at 58.32% examples, 430904 words/s, in\_qsize 36, out\_qsize 2
2024-02-29 18:50:33,599: INFO: EPOCH 5 - PROGRESS: at 60.84% examples, 430905 words/s, in\_qsize 43, out\_qsize 1
2024-02-29 18:50:34,610: INFO: EPOCH 5 - PROGRESS: at 63.49% examples, 430790 words/s, in\_qsize 43, out\_qsize 4
2024-02-29 18:50:35,695: INFO: EPOCH 5 - PROGRESS: at 66.64% examples, 431336 words/s, in\_qsize 42, out\_qsize 4
2024-02-29 18:50:36,700: INFO: EPOCH 5 - PROGRESS: at 69.35% examples, 432565 words/s, in\_qsize 44, out\_qsize 1
2024-02-29 18:50:37,702: INFO: EPOCH 5 - PROGRESS: at 72.20% examples, 433320 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:50:38,702: INFO: EPOCH 5 - PROGRESS: at 74.68% examples, 432933 words/s, in\_qsize 40, out\_qsize 1
2024-02-29 18:50:39,731: INFO: EPOCH 5 - PROGRESS: at 77.04% examples, 433221 words/s, in\_qsize 44, out\_qsize 0
2024-02-29 18:50:40,794: INFO: EPOCH 5 - PROGRESS: at 80.27% examples, 433750 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:50:41,813: INFO: EPOCH 5 - PROGRESS: at 82.70% examples, 432642 words/s, in\_qsize 39, out\_qsize 4
2024-02-29 18:50:42,822: INFO: EPOCH 5 - PROGRESS: at 85.57% examples, 433342 words/s, in\_qsize 36, out\_qsize 2
2024-02-29 18:50:43,823: INFO: EPOCH 5 - PROGRESS: at 88.48% examples, 433558 words/s, in\_qsize 45, out\_qsize 0
2024-02-29 18:50:44,850: INFO: EPOCH 5 - PROGRESS: at 91.58% examples, 433633 words/s, in\_qsize 48, out\_qsize 0
2024-02-29 18:50:45,909: INFO: EPOCH 5 - PROGRESS: at 94.50% examples, 433724 words/s, in\_qsize 39, out\_qsize 0
2024-02-29 18:50:46,918: INFO: EPOCH 5 - PROGRESS: at 97.38% examples, 433581 words/s, in\_qsize 46, out\_qsize 1
2024-02-29 18:50:47,544: INFO: worker thread finished; awaiting finish of 23 more threads
2024-02-29 18:50:47,589: INFO: worker thread finished; awaiting finish of 22 more threads
2024-02-29 18:50:47,591: INFO: worker thread finished; awaiting finish of 21 more threads
2024-02-29 18:50:47,612: INFO: worker thread finished; awaiting finish of 20 more threads
2024-02-29 18:50:47,823: INFO: worker thread finished; awaiting finish of 19 more threads
2024-02-29 18:50:47,825: INFO: worker thread finished; awaiting finish of 18 more threads
2024-02-29 18:50:47,829: INFO: worker thread finished; awaiting finish of 17 more threads
2024-02-29 18:50:47,833: INFO: worker thread finished; awaiting finish of 16 more threads
2024-02-29 18:50:47,836: INFO: worker thread finished; awaiting finish of 15 more threads
2024-02-29 18:50:47,839: INFO: worker thread finished; awaiting finish of 14 more threads
2024-02-29 18:50:47,843: INFO: worker thread finished; awaiting finish of 13 more threads
2024-02-29 18:50:47,846: INFO: worker thread finished; awaiting finish of 12 more threads
2024-02-29 18:50:47,888: INFO: worker thread finished; awaiting finish of 11 more threads
2024-02-29 18:50:47,892: INFO: worker thread finished; awaiting finish of 10 more threads
2024-02-29 18:50:47,893: INFO: worker thread finished; awaiting finish of 9 more threads
2024-02-29 18:50:47,898: INFO: worker thread finished; awaiting finish of 8 more threads
2024-02-29 18:50:47,900: INFO: worker thread finished; awaiting finish of 7 more threads
2024-02-29 18:50:47,901: INFO: worker thread finished; awaiting finish of 6 more threads
2024-02-29 18:50:47,902: INFO: worker thread finished; awaiting finish of 5 more threads
2024-02-29 18:50:47,986: INFO: EPOCH 5 - PROGRESS: at 99.79% examples, 432737 words/s, in\_qsize 4, out\_qsize 1
2024-02-29 18:50:47,988: INFO: worker thread finished; awaiting finish of 4 more threads
2024-02-29 18:50:47,989: INFO: worker thread finished; awaiting finish of 3 more threads
2024-02-29 18:50:47,989: INFO: worker thread finished; awaiting finish of 2 more threads
2024-02-29 18:50:47,995: INFO: worker thread finished; awaiting finish of 1 more threads
2024-02-29 18:50:48,000: INFO: worker thread finished; awaiting finish of 0 more threads
2024-02-29 18:50:48,001: INFO: EPOCH - 5 : training on 22481838 raw words (19624912 effective words) took 45.3s, 433313 effective words/s
2024-02-29 18:50:48,002: INFO: training on a 112409190 raw words (98118181 effective words) took 227.8s, 430764 effective words/s
2024-02-29 18:50:48,002: INFO: saving Word2Vec object under wiki.zh.text.model, separately None
2024-02-29 18:50:48,003: INFO: storing np array 'vectors' to wiki.zh.text.model.wv.vectors.npy
2024-02-29 18:50:48,099: INFO: not storing attribute vectors\_norm
2024-02-29 18:50:48,100: INFO: storing np array 'syn1neg' to wiki.zh.text.model.trainables.syn1neg.npy
2024-02-29 18:50:48,189: INFO: not storing attribute cum\_table
2024-02-29 18:50:48,857: INFO: saved wiki.zh.text.model
2024-02-29 18:50:48,859: INFO: storing 240560x100 projection weights into wiki.zh.text.vector
  • 测试
#####################   测试  ###################
import gensim
from gensim.models import Word2Vec

model = Word2Vec.load("wiki.zh.text.model")  # 加载模型
count = 0
# 打印前10个词向量
for word in model.wv.index2word:
    print(word, '[', model[word], ']')  # 打印每个词对应的词向量
    count += 1
    if count >= 10:
        break
print("")

result = model.most_similar(u"铁路")  # 返回跟指定词语相似度最高的词
for r in result:
    print(r)
print("")

result = model.most_similar(u"中药")  # 返回跟指定词语相似度最高的词
for r in result:
    print(r)
print("")

result = model.most_similar(u"普京")  # 返回跟指定词语相似度最高的词
for r in result:
    print(r)
print("")

输出(训练过程略):

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:9: DeprecationWarning: Call to deprecated \`\_\_getitem\_\_\` (Method will be removed in 4.0.0, use self.wv.\_\_getitem\_\_() instead).
  if \_\_name\_\_ == '\_\_main\_\_':
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:15: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead).
  from ipykernel import kernelapp as app
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:20: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead).

的 \[ \[ 0.05327865 -0.53783256  0.4011491   1.0377467  -0.37736186 -1.3369029
  1.0646014  -0.1514761   1.5452628  -1.0377175   0.0472149  -0.8363005
 -0.10084558 -0.48065135 -0.3601034   0.94604933  0.8394771  -0.6131299
  1.5977417  -2.3976393   0.4921375   0.7588338  -0.32347357 -0.1854495
 -0.5323685  -0.5173837   0.51421505 -0.52293605 -0.36417717 -0.90888894
 -0.13158794  1.4198147  -0.8280145  -0.3019174   2.1073706  -0.760788
  1.2119099   0.3257739  -0.8752619   0.13358122 -1.3738849  -0.57065696
  0.5482845  -0.44655856 -0.2226158  -0.274896    0.6865202  -0.0033878
 -0.2763316  -0.36525854 -0.70503396 -0.64678556 -0.29910237  0.38098514
 -1.4872898   0.03365944 -0.82742816 -0.43514818 -1.2130035  -0.11949506
 -0.29346514  2.0837998   0.17063631  0.17331794  0.33808464 -0.4261683
  1.0569005  -1.4714183   0.33974665 -1.5394073  -0.9799224  -0.54741603
  0.48417726  0.51358485 -0.5715329  -0.12952082  0.4293968  -1.0172116
  1.1407273  -0.88506544 -0.5702839  -1.3481365  -0.39994067 -2.0000238
  1.2328296  -1.1719111  -0.9050281   1.1634829   0.07408974 -1.2275641
  0.27946717 -1.2653685   1.3553606  -1.6927024  -0.7033033   0.2693373
  1.253629    0.6496037   0.2191684  -0.78412926\] \]
在 \[ \[-0.42463258 -2.4437954   0.53217393  0.9996975   0.37987134 -0.88477343
  1.0971447   1.5294083   1.0648928  -1.558458   -0.36555728 -0.42607346
 -1.3693268  -0.6986576   0.5938768   0.0783068   0.758885   -0.83025175
  1.2274495  -0.9937148  -0.33092266  2.093802   -0.33651614  0.21035707
  0.63450843  1.1645601  -1.1898849  -0.07593375 -0.88322973 -0.19563046
  0.9769286   0.74906284 -0.70940083  1.4368366   0.9067723  -0.44570225
  1.0358443   1.1667545   1.0775245  -0.9527623  -0.95970166 -0.70979124
 -0.5310172   0.36139968 -0.18026341  1.4736971   1.6084458  -1.705582
 -0.10648517 -0.1105919  -0.25159562 -0.00873835 -0.26249817  2.1622958
 -1.5742291   0.14910135 -1.2894114   0.2511249  -1.1792454  -0.72360325
 -0.07263664  3.9779882  -0.82457787  0.02922174 -0.57287693  0.34086442
  1.1984884   1.0886639   0.8197982  -0.77552193  0.70042676 -1.4123865
 -0.54220575 -1.0212927   0.2889944   0.24239118  0.1648594  -0.7769327
  0.16848186  1.2421886  -1.6019603  -1.6944915   1.0919546  -0.3363789
 -0.82180744  1.2388902  -1.4864118   0.7689477  -1.170416   -1.4182153
  1.2648599  -2.2608922   1.408338   -1.6608157  -0.6797598   1.494119
  1.197311    1.7018986  -1.045801   -0.14495309\] \]
是 \[ \[ 2.0627701  -1.1885465   1.3413388  -0.63310647 -1.0378323   0.191146
 -1.0518003  -1.0862353   1.4891928   0.37143752 -0.70622504  0.0780537
  0.02657107 -1.2314191  -0.09800842  0.7191717   1.0831716  -0.6678763
  2.553883   -1.2007903  -0.73821676  0.3530014   1.721368    1.3866593
  0.7923605  -1.3692601   0.41672337 -0.96140575 -2.3858385  -0.26267844
  1.114126    1.1806949   1.0037898   1.5768572   2.0220714   0.5763852
  1.6764815   1.9266133  -1.358343    0.15191413 -0.5946121  -2.6195357
  1.6187361  -0.7356461  -1.473615   -0.76726705  2.1406848   0.30505633
  2.9442768   0.3789943  -1.1278807   0.3917617   0.770161    0.73717993
 -0.9430313  -0.2679599  -2.0000083  -0.7008843  -1.0499135  -0.6178511
  2.1850324   2.9828587   0.6661941   2.053006   -1.086323    0.04475425
  2.53099    -0.8302406   0.64353186 -1.8250593   1.1532271  -1.144374
  0.589161   -1.6655053  -0.9623368   1.7211503   0.9463574  -1.9714044
 -1.3936982  -1.1316496   0.17267536 -0.4145161  -0.3099934   0.256562
  1.015426    0.3788599  -1.9755847   0.81467634  1.02809    -1.4076906
 -1.868017   -1.1592947  -0.26673457 -1.2610213   0.5215924  -1.102127
 -0.27354524 -1.1902376  -0.36140302  0.69883007\] \]
年 \[ \[-0.7686671   0.08463874  1.32112     0.90194964  1.3142313   0.2762654
 -0.92944527  0.7551384   1.0904723  -1.7634455  -0.14772087  0.9270074
  0.70371234  0.0248665   2.1429276   0.3456471   1.2979926   1.3280556
 -1.0709156  -1.4325314  -2.0553591   1.9210931  -0.2635952   0.89939356
 -0.24535367  0.12382335 -0.34222543 -1.4257516  -0.16413423  2.005949
  1.1495656   0.9052044  -1.0064452   1.6927723   1.2470132   0.85299474
  1.7945793  -2.3215466  -1.4006611   0.16407704 -0.17039333  0.59470963
 -1.1873173   1.6482116  -0.9101744   2.4193552   0.07334835 -1.1106066
 -0.32891974 -0.10809796 -1.2178708   1.6356549  -0.46528345  1.185685
 -1.0507497  -0.16127113 -0.40930077 -1.7317686  -1.3865246   1.4947401
  0.03928837  1.915953    0.2928778   1.169346   -1.2584093   0.75220495
 -1.5323597   0.6966527   1.0286992  -3.1477675  -0.01011731  0.47440663
 -0.01000467 -1.3776034  -0.16935979 -0.2138399   0.5649436  -2.1339822
 -1.8523834  -0.82796097  0.86329234 -4.4993134   0.31964254 -0.16166072
  1.3355536  -1.6848409  -0.40493208 -0.1744769  -0.0065302  -1.1696439
 -0.29254827  2.644544    1.0520754  -2.0030859   1.9328154  -1.7976549
  1.9641755  -0.9358227  -2.0198176   1.7621045 \] \]
和 \[ \[-1.7936664  -0.60258466 -0.33815056  1.7664076   1.1888294  -0.5124309
 -0.982421   -1.5381647   1.3690406  -0.34940612 -0.18159316  0.34924024
 -2.1175427  -0.39139643 -0.97609144  0.7091031   1.2836043   0.59916985
 -0.44169012 -1.2265047   0.25998732  0.9211792  -0.4099178   0.11590376
 -0.28670695  2.7602255  -0.77220744 -0.7016491  -1.6584926  -1.2257215
  0.88776666 -0.25778687  0.49061418  0.48738685 -0.56769925 -2.2035594
  2.6515436  -0.37563872 -0.08108984  0.17916249 -1.954872   -0.32587513
 -0.2813556  -0.71491474 -0.55605733  0.27773035  0.22445679 -0.1038675
 -0.66065305  1.0714678  -1.412449   -0.14055876  0.17481208  0.6475259
  1.9205297  -0.85978746 -1.7288083  -0.92688423 -0.1334583   0.09569005
 -1.290859    0.77196133 -0.01910409  0.5789173  -0.51498264 -0.8700445
  2.5325863  -0.4368111   0.79267114 -0.28794852  0.7135503   0.00821535
 -0.13613094  0.7516194  -0.8653614   1.2800741  -0.51343066 -1.5026168
  0.54250664 -1.3580089  -1.1880492  -1.5932618   1.1894258  -1.4418019
 -0.41850865 -1.1452755  -0.9339831  -0.12613311  0.7358218   0.08830387
 -0.5067824   0.03577109 -2.0329843  -0.35796794 -1.0161774  -1.0131522
 -0.43137753  0.33253494  0.4018132  -1.41702   \] \]
了 \[ \[-1.9436138e+00  1.6483014e+00  7.3397523e-01  2.5087681e+00
  1.2167963e-01 -4.3142447e+00  1.8856712e-01 -5.9046990e-01
  8.0753857e-01 -1.4349873e+00 -2.4201753e+00 -9.8747307e-01
 -1.1762297e+00 -4.0771633e-01 -4.7250494e-01  1.4274366e+00
  4.2959139e-01  1.1849896e+00  1.4658893e+00 -2.1643031e+00
 -4.9282961e-02 -2.5011623e-01 -7.6726717e-01  2.0264297e+00
 -9.4920123e-01 -1.2521985e+00 -2.0591247e+00 -1.2519429e+00
 -1.5353471e+00  9.2382416e-02  4.3579984e-01  3.0063396e+00
 -2.4839160e-01  8.4310241e-02  2.0635054e+00 -1.1391885e+00
  2.2873564e+00 -1.3363756e+00  2.0226948e+00 -1.0125631e+00
  5.2646023e-01 -2.0331869e+00 -1.9216677e+00  4.7612253e-01
 -9.8945385e-01  6.6175139e-01  1.1421987e+00  2.1541378e-01
  7.3244750e-01  2.6114142e-01  1.7791729e-01 -2.2847609e-01
  4.1287571e-01  8.8611461e-02 -6.1350155e-01  2.7324381e+00
 -2.9383843e+00  1.7865591e+00  1.0036942e+00 -2.0990545e-01
  2.7161896e-01  2.4153254e+00  1.5154275e-01 -1.2099750e+00
 -1.5965548e+00  1.6759452e+00 -8.3815706e-01  9.7805393e-01
  1.5085987e+00  3.0611422e-02  1.8509774e+00  7.3120952e-01
  1.6457441e+00 -2.7104132e+00  1.2034345e+00 -2.1080136e+00
 -7.5097762e-02 -6.3763016e-01  1.1206281e+00  6.5688306e-01
 -8.1922483e-01 -6.7665690e-01 -9.2754817e-01 -2.1539629e+00
  3.3879298e-01 -8.6143786e-01  3.0885071e-01  1.6986367e-03
 -2.8715498e+00 -2.4140685e+00 -7.0239681e-01 -3.5281119e-01
 -1.1388317e+00 -2.9193931e+00 -8.3260250e-01  1.1267102e+00
  6.9696531e-02  7.8351122e-01 -1.2417021e+00  5.3507799e-01\] \]
於 \[ \[-0.53269184 -0.36012843 -0.90692663 -0.362973    1.6366956   0.43958563
  3.5067036   2.6491318   2.0490243  -2.5787504  -0.21314327  0.4410392
 -1.6150179  -1.46432    -1.2484831   0.1407568   1.9192587  -2.6820233
  1.0737547  -0.24800494 -1.0269834   1.7373953   1.3810781  -0.4585215
  0.2519634   3.2757533  -0.6035296   0.35779628 -0.948003   -0.16447543
  0.31204602  0.15876536  1.2921379   0.35225153  2.5887783   0.41650772
  0.35305426  2.0292234   0.10431505  1.3056135  -2.4140575  -2.002597
 -1.6638165   2.0507886   0.98914206 -0.43331012  1.8916605   0.05047855
  1.0437186  -1.1184938  -2.9102814  -0.44862002 -1.5645779   2.8235185
  1.924463   -3.3811007  -0.35813704 -1.3784553  -0.13405009 -0.46785113
  0.43813846 -0.22884116 -2.258125    1.3256868   0.9471638  -0.7640105
  1.5571808   3.67068    -1.6710087  -0.7790608   2.6240816  -0.45023972
  1.9663687   1.0383086  -1.4122825  -1.4562843   0.91329277  1.5664321
 -2.1609735   0.69369364  0.80576926 -2.248431   -0.3148186   3.7729433
  1.9283202   1.3789957   1.7161353   0.41128302 -3.1765096  -2.251166
  0.8560807  -2.5485353   1.842208   -0.95322704 -0.38343704  1.9791752
  1.5500413   1.2364751  -2.791452    0.86852646\] \]
有 \[ \[ 0.18258221 -2.3049097   1.497023   -1.3487074  -3.727378    0.50723153
  0.24911746 -0.9672969  -0.48191318 -1.9050581  -0.43654805  2.3170543
  1.1036432  -0.10333104 -0.600115    1.1932797  -0.2998131  -1.5992122
  1.2433572  -2.4837353  -2.0780206   0.5371243  -0.26808724  0.8863078
  0.1829582  -1.0246236  -1.1198604  -0.9830677   0.32930338 -1.9471537
 -0.50897574  1.2545011  -0.21091224  0.8305599   1.4226604   0.5430306
  2.6515467   0.5407714   0.9081938   0.97991794  2.4480078  -0.37713084
 -0.6813183   2.2026594  -0.98280716  0.64117205  0.84968406  2.2401826
  0.03101416  0.7823304  -0.7782107  -2.7762618   0.13561708  2.07437
 -2.5257614   0.05108101 -1.4901428  -1.2402513   0.8309426  -0.7211121
  1.9019513   3.960054   -1.8887383   0.39035076 -0.8641255  -0.61770415
  1.5876379  -2.6439407   0.40260366 -0.29260564  2.5874152  -0.9009424
 -1.1908774  -0.9699979   0.5119676   0.45808062 -1.3735437  -0.36920473
  1.3141601  -2.0569677  -0.38173217  0.6845369   0.42968374  0.6309501
 -0.107759   -1.9057575  -1.5358291  -1.5482831  -1.9485159  -1.0988526
  0.13084021 -2.2655997   1.5260206  -2.5642633  -0.5744566  -0.08428011
 -1.4260695   0.19865772  0.4512534  -0.9715485 \] \]
為 \[ \[ 0.1896151  -1.8181373   3.3572814  -1.0685782   0.49208143  2.378115
  0.20101228  1.4529976   0.9294902  -2.4987757  -1.0333834   1.4453149
  0.16965653  1.0906299   0.0800195   0.340483    0.4134813  -1.2748445
 -0.36895636 -2.4484384   2.7829435   2.1415136   1.1581845   1.3086056
  2.269279   -3.1014345  -1.1449586  -1.3106853   0.72433496  1.2913316
  0.58723754  0.7778022  -0.15846896 -0.8188704   3.476332   -0.39220434
  0.4255195   1.1091534   0.5255888   1.3774976   1.6827629  -1.4297134
  3.231139   -1.0098424   2.2370512   1.4140384  -1.3761084  -0.91320014
  3.5050528   0.13942066 -2.3848357  -0.13616315 -0.7808771  -2.407467
  0.86087775 -1.0559365  -0.48088276 -0.06485796 -1.6015757  -0.09015828
  1.7890335   1.3084754  -0.24928983  0.4265427   0.5235788  -2.5840218
  3.446129   -0.38306347 -1.2760977  -1.5464348   0.5212346  -0.13336264
  2.2953007   2.7280948  -0.96069217  2.5137131   0.42497525 -0.70072883
 -2.267927    1.168881   -0.9020308  -0.09913182  1.2413316  -0.3555878
  2.8514225  -0.75041753  1.0707642  -0.24892485  0.1871088   2.4870126
  0.85072696 -0.09363261 -1.7077832  -1.4241735   2.1220694   0.6601661
  2.066149    0.6744381  -1.6973271   1.0169114 \] \]
中 \[ \[ 1.4874371   1.2361885  -0.65242714 -0.34126204  0.57889384 -0.23008534
 -0.30447042 -0.57755953 -1.7507975  -0.81123376 -1.793821    1.1634924
 -1.3117273  -2.5925312  -0.09789076 -0.9175423  -1.2740778  -0.10426655
 -1.0767089  -2.4226363  -2.5593858  -0.3632854  -2.9014764  -1.2972332
 -0.47120202 -0.43429497  1.5338455  -0.04679739 -0.6710644  -0.3594158
  0.19282657  3.6521492  -0.92375535  1.4474547   3.8261755  -1.9815593
  0.447895    0.19661807 -0.46812743  1.6137925  -5.175673    3.2903085
 -1.316274   -3.0216594   1.1939027  -0.09521949 -0.5766235  -2.2970293
  1.2614003   2.533164   -3.493826    1.6417836   1.1442548   1.4973636
 -2.3305657   1.1097547  -1.1043872   1.1209316  -1.3346034  -2.21357
  3.2227323   1.248244   -1.1044198  -0.26365292 -0.5852314  -1.1633244
  2.2887418   1.0944943   0.03403454 -3.1973522  -0.07695059 -1.8952466
 -2.0021343  -0.9144443   2.0054252  -1.1020823  -1.0487053   3.0473692
  1.7026799  -0.8561669  -0.61132085  0.48333165  0.35812032  0.08411325
 -0.31317002 -1.3281633  -1.668745    0.05250288  1.6916655  -2.6553738
 -0.35415223 -0.31771886  0.18930124  0.3642429   0.06295905 -1.6075228
  1.0787768  -1.5568794  -1.8582122   2.0176907 \] \]

('高铁', 0.8286420106887817)
('客运专线', 0.8204636573791504)
('高速铁路', 0.8059936761856079)
('城际', 0.803480327129364)
('支线', 0.8015587329864502)
('枢纽', 0.7872340679168701)
('环线', 0.7759756445884705)
('干线', 0.7744237184524536)
('通车', 0.774136483669281)
('胶济', 0.7733167409896851)

('草药', 0.8472081422805786)
('气功', 0.8342577219009399)
('生药', 0.8325771689414978)
('矿物', 0.8319119215011597)
('药用', 0.829121470451355)
('调味', 0.8272895812988281)
('中药材', 0.8242448568344116)
('有机合成', 0.8207097053527832)
('名贵', 0.8184367418289185)
('工艺品', 0.8182870149612427)

('布什', 0.8476501703262329)
('李光耀', 0.79795241355896)
('尼克森', 0.7966246604919434)
('里根', 0.7949744462966919)
('歐巴馬', 0.784071683883667)
('奥巴马', 0.7817918658256531)
('希特勒', 0.779426097869873)
('希拉里', 0.7774235010147095)
('巴拉克', 0.7772145867347717)
('雷根', 0.7767146229743958)

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel\_launcher.py:25: DeprecationWarning: Call to deprecated \`most\_similar\` (Method will be removed in 4.0.0, use self.wv.most\_similar() instead).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

NJU_AI_NB

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值