乌云tail-CSDN博客

原创 NMT平行语料划分数据集

目标：将数据集按比例划分为 train、test、val。对平行语料处理后如下图所示：步骤：随机打乱数据集划分数据集划分平行语料代码如下：import osimport randomdef data_split(config, file, train_ratio=0.98, shuffle=True): """ :param config: 数据文件所在的文件夹名 :param file: 要处理数据的文件名（全称） :param

2021-12-31 15:10:01 1072

原创使用python -m spacy download XXX失败，requests.exceptions.ConnectionError

问题描述：使用spacy在线下载数据集网络连接请求错误。问题原因：大部分是因为网络的原因导致的。解决办法：到官网下载.tar.gz文件，官网链接：Releases · explosion/spacy-models · GitHub然后...

2021-12-11 14:41:25 934

原创 GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation.

问题描述：GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.问题原因：没有或未正确安装安装CUDA和Cudnn。解决办法：根据自己的显卡

2021-12-07 16:49:29 3264

原创基于pytorch实现Word2Vec（skip-gram+Negative Sampling）

目录word2vec简介语料处理数据预处理训练模型近似训练法参数设定预测及可视化word2vec简介2013 年，Google 团队发表了 word2vec 工具。word2vec 工具主要包含两个模型：跳字模型（skip-gram）和连续词模型（continuous bag of words，简称 CBOW），以及两种高效训练的方法：负采样（negative sampling）和层序 softmax（hierarchical softmax）。类似于f(x)->

2021-12-07 16:08:00 5568 6

原创 No module named “legacy“；Cannot find reference ‘legacy‘ in ‘init.py‘

问题描述：torchtext没有名为“legacy”的模块；在“__init__.py”中找不到参考“legacy”。问题原因：在 v0.9.0 版本中，以下"legacy"代码被移至 torchtext.legacy ：torchtext.legacy.data.field torchtext.legacy.data.batch torchtext.legacy.data.example torchtext.legacy.data.iterator torchtext.legacy.d

2021-11-25 20:27:53 3188 3

原创 TypeError: annotate() missing 1 required positional argument: ‘text‘

TypeError: annotate() missing 1 required positional argument: 'text'

2021-11-23 16:51:06 10945 5

原创词向量的可视化（2D）

如何对我们通过word2vec模型得的的词向量（.txt文件）进行可视化?步骤如下：1）导入包matplotlib.pyplot、KMeans、PCA；2）读取词向量文件信息，获取所有词数组（array）和词到词向量的映射（dict）；3）用for循环得到当前所选词的词向量数组（array）；4）将高维向量压缩为二维向量，以此作为可视化图像的X与Y轴坐标；5）设定好维度、颜色、字体后开始画图，最后再为每个词标注信息。代码如下：import matplotlib.py

2021-11-23 16:33:00 1500

qq_24668285的博客