linux anzhung python,清华开源知识图谱 OPENKE pytorch github入门bug及解决方案配置: LINUX python 3.5...

最新推荐文章于 2024-04-26 08:00:00 发布

安新宇

最新推荐文章于 2024-04-26 08:00:00 发布

阅读量434

点赞数

文章标签： linux anzhung python

电脑配置: python3.5 有虚拟环境

首先安装 OpenKE 软件安装包

git clone -b OpenKE-PyTorch https://github.com/thunlp/OpenKE

remote: Enumerating objects: 1033, done.

error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.

fatal: The remote end hung up unexpectedly

fatal: early EOF

fatal: index-pack failed"

出现了以上的报错,查询了之后,发现是缓冲区溢出了,

纠正方法:

git config --global http.postBuffer 524288000

remote: Enumerating objects: 1033, done.

remote: Total 1033 (delta 0), reused 0 (delta 0), pack-reused 1033

Receiving objects: 100% (1033/1033), 276.84 MiB | 84.00 KiB/s, done.

Resolving deltas: 100% (501/501), done.

没有报错

cd OpenKE # 进入OpenKE 文件夹下

bash make.sh

第一步安装module OpenkE 完成

训练知识图谱模型—linux shell 中输入代码训练

参考的是一篇简书翻译的OpenKE 主页上的内容,附上链接

https://www.jianshu.com/p/ac603764b2a3

import config

import models

import json

import numpy as n

错误1 :虚拟环境引起的错误

在训练时由于我的pytorch 在虚拟环境里,出现了一些问题

直接import config 报错提示

“ModuleNotFoundError: No module named ‘torch’”

因此我将整个OpenKE文件挪到了虚拟环境安装包里,复制粘贴就可以.进入虚拟环境后, 直接import config

错误2 import config 为 python 中默认的config

设置 con.set_in_path("./benchmarks/FB15K/")的时候报错,提示我没有set_in_path,

python3.5 模块中自带 config,因此,默认导入的不是OpenKE 中的 config 而是python自带的config

解决方案1: 将OpenKE文件夹的位置添加到系统路径下

在import config 前加入

import sys

sys.path.append("./OpenKE")

解决方案2 : 在虚拟环境中进入 OpenKE文件夹下

输入

cd OpenKE

就可以解决上述两个错误

输入python

进入 python 界面

错误3:github 页的代码很多参数名称在源码的config中做了微调

import config

import models

import json

import numpy as np

con = config_o.Config()

con.set_in_path('./benchmarks/FB15K/')

con.set_work_threads(4)

con.set_train_times(500)

con.set_nbatches(100)

con.set_alpha(0.001)

con.set_margin(1)

con.set_born(0)

con.set_export_files("./res/model.vec.tf", 0)

Traceback (most recent call last):

File "", line 1, in AttributeError: 'Config' object has no attribute 'set_export_files'

原因: github 页的代码很多参数名称在源码的config中做了微调,调整后的

训练代码为:

可在jupyter notebook /spyder /pycharm 中运行这段代码

import config

from models import *

import json

## 因为我的服务器上没有cuda 就把os注释掉了,如果有安装cuda 的电脑可以把注释撤掉

# import os

# os.environ['CUDA_VISIBLE_DEVICES']='5'

con = config_o.Config()

con.set_use_gpu(True)

con.set_in_path("./benchmarks/FB15K/")

con.set_work_threads(8)

con.set_train_times(1000)

con.set_nbatches(100)

con.set_alpha(0.001)

con.set_bern(0)

con.set_dimension(100)

con.set_margin(1.0)

con.set_ent_neg_rate(1)

con.set_rel_neg_rate(0)

con.set_opt_method("SGD")

con.set_save_steps(100)

con.set_valid_steps(100)

con.set_early_stopping_patience(10)

con.set_checkpoint_dir("./checkpoint")

con.set_result_dir("./result")

con.set_test_link(True)

con.set_test_triple(True)

con.init()

con.set_train_model(TransE)

con.train()

测试代码为:

import config

from models import *

import json

## 因为我的服务器上没有cuda 就把os注释掉了,如果有安装cuda 的电脑可以把注释撤掉

# import os

# os.environ['CUDA_VISIBLE_DEVICES']='6'

con = config.Config()

con.set_use_gpu(False)

#Input training files from benchmarks/FB15K/ folder.

con.set_in_path("./benchmarks/FB15K/")

#True: Input test files from the same folder.

con.set_result_dir("./result/")

con.set_test_link(True)

con.set_test_triple(True)

con.init()

con.set_test_model(TransE)

con.test()

错误4: pytorch 1.0 弃用引起的警告

/home/chenmengyuan/py35/OpenKE/models/TransE.py:19: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.

nn.init.xavier_uniform(self.ent_embeddings.weight.data)

/home/chenmengyuan/py35/OpenKE/models/TransE.py:20: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_. nn.init.xavier_uniform(self.rel_embeddings.weight.data)

解决方案: 打开models中的TransE.py 找到19/20行的代码,将nn.init.xavier_uniform_后边加一个下划线nn.init.xavier_uniform_

结果展示:

时间消耗:

一个transe 模型训练大概要跑1个小时,测试半个小时之内吧,真慢啊,漫长的模型时间,可以趁着这个时候看看源码.

小建议:

希望下次有一个MINI FB15K 数据集出来,10分钟搞一搞,关键不是看结果,是看这些代码是不是都跑的通,我遇到了一个坑,

训练集跑完了,测试集跑的时候报错了,结果训练集的东西没保存,修正完错误后又重新跑了一遍…

当然你可以把epoch 都调为1跑一跑非常快,可以用来测试代码是否都准确了,我当时跑的时候没有想到测试代码,被坑到了

训练结果:

感觉训练出来的效果不是很好

训练过程:

Epoch 989 | loss: 1045.272685

Epoch 990 | loss: 980.651575

Epoch 991 | loss: 1002.198379

Epoch 992 | loss: 980.045353

Epoch 993 | loss: 959.496778

Epoch 994 | loss: 985.212145

Epoch 995 | loss: 1008.031349

Epoch 996 | loss: 1005.073647

Epoch 997 | loss: 990.986805

Epoch 998 | loss: 945.992249

Epoch 999 | loss: 981.005116

TransE 模型的结果 68%

测试结果还比较好最高hits@10 达到了80%

第一集的技术细节及bug 总结就说到这里

还会写一个代码细节原理以及结果的解读称之为第二集

安新宇

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

linux anzhung python,清华开源知识图谱 OPENKE pytorch github入门bug及解决方案 配置: LINUX python 3.5...

linux anzhung python,清华开源知识图谱 OPENKE pytorch github入门bug及解决方案配置: LINUX python 3.5...