论文Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings代码运行

最新推荐文章于 2024-06-08 09:41:36 发布

ChessZH

最新推荐文章于 2024-06-08 09:41:36 发布

阅读量875

点赞数 3

分类专栏：学习记录文章标签： python 自然语言处理 pytorch

本文链接：https://blog.csdn.net/xiangQiAtCSDN/article/details/109243376

版权

学习记录专栏收录该内容

13 篇文章 1 订阅

订阅专栏

论文Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings代码运行期间遇到的问题与解决方法

论文与代码下载地址
代码试运行过程

论文与代码下载地址

GitHub源码
 论文PDF

代码试运行过程

下载预训练模型

从GitHub上下载到源码后，我们会发现训练模型部分是缺失的。打开根目录下的README.md,我们可以看到如下内容：

EmbedKGQA

This is the code for our ACL 2020 paper Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings

Video

在这里插入图片描述

Instructions

In order to run the code, first download data.zip and pretrained_model.zip from https://drive.google.com/drive/folders/1RlqGBMo45lTmWz9MUPTq-0KcjSd3ujxc?usp=sharing. Unzip these files in the main directory.

MetaQA

Change to directory ./KGQA/LSTM. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --hidden_dim 256 \
--gpu 2 --freeze 0 --batch_size 128 --validate_every 5 --hops 2 --lr 0.0005 --entdrop 0.1 --reldrop 0.2  --scoredrop 0.2 \
--decay 1.0 --model ComplEx --patience 5 --ls 0.0 --kg_type half

WebQuestionsSP

Change to directory ./KGQA/RoBERTa. Following is an example command to run the QA training code

python3 main.py --mode train --relation_dim 200 --do_batch_norm 0 \
--gpu 2 --freeze 1 --batch_size 16 --validate_every 10 --hops webqsp_half --lr 0.00002 --entdrop 0.0 --reldrop 0.0 --scoredrop 0.0 \
--decay 1.0 --model ComplEx --patience 20 --ls 0.0 --l3_reg 0.001 --nb_epochs 200 --outfile half_fbwq

Note: This will run the code in vanilla setting without relation matching, relation matching will have to be done separately.

Also, please not that this implementation uses embeddings created through libkge (https://github.com/uma-pi1/kge). This is a very helpful library and I would suggest that you train embeddings through it since it supports sparse embeddings + shared negative sampling to speed up learning for large KGs like Freebase.

Dataset creation

MetaQA

KG dataset

There are 2 datasets: MetaQA_full and MetaQA_half. Full dataset contains the original kb.txt as train.txt with duplicate triples removed. Half contains only 50% of the triples (randomly selected without replacement).

There are some lines like ‘entity NOOP entity’ in the train.txt for half dataset. This is because when removing the triples, all triples for that entity were removed, hence any KG embedding implementation would not find any embedding vector for them using the train.txt file. By including such ‘NOOP’ triples we are not including any additional information regarding them from the KG, it is there just so that we can directly use any embedding implementation to generate some random vector for them.

QA Dataset

There are 5 files for each dataset (1, 2 and 3 hop)

qa_train_{n}hop_train.txt
qa_train_{n}hop_train_half.txt
qa_train_{n}hop_train_old.txt
qa_dev_{n}hop.txt
qa_test_{n}hop.txt

Out of these, qa_dev, qa_test and qa_train_{n}hop_old are exactly the same as the MetaQA original dev, test and train files respectively.

For qa_train_{n}hop_train and qa_train_{n}hop_train_half, we have added triple (h, r, t) in the form of (head entity, question, answer). This is to prevent the model from ‘forgetting’ the entity embeddings when it is training the QA model using the QA dataset. qa_train.txt contains all triples, while qa_train_half.txt contains only triples from MetaQA_half.

WebQuestionsSP

KG dataset

There are 2 datasets: fbwq_full and fbwq_half

Creating fbwq_full: We restrict the KB to be a subset of Freebase which contains all facts that are within 2-hops of any entity mentioned in the questions of WebQuestionsSP. We further prune it to contain only those relations that are mentioned in the dataset. This smaller KB has 1.8 million entities and 5.7 million triples.

Creating fbwq_half: We randomly sample 50% of the edges from fbwq_full.

QA Dataset

Same as the original WebQuestionsSP QA dataset.

Citation:

Please cite the following paper if you use this code in your work.

@inproceedings{saxena-etal-2020-improving,
    title = "Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings",
    author = "Saxena, Apoorv  and
      Tripathi, Aditay  and
      Talukdar, Partha",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.412",
    doi = "10.18653/v1/2020.acl-main.412",
    pages = "4498--4507"
}

For any clarification, comments, or suggestions please create an issue or contact Apoorv.

从上述内容可以得知，我们应该在https://drive.google.com/drive/folders/1RlqGBMo45lTmWz9MUPTq-0KcjSd3ujxc?usp=sharing
下载预训练模型，并将其解压在EmbedKGQA-master目录下
注1：这是谷歌网盘，且大小接近10GB，境内下载需要梯子
注2：原文说要解压在主目录下，但是检查代码可以发现实际应该放在EmbedKGQA-master目录下

运行模型训练部分（LSTM）

初次运行模型，会提示你缺少最新版本的pytorch包。轻车熟路地在后台运行：

pip install torch -i 镜像源

会发现无法下载，且有以下报错：

ModuleNotFoundError: No module named ‘tools.nnwrap’

查找资料后发现，这是写这个包的公司的一道验证过不去，需要去官网https://pytorch.org/选择你想要的版本后自动生成pip命令
在这里插入图片描述
然后把这个pip命令放在控制台运行即可

运行模型使用部分

暂无困难，后续可能会更新

ChessZH

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
22
评论
论文Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings代码运行

论文Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings代码运行期间遇到的问题与解决方法论文与代码下载地址代码试运行过程下载预训练模型EmbedKGQAInstructionsMetaQAWebQuestionsSPDataset creationMetaQAKG datasetQA DatasetWebQuestionsSPKG datasetQA DatasetCitati
复制链接

扫一扫

专栏目录