2021SC@SDUSC
工作总结
整个知识图到文本生成的算法的关键代码和核心代码,已经由我和两个队友全部分析完毕。我们在项目初期开会商讨,认为整个算法大致分为三个板块:训练→生成→评估。对应的源代码为train.py→generator.py→eval.py。三个代码需要分别运行,才能得到最终结果。
我主要负责分析数据集的训练,总结一下代码的逻辑和脉络,结合论文中提到的:
“To effect our study, we use a collection of abstracts from a corpus of scientific articles (Ammar et al., 2018). We extract entity, coreference, and relation annotations for each abstract with a state-of-the-art information extraction system (Luan et al., 2018), and represent the annotations as a knowledge graph which collapses co-referential entities. An example of a text and graph are shown in Figure 1. We use these graph/text pairs to train a novel attention-based encoder-decoder model for knowledge-graph-to-text generation. Our model, GraphWriter, extends the successful Transformer for text encoding (Vaswani et al., 2017) to graph-structured inputs, building on the recent Graph Attention Network architectur