2019.10.18 note

最新推荐文章于 2024-06-03 16:45:52 发布

pku_zzy

最新推荐文章于 2024-06-03 16:45:52 发布

阅读量2.3k

点赞数

分类专栏： Paper Reading

本文链接：https://blog.csdn.net/PKU_ZZY/article/details/102658841

版权

Paper Reading 专栏收录该内容

13 篇文章

订阅专栏

本文探讨了Quaternion知识图谱嵌入、半监督学习、图卷积网络、句子生成与序列生成等前沿研究。Quaternion知识图谱嵌入通过引入更复杂的超复数表示，实现了知识图谱嵌入的最新成果。MixMatch算法结合一致性正则化、熵最小化等策略，推动了半监督学习的发展。MixHop提出了一种新的图卷积架构，增强了层间邻居混合的表示能力。TransSent模型专注于生成带连词的长句，FlowSeq模型实现非自回归条件序列生成，而异构图注意力网络则改进了短文本分类。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

2019.10.18 note

Quaternion Knowledge Graph Embeddings

In this work, authors move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modeled as rotations in the quaternion space. Experimental results demonstrate that their method achieves state-of-the-art performance on four well established knowledge graph completion benchmarks.

Code: github

MixMatch: A Holistic Approach to Semi-Supervised Learning

In this work, they proposes a semi-supervised algorithms which utilizes three methods: Consistency Regularization (the consistency of the probability predicted by the model with a stochastic data augmentation in two runs, or two random seeds of data augmentation), Entropy Minimization and Traditional Regularization (weight decay).

Given a batch X of labeled examples with corresponding one-hot targets (representing one of L possible labels) and an equally-sized batch U of unlabeled examples, MixMatch produces a processed batch of augmented labeled examples X’ and a batch of augmented unlabeled examples with “guessed” labels U’. Then used in computing separate labeled and unlabeled loss terms. More formally, the combined loss L for semi-supervised learning is computed as:
在这里插入图片描述

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

GCN: ( $H^{(i+1)}=\sigma(\hat AH^{(i)}W^{(i)}),\hat A=D^{-1/2}(A+I)D^{-1/2}$ , where $D$ is diagonal node degree matrix and $A$ is the description of the graph structure in matrix form). Their proposed GCN: ( $H^{(i+1)}=\textbf{concat}_j[\sigma(\hat A^jH^{(i)}W^{(i)}_j)]$ ).
They prove that GCNs are not capable of representing general layer-wise neighborhood mixing. However, GCNs defined using their proposed method are capable of representing general layer-wise neighborhood mixing.

TransSent: Towards Generation of Structured Sentences with Discourse Marker

This paper focuses on the task of generating long structured sentences with explicit discourse markers, by proposing a new task Sentence Transfer and a novel model architecture TransSent. For example: I like apples because they are sweet. head -> I like apples, relation -> because, tail -> they are sweet.

Their assumption is similar to TransE. They introduce three loss terms: recong loss, distance loss, ratio loss. Distance loss encourages prediction to be close to tail and ratio loss encourages and the term dis(prediction, tail)/dis(prediction, head) to be large.

The dataset: github

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

In Figure 1, they compare autoregressive, non-autoregressive and their proposed model. Non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs.

在这里插入图片描述

The model is shown in Figure 2.

在这里插入图片描述

This work also utilizes these methods: variational inference (ELBO: reconstruction error and KL-divergence) when training, normal distribution for generating latent variables, actnorm in decoder, invertible multi-head linear layers in decoder, affine coupling layers, NN for predicting target sequence length, noisy parallel decoding and importance weighted decoding.

Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

This work first presents the HIN framework for modeling the short texts.

在这里插入图片描述

Unfortunately, GCN cannot be directly applied to the HIN for short texts due to the node heterogeneity issue. Specifically, in the HIN, we have three types of nodes: documents, topics and entities with different feature spaces. To address the issue, they propose the heterogeneous graph convolution, which considers the difference of various types of information and projects them into an implicit common space with their respective transformation matrices. $H^{(l+1)}=\sigma(\sum_tA_tH^{(l)}_tW^{(l)}_t)$ , where $t$ denotes the type index here.
They also propose Dual-level Attention Mechanism: Type-level Attention and Node-level Attention and replace $A_t$ as attention weight matrix $B_t$ .