2019.10.18 note

2019.10.18 note

Quaternion Knowledge Graph Embeddings

In this work, authors move beyond the traditional complex-valued representations, introducing more expressive hypercomplex representations to model entities and relations for knowledge graph embeddings. More specifically, quaternion embeddings, hypercomplex-valued embeddings with three imaginary components, are utilized to represent entities. Relations are modeled as rotations in the quaternion space. Experimental results demonstrate that their method achieves state-of-the-art performance on four well established knowledge graph completion benchmarks.

Code: github

MixMatch: A Holistic Approach to Semi-Supervised Learning

In this work, they proposes a semi-supervised algorithms which utilizes three methods: Consistency Regularization (the consistency of the probability predicted by the model with a stochastic data augmentation in two runs, or two random seeds of data augmentation), Entropy Minimization and Traditional Regularization (weight decay).

Given a batch X of labeled examples with corresponding one-hot targets (representing one of L possible labels) and an equally-sized batch U of unlabeled examples, MixMatch produces a processed batch of augmented labeled examples X’ and a batch of augmented unlabeled examples with “guessed” labels U’. Then used in computing separate labeled and unlabeled loss terms. More formally, the combined loss L for semi-supervised learning is computed as:
在这里插入图片描述
在这里插入图片描述

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing

  1. GCN: ( H ( i + 1 ) = σ ( A ^ H ( i ) W ( i ) ) , A ^ = D − 1 / 2 ( A + I ) D − 1 / 2 H^{(i+1)}=\sigma(\hat AH^{(i)}W^{(i)}),\hat A=D^{-1/2}(A+I)D^{-1/2} H(i+1)=σ(A^H(i)W(i)),A^=D1/2(A+I)D1/2, where D D D is diagonal node degree matrix and A A A is the description of the graph structure in matrix form). Their proposed GCN: ( H ( i + 1 ) = concat j [ σ ( A ^ j H ( i ) W j ( i ) ) ] H^{(i+1)}=\textbf{concat}_j[\sigma(\hat A^jH^{(i)}W^{(i)}_j)] H(i+1)=concatj[σ(A^jH(i)Wj(i))]).
  2. They prove that GCNs are not capable of representing general layer-wise neighborhood mixing. However, GCNs defined using their proposed method are capable of representing general layer-wise neighborhood mixing.

TransSent: Towards Generation of Structured Sentences with Discourse Marker

This paper focuses on the task of generating long structured sentences with explicit discourse markers, by proposing a new task Sentence Transfer and a novel model architecture TransSent. For example: I like apples because they are sweet. head -> I like apples, relation -> because, tail -> they are sweet.

Their assumption is similar to TransE. They introduce three loss terms: recong loss, distance loss, ratio loss. Distance loss encourages prediction to be close to tail and ratio loss encourages and the term dis(prediction, tail)/dis(prediction, head) to be large.

The dataset: github

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

  1. In Figure 1, they compare autoregressive, non-autoregressive and their proposed model. Non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs.

在这里插入图片描述

  1. The model is shown in Figure 2.

在这里插入图片描述

  1. This work also utilizes these methods: variational inference (ELBO: reconstruction error and KL-divergence) when training, normal distribution for generating latent variables, actnorm in decoder, invertible multi-head linear layers in decoder, affine coupling layers, NN for predicting target sequence length, noisy parallel decoding and importance weighted decoding.

Heterogeneous Graph Attention Networks for Semi-supervised Short Text Classification

  1. This work first presents the HIN framework for modeling the short texts.

在这里插入图片描述

  1. Unfortunately, GCN cannot be directly applied to the HIN for short texts due to the node heterogeneity issue. Specifically, in the HIN, we have three types of nodes: documents, topics and entities with different feature spaces. To address the issue, they propose the heterogeneous graph convolution, which considers the difference of various types of information and projects them into an implicit common space with their respective transformation matrices. H ( l + 1 ) = σ ( ∑ t A t H t ( l ) W t ( l ) ) H^{(l+1)}=\sigma(\sum_tA_tH^{(l)}_tW^{(l)}_t) H(l+1)=σ(tAtHt(l)Wt(l)), where t t t denotes the type index here.

  2. They also propose Dual-level Attention Mechanism: Type-level Attention and Node-level Attention and replace A t A_t At as attention weight matrix B t B_t Bt.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值