2019.9.4 note

最新推荐文章于 2020-08-23 14:42:34 发布

pku_zzy

最新推荐文章于 2020-08-23 14:42:34 发布

阅读量752

点赞数

分类专栏： Paper Reading

本文链接：https://blog.csdn.net/PKU_ZZY/article/details/102594610

版权

Paper Reading 专栏收录该内容

13 篇文章 0 订阅

订阅专栏

2019.9.4 note

A Simple Theoretical Model of Importance for Summarization

Define Redundancy, Relevance and Informativeness.
Prove the formulation of the theoretical model of importance according to some assumptions.
Conduct experiments to show that their model correlates well with human judgments.

LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules

It encodes logic rules into knowledge graphs embedding by adding regularization terms in optimization.

Norm-Preservation

It analyzes the effect of slip connections and explains that ResNet can be so deep because the Norm Preservation mechanism in residual blocks and proves it.
It enhances the norm-preservation ability by stacking more layers.
It pushes extra norm-preservation by regularize the singular values.

Squeeze-and-Excitation Networks

$SE-block(x)=x\odot scale(x)$ where scale(x): [H,W,C] -> global pooling -> [1,1,C] -> FC and relu -> [1, 1, C/r] -> FC and sigmoid -> [1, 1, C] -> copy -> [H, W, C].

SE-block can before, after or parallel to other blocks.

ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS

In transformers, hidden states of position i are mixed of all word embeddings and word i plays a small role in hidden states of position i of intermediate layers.
However, the contribution (defined in paper, according to gradients) of word i in position i of intermediate layers is still maximum in all words.

ONE MODEL TO RULE THEM ALL

It present a new flavor of Variational Auto Encoder (VAE) that interpolates seamlessly between unsupervised, semi-supervised and fully supervised learning domains.
The VAE model $x->(\pi, \mu, \sigma)->x_{recong}, L=L_{ELBO}+L_{cl}$ ， $\pi$ is one-hot vector for classification and $L_{cl}$ is the CE classification loss function (only for labeled data). For semi-supervision, $(\pi, \mu, \sigma)$ treated as latent state. For supervised classifier, $\pi$ is treated as input and $x_{recong}$ is treated as output. For unsupervised anomaly detector, $(\mu, \sigma)$ is treated as latent state.
在这里插入图片描述

Smaller Models, Better Generalization

It analyzes the network complexity based on upper bound on VC-dim. It attempts to extend the ideas of minimal complexity machines and learn the weights of a neural network by minimizing the empirical error and an upper bound on the VC dimension. It proposes a pruning method and analyzes the quantization. We observe that pruning and then quantizing the models helps to achieve comparable or better sparsity in terms of weights and allows for better generalization abilities.

Testing Robustness Against Unforeseen Adversaries

It proposes some novel adversaries: $L_p$ -JPEG， FOG, Gabor and Snow.

在这里插入图片描述
The adversarial attack means that for a target label $y'\ne y$ , find $x^{'}$ under some construction to ensure that $l (f (x^{'}), y^{'})$ is minimized. (The optimization method is interesting.)

It also proposes a methodology for assessing robustness against unforeseen adversaries. It proposes an adversarial robustness metric named $U A R$ . It also analyzes adversarial training against a single distortion type and joint adversarial training.

pku_zzy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2019.9.4 note

paper/2019.9.4 notepaper/2019.9.4 noteA Simple Theoretical Model of Importance for SummarizationLogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical RulesNorm-PreservationSqueeze-and...
复制链接

扫一扫