2019.9.4 note

2019.9.4 note

A Simple Theoretical Model of Importance for Summarization

  1. Define Redundancy, Relevance and Informativeness.

  2. Prove the formulation of the theoretical model of importance according to some assumptions.

  3. Conduct experiments to show that their model correlates well with human judgments.

LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules

It encodes logic rules into knowledge graphs embedding by adding regularization terms in optimization.

Norm-Preservation

  1. It analyzes the effect of slip connections and explains that ResNet can be so deep because the Norm Preservation mechanism in residual blocks and proves it.
  2. It enhances the norm-preservation ability by stacking more layers.
  3. It pushes extra norm-preservation by regularize the singular values.

Squeeze-and-Excitation Networks

S E − b l o c k ( x ) = x ⊙ s c a l e ( x ) SE-block(x)=x\odot scale(x) SEblock(x)=xscale(x) where scale(x): [H,W,C] -> global pooling -> [1,1,C] -> FC and relu -> [1, 1, C/r] -> FC and sigmoid -> [1, 1, C] -> copy -> [H, W, C].

SE-block can before, after or parallel to other blocks.

ON THE VALIDITY OF SELF-ATTENTION AS EXPLANATION IN TRANSFORMER MODELS

  1. In transformers, hidden states of position i are mixed of all word embeddings and word i plays a small role in hidden states of position i of intermediate layers.
  2. However, the contribution (defined in paper, according to gradients) of word i in position i of intermediate layers is still maximum in all words.

ONE MODEL TO RULE THEM ALL

It present a new flavor of Variational Auto Encoder (VAE) that interpolates seamlessly between unsupervised, semi-supervised and fully supervised learning domains.
The VAE model x − > ( π , μ , σ ) − > x r e c o n g , L = L E L B O + L c l x->(\pi, \mu, \sigma)->x_{recong}, L=L_{ELBO}+L_{cl} x>(π,μ,σ)>xrecong,L=LELBO+Lcl π \pi π is one-hot vector for classification and L c l L_{cl} Lcl is the CE classification loss function (only for labeled data). For semi-supervision, ( π , μ , σ ) (\pi, \mu, \sigma) (π,μ,σ) treated as latent state. For supervised classifier, π \pi π is treated as input and x r e c o n g x_{recong} xrecong is treated as output. For unsupervised anomaly detector, ( μ , σ ) (\mu, \sigma) (μ,σ) is treated as latent state.
在这里插入图片描述

Smaller Models, Better Generalization

It analyzes the network complexity based on upper bound on VC-dim. It attempts to extend the ideas of minimal complexity machines and learn the weights of a neural network by minimizing the empirical error and an upper bound on the VC dimension. It proposes a pruning method and analyzes the quantization. We observe that pruning and then quantizing the models helps to achieve comparable or better sparsity in terms of weights and allows for better generalization abilities.

Testing Robustness Against Unforeseen Adversaries

It proposes some novel adversaries: L p L_p Lp-JPEG, FOG, Gabor and Snow.

在这里插入图片描述
The adversarial attack means that for a target label y ′ ≠ y y'\ne y y=y, find x ′ x' x under some construction to ensure that l ( f ( x ′ ) , y ′ ) l(f(x'), y') l(f(x),y) is minimized. (The optimization method is interesting.)

It also proposes a methodology for assessing robustness against unforeseen adversaries. It proposes an adversarial robustness metric named U A R UAR UAR. It also analyzes adversarial training against a single distortion type and joint adversarial training.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值