[Paper Summary] Optimal Brain Damage

最新推荐文章于 2021-10-14 23:08:25 发布

芝麻挞

最新推荐文章于 2021-10-14 23:08:25 发布

阅读量266

点赞数

分类专栏：我爱读的paper

本文链接：https://blog.csdn.net/weixin_43928665/article/details/119100218

版权

Optimal Brain Damage

— LeCun, Denker and Solla, 1989, Advances in Neural Information Processing Systems

We introduce OBD for reducing the size of a learning network by selectively deleting weights based on second-derivative information.

We show that OBD can be used both as a) an automatic network minimization procedure and b) as an interactive tool to suggest better architectures, which contrasts with the view of weight deletion as a more-or-less automatic procedure.

It is possible to take perfectly reasonable network, delete half or more of the weights and wind up with a network that works just as well, or better (Seems like a precursor for Lottery Tichet Hypothesis). But we emphasize that the starting point was a state-of-the-art network.

Motivation

The traditional way to trade-off complexity with acc is to add a regularization cost (e.g. VC-dim, description length, #non-zero free parameters, etc.)

A simple strategy consists of deleting parameters with “small saliency”. Other things being equal, small-magnitude parameters will have the least saliency, so a reasonable initial strategy is to train the network, delete small-magnitude parameters in order, then retrain the network. This procedure can be iterated. In the limit, it reduces to a continuous weight-decay during training using disproportionally rapid decay of small-magnitude parameters. Two drawbacks are a) require fine-tuning, b) le

最低0.47元/天解锁文章

芝麻挞

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
[Paper Summary] Optimal Brain Damage

Optimal Brain Damage— LeCun, Denker and Solla, 1989, Advances in Neural Information Processing SystemsWe introduce OBD for reducing the size of a learning network by selectively deleting weights based on second-derivative information.We show that OBD
复制链接

扫一扫