![](https://img-blog.csdnimg.cn/20201014180756913.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
深度学习
布纸所云
这个作者很懒,什么都没留下…
展开
-
【深度学习】Pointer Network
Pointer Network论文地址:https://arxiv.org/pdf/1506.03134.pdf概述传统的 seq2seq 模型会预先固定输出的词汇表,无法解决 输出序列的词汇表会随着输入序列长度的改变而改变 的问题,如寻找凸包等。对于这类问题,输出往往是输入集合的子集。下图是凸包问题示例:Pointer Network 的主要特点如下:Pointer Network的输出是离散的 token,对应 input sequence 中的位置;在输出的每一步, target c原创 2020-09-10 08:06:46 · 424 阅读 · 0 评论 -
【深度学习】残差神经网络
论文地址:Deep Residual Learning for Image Recognition网络退化问题(degradation)在神经网络可以收敛的前提下,随着网络深度增加,网络的表现先是逐渐增加至饱和,然后迅速下降。下图描述了使用不同深度的网络训练得到的训练集上的误差(training error)和测试集上的误差(test error)。残差块xl+1=xl+F(xl,Wl...原创 2020-04-18 19:56:55 · 1380 阅读 · 0 评论 -
李宏毅-ELMO, BERT, GPT
李宏毅-ELMO, BERT, GPT参考资料笔记教学视频课件引言One-of-Hot: 词汇鸿沟Word-embedding: 语义相近的词在向量空间上也比较近同一个词汇也会有不同的意思:Have you paid that money to the bank yet ?It is safest to deposit your money in the bank.The ...原创 2020-02-29 17:40:42 · 786 阅读 · 0 评论 -
Neural Networks and Deep Learning
3.6 Activation Function sigmoid: a=11+e−za=11+e−za=\frac{1}{1+e^{-z}} 取值在(0,1)之间,除非是二分类的输出层,一般不选用,因为tanhtanhtanh比sigmoid表现要好。tanh: a=ez−e−zez+e−za=ez−e−zez+e−za=\frac{e^z-e^{-z}}{e^{z}+e^{-z}}...原创 2018-09-01 11:33:21 · 231 阅读 · 0 评论 -
Improving Deep Neural Networks
1.10 Vanishing/Exploding Gradients这一篇写得特别好: 详解机器学习中的梯度消失、爆炸原因及其解决方法 One of the problems of training neural network, especially very deep neural networks, is data vanishing and exploding gradient...原创 2018-09-01 15:05:11 · 300 阅读 · 0 评论 -
Sequence Model (二)
传统的RNN没有办法捕获长期的依赖关系(Long term dependency)And we said that, if this is a very deep neural network, then the gradient from just output y, would have a very hard time propagating back to affect the...原创 2018-09-01 16:14:58 · 285 阅读 · 0 评论 -
Sequence Model (三)
Word RepresentationsUsing word embeddingsProperties of word embeddingsEmbedding matrixLearning word embeddingsword2vecNegative sampling详细的笔记: 第二周 自然语言处理与词嵌入(Natural Language Processi...原创 2018-09-02 11:45:03 · 724 阅读 · 0 评论 -
卷积神经网络
这篇文章解释地特别简单易懂: How do Convolutional Neural Networks work? 译文:图解CNN:通过100张图一步步理解CNNCNN的目的: 理想的情况下,我们希望,对于那些仅仅只是做了一些像平移,缩放,旋转,微变形等简单变换的图像,计算机仍然能够识别出图中的”X”和”O”。就像下面这些情况,我们希望计算机依然能够很快并且很准的识别出来: 这也...原创 2018-08-30 15:28:46 · 253 阅读 · 0 评论 -
Sequence Model (四)
BleuAttention Model IntuitionAttention ModelBleuOne of the challenges of machine translation is that, given a French sentence, there could be multiple English translations that are equa...原创 2018-09-02 14:42:27 · 330 阅读 · 0 评论 -
Keras windows plot_model问题解决
Keras plot_model问题解决办法安装 pydot_ngpip install pydot_ng下载 graphviz.msihttp://www.graphviz.org/Download_windows.php修改 pydot_ng的__init__.py修改 keras/utils/vis_utils.pyimport pydot_ng as pydot...原创 2018-12-06 11:50:02 · 855 阅读 · 0 评论 -
Sequence Models (一)
任务:识别人名 给定xxx,输出yyy标识每个词是否是人名的一部分。现在,输入有9个词,因此我们最后的输出长度也为9,分别表示这9个词是否为人名的一部分。标识: x:x<1>,x<2>,⋯,x<t>,⋯,x<Tx>x:x<1>,x<2>,⋯,x<t>,⋯,x<Tx>原创 2018-09-01 10:31:13 · 572 阅读 · 0 评论 -
3.2 为超参数选择合适的尺度
sampling uniformly 并不是适用于所有的参数。有时候需要考虑对数尺度。In the last video, you saw how sampling at random, over the range of hyperparameters, can allow you to search over the space of hyperparameters more effici...原创 2018-08-31 17:39:41 · 337 阅读 · 0 评论 -
2.7 Adam
这一篇写得特别详细: 深度学习优化算法解析(Momentum, RMSProp, Adam) Adam(Adaptive Moment Estimation) 初始化: vdW=0,vdb=0,SdW=0,Sdb=0vdW=0,vdb=0,SdW=0,Sdb=0v_{dW}=0, v_{db}=0, S_{dW}=0, S_{db}=0On iteration t: comp...原创 2018-08-31 16:46:51 · 158 阅读 · 0 评论 -
Batch Normalization
Batch normalization in Neural Networks Deeplearning.ai: Why Does Batch Norm Work? (C2W3L06)Fitting Batch Norm into a Neural Network对每个隐层的输入z(l),a(l)z(l),a(l)z^{(l)}, a^{(l)}做归一化(减去均值除以标准差),再用β,...原创 2018-08-30 17:14:34 · 248 阅读 · 0 评论 -
2.1 Mini-batch 梯度下降
Applying machine learning is a highly empirical process, is highly iterative process. In which you just had to train a lot of models to find one that works really well. So, it really helps to really...原创 2018-08-31 09:53:08 · 261 阅读 · 0 评论 -
2.2 理解Mini-batch
Batch gradient descent: With batch gradient descent on every iteration you go through the entire training set and you’d expect the cost to go down on every single iteration. So if we’ve had the co...原创 2018-08-31 10:36:15 · 540 阅读 · 0 评论 -
3.1 调试处理
One of the painful things about training deepness is the sheer number of hyperparameters you have to deal with, ranging from the learning rate alpha to the momentum term beta, if using momentum, or th...原创 2018-08-31 11:21:05 · 301 阅读 · 0 评论 -
2.3 指数加权平均
I want to show you a few optimization algorithms. They are faster than gradient descent. In order to understand those algorithms, you need to be able they use something called exponentially weighted...原创 2018-08-31 11:48:37 · 273 阅读 · 0 评论 -
2.4 理解指数加权平均
If beta equals 0.9 you got the red line. If it was much closer to one, if it was 0.98, you get the green line. And it it’s much smaller, maybe 0.5, you get the yellow line.Let’s look a bit more ...原创 2018-08-31 13:53:21 · 431 阅读 · 0 评论 -
2.5 指数加权平均的偏差修正
In a previous video, you saw this figure for beta = 0.9. This figure for beta = 0.98. But it turns out that if you implement the formula as written here, you won’t actually get the green curve when,...原创 2018-08-31 14:19:13 · 747 阅读 · 0 评论 -
2.6 动量梯度下降法
Gradient Descent with momentum In one sentence, the basic idea is to compute an exponentially weighted average of your gradients, and then use that gradient to update your weights instead.As a exam...原创 2018-08-31 15:15:44 · 1298 阅读 · 0 评论 -
2.6 RMSprop
There’s another algorithm calledRMSprop, which stands for root mean square prop, that can also speed up gradient descent. Let’s see how it works. Recall our example from before, that if you impl...原创 2018-08-31 16:23:14 · 390 阅读 · 0 评论