2021-03-15

晋升之道

于 2021-05-11 20:47:20 发布

阅读量255

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/qq_28400629/article/details/114854370

版权

深度学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

深度学习-李宏毅课程

Course Introduction

Rule Of ML

Regression

Basic Concept

Gradient Descent

Optimization for Deep Learning

Classification

Logistic Regression

Brief Introduction of Deep Learning

Backpropagation

Tips for Training DNN

Why Deep Learning

Convolutional Nerual Network

Graph Neural Network

Recurrent Neural Network

Semi-supervised Learning

un-supervised Learning

Explainable ML

Attack ML Models

Network Compression

Conditional Generation by RNN

Recursive

Transformer

New Architecture

Unsupervised Learning

Auto Encoder

Introduction of ELMO,BERT,GPT

Anomaly Detection

Generative Adversarial Network

Meta Learning

Life Long

Deep Reinforcement Learning

Introduce

1. 机器学习--自动找函数

Speech Recognition
Image Recognition
Playing Go
Dialogue System

2. 课程作业

Address：http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML20.html

3. 交作业的一些细节

Regression

1. example of application

Estimating the combat power (cp) of a pokemon after evolution

1) find a model

2) goodness of Function

collect the data

Loss Function $L(f)=L(w,b) = \sum_{n=1}^{10}(\hat{y}-(b + w\cdot x_{cp}^{n}))^{2}$

Best Funtion $f^{*}=argminL(f)$

Gradient Descent :

容易陷入局部最优

Linear model: result bad

Polynomial Model：二次、三次可能会造成过拟合

引入 -> factors pokemon 物种

根据不同的物种选择合适的model

权重 -> 正则化 Regularization

$y=b+\sum w_{i}x_{i}$

$L=\sum_{n}(\hat{y}-(b+\sum w_{i} x_{i}))^{2}+\lambda \sum (w_{i}^{2})$

reason:

对于输入来说，权重影响的最小，即更加平滑

bias无须正则化 -> 它不影响平滑程度

Basic Concept

1. Bias and Variance of Estimator

解释：普通模型的方差比复杂模型的方差要小-->简单模型受数据的影响较小，例如 f(x)=c，则所有输出都为c，其variance就为0

解释：模型简单就越集中在一块，如果一块预测不准则都预测不准，所以导致bias大，复杂模型就反之

这就引入欠拟合和过拟合

2. Cross Validation

N-fold Cross Validation

Gradient Descent

skills:

1. Adaptive Learning Rate

随着迭代的次数应该越来越小

引入 -> Adagrad

一阶微分、二阶微分

Stochastic Gradient Descent

Feature Scaling

Question：

answer：incorrect

当多参数的时候，上图当前位置的前方(参数一)的低，右方(参数二)低，那下一次行动就会朝着右前方行动，但是其loss会变大

如何站在一点上，眼观一周找出最大的gradient -> Taylor Series

condition: learning rate 无穷小，下面才成立

21/3/21

Optimization for Deep Learning

1. 优化方法

SGD 1847
SGD with momentum 1986
Adagrad 2011
RMSProp 2013
Adam 2015

2. On-line vs off-line

3. 具体

SGD

SGD with Momentum(SGDM)

Adagrad

RMSProp

Adam

4. Application

Adam vs SGDM

引入：SWATS: Begin with adam ，end with sgdm 2017

5. Adam

warm up : 需要在训练最初使用较小的学习率来启动，并很快切换到大学习率而后进行常见的 decay

6. kinds of optimizer

Classification

2. Two Boxes 引入贝叶斯公式

特征全部用上，其预测效果也不好

Modifying Model

steps

2. Posterior Probability 后验概率

Logistic Regression

交叉熵

2. Cross Entropy vs Square Error

3. Discriminative vs Generative

example:

conclusion

Multi-class Classification

引入one-hot 编码

Limitation of Logistic Regression

下图：无法找到一条线将其分割开

~~如果必须要使用Logistic Regression 使用核函数特征的高维映射~~

Cascading logistic regression models --> 神经网络

21/3/22

Brief Introduction of Deep Learning

1. develop

2. steps

3. How many layers? How many neurons for each layer

Trial nad Error + Intution

21/3/23

Backpropagation

Tips for Training DNN

2. Do not always blame Overfitting

3. tips

3.1 Vanishing Gradient Problem

使用sigmoid function Δw变大，结果影响较小，则层层衰减，导致梯度消失 -> 改变 activate function ReLU

-> Leaky ReLU -> Maxout

Learnable activation function -- Maxout

3.2

3.2.1 L2和L1

L2：密集

L1：稀疏

WHY Deep Learning

1. Modularization 模块化 -> 有效

2. Anlogy 逻辑电路类别深度学习

example ：剪窗花

3. End-to-End learning

4. Complex Task

very similar input，different output
very different input，similar output

CNN

1. Why CNN for Image

2. The Whole CNN
2.1 Convolution

权值共享

2.2 what does CNN learn

2.2.1 convolution neural network

2.2.2 fully connected layer

特征组合

应用：Deep Dream

21/3/24

Graph Neural Network -> GNN

1. rb-tree -> graph

2. the function GNN

example: 推断凶手

简单的推断：

利用图信息：

如今的graph存在的问题：

这样就不能直接使用CNN，要进行变换

3. OutLine

Semi-Supervised Learning -- 半监督学习

3.1 Tasks,Dataset,and Benchmark

3.1.1 Spatial-based Convolution

1) MN4G(Neural Networks for Graph)

2) DCNN(Diffusion-Convolution Neural Network)

3) DGC(Diffusion Graph Convolution)

4) MoNet(Mixture Model Networks)

5) GAT(Graph Attention Networks) 自己学习weight

3.1.2 Graph Signal Processing and Spectral-based GNN

4. ChebNet

5. GCN

RNN

1. example

2. RNN

2.1 Base

2.2 Elman Network & Jordan Network

2.3 Bidirectional RNN

2.4 LSTM

example:

3. RNN Loss

解决方式：threshold if gradient > threshold else gradient

LSTM 解决Loss

2.5 GRU Gated Recurrent Unit

2.6 Many To One

情感分析

question:

引入CTC 语言辨识

machine translation

syntactic parsing：语法解析

sequence-to-sequence

4. Attention Model

21/3/25

Semi-supervised Learning

outline:

Semi-supervised Learning for Generative Model
Low-density Separationo Assumption
Smoothness Assumption
Better Representation

1. Semi-supervised Learning for Generative Model

2. Low-density Separation Assumption 非黑即白

交叉信息熵：

3. Smoothness Assumption

机器学习中的：cluster and then label

4. Graph-based Approach

How to use : 定量

5. Better Representation

un-supervised Learning

1. world embedding belongs to unsupervised

auto-encoder doesn't work

2. Count based

3. Prediction-based

Training:

4. world Embedding

Explainable ML

2. some model interpretable and powerful at the same time

example: Decision Tree -> Forest

3. Local Explanation

3.1 Basic Idea

example:

实验：随便堆砌几个卷积网络层

然而。。。 实际什么都没有学到

3.2 Activation Maximization

加入R

3. Global Explanation 控制它产生想要的图像

4. Using A Model Explain Another

4.1 Linear Model

4.2 Decision Tree

5. Attribution

6. Probing

7. Generative CNN

21/3/30

Attack ML Models

1. Attack

example:

attack ways:

Training

Non-targeted Attack
Targeted Attrack

21/3/31

2. how to attack

example:

3. what happened

4. reference

5. attack approaches

5.1 FGSM

5.2 white box vs black box

exmaple:

5.3 Universal Adversarial Attack 普遍对抗性攻击

5.4 Adversarial Reprogramming

5.5 Attack in the real world

人脸辨识系统：

交通标志：

6. Defense

6.1

example:

6.2

7. Attack on Image

7.1 one pixel attack

how do we find the exact pixel and value

不追求最好，追求攻击成功就行：

引入: Differential Evolution

example:

first step

second step

third step

fourth step

start from the second step

21/4/7

Network Compression

1. Network to Mobile

2. Network Pruning -> 网络剪枝

2.1

2.2 Why Pruning

example:

2.3 issue

3. Konowledge distillation 知识蒸馏

paper:

3.1

4. Parameter Quantization 量化参数

5. low rank approximation

引入 Depthwise Separable Convolution

6. dynamic computation

21/4/13

Conditional Generation By RNN

1. Generation

generating a structured object component-by-component

example:

1) sentences are composed of characters/words

2) Images are composed of pixels

2. Attention

3. Tips for Generation

21/4/14

4. Pointer Network

为解决的问题之一：给一组数据点，选择其中的点连成线，并且这个圈能够将剩余的点包围起来

Recursive

example:

Transformer -- 变形金刚

1. Seq2seq model with "self-attention"

self-attention: paper--arxiv.org/abs/1706.03762 attention is all you need

21/4/15

New Architecture

1. FCN CNN RNN

Follow up SOTA structure

SOTA全称是state of the art，是指在特定任务中目前表现最好的方法或模型。

Benchmark和baseline都是指最基础的比较对象

3. transformer 改进

3.1 仅交换层次结构：fnn 与 self-attention交换顺序

--> Sandwich transform

3.2

3.3

4. BERT Bidirectional Encoder Representation from Transformers,

4.1 ALBERT

5. REFORMER

6. Style GAN

Unsupervised Learning

1.1

1.2

21/４/20

3.1 PCA

SVD

example:

PCA-pokmon

推荐系统：

Unsupervised Learning -- Neighbor Embedding

4. t-SNE

散度公式

Auto-encoder

3. Auto-encoder Text Retrieval

4. Similar Image Search

21/4/27

Putting Words into Computers: Introduction of ELMO,BERT,GPT

1. 1-of-N Encoding vs World Embedding

每个单词在不同语境下由不同的意思 -->引入　ELMO(Embeddings from Language Model)　

2. ELMO(Embeddings from Language Model)　

3. BERT(Bidirectonal Encoder Representations from Transformers)

训练方式：

GPT(Generative Pre-Training)

https://talktotransformer.com GPT

Self-Supervised Learning

Anomaly Detection

fraud detection

attack detection

21/4/28

任务机制

4. 最大似然估计

GMM(Gaussian Mixture Model)

21/5/7

Generative Adversarial Network

1. the gan zoo github

basic idea of gan
gan as structured learning
gan generator learn by itself
can discriminator generate
a little bit theory

3.1

3.2 step

4. Structured Learning Approach

21/5/9

7. Text-to-Image

8. Image-to-Image

Unsupervised Conditional Generation

3. theroy behind the gan

3.2 fGan

21/5/10

Meta Learning

1. Meta learning = learn to learn

2.1 step one: Define a set of learning algorithm

2.2 find the best function F*

3. example:Omniglot

4.1 MAML

4.1 EXAMPLE

5. Math

5. Reptile

21/5/11

Life Long

1. define:

example

--> Catastrophic Forgetting(灾难性遗忘)

solve:

Deep Reinforcement Learning

example

2.1

MuZero

晋升之道

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
2021-03-15

深度学习-李宏毅课程Course IntroductionRule Of ML1. 机器学习--自动找函数Speech Recognition Image Recognition Playing Go Dialogue System2. 课程作业Address：http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML20.html
复制链接

扫一扫