2021-03-15

深度学习-李宏毅课程

Course Introduction

Rule Of ML

Regression

Basic Concept

Gradient Descent

Optimization for Deep Learning

Classification

Logistic Regression

Brief Introduction of Deep Learning

Backpropagation

Tips for Training DNN

Why Deep Learning

Convolutional Nerual Network

Graph Neural Network

Recurrent Neural Network 

Semi-supervised Learning

un-supervised Learning

Explainable ML

Attack ML Models

Network Compression

Conditional Generation by RNN

Recursive

Transformer

New Architecture

Unsupervised Learning 

Auto Encoder

Introduction of ELMO,BERT,GPT

Anomaly Detection

Generative Adversarial Network

Meta Learning

Life Long

Deep Reinforcement Learning

Introduce

1. 机器学习--自动找函数

  • Speech Recognition
  • Image Recognition
  • Playing Go
  • Dialogue System

2. 课程作业

Address:http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML20.html

3. 交作业的一些细节

Regression

1. example of application

Estimating the combat power (cp) of a pokemon after evolution

1) find a model

2) goodness of Function     

collect the data

Loss Function   L(f)=L(w,b) = \sum_{n=1}^{10}(\hat{y}-(b + w\cdot x_{cp}^{n}))^{2}

Best Funtion      f^{*}=argminL(f)

Gradient Descent :

       

容易陷入局部最优

Linear model:  result bad

Polynomial Model:二次、三次  可能会造成 过拟合

引入 -> factors   pokemon 物种  

根据不同的物种选择合适的model

权重 -> 正则化       Regularization

y=b+\sum w_{i}x_{i}

L=\sum_{n}(\hat{y}-(b+\sum w_{i} x_{i}))^{2}+\lambda \sum (w_{i}^{2})

reason:

      对于输入来说,权重影响的最小,即更加平滑

bias无须正则化  -> 它不影响平滑程度

Basic Concept

1. Bias and Variance of Estimator

                 

解释:普通模型的方差比复杂模型的方差要小-->简单模型受数据的影响较小,例如 f(x)=c,则所有输出都为c,其variance就为0

解释:模型简单就越集中在一块,如果一块预测不准则都预测不准,所以导致bias大,复杂模型就反之

这就引入欠拟合和过拟合

2. Cross Validation

N-fold Cross Validation

Gradient Descent

skills:

1. Adaptive Learning Rate

 随着迭代的次数应该越来越小

引入 -> Adagrad

                 

一阶微分、二阶微分

Stochastic Gradient Descent

Feature Scaling

Question:

answer:incorrect

当多参数的时候,上图当前位置的前方(参数一)的低,右方(参数二)低,那下一次行动就会朝着右前方行动,但是其loss会变大

如何站在一点上,眼观一周找出最大的gradient ->  Taylor Series

              

           

condition: learning rate 无穷小,下面才成立

21/3/21

Optimization for Deep Learning

1. 优化方法

  • SGD    1847
  • SGD with momentum        1986
  • Adagrad           2011
  • RMSProp          2013
  • Adam              2015

2. On-line vs off-line

3. 具体

SGD

SGD with Momentum(SGDM)

      

Adagrad

RMSProp

Adam

4. Application

Adam vs SGDM

引入:SWATS: Begin with adam ,end with sgdm       2017

5. Adam 

warm up : 需要在训练最初使用较小的学习率来启动,并很快切换到大学习率而后进行常见的 decay

6. kinds of optimizer

         

Classification

1. 

2. Two Boxes 引入 贝叶斯公式

     

         

    

      

特征全部用上,其预测效果也不好

Modifying Model

steps

2. Posterior Probability  后验概率

      

    

Logistic Regression

1.

          

交叉熵

              

2. Cross Entropy vs Square Error

3. Discriminative vs Generative

           

example:

    

conclusion

Multi-class Classification

引入one-hot 编码

Limitation of Logistic Regression

下图:无法找到一条线将其分割开

如果必须要使用Logistic Regression   使用核函数   特征的高维映射

Cascading logistic regression models    -->    神经网络

    

21/3/22

Brief Introduction of Deep Learning

1. develop

2. steps

3. How many layers? How many neurons for each layer

Trial nad Error   +   Intution

21/3/23

Backpropagation

Tips for Training DNN

1. 

2. Do not always blame Overfitting

3. tips

3.1 Vanishing Gradient Problem

使用sigmoid function Δw变大,结果影响较小,则层层衰减,导致梯度消失    ->  改变 activate function  ReLU

-> Leaky ReLU  -> Maxout

 Learnable activation function   --  Maxout

  

3.2 

3.2.1 L2和L1

L2:密集

L1:稀疏   

WHY Deep Learning

1. Modularization   模块化   ->   有效

2. Anlogy   逻辑电路  类别   深度学习

example  :  剪窗花

3. End-to-End learning

4. Complex Task

  • very similar input,different output
  • very different input,similar output

  

CNN

1. Why CNN for Image

2. The Whole CNN
2.1 Convolution

权值共享

2.2 what does CNN learn

2.2.1 convolution neural network

2.2.2 fully connected layer

特征组合

应用:Deep Dream

       

          

21/3/24

Graph Neural Network    ->     GNN

1. rb-tree   ->  graph

       

2. the function GNN

example: 推断凶手

简单的推断:

利用图信息:

如今的graph存在的问题:

这样就不能直接使用CNN,要进行变换

3. OutLine 

Semi-Supervised Learning -- 半监督学习

3.1 Tasks,Dataset,and Benchmark

3.1.1 Spatial-based Convolution

1) MN4G(Neural Networks for Graph)

      

2) DCNN(Diffusion-Convolution Neural Network)

          

3) DGC(Diffusion Graph Convolution)

4) MoNet(Mixture Model Networks)

5) GAT(Graph Attention Networks)  自己学习weight

6)

3.1.2 Graph Signal Processing and Spectral-based GNN

4. ChebNet

 

5. GCN

RNN

1. example

2. RNN

2.1 Base

2.2 Elman Network & Jordan Network

2.3 Bidirectional RNN

2.4 LSTM

     

example:

3. RNN Loss

解决方式:threshold if gradient > threshold else gradient 

LSTM 解决Loss

2.5 GRU  Gated Recurrent Unit

2.6 Many To One

情感分析

question: 

引入CTC   语言辨识

machine translation

syntactic parsing:语法解析

sequence-to-sequence

4. Attention Model

 

21/3/25

Semi-supervised Learning

outline:

  • Semi-supervised Learning for Generative Model
  • Low-density Separationo Assumption
  • Smoothness Assumption
  • Better Representation

1. Semi-supervised Learning for Generative Model

2. Low-density Separation Assumption        非黑即白

    

交叉信息熵:

3. Smoothness Assumption

机器学习中的:cluster and then label

4. Graph-based Approach

    

How to use   : 定量

5. Better Representation

un-supervised Learning

1. world embedding belongs to unsupervised

auto-encoder doesn't work

2. Count based

3. Prediction-based

Training:

4. world Embedding

Explainable ML

1. 

  

2. some model interpretable and powerful at the same time

example: Decision Tree  -> Forest

3. Local Explanation

3.1 Basic Idea

example:

实验:随便堆砌几个卷积网络层

      

然而。。。   实际什么都没有学到

3.2 Activation Maximization

加入R

3. Global Explanation   控制它产生想要的图像

4. Using A Model Explain Another

4.1 Linear Model

           

4.2 Decision Tree

5. Attribution

6. Probing

7. Generative CNN

21/3/30

Attack ML Models

1. Attack

example:

attack ways:

Training

  • Non-targeted Attack
  • Targeted Attrack

21/3/31

2. how to attack

example:

3. what happened

4. reference

5. attack approaches

5.1 FGSM

5.2 white box vs black box

exmaple:

5.3 Universal Adversarial Attack  普遍对抗性攻击

5.4 Adversarial Reprogramming

5.5 Attack in the real world

人脸辨识系统:

交通标志:

6. Defense

6.1 

example:

6.2 

7. Attack on Image

7.1 one pixel attack

how do we find the exact pixel and value

不追求最好,追求攻击成功就行:

引入: Differential Evolution

example:

first step

second step

third step

fourth step

start from the second step

21/4/7

Network Compression

1. Network to Mobile

2. Network Pruning  -> 网络剪枝

2.1 

2.2 Why Pruning

example:

2.3 issue

3. Konowledge distillation  知识蒸馏

paper: 

3.1 

4. Parameter Quantization 量化参数

5. low rank approximation

 引入 Depthwise Separable Convolution

6. dynamic computation

21/4/13

Conditional Generation By RNN

1. Generation

generating a structured object component-by-component

example:

1) sentences are composed of characters/words

2) Images are composed of pixels

 

2. Attention

3. Tips for Generation

 

21/4/14

4. Pointer Network

为解决的问题之一:给一组数据点,选择其中的点连成线,并且这个圈能够将剩余的点包围起来

Recursive

example:

1)

2)

3)

 

Transformer -- 变形金刚

1. Seq2seq model with "self-attention"

self-attention: paper--arxiv.org/abs/1706.03762        attention is all you need

2. 

 

3. 

4. 

5. 

6.

21/4/15

New Architecture

1. FCN   CNN     RNN

2. 

Follow up SOTA structure

SOTA全称是state of the art,是指在特定任务中目前表现最好的方法或模型。

Benchmark和baseline都是指最基础的比较对象

3. transformer 改进

3.1 仅交换层次结构:fnn 与 self-attention交换顺序

--> Sandwich transform

3.2 

3.3

4. BERT  Bidirectional Encoder Representation from Transformers,

4.1 ALBERT

5. REFORMER

6. Style GAN

Unsupervised Learning

1.

1.1 

1.2

2.

3. 

21/4/20

3.1 PCA

 

SVD

example:

PCA-pokmon

推荐系统:

Unsupervised Learning -- Neighbor Embedding

1.

2. 

3.

4. t-SNE

散度公式

Auto-encoder

1.

2. 

 

3. Auto-encoder Text Retrieval

 

4. Similar Image Search

21/4/27

5. 

Putting Words into Computers: Introduction of ELMO,BERT,GPT

1. 1-of-N Encoding  vs World Embedding

     

每个单词在不同语境下由不同的意思  -->引入 ELMO(Embeddings from Language Model) 

2.  ELMO(Embeddings from Language Model) 

3. BERT(Bidirectonal Encoder Representations from Transformers)

训练方式:

      

   

  

GPT(Generative Pre-Training)

https://talktotransformer.com    GPT

Self-Supervised Learning

1.

2.

Anomaly Detection

1.

2.

fraud detection

attack detection

3.

21/4/28

任务机制

4. 最大似然估计

GMM(Gaussian Mixture Model)

5. 

21/5/7

Generative Adversarial Network

1.  the gan zoo   github

2.

  • basic idea of gan
  • gan as structured learning
  • gan generator learn by itself
  • can discriminator generate
  • a little bit theory

3.

3.1

 

3.2 step

1) 

2)

3)

4. Structured Learning Approach

5.

6.

21/5/9

7. Text-to-Image

8. Image-to-Image

Unsupervised Conditional Generation

1. 

2.

3. theroy behind the gan

3.2 fGan

21/5/10

Meta Learning

1. Meta learning = learn to learn

2. 

2.1 step one: Define a set of learning algorithm

2.2 find the best function F*

3. example:Omniglot

4. 

4.1 MAML

4.1 EXAMPLE

5. Math

5. Reptile

6.

21/5/11

Life Long

1. define:

example

--> Catastrophic Forgetting(灾难性遗忘)

solve:

2.

3.

4.

Deep Reinforcement Learning

1.

example

2.

2.1

3.

4.

MuZero

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值