week1 前置知识

myooooou

已于 2022-08-17 04:09:04 修改

阅读量350

点赞数 1

分类专栏：深度学习文章标签：人工智能算法机器学习

于 2022-08-17 03:34:49 首次发布

本文链接：https://blog.csdn.net/myooooou/article/details/126377442

版权

深度学习专栏收录该内容

8 篇文章 0 订阅

订阅专栏

week 1 前置知识

Performance metrics

Accuracy

顾名思义，就是所有的预测正确（正类负类）的占总的比重。

Recall

即正确预测为正的占全部预测为正的比例。个人理解：真正正确的占所有预测为正的比例。

Precision

即正确预测为正的占全部实际为正的比例。个人理解：真正正确的占所有实际为正的比例。

F-score

公式转化之后为：

机器学习里两大主要任务

Classification

Regression

判别式机器学习（Discriminative machine learning ）

在这里插入图片描述

activation function

softmax

Use to transform real-valued discriminative scores to discrete probabilities。用于将实价判别分数转换为离散概率. Softmax transforms real-valued discriminative scores (ranged in (−∞,+∞)) to probability values
(ranged in [0,1]) but preserving the order.

公式： $\operatorname{Softmax}(x)=\frac{e^{x_{i}}}{\sum_{i} e^{x_{i}}}$

eg：

Assume that we do sentiment classification.

Data point 𝑥is a sentence (e.g., 𝑥 ="𝑖𝑡𝑖𝑠𝑎𝑏𝑒𝑎𝑢𝑡𝑖𝑓𝑢𝑙𝑑𝑎𝑦𝑡𝑜𝑑𝑎𝑦“), we need to predict sentiment of this sentence. Three sentiment labels: positive (happy, class 1), negative (sad, class 2), neural (none happy and sad, class 3). The model gives three discriminative values for 𝑥.
▪ ℎ1 =2,ℎ2 =−1,ℎ3 =1means that highest possibility to classify 𝑥to the class 𝟏with the label positive.

We apply softmax function on the discriminative scores ℎ.

𝑝=softmax(ℎ):

$p_{1}=\frac{\exp \{2\}}{\exp \{2\}+\exp \{-1\}+\exp \{1\}} \approx 0.705$

$\mathrm{p}_{2}=\frac{\exp \{-1\}}{\exp \{2\}+\exp \{-1\}+\exp \{1\}} \approx 0.035$

$\mathrm{p}_{3}=\frac{\exp \{1\}}{\exp \{2\}+\exp \{-1\}+\exp \{1\}} \approx 0.259$

𝑝=[0.705,0.035,0.259]are the probabilities to classify 𝑥 to the classes 𝟏,𝟐,𝟑respectively.

Sigmoid

公式： $S(x)=\frac{1}{1+e^{-x}}$

在这里插入图片描述

Sigmoid函数的导数: $S^{\prime}(x)=\frac{e^{-x}}{\left(1+e^{-x}\right)^{2}}=S(x)(1-S(x))$

Sigmoid函数的特性与优缺点：

Sigmoid函数的输出范围是0到1。由于输出值限定在0到1，因此它对每个神经元的输出进行了归一化。
用于将预测概率作为输出的模型。由于概率的取值范围是0到1，因此Sigmoid函数非常合适
梯度平滑，避免跳跃的输出值
函数是可微的。这意味着可以找到任意两个点的Sigmoid曲线的斜率
明确的预测，即非常接近1或0。
函数输出不是以0为中心的，这会降低权重更新的效率
Sigmoid函数执行指数运算，计算机运行得较慢。

Tanh

它解决了Sigmoid函数的不以0为中心输出问题，然而，梯度消失的问题和幂运算的问题仍然存在。

公式： $\tanh (x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$

$Font metrics not found for font: .$

ReLU

公式： $\operatorname{Re} L U(x)=\max (0, x)$

$\sigma^{\prime}(t)=\left\{\begin{array}{cc} 1 & \text { if } t \geq 0 \\ 0 & \text { otherwise } \end{array}\right.$

线性整流函数（ReLU函数）的特点：

当输入为正时，不存在梯度饱和问题。
计算速度快得多。ReLU 函数中只存在线性关系，因此它的计算速度比Sigmoid函数和tanh函数更快。
Dead ReLU问题。当输入为负时，ReLU完全失效，在正向传播过程中，这不是问题。有些区域很敏感，有些则不敏感。但是在反向传播过程中，如果输入负数，则梯度将完全为零，Sigmoid函数和tanh函数也具有相同的问题
ReLU函数的输出为0或正数，这意味着ReLU函数不是以0为中心的函数。

Loss

Cross-entropy loss

公式： $p)=\operatorname{CE}\left(1_{y}, p\right)=-\log p_{y}$

eg1:

Given a sentence 𝑥 with the label positive/happy (the class 1), assume that our model predicts it with prediction probabilities 𝑝=[0.3,0.4,0.3], the cross-entropy loss for this prediction:

$\log 0.3-0 . \log 0.4-0 . \log 0.3=-\log 0.3 \approx 1.204$

eg2:

预测	真实	是否正确
0.1 0.2 0.7	0 0 1 (猪)	正确
0.1 0.7 0.2	0 1 0 (狗)	正确
0.3 0.4 0.3	1 0 0 (猫)	错误

$sample \ 1 \ loss =-(0 \times \log 0.3+0 \times \log 0.3+1 \times \log 0.4)=0.91 $

$sample \ 2\ \operatorname{loss}=-(0 \times \log 0.3+1 \times \log 0.4+0 \times \log 0.3)=0.91 $

$sample\ 3\ \operatorname{loss}=-(1 \times \log 0.1+0 \times \log 0.2+0 \times \log 0.7)=2.30 $

对所有样本的loss求平均：

$L=\frac{0.35+0.35+1.2}{3}=0.63$

L1 loss

公式： $S=\sum_{i=1}^{n}\left|Y_{i}-f\left(x_{i}\right)\right|$

L2 loss

公式： $S=\sum_{i=1}^{n}\left(Y_{i}-f\left(x_{i}\right)\right)^{2}$

Forward propagation

分类

在这里插入图片描述

eg: Example of spam email detection. From emails, assume that we extract three features.

𝑥 =[ 𝑥1,𝑥2,𝑥3 ]. There are two classes and labels: spam (𝑦=1)and non-spam (𝑦=2)

网络图像：

在这里插入图片描述

每一个隐藏层采用sigmoid函数，最后一层输出采用softmax函数

回归

在这里插入图片描述

Training deep nets

在这里插入图片描述

Deep model parameters: $\theta:=\left\{\left(W^{l}, b^{l}\right)\right\}_{l=1}^{L}$

Find model parameters (weight matrices and biases) so that the model predictions fit the training set as much as possible. 查找模型参数（重量矩阵和偏见），以便模型预测尽可能适合训练集

$\min _{\theta} L(D ; \theta):=-\frac{1}{N} \sum_{i=1}^{N} \log p_{y_{i}}\left(x_{i}\right)$ (minimize negative log likelihood)