cs231n lecture5 CNN

最新推荐文章于 2021-04-16 16:31:22 发布

feitianlzk

最新推荐文章于 2021-04-16 16:31:22 发布

阅读量219

点赞数

分类专栏： AI

本文链接：https://blog.csdn.net/feitianlzk/article/details/79556838

版权

AI 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

CNN 笔记
Useful Notes
Todos

CNN 笔记

detection: bounding box
segementation: pixel by pixel

Convolution Layer

convolve the filter with the image(dot products)
extend the full depth
first stretch filter to a vector(5*5*3 -> 1*75), then do dot products
实际上把filter放在图像上，做一个点对点的乘积，结果就是中心的点的值
不是信号处理的convolve
a set of multiple filters(N), N activation maps
longer filters for deeper depth

eg:
32x32x3–> 28x28x6(6 feature map)
CONV->ReLU->CONV->ReLU->POOLING->CONV->ReLU->CONV->ReLU->POOLING->CONV->ReLU->CONV->ReLU->POOLING->FULL CONNECT

ConvNet is a sequence of Convolution Layers, intersperesed with activation functions

$(N - F) / s t r i d e + 1$ $(N-F)/stride + 1$
common: zero pad the border
parameters, always have 1 bias term for each filter

Pooling Layer

makes the representations smaller and more manageable
invariance over a given region
downsampling, not operate on depth
MAX POOLING
- common is no overlap
- better
- common no zero-padding
stride can also be used for downsampling instead
of pooling

Fully Connected Layer(FC layer)

typical arch:
$[(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOFTMAX$

Useful Notes

cs231 three network notes

Preprocessing

Mean subtraction: X -= np.mean(X, axis=0)
Normalization: X /= np.std(X, axis=0)
PCA: saving space and time
whitening:
any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data.

Weigth Initialization

small random numbers: W = 0.01* np.random.rand(D, H)
Calibrating the variances with 1/sqrt(n)
Bacth Normalization:
- 防止梯度弥散
- 加快训练速度

Regularization

L2 reularization
Max norm constraints
Dropout
practice:
- use a single, global L2 regularization strength
- with dropout (p = 0.5)

Loss

classification

hinge loss
cross-entropy loss
large number classes: Hierachical softmax

Attribute classification

build a binary classifier for every single attribute independently
$L_i = \sum_{j}{} max(0, 1 - y_{ij}f_j)$

$y_{ij}$ is either +1 or -1 depending on whether the i-th example is labeled with the j-th attribute

or train a logistic regression classifier for every attribute independently
$P(y = 1 \mid x; w, b) = \frac{1}{1 + e^{-(w^Tx +b)}} = \sigma (w^Tx + b)$
$L_i = \sum_j y_{ij} \log(\sigma(f_j)) + (1 - y_{ij}) \log(1 - \sigma(f_j))$

gradient is $\partial{L_i} / \partial{f_j} = y_{ij} - \sigma(f_j)$

regression

$L_i = \| f - y_i\|^2_2$

not stable
softmax loss more better

Summary

have mean of zero, and normalize its scale to [-1, 1] along each feature
W using gaussian distribution with standard deviation of $\sqrt{2/n}$ , n is number of inputs to the neuron
L2 regularization and dropout
batch normalization