coursera-CNN-编程题中遇到的问题
The first week
First assignment
Exercise 3 - conv_forward
在计算vert_start、vert_end、horiz_start、horiz_end
时注意要乘以stride
,不然系统会默认stride=1
,从而报错说使用了全局变量;
4.1 - Forward Pooling
在最外层循环中应该给予提示:a_prev = None
# for i in range(None): # loop over the training examples
# a_prev = None
Second assignment
Exercise 1 - happyModel
# GRADED FUNCTION: happyModel
def happyModel():
"""
Implements the forward propagation for the binary classification model:
ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE
Note that for simplicity and grading purposes, you'll hard-code all the values
such as the stride and kernel (filter) sizes.
Normally, functions should take these values as function parameters.
Arguments:
None
Returns:
model -- TF Keras model (object containing the information for the entire training process)
"""
model = tf.keras.Sequential([
## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3
## Conv2D with 32 7x7 filters and stride of 1
## BatchNormalization for axis 3
## ReLU
## Max Pooling 2D with default parameters
## Flatten layer
## Dense layer with 1 unit for output & 'sigmoid' activation
# YOUR CODE STARTS HERE
#(m, n_H_prev, n_W_prev, n_C_prev) = X_train.shape
tfl.ZeroPadding2D(padding=3, input_shape = (64, 64, 3)),
tfl.Conv2D(32, 7, strides = (1,1)),
tfl.BatchNormalization(axis=3),
tfl.ReLU(),
tfl.MaxPool2D(),
tfl.Flatten(),
tfl.Dense(1,activation='sigmoid')
# YOUR CODE ENDS HERE
])
return model
The second week
测验题
Depthwise convolution
假设传统卷积核:
(
D
K
,
D
K
,
M
,
N
)
(D_{K},D_{K},M,N)
(DK,DK,M,N)
将传统卷积核拆分为深度卷积(depthwise convolution):
(
D
K
,
D
K
,
1
,
M
)
(D_{K},D_{K},1,M)
(DK,DK,1,M)
将传统卷积核拆分为逐点卷积(pointwise convolution):
(
1
,
1
,
M
,
N
)
(1,1,M,N)
(1,1,M,N)
将传统的卷积操作拆分为两步:
- The first step calculates an intermediate result by convolving on each of the channels independently. This is the depthwise convolution.
- In the second step, another convolution merges the outputs of the previous step into one. This gets a single result from a single feature at a time, and then is applied to all the filters in the output layer. This is the pointwise convolution.
depthwise层,只改变feature map的大小,不改变通道数。。而Pointwise 层则相反,只改变通道数,不改变大小。这样将常规卷积的做法(改变大小和通道数)拆分成两步走。
这么做的好处就是可以在损失精度不多的情况下大幅度降低参数量和计算量。But, 虽然使用深度可分离卷积可以让参数量变小,但是实际上用GPU训练的时候深度可分离卷积会非常的占用内存,并且比普通的3*3卷积要慢很多(实测)
First assignment
Identity Block
Convolutional Block
convolutional block 是为了解决输入和输出块维度不一样而引入的过度块。
Second assignment
transfer learning—MobileNetV2
A idea:用预训练模型时,因为该模型已在大型数据集上训练,例如MobileNetV2在ImageNet上训练过。将这个预训练模型用到自己的小数据集上,因为大多数图像的 low-level features 具有相似性,只是在 high-level features 上不同。因此,可否将预训练模型的 bottom layers freeze(un-trainable) , 在我的数据集上从高层开始训练,并更新高层的参数,然后做预测。
The third week
测验题
semantic segmentation
图中(3)既包含了(2)中的特性:低的分辨率(lower resolutuin),但却是高的语义信息,又包含了(1)中的特性:低水平的,但却有更详细的纹理,以便更好的确定某个像素是否属于猫的一部分。
First assignment
-
YOLO architecture:Image(m,608,608,3) → \to →Deep CNN → \to →Encoding(m,19,19,5,85)
-
tf.math.reduce_max
:
-
np.max([])
:注意圆括号里还有方括号. -
将YOLO框坐标 ( x , y , w , h ) (x,y,w,h) (x,y,w,h)转换为角坐标 ( x 1 , y 1 , x 2 , y 2 ) (x_{1},y_{1},x_{2},y_{2}) (x1,y1,x2,y2)
-
score threshold
:
-
NMS
( Faster-NMS):
Second assignment
跳跃连接的作用:将底层图像的信息利用起来,因为底层提取到的特征更加细微,可以用来确定某个像素里是否包含某一类别。这样就保证了最后恢复出来的特征图融合了更多的low-level的feature,4次上采样也使得分割图恢复边缘等信息更加精细。
concatenate
函数的作用是将两个张量拼接在一起;
The fourth week
测验题
人脸验证:“Is this the claimed person?” ,做的是1:1的比对,应用场景有手机解锁,机场或车站的人脸验证;
人脸识别:“Who is this person?”,做的是1:N的比对;
在卷积网络中,越深的层越能够理解图片中目标具有的复杂形状(例如一只猫);
第十题有疑问
First assignment
tf.subtract(x,y,name=None)
:张量减法x-y
,必须具备同样的维度;
tf.add(x,y,name=None)
:张量加法;
tf.square(x,name=None)
:对张量x中的每个元素平方;
tf.reduce_sum(x,axis=None)
:
np.linalg.norm(x)
求范数
- 使用了InceptionNet预训练模型做人脸验证和识别任务;
- triplet-loss:
J = ∑ i = 1 m [ ∥ f ( A ( i ) ) − f ( P ( i ) ) ∥ 2 2 − ∥ f ( A ( i ) ) − f ( N ( i ) ) ∥ 2 2 + α ] + \mathcal{J}=\sum_{i=1}^{m}\left[\left\|f\left(A^{(i)}\right)-f\left(P^{(i)}\right)\right\|_{2}^{2}-\left\|f\left(A^{(i)}\right)-f\left(N^{(i)}\right)\right\|_{2}^{2}+\alpha\right]_{+} J=i=1∑m[∥∥∥f(A(i))−f(P(i))∥∥∥22−∥∥∥f(A(i))−f(N(i))∥∥∥22+α]+
Second assignment
- Content cost function:
J c o n t e n t [ l ] ( C , G ) = 1 4 n H × n W × n C ∑ a l l e n t r i e s ( a C − a G ) 2 J_{content}^{[l]}(C,G)=\frac{1}{4n_{H}\times n_{W}\times n_{C}}\sum_{all entries}(a^{C}-a^{G})^{2} Jcontent[l](C,G)=4nH×nW×nC1allentries∑(aC−aG)2 - Style cost function:
J s t y l e [ l ] ( S , G ) = 1 4 × n C 2 × ( n H × n W ) 2 ∑ i = 1 n C ∑ j = 1 n C ( G ( g r a m ) i , j ( S ) − G ( g r a m ) i , j ( G ) ) 2 J_{s t y l e}^{[l]}(S, G)=\frac{1}{4 \times n_{C}^{2} \times\left(n_{H} \times n_{W}\right)^{2}} \sum_{i=1}^{n_{C}} \sum_{j=1}^{n_{C}}\left(G_{(g r a m) i, j}^{(S)}-G_{(g r a m) i, j}^{(G)}\right)^{2} Jstyle[l](S,G)=4×nC2×(nH×nW)21i=1∑nCj=1∑nC(G(gram)i,j(S)−G(gram)i,j(G))2 - unroll 3D
→
\to
→ 2D
code:unroll_matrix = tf.reshape(tensor, shape=(n_{H}\times n_{W},n_{C}))
- 定义两个向量之间的相关程度:
∀ \forall ∀ 的向量 μ \mu μ, ν \nu ν,定义 G i j = μ T ν = n p . d o t ( μ , ν ) G_{ij}=\mu^{T}\nu=np.dot(\mu,\nu) Gij=μTν=np.dot(μ,ν),则 G i j G_{ij} Gij衡量了 μ \mu μ, ν \nu ν之间的相似度, G i j G_{ij} Gij越大则相似度越高。 - Gram matrix:
Gram matrix 是对称矩阵,在feature map中,每个数字都来自于一个特定滤波器在特定位置的卷积,因此每个数字代表一个特征的强度,而Gram计算的实际上是两两特征之间的相关性,哪两个特征是同时出现的,哪两个是此消彼长的等等,同时,Gram的对角线元素,还体现了不同类型的特征在图像中出现的量(多少),因此,Gram有助于把握整个图像的大体风格。有了表示风格的Gram Matrix,要度量两个图像风格的差异,只需比较他们Gram Matrix的差异即可。
tf.matmul(a,b)
:a,b两个tensor做点积;
- 从预训练模型结构中提取指定层的输出激活值张量:
def get_layer_outputs(vgg, layer_names):
""" Creates a vgg model that returns a list of intermediate output values."""
outputs = [vgg.get_layer(layer[0]).output for layer in layer_names]
print(outputs)
##print([vgg.input])
model = tf.keras.Model([vgg.input], outputs)
return model
content_layer = [('block5_conv4', 1)]
vgg_model_outputs = get_layer_outputs(vgg, STYLE_LAYERS + content_layer)
content_target = vgg_model_outputs(content_image)
其中outputs
就是指定vgg网络某些层的输出结果——activation tensor,content_target
输出6个tensor,每个tensor对应一个指定层输出。
其中:
STYLE_LAYERS + content_latyer = [('block1_conv2', 0.2),
('block2_conv2', 0.2),
('block3_conv2', 0.2),
('block4_conv2', 0.2),
('block5_conv2', 0.2),
('block5_conv4', 1)]
- 生成图像的初始化:在内容图像上加噪(服从均匀分布),这有助于生成图像的内容快速匹配内容图像的内容。