[卷积神经网络]-FOR CNN OUTPUT CALCULATE

卷积计算

In a convolutional neural network, there are 3 main parameters that need to be tweaked to modify the behavior of a convolutional layer.

CNN 中有三个主要的参数需要调整和修改

These parameters are filter size, stride and zero padding.

分别为卷积核大小,卷积操作的步长以及填充大小

The size of the output feature map generated depends on the above 3 important parameters.

Size of the filters play an important role in finding the key features. It is difficult to select an optimal size of the filter.

每一层输出的特征图都取决于这三个主要的参数,同时在选择最优的和大小的时候是比较困难的;

It all depends on the application. A larger size kernel can overlook at the features and could skip the essential details in the images whereas a smaller size kernel could provide more information leading to more confusion. Thus there is a need to determine the most suitable size of the kernel/filter .

在进行卷积操作的时候,对于卷积核大小的选择是需要思考的,太小,可以得到更多细节信息

Methods like Gaussian pyramids(set of different sized kernels) can be used to test the efficiency of the feature extraction and appropriate size of kernel is determined. Added to the filter size, it is very important to understand and decide the size of stride and the padding.

Stride actually controls the number of steps that you move the filter over the input image. When the stride is 1, we move the filter one pixel at a time. When we set the stride to 2 or 3 (uncommon), we move the filter 2 or 3 pixels at a time depending on the stride. The value of the stride also controls the size of the output volume generated by the convolutional layer. Bigger the stride, smaller the output volume size. For example if the input image is 7×7
and stride is 1, the output volume will be 5×5. On the other hand if we increase the stride to be 2, the output volume reduces to 3×3.

Stride is normally set in a way so that the output volume is an integer and not a fraction .

Next important parameter is the zero padding. Zero padding refers to padding the input volume with zeros around the border. The zero padding also allows us to control the spatial size of the output volume. If we do not add any zero padding and we use a stride of 1, the spatial size of the output volume is reduced. However, with the first few convolutional layers in the network we would want to preserve the spatial size and make sure that the input volume is equal to the output volume. This is where the zero padding is helpful. In the 7×7
input image example, if we use a stride of 1 and a zero padding of 1, then the output volume is also equal to 7×7.

The formula to be used to measure the padding value to get the spatial size of the input and output volume to be the same with stride 1 is

K−12

where K is the filter size.

Finally, the formula to calculate the output size is equal to

O=W−K+2PS+1

where O is the output height/length, W is the input height/length, K is the filter size, P is the padding, and S is the stride.

For example, if we take S=1, P=2 with W=200 and K=5 and using 40 filters, then the output size will 200×200×40

using the above formula.

On the other hand if we use S=1,P=1
, then the output size would be 198×198×40.

I know it is a lengthy answer but just wanted to clarify how filter size, stride and padding work to control the output size in a convolutional layer

激活函数

Here is how to look at it.

You get a patch of size k×k
centered at (i,j)

from a feature map, do a dot product with the kernel weights to compute the result at that position and then move to the next position and repeat.

Note. That describes a cross-correlation. For a convolution we just flip the weights before the dot product which makes no difference at all.

So how is that non-linear?

The dot product is linear and all you are doing is scanning it over an input feature map at different locations. So we can assume the result of that scanning is also linear.

In mathematical terms we can check if the convolution, as applied in image processing, computer vision(CV) and machine learning (ML), is generally linear as follows:

A linear system H

must obey the following:

H(x+y)=H(x)+H(y)

H(αx)=αH(x)

Let’s denote the convolution operation as f()

.

fi,j(x)=∑n∑mxi−n,j−mwn,m

where x=
feature map and w=

kernel weights

So we can check if the convolution operation obeys the linear system test.

Consider another feature map y
of similar shape to x

.

fi,j(x+y)=∑n∑m(x+y)i−n,j−mwn,m

Then clearly:

fi,j(x+y)=∑n∑mxi−n,j−mwn,m+∑n∑myi−n,j−mwn,m

Thus the convolution satisfies the first test.

Also

fi,j(αx)=∑n∑mαxi−n,j−mwn,m

Which can easily be reduced to:

fi,j(αx)=α∑n∑mxi−n,j−mwn,m

Which proves that the convolution operation in convolutional neural networks (CNN) is itself linear.

Thus without non-linear activation functions the whole multi-layer(deep) CNN would collapse into a single equivalent convolutional layer.

Those activation functions play a critical role of introducing non-linearity in CNNs.在这里插入图片描述

Hope this helps.

Backward gradient

在这里插入图片描述

pytorch and tensorflow difference

I use PyTorch at home and TensorFlow at work. The other way around would be also great, which kinda gives you a hint.

There are two “general use cases”. Training and inference. Each of them has its own challenges, but if you have only training (students and researchers) or mostly inference and implementation (developers), you start focusing on different things.

TensorFlow is built around a concept of Static Computational Graph (SCG). That means, first you define everything that is going to happen inside your framework, then you run it. It has two great upsides:

When a model becomes obscenely huge, it’s easier to understand it, because everything is basically a giant function that never changes.
It’s always easier to optimize a static computational graph, because it allows all kinds of tricks: preallocating buffers, fusing layers, precompiling the functions.

If this is what matters most for you, then your choice is probably TensorFlow.

A network written in PyTorch is a Dynamic Computational Graph (DCG). It allows you to do any crazy thing you want to do.

Dynamic data structures inside the network. You can have any number of inputs at any given point of training in PyTorch. Lists? Stacks? No problem.
Networks are modular. This is my favorite feature, actually. Each part is implemented separately, and you can debug it separately, unlike a monolithic TF construction.

TensorFlow has very wide support for parallel and distributed training. If you have 100 GPU…well, if you have 100 GPU, stop wasting time here and go check your networks, something must be already finished.

I like many interesting ways to optimize different processes in TF, from parallel training with queues to almost-built-in weight quantization[1] . It’s a great choice for our team and works marvelously in production.

PyTorch allows to write a lot of things very quickly without visible losses in performance during training. It’s hard to overestimate the importance of this, especially when you have strict deadlines and a lot of ideas to validate.

Try a few things with both and see the difference:

Skip some layers
Remove a few layers from a pretrained model
Mix activation functions / change them

There’s no compelling reason to do everything in one framework and miss out on a great deal of opportunities and convenient features. Different research groups will keep publishing code written in both, because there’s a place for each paradigm.

反向传播

链接
在这里插入图片描述
在这里插入图片描述

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
当使用卷积神经网络(Convolutional Neural Network,CNN)进行图像处理时,可以通过计算信噪比(Signal-to-Noise Ratio,SNR)来评估图像的质量。下面是一个使用Python实现的卷积神经网络求信噪比的代码示例: ```python import numpy as np import tensorflow as tf # 构建卷积神经网络模型 def build_model(): model = tf.keras.Sequential([ tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), tf.keras.layers.MaxPooling2D((2, 2)), tf.keras.layers.Flatten(), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(10, activation='softmax') ]) return model # 计算信噪比 def calculate_snr(original_image, noisy_image): mse = np.mean((original_image - noisy_image) ** 2) if mse == 0: return 100 max_pixel = np.max(original_image) snr = 10 * np.log10(max_pixel ** 2 / mse) return snr # 加载数据集 (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() # 数据预处理 x_train = x_train.reshape(-1, 28, 28, 1) / 255.0 x_test = x_test.reshape(-1, 28, 28, 1) / 255.0 # 构建并训练模型 model = build_model() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) # 生成噪声图像 noisy_images = x_test + np.random.normal(loc=0, scale=0.1, size=x_test.shape) # 预测图像 predicted_images = model.predict(noisy_images) # 计算信噪比 snr_values = [] for i in range(len(x_test)): snr = calculate_snr(x_test[i], predicted_images[i]) snr_values.append(snr) # 打印信噪比结果 print("SNR values:", snr_values) ``` 上述代码中,首先构建了一个简单的卷积神经网络模型,然后使用MNIST数据集进行训练。接着,生成了一组噪声图像,并使用训练好的模型对这些噪声图像进行预测。最后,通过计算原始图像和预测图像之间的均方误差(Mean Squared Error,MSE)来计算信噪比。 请注意,上述代码仅为示例,实际应用中可能需要根据具体情况进行修改和调整。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Jucway

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值