卷积神经网络情绪识别_使用卷积神经网络识别黏土人

最新推荐文章于 2024-01-02 10:05:30 发布

weixin_26756255

最新推荐文章于 2024-01-02 10:05:30 发布

阅读量1.1k

点赞数

文章标签：神经网络卷积深度学习卷积神经网络 python

原文链接：https://towardsdatascience.com/identifying-nendoroids-using-convolutional-neural-networks-80aa08291aef

版权

这篇博客介绍了如何利用卷积神经网络进行情绪识别，特别是针对黏土人的应用。内容涉及深度学习中的卷积神经网络技术，并可能涵盖Python实现。

摘要由CSDN通过智能技术生成

卷积神经网络情绪识别

Nendoroids are a brand of action figures owned by the Good Smile Company. They are usually short in size or chibi-sized and cover a lot of characters from all sorts of mediums such as movies, games TV shows, animations, books, etc. They are very popular among the anime and manga fandom due to most of their figures being based on most popular anime/manga/light novels. They consist of around 30–40 movable parts and some of them are known to be hand-made to this day.

黏土人是Good Smile Company旗下的动作人物品牌。它们通常身材矮小或赤壁大小，并且涵盖了来自电影，游戏电视节目，动画，书籍等各种媒介的许多角色。由于它们的大多数，它们在动漫和漫画迷中非常受欢迎这些数字是根据最受欢迎的动漫/漫画/轻小说创作的。它们由大约30–40个可移动部件组成，其中有些至今仍是手工制作的。

If you want to know more about their production, I recommend watching this 30 min video of how they are made.

如果您想了解更多有关其生产的信息，建议您观看这30分钟的视频，了解它们的制作方法。

In this blog, we’re going to classify pics of Nendoroids with other action figure brands such as Figmas, Gunplas, etc. using deep learning. I’ll try my best to explain these concepts in layman terms. The reason for doing so is that making an image classification project using cats, dogs, fruits doesn’t sound so exciting.

在本博客中，我们将使用深度学习将黏土人的图片与其他动作人物品牌(例如Figmas，Gunplas等)进行分类。我将尽力用外行术语解释这些概念。这样做的原因是，使用猫，狗，水果进行图像分类项目听起来并不那么令人兴奋。

创建图像数据集 (Creating an image dataset)

Collecting the images was sort of easy. I downloaded (manually or through a script or software) the Nendoroid and Non-Nendoroid images from Google Images, Pinterest, and online stores that sell these figures and place them into their respective folders (Nendoroid and non-Nendoroids). There’s no set amount for the number of images in a dataset, but the rule of thumb is around a total of 1000 images.

收集图像很容易。我从Google图片，Pinterest和在线商店下载了(手动或通过脚本或软件)黏土人和非黏人人的图像，这些图像出售这些人物并将它们放置在各自的文件夹中(黏土人和非黏土人)。数据集中的图像数量没有固定的数量，但是经验法则是总共约1000张图像。

One thing that I had to keep in mind is that the images only contained one or 2 figures and are facing the camera.

我要记住的一件事是，图像仅包含一个或两个数字，并且都面向相机。

For labeling, I renamed the file names in both folder to something like nendo_(insert number) and nonendo_(insert number) using the Python OS library.

为了进行标记，我使用Python OS库将两个文件夹中的文件名都重命名为nendo_(插入编号)和nonendo_(插入编号)。

A much better way to label the images is using an annotator like Pidgeon (https://github.com/agermanidis/pigeon) thanks to this blog on collecting images for the dataset.

更好的标记图像的方法是使用像Pidgeon( https://github.com/agermanidis/pigeon )这样的注释器，这要归功于此博客为数据集收集图像。

Here are some examples of Nendoroids. They have a very miniature build that is more expressive and their poses can be customized.

这是黏土人的一些例子。他们的身材非常小巧，更具表现力，可以自定义姿势。

Image for post — Photo by Tengyart on Unsplash

Here are other examples of action figures that are not Nendoroids like Gunplas (owned by Bandai Hobby Center) and Figmas (which are also owned by the Good Smile Company).

这里还有一些其他非黏土人的动作模型，例如Gunplas(由Bandai Hobby Center拥有)和Figmas(也由Good Smile Company拥有)。

图像的组成 (Components of an image)

Before we get into preprocessing the data, we need to understand what an image consists of. A digital image consists of pixels that are arranged in a given length and width.

在进行数据预处理之前，我们需要了解图像由什么组成。数字图像由以给定的长度和宽度排列的像素组成。

An example of this would be an image of a resolution of 1920 x 1080. This contains 816,000 pixels, where the pixels are arranged within the length of 1920 pixels horizontally and 1080 pixels vertically.

例如，分辨率为1920 x 1080的图像。它包含816,000像素，其中这些像素在水平1920像素和垂直1080像素的长度内排列。

Every pixel contains a value. For grayscaled images, those values range between 0(black) to 255(white) and each of them represent the different shades of the 2 colors.

每个像素都包含一个值。对于灰度图像，这些值的范围在0(黑色)到255(白色)之间，并且每个值代表两种颜色的不同阴影。

What gives the color to an image is the RGB (Red, Green, Blue) channels. These channels consist of their own pixel value from 0 to 255, each representing a different shade of that color channel. The pixel value of that color image is a vector of all the 3-pixel values from each channel.

使图像具有颜色的是RGB(红色，绿色，蓝色)通道。这些通道由自己的像素值(从0到255)组成，每个像素值代表该颜色通道的不同阴影。该彩色图像的像素值是每个通道中所有3像素值的向量。

预处理图像 (Preprocessing the images)

For preprocessing our data, there are 3 steps to be followed :

要预处理我们的数据，需要遵循3个步骤：

Read our image and convert it to greyscale. This is because we are more focused on the structures of the figures and not the color.
阅读我们的图像并将其转换为灰度。这是因为我们更专注于图形的结构，而不是颜色。
Resize it to a certain resolution because all the images must have a uniform resolution size.
将其调整为某个分辨率，因为所有图像都必须具有统一的分辨率大小。
Convert the image to an n-dimensional array and flatten it to a 1-dimensional array.
将图像转换为n维数组并将其展平为1维数组。

Do that for both image classes, append the labels and we have our dataset to feed the model with.

对两个图像类都执行此操作，然后附加标签，我们就有了数据集来填充模型。

os.chdir("Insert file link to dataset")files = os.listdir()data = []for i in files:
   #Read and convert the images to greyscale
   ig = cv2.imread(os.path.join(os.getcwd(),i),cv2.IMREAD_GRAYSCALE)
   #Resize the image to a uniform dimension
   new_ig = cv2.resize(ig,dsize=(img_size,img_size))
   #append converted data to label
   data.append([new_ig,'Nendoroid'])

Now to split our dataset into training, validation, and testing datasets.

现在将我们的数据集分为训练，验证和测试数据集。

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=42)x_train,x_val,y_train,y_val = train_test_split(x_train,y_train,test_size=0.2,random_state=42)

First thing is to normalize our images by dividing it with the value 255, which is the highest number that represents a shade of color from 0–255. The reason for doing so is that improve our Convolutional Neural Network during classification since the values will be at a range between 0–1.

第一件事是通过将图像除以值255(这是代表从0到255的阴影)的最高数字来规范化图像。这样做的原因是在分类过程中改进了卷积神经网络，因为该值将在0–1之间。

x_train = np.array(x_train)/255
x_test = np.array(x_test)/255
x_val = np.array(x_val)/255

Next is to resize them back to their image size and send it as input to the Convolutional Neural Network.

接下来是将其调整为图像大小，并将其作为输入发送到卷积神经网络。

x_train = x_train.reshape(-1,img_size,img_size,1)
y_train = np.array(y_train)x_test = x_test.reshape(-1,img_size,img_size,1)
y_test = np.array(y_test)x_val = x_val.reshape(-1,img_size,img_size,1)
y_val = np.array(y_val)

卷积神经网络 (Convolutional Neural Network)

To put it simply, the convolutional neural network (CNN) involves breaking down the images into the smallest representation, before it is sent as input to the neural networks. How it breaks down the image? It goes like this.

简单地说，卷积神经网络(CNN)涉及在将图像作为输入发送到神经网络之前，将图像分解为最小的表示形式。它如何分解图像？就像这样

Step 1: A matrix of a given dimension aka a kernel goes through the image step by step from up to down, left to right, and multiplies. The result is a value that represents that kernel at that position. This is known as a convolution.

第1步 ：给定维度(又称内核)的矩阵从上到下，从左到右逐步相乘并相乘。结果是一个值，该值表示该位置的内核。这被称为卷积。

How many steps it needs to take is determined by a value called stride.

它需要采取多少步骤由一个称为跨度的值决定。

Step 2: That convoluted matrix then gets broken down even further through a process called pooling, which is similar to convolution but it returns a matrix containing the max or average value of that kernel in the pooling phase.

步骤2 ：然后，通过称为池化的过程进一步分解了卷积矩阵，该过程类似于卷积，但是它返回了一个包含在池化阶段该内核的最大值或平均值的矩阵。

Step 3: This goes on until it returns the smallest possible image dimensions.

步骤3 ：继续进行直到返回最小的图像尺寸。

How does CNN find out any patterns and classify the images?

CNN如何找出任何图案并对图像进行分类？

In the convolution layer, there exists filters, which are a bunch of weights as a vector. During the training of the model, it multiplies with the output of the convolution layer and as it goes through all the images, the weights keep on changing. When the model predicts an image, if the image has a very similar pattern to one of the images in the training dataset, those weights return high values.

在卷积层中，存在过滤器，这些过滤器是一堆权重作为矢量。在训练模型期间，它与卷积层的输出相乘，并且在遍历所有图像时，权重不断变化。当模型预测图像时，如果图像具有与训练数据集中的图像非常相似的图案，则这些权重将返回高值。

建立卷积神经网络 (Building the Convolutional Neural Network)

Now it’s time to build the Convolutional Neural Network using Keras via Tensorflow. Start by calling the Sequential() method to stack our layers and then add the convolutional layer (Conv2D) with a given number of nodes and the size of the kernel to use. After that add a max-pooling layer of given kernel size and keep on doing this until the dimension of the images through the layers are near zero or one. Once that’s done, the images are flattened to a 1-dimensional array and it goes through the dense neural network at the last stages.

现在是时候通过Tensorflow使用Keras构建卷积神经网络了。首先调用Sequential()方法来堆叠我们的层，然后添加具有给定节点数和要使用的内核大小的卷积层(Conv2D)。之后，添加给定内核大小的最大池化层，并继续执行此操作，直到通过层的图像尺寸接近零或一。一旦完成，图像将被展平为一维数组，并在最后阶段通过密集的神经网络。

cnn = Sequential()
cnn.add(Conv2D(32,(3,3),activation='relu',input_shape=(img_size,img_size,1)))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Conv2D(64,(3,3),activation='relu'))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Conv2D(64,(3,3),activation='relu'))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Conv2D(64,(3,3),activation='relu'))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Conv2D(128,(3,3),activation='relu'))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Conv2D(128,(3,3),activation='relu'))
cnn.add(MaxPooling2D((2,2)))
cnn.add(Dropout(0.1))
cnn.add(Flatten())
cnn.add(Dense(128,activation='relu'))
cnn.add(Dropout(0.1))
cnn.add(Dense(1,activation='sigmoid'))

Activation functions are what convert the output of one node to be sent as input to the corresponding node. Except for the last layer, the ReLu activation function is used since it takes fewer resources and it doesn’t fall victim to the ‘vanishing gradient’ problem, unlike the Sigmoid function.

激活函数将转换一个节点的输出作为输入发送到相应节点的功能。除了最后一层之外，还使用了ReLu激活功能，因为它占用的资源更少，并且与Sigmoid函数不同，它不会成为“消失梯度”问题的受害者。

The last layer has ‘Sigmoid’ as the activation function and it outputs a value between 0 to 1.

最后一层具有“ Sigmoid”作为激活函数，它输出0到1之间的值。

Dropouts are added to lessen the chances of overfitting the model. Here’s a visual representation of the Dropout process.

添加了辍学以减少过拟合模型的机会。这是Dropout流程的直观表示。

For compiling the neural network, the optimizer (which changes the weights and biases in the Neural Network at a given learning rate) chosen is the RMSprop, the loss function is binary cross-entropy because we have only 2 classes to classify from. The Convolutional Neural Network is trained for about 35 rounds or epochs, with the validation data used to make sure that it isn’t ‘memorizing’ anything and can classify unseen data well.

为了编译神经网络，选择的优化器(以给定的学习速率改变神经网络中的权重和偏差)是RMSprop，损失函数是二进制交叉熵，因为我们只有2个类可以进行分类。卷积神经网络经过约35轮或每个纪元的训练，其验证数据可确保其不会“记住”任何东西，并且可以很好地对看不见的数据进行分类。

cnn.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])history = cnn.fit(x_train,y_train,epochs=35,validation_data=(x_val,y_val))

Comparing the accuracy values of both the training and validation data for 20 epochs and the test data accuracy, there seems to be overfitting. This can be changed through trial and error by changing the learning rate, dropout value, removing layers, etc. but I’ll stop it from here.

比较20个时期的训练和验证数据的准确性值与测试数据的准确性，似乎存在过度拟合的问题。可以通过更改学习率，辍学值，删除图层等来通过反复试验来更改此设置，但我将从此处停止。

Looking at the confusion matrix for the test data, the model did its job of classifying the test data quite well, albeit some false negatives.

通过查看测试数据的混淆矩阵，该模型可以很好地对测试数据进行分类，尽管存在一些假阴性。

关键时刻 (The moment of truth)

Now for the moment of truth. It’s time to use our classifier against images outside of the dataset we used. What’s a classifier if it doesn’t classify anything outside of the used dataset ?. There are 40 images (20 Nendoroid and non-Nendoroid images) in this set and we’re going to see how well it was able to classify our images.

现在是关键时刻。现在该对我们所使用的数据集之外的图像使用分类器了。如果未对使用的数据集以外的任何内容进行分类，则该分类器是什么？此集合中有40张图像(20个Nendoroid和非Nendoroid图像)，我们将看到它对图像的分类效果如何。

Like before, we preprocess the images and then use the model.predict_classes() function to predict the class for each image.

和之前一样，我们对图像进行预处理，然后使用model.predict_classes()函数预测每个图像的类。

Looking at the accuracy and the confusion matrix, there are a lot less false positives and false negatives found, so I’d say it did its job.

从准确性和混淆矩阵的角度来看，发现的误报和误报要少得多，所以我想说它确实能做到。

Here are some of the images classified correctly and not correctly classified (0 indicates it’s not a Nendoroid and 1 indicates it is a Nendoroid).

以下是一些正确分类和未正确分类的图像(0表示不是黏土人，1表示是黏土人)。

结论 (Conclusion)

In short, we were able to make a classifier that was able to differentiate and classify between Nendoroids and Non-Nendoroid images using a Convolutional Neural Network. Even though this could have been done better with some slight tuning and modifications, we now know that it was effective in doing so.

简而言之，我们能够使用卷积神经网络进行分类，从而能够对黏土人和非黏土人图像进行区分和分类。即使通过一些微调和修改可以更好地做到这一点，我们现在知道这样做是有效的。

Overall, this was a fun experiment to learn about image preprocessing and classification as well as the inner workings of the Convolutional Neural Network. Here’s a reward for going through all of this.

总体而言，这是一个有趣的实验，旨在了解图像预处理和分类以及卷积神经网络的内部工作原理。这是对所有这些过程的奖励。

演示地址

If you like the Nendoroids, Gunplas, and figmas in this article and want to buy for yourself, you can go to their official sites to buy them.

如果您喜欢本文中的Nendoroids，Gunplas和figmas并希望自己购买，可以前往其官方网站购买。

If you like this article, please share it with others. I’ll be grateful if you do so. Feel free to give feedback about this and the notebook for this project will be on my Github page.

如果您喜欢这篇文章，请与他人分享。如果您愿意，我将不胜感激。随时提供有关此项目的反馈，该项目的笔记本将在我的Github页面上。

翻译自: https://towardsdatascience.com/identifying-nendoroids-using-convolutional-neural-networks-80aa08291aef

卷积神经网络情绪识别

weixin_26756255

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
卷积神经网络情绪识别_使用卷积神经网络识别黏土人

卷积神经网络情绪识别Nendoroids are a brand of action figures owned by the Good Smile Company. They are usually short in size or chibi-sized and cover a lot of characters from all sorts of mediums such as movie...
复制链接

扫一扫