芝士粉和芝士的区别
Once upon a time I was sitting in on an online rescue dog adoption event hosted by a rescue organization. They rescue dogs from the dog meat trade in Korea. Oftentimes, when a dog is being shown on camera, there’s always the question of “What breed is that dog?” and the usual response is “We don’t know exactly.” The keyword is “exactly” and the reason for this (as they explain many times) is the dogs get rescued from all over Korea from multiple farms.
曾几何时,我参加了由救援组织主办的在线救援犬收养活动。 他们从韩国的狗肉贸易中救出狗。 通常,当在相机上显示一条狗时,总会有一个问题:“那只狗是哪种犬种?” 通常的回答是“我们不清楚。” 关键字是“完全正确”,其原因(正如他们多次解释的那样)是狗从韩国各地的多个农场中被救出。
This inspired me to try to create a way to use a single image of a dog to correctly identify it’s breed. After doing some research, turns out people have already created a breed classifier already, but of course they don’t have all of the breeds. Therefore, I ended up trying to find a way to identify two breeds, Shiba Inu and Jindo, two common breeds I observed from the rescue adoption event that weren’t on any existing, available models.
这激发了我尝试创建一种使用狗的单个图像来正确识别其犬种的方法。 经过研究后,人们已经创建了一个品种分类器,但是当然他们并没有所有的品种。 因此,我最终试图找到一种方法来识别Shiba Inu和Jindo这两个犬种,这是我在救援收养事件中观察到的两个常见犬种,这些犬种没有任何现有的可用模型。
Since the goal of this was to use an image to do a classification, I know I am going to end up working with images as my data points and eventually be making a convolutional neural network (CNN).
因为这样做的目的是使用图像进行分类,所以我知道我最终将图像作为数据点,并最终建立了卷积神经网络(CNN)。
数据采集 (Data Collection)
The first step in this process was collecting data, which in this case were images. To do this I searched GoogleImages with the queries “Jindo dog” and “Shiba dog”. Afterwards I created an image scraper that would first use Selenium to scroll through the entire search and find the links to all the image. Each of the images had a distinct class tag and so I was able to create a big list of each image link that was eventually accessed using urllib3 library. I had to look through the images to make sure they were pictures of actual dogs, and in the end, I was able to gather about 300 images for each breed.
此过程的第一步是收集数据,在这种情况下为图像。 为此,我用查询“ Jindo dog”和“ Shiba dog”搜索了GoogleImages。 之后,我创建了一个图像抓取器,该图像抓取器将首先使用Selenium滚动浏览整个搜索并找到所有图像的链接。 每个图像都有一个不同的类标记,因此我能够为每个图像链接创建一个大列表,最终使用urllib3库对其进行访问。 我必须仔细检查图像,以确保它们是真实的狗的照片,最后,我能够为每个品种收集约300张图像。
数据扩充 (Data Augmentation)
Since making a CNN from scratch would require at least 900 images for each class, I had to conduct offline augmentation. This augmentation is done by taking the original images and randomly applying a combination of rotating, cropping, zooming and horizontal flipping to create a new image. To actually do to the data augmentation, I used the image generator function from the Keras library three times on a selection of the images from the proper directory holding all the images. Below is an example code I used.
由于从头开始制作CNN每一课至少需要900张图像,因此我不得不进行离线扩充。 通过拍摄原始图像并随机应用旋转,裁剪,缩放和水平翻转的组合来创建新图像来完成这种增强。 为了实际进行数据扩充,我从Keras库中使用了3次图像生成器功能,从包含所有图像的正确目录中选择图像。 以下是我使用的示例代码。
img_gen2 = ImageDataGenerator(rescale=1./255,
rotation_range=20,
zoom_range=0.2,
width_shift_range = 0.2,
horizontal_flip=True,
fill_mode= 'reflect')data_generatation_1 = img_gen1.flow_from_directory(
'.',
target_size=(224, 224),
batch_size = 604,
classes = ['jindo','shiba'],
class_mode = 'binary',
seed = random_seed, #random seed = 123
)
After doing this and saving it all, I ended up with a balanced data set with about 1800 images in total. Despite it being the image of the same dog, to the CNN, the new image could as well be the image of another dog. Basically, the end result is making CNN see a dog and the proper classification. To show this and give a sense of the data, augmented images images are shown below.
完成并保存所有内容后,我得到了一个平衡的数据集,总共包含约1800张图像。 尽管它是同一只狗的图像,但对于CNN而言,新图像也可能是另一只狗的图像。 基本上,最终结果是使CNN见到狗和正确的分类。 为了显示这一点并提供数据感,下面显示了增强图像。
造型 (Modeling)
After gathering enough images, I was finally able to use Keras to make some CNN models. I started with only four layers at first and got really overfit results or one class guesses when training under a few epochs. This made me realize that identifying dog breeds is much more complex and I would need more layers.
收集了足够的图像后,我终于能够使用Keras制作一些CNN模型。 一开始我只从四层开始,然后在几个时期进行训练时得到了非常适合的结果或一个班级的猜测。 这使我意识到识别狗的品种要复杂得多,因此我需要更多的层次。
I knew the eventual best thing to do time-wise was to apply transfer learning, but before doing that, I built a model that followed an AlexNet structure (a famous CNN design with 5 convolution layers and 3 fully connected) to get a taste of building. I had some help by following the code here, but I adjusted the dropout and learning rates to help with the overfit. Despite this, the model was still overfit, and so it was time to apply transfer learning.
我知道最终最好的做法是应用转移学习,但是在此之前,我建立了一个模型,该模型遵循AlexNet结构(著名的CNN设计,具有5个卷积层和3个完全连接的层),以了解建造。 通过遵循此处的代码,我得到了一些帮助,但我调整了辍学率和学习率,以帮助解决过拟合问题。 尽管如此,该模型仍然过拟合,因此是时候应用迁移学习了。
转移学习 (Transfer Learning)
First off, the benefit of transfer learning is not having to train your model from scratch, which would take a long time if you wanted to make a very accurate model. Transfer learning is taking a model that has already been trained and produced for a specific purpose, like classifying hot dogs and then used for a different purpose, like classifying dogs. There are two methods of transfer learning, first there is feature extraction, which takes the previously learned features of the previous model and outputs something entirely different. The second is fine tuning, which is basically taking whichever feature you desire, find the layer associated, and then use it. For this project, I ended up doing feature extraction by loading in the VGG16 model from Keras and setting freezing each layer with:
首先,迁移学习的好处是不必从头开始训练模型,如果您想制作一个非常准确的模型,则将花费很长时间。 转移学习采用已经针对特定目的(例如对热狗进行分类)进行训练和生产的模型,然后将其用于不同目的(例如对狗进行分类)。 转移学习有两种方法,第一种是特征提取,它采用先前模型的先前学习的特征,并输出完全不同的东西。 第二个是微调,它基本上是采用您想要的任何功能,找到关联的图层,然后使用它。 对于这个项目,我最终通过从Keras加载VGG16模型并通过冻结设置每一层来进行特征提取:
model = VGG16(include_top=False, input_shape=(224, 224, 3))# mark loaded layers as not trainablefor layer in model.layers:
layer.trainable = False
结果 (Results)
I ended up with a final model with an accuracy of 0.8333 and an F1 score of 0.8148.
我最终得到了精度为0.8333,F1得分为0.8148的最终模型。
Some of the properly classified images are below, and it doesn’t seem like it’s doing too bad of a job. This does show, however, that the data images I have aren’t exactly clean, which makes sense since I only tried to clean the data very quickly to make sure all the images were only of a single dog.
下面是一些经过适当分类的图像,看来它做得并不好。 但是,这确实表明我拥有的数据图像不是很干净,这是有道理的,因为我只是试图非常快地清理数据以确保所有图像都只有一条狗。
Some of the misclassified images are here:
一些误分类的图片在这里:
From looking at the misclassified and confusion matrix, it seems the model is having trouble identifying Shibas. Perhaps one potential cause could be there aren’t enough activation in the CNN to make the image know it’s a Shiba.
从分类错误和混乱的矩阵来看,该模型似乎难以识别Shibas。 可能的一个潜在原因可能是CNN中的激活不足以使图像知道它是Shiba。
结论 (Conclusion)
Unfortunately, since the CNN isn’t very transparent, one can’t see what the model is looking at to do the classifications. As a next step, doing a Grad-CAM can put a heat-map onto what parts of the images or shapes are being activated. After trying for a while, I figured this would be putting a visualization layer in somewhere before the output layer, but I haven’t successfully implemented it yet. Another next step would be to not limit this model to only Shiba and Jindo, and since there are already plenty of models out with breed classification, a cool thing to do would be to add it other models to make it more thorough.
不幸的是,由于CNN不够透明,因此无法看到模型在进行分类的目的。 下一步,执行Grad-CAM可以将热图放到正在激活图像或形状的哪些部分上。 经过一段时间的尝试,我认为这将在输出层之前的某个位置放置一个可视化层,但是我尚未成功实现它。 下一步是不将模型仅限于Shiba和Jindo,并且由于已经有很多具有品种分类的模型,一个很酷的事情是将其添加到其他模型中以使其更加透彻。
翻译自: https://medium.com/@richard.mei97/is-it-a-shiba-is-it-a-jindo-8b88a5f22387
芝士粉和芝士的区别