acp gcp_gcp automml视觉入门

acp gcp

是否想在编写代码之前就用数据对图像分类模型进行基准测试?(Want to benchmark your image classification model with data you have before even writing code?)

I was developing a model to do image classification of dog breeds and ran into few questions like how good is my data, what ratio to split it for train and test sets, and few other hyperparameters to decide on. So, I have chosen the GCP AutoML Vision cloud service which does end-end pipeline process from pre-processing the data to deploying the model. Below are the steps to follow to benchmark your model using GCP AutoML Vision

我正在开发一个模型来对犬种进行图像分类,并遇到一些问题,例如我的数据有多好,用于训练和测试集的比例如何以及其他几项需要确定的超参数。 因此,我选择了GCP AutoML Vision云服务,该服务执行从预处理数据到部署模型的末端管道流程。 以下是使用GCP AutoML Vision对模型进行基准测试的步骤

Step1: Data Cleaning

步骤1:资料清理

Since the data that I gathered through AWS is already in a structured format(Class of images in a folder and folder name as the class name of images), I need not have to do much of the data preparation steps. However, to match GCP’s upload validations for the image, I had to do the following to my data

由于我通过AWS收集的数据已经采用结构化格式(文件夹中的图像类别和文件夹名称作为图像的类别名称),因此我不必执行很多数据准备步骤。 但是,为了匹配GCP对图片的上传验证,我必须对数据执行以下操作

  • Remove the period from all file names

    从所有文件名中删除句点
  • Limit file names to be only 32 characters

    将文件名限制为仅32个字符

You can just upload the data with minimum structure and GCP tells you if you have to clean it.

您只需上传具有最小结构的数据,GCP就会告诉您是否必须清理它。

Step2: Go to https://console.cloud.google.com/ and from the menu bar navigate to Artificial Intelligence ->Vision

步骤2:转到https://console.cloud.google.com/,然后从菜单栏中导航至“人工智能->视觉”

Step3: Import images into GCP cloud

步骤3:将图片导入GCP云

Create a dataset and give it a name. From the import tab, upload all the data containing images and choose a destination to google cloud storage data bucket. If there is no existing one, create one using the option available.

创建一个数据集并为其命名。 在导入标签中,上传所有包含图像的数据,然后选择Google云存储数据存储区的目标位置。 如果不存在,请使用可用选项创建一个。

I have around 6500 images and the step took around 2–3 hours.

我有大约6500张图像,该步骤大约需要2到3个小时。

We get an email notification if the images are successfully uploaded or if there are any errors in our data format.

如果图像成功上传或我们的数据格式有任何错误,我们会收到一封电子邮件通知。

Step4: Verify your data

步骤4:验证您的数据

After successful upload, go to the IMAGES tab to see if the images are correctly imported as uploaded.

成功上传后,转到“图像”选项卡,查看图像是否已正确导入为上载。

Here we can select any images that are not labeled and can assign them one of the listed labels using the “Assign Labels” option as shown below.

在这里,我们可以选择未标记的任何图像,并可以使用“分配标签”选项将它们分配为列出的标签之一,如下所示。

Image for post

GCP automatically divides our dataset into Train, Test, and Valid datasets

GCP自动将我们的数据集分为训练,测试和有效数据集

We can see the stats using the “LABEL STATS” option available. For my dataset, it divided into 80% training data,10% valid data, and 10% test data.

我们可以使用“ LABEL STATS”选项查看统计信息。 对于我的数据集,它分为80%训练数据,10%有效数据和10%测试数据。

STEP4: Train your data

第4步:训练您的数据

After successful data upload, click on the TRAIN tab to train the data.

成功上传数据后,单击“训练”选项卡以训练数据。

The training process took me around 6–7 hours.

培训过程花了我大约6-7个小时。

Results:

结果:

The model completed the training part successfully and it gave me a 97% accuracy score with test data (that the GCP divided by itself) having precision 93.43% and recall 90.39% with a confidence threshold of 0.5

该模型成功完成了训练部分,它给了我97%的准确率,测试数据(GCP除以自身)的精度为93.43%,召回率为90.39%,置信度为0.5

We can adjust the confidence threshold to see how precision and recall vary.

我们可以调整置信度阈值,以查看准确性和召回率如何变化。

Below is the confusion matric obtained.

以下是获得的混淆矩阵。

Image for post

Some findings from the above matrix

以上矩阵的一些发现

The model can categorize dogs of the Greyhound class with 100% accuracy whereas it got confused with the Entlebucher mountain dog breed and Greater swiss mountain dog breed.

该模型可以以100%的准确度对灵狮类犬进行分类,但与Entlebucher山区犬种和Greater Switzerland山区犬种混淆了。

Image for post
Image for post
Greater swiss mountain dog Entlebucher mountain dog 大瑞士山狗Entlebucher山狗

Hmmm!Looks like even I can't figure out the difference exactly.No wonder the machine got confused!

嗯,甚至我也无法确切找出两者之间的差异,难怪机器会感到困惑!

With GCP AutoML, we also have the option to deploy the model, but be aware of the pricing provided in the pricing guide.

使用GCP AutoML,我们还可以选择部署模型,但是请注意定价指南中提供的定价。

I have not done the deployment part and was able to all the above stuff for no charge.

我还没有完成部署部分,因此能够免费完成上述所有工作。

Summary

概要

In this article, we outlined some of the key steps of utilizing an AutoML Vision service does, by learning how to clean up the data to match service requirements, how long it takes to train the data, and how to analyze the results given by the service.GCP AutoML services are a great way to train models on large datasets and our next steps will be to deploy this model.

在本文中,我们通过学习如何清理数据以满足服务需求,训练数据需要多长时间以及如何分析AutoML Vision服务给出的结果,概述了利用AutoML Vision服务所做的一些关键步骤。 GCP AutoML服务是在大型数据集上训练模型的好方法,我们的下一步将是部署此模型。

Hope this helps someone.

希望这对某人有帮助。

翻译自: https://medium.com/@sadhana.paladugu/getting-started-on-gcp-automml-vision-aa944a9ff307

acp gcp

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值