acp gcp
是否想在编写代码之前就用数据对图像分类模型进行基准测试?(Want to benchmark your image classification model with data you have before even writing code?)
I was developing a model to do image classification of dog breeds and ran into few questions like how good is my data, what ratio to split it for train and test sets, and few other hyperparameters to decide on. So, I have chosen the GCP AutoML Vision cloud service which does end-end pipeline process from pre-processing the data to deploying the model. Below are the steps to follow to benchmark your model using GCP AutoML Vision
我正在开发一个模型来对犬种进行图像分类,并遇到一些问题,例如我的数据有多好,用于训练和测试集的比例如何以及其他几项需要确定的超参数。 因此,我选择了GCP AutoML Vision云服务,该服务执行从预处理数据到部署模型的末端管道流程。 以下是使用GCP AutoML Vision对模型进行基准测试的步骤
Step1: Data Cleaning
步骤1:资料清理
Since the data that I gathered through AWS is already in a structured format(Class of images in a folder and folder name as the class name of images), I need not have to do much of the data preparation steps. However, to match GCP’s upload validations for the image, I had to do the following to my data
由于我通过AWS收集的数据已经采用结构化格式(文件夹中的图像类别和文件夹名称作为图像的类别名称),因此我不必执行很多数据准备步骤。 但是,为了匹配GCP对图片的上传验证,我必须对数据执行以下操作
- Remove the period from all file names从所有文件名中删除句点
- Limit file names to be only 32 characters将文件名限制为仅32个字符
You can just upload the data with minimum structure and GCP tells you if you have to clean it.
您只需上传具有最小结构的数据,GCP就会告诉您是否必须清理它。
Step2: Go to https://console.cloud.google.com/ and from the menu bar navigate to Artificial Intelligence ->Vision
步骤2:转到https://console.cloud.google.com/,然后从菜单栏中导航至“人工智能->视觉”
Step3: Import images into GCP cloud
步骤3:将图片导入GCP云
Create a dataset and give it a name. From the import tab, upload all the data containing images and choose a destination to google cloud storage data bucket. If there is no existing one, create one using the option available.
创建一个数据集并为其命名。 在导入标签中,上传所有包含图像的数据,然后选择Google云存储数据存储区的目标位置。 如果不存在,请使用可用选项创建一个。
I have around 6500 images and the step took around 2–3 hours.
我有大约6500张图像,该步骤大约需要2到3个小时。
We get an email notification if the images are successfully uploaded or if there are any errors in our data format.
如果图像成功上传或我们的数据格式有任何错误,我们会收到一封电子邮件通知。
Step4: Verify your data
步骤4:验证您的数据
After successful upload, go to the IMAGES tab to see if the images are correctly imported as uploaded.
成功上传后,转到“图像”选项卡,查看图像是否已正确导入为上载。
Here we can select any images that are not labeled and can assign them one of the listed labels using the “Assign Labels” option as shown below.
在这里,我们可以选择未标记的任何图像,并可以使用“分配标签”选项将它们分配为列出的标签之一,如下所示。
GCP automatically divides our dataset into Train, Test, and Valid datasets
GCP自动将我们的数据集分为训练,测试和有效数据集
We can see the stats using the “LABEL STATS” option available. For my dataset, it divided into 80% training data,10% valid data, and 10% test data.
我们可以使用“ LABEL STATS”选项查看统计信息。 对于我的数据集,它分为80%训练数据,10%有效数据和10%测试数据。
STEP4: Train your data
第4步:训练您的数据
After successful data upload, click on the TRAIN tab to train the data.
成功上传数据后,单击“训练”选项卡以训练数据。
The training process took me around 6–7 hours.
培训过程花了我大约6-7个小时。
Results:
结果:
The model completed the training part successfully and it gave me a 97% accuracy score with test data (that the GCP divided by itself) having precision 93.43% and recall 90.39% with a confidence threshold of 0.5
该模型成功完成了训练部分,它给了我97%的准确率,测试数据(GCP除以自身)的精度为93.43%,召回率为90.39%,置信度为0.5
We can adjust the confidence threshold to see how precision and recall vary.
我们可以调整置信度阈值,以查看准确性和召回率如何变化。
Below is the confusion matric obtained.
以下是获得的混淆矩阵。
Some findings from the above matrix
以上矩阵的一些发现
The model can categorize dogs of the Greyhound class with 100% accuracy whereas it got confused with the Entlebucher mountain dog breed and Greater swiss mountain dog breed.
该模型可以以100%的准确度对灵狮类犬进行分类,但与Entlebucher山区犬种和Greater Switzerland山区犬种混淆了。
Hmmm!Looks like even I can't figure out the difference exactly.No wonder the machine got confused!
嗯,甚至我也无法确切找出两者之间的差异,难怪机器会感到困惑!
With GCP AutoML, we also have the option to deploy the model, but be aware of the pricing provided in the pricing guide.
使用GCP AutoML,我们还可以选择部署模型,但是请注意定价指南中提供的定价。
I have not done the deployment part and was able to all the above stuff for no charge.
我还没有完成部署部分,因此能够免费完成上述所有工作。
Summary
概要
In this article, we outlined some of the key steps of utilizing an AutoML Vision service does, by learning how to clean up the data to match service requirements, how long it takes to train the data, and how to analyze the results given by the service.GCP AutoML services are a great way to train models on large datasets and our next steps will be to deploy this model.
在本文中,我们通过学习如何清理数据以满足服务需求,训练数据需要多长时间以及如何分析AutoML Vision服务给出的结果,概述了利用AutoML Vision服务所做的一些关键步骤。 GCP AutoML服务是在大型数据集上训练模型的好方法,我们的下一步将是部署此模型。
Hope this helps someone.
希望这对某人有帮助。
翻译自: https://medium.com/@sadhana.paladugu/getting-started-on-gcp-automml-vision-aa944a9ff307
acp gcp