ios apple语音性别_如何使用Apple的CoreML和Vision API构建图像识别iOS应用

ios apple语音性别

by Mark Mansur

马克·曼苏尔(Mark Mansur)

如何使用Apple的CoreML和Vision API构建图像识别iOS应用 (How to build an image recognition iOS app with Apple’s CoreML and Vision APIs)

With the release of CoreML and new Vision APIs at this year’s Apple World Wide Developers Conference, machine learning has never been easier to get into. Today I’m going to show you how to build a simple image recognition app.

在今年的Apple World Wide Developers Conference上,随着CoreML的发布和新的Vision API的出现,机器学习从未如此简单。 今天,我将向您展示如何构建一个简单的图像识别应用程序。

We will learn how to gain access to the iPhone’s camera and how to pass what the camera is seeing into a machine learning model for analysis. We’ll do all this programmatically, without the use of storyboards! Crazy, I know.

我们将学习如何访问iPhone的摄像头,以及如何将摄像头看到的内容传递到机器学习模型中进行分析。 我们将以编程方式完成所有这些操作,而无需使用情节提要! 疯狂,我知道。

Here is a look at what we are going to accomplish today:


?? 步骤1:创建一个新项目。 (?? Step 1: Create a new project.)

Fire up Xcode and create a new single view application. Give it a name, perhaps “ImageRecognition.” Choose swift as the main language and save your new project.

启动Xcode并创建一个新的单视图应用程序。 给它起一个名字,也许是“ ImageRecognition”。 选择swift作为主要语言,然后保存新项目。

? 第2步:告别分镜脚本。 (? Step 2 : Say goodbye to the storyboard.)

For this tutorial, we are going to do everything programmatically, without the need for the storyboard. Maybe I’ll explain why in another article.

对于本教程,我们将以编程方式进行所有操作,而无需情节提要。 也许我会在另一篇文章中解释原因。

Delete main.storyboard.


Navigate to info.plist and scroll down to Deployment Info. We need to tell Xcode we are no longer using the storyboard.

导航到info.plist 并向下滚动到“部署信息”。 我们需要告诉Xcode我们不再使用情节提要。

Delete the main interface.


Without the storyboard we need to manually create the app window and root view controller.


Add the following to the application() function in AppDelegate.swift:


We manually create the app window with UIWindow(), create our view controller, and tell the window to use it as its root view controller.

我们使用UIWindow()手动创建应用程序窗口, 创建我们的视图控制器,并告诉窗口将其用作其根视图控制器。

The app should now build and run without the storyboard ?


Step️步骤3:设置AVCaptureSession。 (⚙️ Step 3: Set up AVCaptureSession.)

Before we start, import UIKit, AVFoundation and Vision. The AVCaptureSession object handles capture activity and manages the flow of data between input devices (such as the rear camera) and outputs.

在开始之前,请导入UIKit,AVFoundation和Vision。 AVCaptureSession对象处理捕获活动,并管理输入设备(例如后置摄像头)和输出之间的数据流。

We are going to start by creating a function to setup our capture session.


Create setupCaptureSession() inside ViewController.swift and instantiate a new AVCaptureSession.

ViewController.swift创建setupCaptureSession() 并实例化一个新的AVCaptureSession

Don’t forget to call this new function from ViewDidLoad().


Next, we are going to need a reference to the rear view camera. We can use a DiscoverySession to query available capture devices based on our search criteria.

接下来,我们将需要参考后视摄像机。 我们可以使用DiscoverySession 根据我们的搜索条件查询可用的捕获设备。

Add the following code:


AvailableDevices now contains a list of available devices matching our search criteria.

AvailableDevices 现在包含符合我们搜索条件的可用设备列表。

We now need to gain access to our captureDevice and add it as an input to our captureSession.


Add an input to the capture session.


The first available device will be the rear facing camera. We create a new AVCaptureDeviceInput using our capture device and add it to the capture session.

第一个可用的设备将是后置摄像头。 我们创建一个新的AVCaptureDeviceInput 使用我们的捕获设备并将其添加到捕获会话。

Now that we have our input setup, we can get started on how to output what the camera is capturing.


Add a video output to our capture session.


AVCaptureVideoDataOutput is an output that captures video. It also provides us access to the frames being captured for processing with a delegate method we will see later.

AVCaptureVideoDataOutput是捕获视频的输出。 它还使我们可以访问捕获的帧,并使用稍后将介绍的委托方法进行处理。

Next, we need to add the capture session’s output as a sublayer to our view.


Add capture session output as a sublayer to the view controllers’ view.


We create a layer based on our capture session and add this layer as a sublayer to our view. CaptureSession.startRunning() starts the flow from inputs to the outputs that we connected earlier.

我们基于捕获会话创建一个图层,并将该图层作为子图层添加到视图中。 CaptureSession.startRunning()启动从输入到我们先前连接的输出的流。

? 步骤4:可以使用相机吗? 许可授予。 (? Step 4: Permission to use the camera? Permission granted.)

Nearly everyone has opened an app for the first time and has been prompted to allow the app to use the camera. Starting in iOS 10, our app will crash if we don’t prompt the user before attempting to access the camera.

几乎每个人都首次打开了一个应用程序,并被提示允许该应用程序使用相机。 从iOS 10开始,如果我们在尝试访问相机之前未提示用户,则我们的应用程序将崩溃。

Navigate to info.plist and add a new key named NSCameraUsageDescription. In the value column, simply explain to the user why your app needs camera access.

导航到info.plist 并添加一个名为NSCameraUsageDescription的新密钥。 在值列中,向用户简单说明为什么您的应用需要访问摄像头。

Now, when the user launches the app for the first time they will be prompted to allow access to the camera.


? 步骤5:获取模型。 (? Step 5: Getting the model.)

The heart of this project is most likely the machine learning model. The model must be able to take in an image and give us back a prediction of what the image is. You can find free trained models here. The one I chose is ResNet50.

该项目的核心很可能是机器学习模型。 该模型必须能够获取图像并向我们返回图像的预测。 您可以在这里找到免费的训练有素的模型。 我选择的是ResNet50。

Once you obtain your model, drag and drop it into Xcode. It will automatically generate the necessary classes, providing you an interface to interact with your model.

获得模型后,将其拖放到Xcode中。 它将自动生成必要的类,为您提供与模型进行交互的界面。

? 步骤6:图像分析。 (? Step 6: Image analysis.)

To analyze what the camera is seeing, we need to somehow gain access to the frames being captured by the camera.


Conforming to the AVCaptureVideoDataOutputSampleBufferDelegate gives us an interface to interact with and be notified every time a frame is captured by the camera.

符合AVCaptureVideoDataOutputSampleBufferDelegate 为我们提供了一个与相机进行交互的界面,并在每次相机捕获到帧时得到通知。

Conform ViewController to the AVCaptureVideoDataOutputSampleBufferDelegate.


We need to tell our Video output that ViewController is its sample buffer delegate.


Add the following line in SetupCaptureSession():


Add the following function:


Each time a frame is captured, the delegate is notified by calling captureOutput(). This is a perfect place to do our image analysis with CoreML.

每次捕获帧时,都会通过调用captureOutput()来通知委托。 这是使用CoreML进行图像分析的理想场所。

First, we create a VNCoreMLModel which is essentially a CoreML model used with the vision framework. We create it with a Resnet50 Model.

首先,我们创建一个VNCoreMLModel 它本质上是与视觉框架一起使用的CoreML模型。 我们使用Resnet50模型创建它。

Next, we create our vision request. In the completion handler, we update the onscreen UILabel with the identifier returned by the model. We then convert the frame passed to us from a CMSampleBuffer to a CVPixelBuffer. Which is the format our model needs for analysis.

接下来,我们创建愿景请求。 在完成处理程序中,我们使用模型返回的标识符更新屏幕上的UILabel。 然后,我们将传递给我们的帧从CMSampleBuffer转换为CVPixelBuffer 。 我们的模型需要哪种格式进行分析。

Lastly, we perform the Vision request with a VNImageRequestHandler.


? 步骤7:创建标签。 (? Step 7: Create a label.)

The last step is to create a UILabel containing the model’s prediction.


Create a new UILabel and position it using constraints.

创建一个新的UILabel 并使用约束对其进行定位。

Don’t forget to add the label as a subview and call setupLabel() from within ViewDidLoad().


You can download the completed project from GitHub here.


Like what you see? Give this post a thumbs up ?, follow me on Twitter, GitHub, or check out my personal page.

喜欢你看到的吗? 给这个帖子一个大拇指吗?,在T witter, G itHub上关注我或查看个人页面。


ios apple语音性别

已标记关键词 清除标记
评论将由博主筛选后显示,对所有人可见 | 还能输入1000个字符
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页