手语翻译系统
The author selected Code Org to receive a donation as part of the Write for DOnations program.
作者选择Code Org接受捐赠,这是Write for DOnations计划的一部分。
介绍 (Introduction)
Computer vision is a subfield of computer science that aims to extract a higher-order understanding from images and videos. This powers technologies such as fun video chat filters, your mobile device’s face authenticator, and self-driving cars.
计算机视觉是计算机科学的一个子领域,旨在从图像和视频中提取更高层次的理解。 这推动了诸如有趣的视频聊天过滤器,移动设备的面部识别器和自动驾驶汽车等技术的发展。
In this tutorial, you’ll use computer vision to build an American Sign Language translator for your webcam. As you work through the tutorial, you’ll use OpenCV
, a computer-vision library, PyTorch
to build a deep neural network, and onnx
to export your neural network. You’ll also apply the following concepts as you build a computer-vision application:
在本教程中,您将使用计算机视觉为网络摄像头构建American Sign Language转换器。 在学习本教程的过程中,将使用OpenCV
(一个计算机视觉库), PyTorch
构建一个深层神经网络,并使用onnx
导出您的神经网络。 在构建计算机视觉应用程序时,您还将应用以下概念:
You’ll use the same three-step method as used in How To Apply Computer Vision to Build an Emotion-Based Dog Filter tutorial: preprocess a dataset, train a model, and evaluate the model.
您将使用与如何应用计算机视觉来构建基于情感的狗过滤器教程中所用的三步方法相同的方法:预处理数据集,训练模型并评估模型。
- You’ll also expand each of these steps: employ data augmentation to address rotated or non-centered hands, change learning rate schedules to improve model accuracy, and export models for faster inference speed. 您还将扩展这些步骤中的每一个步骤:利用数据增强来解决旋转或不居中的手,更改学习率时间表以提高模型准确性,并导出模型以加快推理速度。
Along the way, you’ll also explore related concepts in machine learning.
在此过程中,您还将探索机器学习中的相关概念。
By the end of this tutorial, you’ll have both an American Sign Language translator and foundational deep learning know-how. You can also access the complete source code for this project.
在本教程结束时,您将拥有美国手语翻译者和基础深度学习知识。 您也可以访问该项目的完整源代码 。
先决条件 (Prerequisites)
To complete this tutorial, you will need the following:
要完成本教程,您将需要以下内容:
A local development environment for Python 3 with at least 1GB of RAM. You can follow How to Install and Set Up a Local Programming Environment for Python 3 to configure everything you need.
具有至少1GB RAM的Python 3本地开发环境。 您可以按照如何为Python 3安装和设置本地编程环境来配置所需的一切。
- A working webcam to do real-time image detection. 可以进行实时图像检测的有效网络摄像头。
(Recommended) Build an Emotion-Based Dog Filter; this tutorial is not explicitly used but the same ideas are reinforced and built upon.
(推荐) 构建基于情感的狗过滤器 ; 本教程未明确使用,但相同的思想得到了加强和建立。
第1步-创建项目并安装依赖项 (Step 1 — Creating the Project and Installing Dependencies)
Let’s create a workspace for this project and install the dependencies we’ll need.
让我们为该项目创建一个工作区并安装所需的依赖项。
On Linux distributions, start by preparing your system package manager and install the Python3 virtualenv package. Use:
在Linux发行版上,首先准备系统软件包管理器并安装Python3 virtualenv软件包。 用:
- apt-get update apt-get更新
- apt-get upgrade apt-get升级
- apt-get install python3-venv apt-get安装python3-venv
We’ll call our workspace SignLanguage
:
我们将工作区SignLanguage
:
- mkdir ~/SignLanguage mkdir〜/ SignLanguage
Navigate to the SignLanguage
directory:
导航到SignLanguage
目录:
- cd ~/SignLanguage cd〜/ SignLanguage
Then create a new virtual environment for the project:
然后为项目创建一个新的虚拟环境:
python3 -m venv signlanguage
python3 -m venv 手语
Activate your environment:
激活您的环境:
source signlanguage/bin/activate
源手语 / bin / activate
Then install PyTorch, a deep-learning framework for Python that we’ll use in this tutorial.
然后安装PyTorch ,这是我们在本教程中将使用的Python深度学习框架。
On macOS, install Pytorch with the following command:
在macOS上,使用以下命令安装Pytorch:
- python -m pip install torch==1.2.0 torchvision==0.4.0 python -m pip install torch == 1.2.0 torchvision == 0.4.0
On Linux and Windows, use the following commands for a CPU-only build:
在Linux和Windows上,对仅CPU的构建使用以下命令:
- pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html pip install torch == 1.2.0 + cpu torchvision == 0.4.0 + cpu -f https://download.pytorch.org/whl/torch_stable.html
- pip install torchvision pip安装torchvision
Now install prepackaged binaries for OpenCV
, numpy
, and onnx
, which are libraries for computer vision, linear algebra, AI model exporting, and AI model execution, respectively. OpenCV
offers utilities such as image rotations, and numpy
offers linear algebra utilities such as a matrix inversion:
现在为OpenCV
, numpy
和onnx
安装预打包的二进制文件,它们分别是计算机视觉,线性代数,AI模型导出和AI模型执行的库。 OpenCV
提供诸如图像旋转之类的实用程序,而numpy
提供诸如矩阵求逆之类的线性代数实用程序:
- python -m pip install opencv-python==3.4.3.18 numpy==1.14.5 onnx==1.6.0 onnxruntime==1.0.0 python -m pip install opencv-python == 3.4.3.18 numpy == 1.14.5 onnx == 1.6.0 onnxruntime == 1.0.0
On Linux distributions, you will need to install libSM.so
:
在Linux发行版上,您将需要安装libSM.so
:
- apt-get install libsm6 libxext6 libxrender-dev apt-get安装libsm6 libxext6 libxrender-dev
With the dependencies installed, let’s build the first version of our sign language translator: a sign language classifier.
安装依赖项后,让我们构建手语翻译器的第一个版本:手语分类器。
第2步-准备手语分类数据集 (Step 2 — Preparing the Sign Language Classification Dataset)
In these next three sections, you’ll build a sign language classifier using a neural network. Your goal is to produce a model that accepts a picture of a hand as input and outputs a letter.
在接下来的三部分中,您将使用神经网络构建手语分类器。 您的目标是产生一个模型,该模型接受一只手的图片作为输入并输出一个字母。
The following three steps are required to build a machine learning classification model:
建立机器学习分类模型需要以下三个步骤:
Preprocess the data: Apply one-hot encoding to your labels and wrap your data in PyTorch Tensors. Train your model on augmented data to prepare it for “unusual” input, like an off-center or rotated hand.
预处理数据:对标签应用一键编码 ,然后将数据包装在PyTorch张量中。 在增强数据上训练模型,以使其为“异常”输入做好准备,例如偏心或旋转手。
Specify and train the model: Set up a neural network using PyTorch. Define training hyper-parameters—such as how long to train for—and run stochastic gradient descent. You’ll also vary a specific training hyper-parameter, which is learning rate schedule. These will boost model accuracy.
指定并训练模型:使用PyTorch建立神经网络。 定义训练超参数(例如,训练时间)并进行随机梯度下降。 您还将更改特定的训练超参数,即学习率计划。 这些将提高模型的准确性。
- Run a prediction using the model: Evaluate the neural network on your validation data to understand its accuracy. Then, export the model to a format called ONNX for faster inference speeds. 使用模型进行预测:在验证数据上评估神经网络以了解其准确性。 然后,将模型导出为一种称为ONNX的格式,以加快推理速度。
In this section of the tutorial, you will accomplish step 1 of 3. You will download the data, create a Dataset
object to iterate over your data, and finally apply data augmentation. At the end of this step, you will have a programmatic way of accessing images and labels in your dataset to feed to your model.
在本教程的这一部分中,您将完成第1步(共3步)。您将下载数据,创建一个Dataset
对象以遍历您的数据,最后应用数据扩充 。 在此步骤的最后,您将以编程方式访问数据集中的图像和标签以馈入模型。
First, download the dataset to your current working directory:
首先,将数据集下载到当前工作目录:
Note: On macOS, wget
is not available by default. To do so, install Homebrew by following this DigitalOcean tutorial. Then, run brew install wget
.
注意 :在macOS上,默认情况下wget
不可用。 为此,请按照此DigitalOcean教程安装Homebrew。 然后,运行brew install wget
。
- wget https://assets.digitalocean.com/articles/signlanguage_data/sign-language-mnist.tar.gz wget https://assets.digitalocean.com/articles/signlanguage_data/sign-language-mnist.tar.gz
Unzip the zip file, which contains a data/
directory:
解压缩包含data/
目录的压缩文件:
- tar -xzf sign-language-mnist.tar.gz tar -xzf sign-language-mnist.tar.gz
Create a new file, named step_2_dataset.py
:
创建一个名为step_2_dataset.py
的新文件:
- nano step_2_dataset.py 纳米step_2_dataset.py
As before, import the necessary utilities and create the class that will hold your data. For data processing here, you will create the train and test datasets. You’ll implement PyTorch’s Dataset
interface, allowing you to load and use PyTorch’s built-in data pipeline for your sign language classification dataset:
和以前一样,导入必要的实用程序并创建将保存您的数据的类。 对于此处的数据处理,您将创建训练和测试数据集。 您将实现PyTorch的Dataset
接口,从而允许您为手语分类数据集加载和使用PyTorch的内置数据管道:
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
impo