by Kevin Scott
凯文·斯科特(Kevin Scott)
如何在Tensorflow.js中处理MNIST图像数据 (How to deal with MNIST image data in Tensorflow.js)
There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data … data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does.
有人开玩笑说,80%的数据科学正在清理数据,20%的人们抱怨清理数据……数据清理在数据科学中所占的比例比外界预期的要高得多。 实际上,训练模型通常只占机器学习者或数据科学家所做工作的一小部分(不到10%)。
There’s the joke that 80 percent of data science is cleaning the data and 20 percent is complaining about cleaning the data … data cleaning is a much higher proportion of data science than an outsider would expect. Actually training models is typically a relatively small proportion (less than 10 percent) of what a machine learner or data scientist does.
有人开玩笑说,80%的数据科学正在清理数据,20%的人们抱怨清理数据……数据清理比外部人期望的要高得多。 实际上,训练模型通常只占机器学习者或数据科学家所做工作的一小部分(不到10%)。
Manipulating data is a crucial step for any machine learning problem. This article will take the MNIST example for Tensorflow.js (0.11.1), and walk through the code that handles the data loading line-by-line.
对于任何机器学习问题,处理数据都是至关重要的一步。 本文将以Tensorflow.js(0.11.1)的MNIST示例为例 ,并逐行介绍处理数据加载的代码。
MNIST示例 (MNIST example)
18 import * as tf from '@tensorflow/tfjs';1920 const IMAGE_SIZE = 784;21 const NUM_CLASSES = 10;22 const NUM_DATASET_ELEMENTS = 65000;2324 const NUM_TRAIN_ELEMENTS = 55000;25 const NUM_TEST_ELEMENTS = NUM_DATASET_ELEMENTS - NUM_TRAIN_ELEMENTS;2627 const MNIST_IMAGES_SPRITE_PATH =28 'https://storage.googleapis.com/learnjs-data/model-builder/mnist_images.png';29 const MNIST_LABELS_PATH =30 'https://storage.googleapis.com/learnjs-data/model-builder/mnist_labels_uint8';`
First, the code imports Tensorflow (make sure you’re transpiling your code!), and establishes some constants, including:
首先,代码导入Tensorflow (确保您正在编译代码!) ,并建立一些常量,包括:
IMAGE_SIZE
– the size of an image (width and height of 28x28 = 784)IMAGE_SIZE
–图片大小(宽度和高度28x28 = 784)NUM_CLASSES
– number of label categories (a number can be 0-9, so there's 10 classes)NUM_CLASSES
–标签类别的数量(一个数字可以是0-9,因此有10个类别)NUM_DATASET_ELEMENTS
– number of images total (65,000)NUM_DATASET_ELEMENTS
–图像总数(65,000)NUM_TRAIN_ELEMENTS
– number of training images (55,000)NUM_TRAIN_ELEMENTS
–训练图像数(55,000)NUM_TEST_ELEMENTS
– number of test images (10,000, aka the remainder)NUM_TEST_ELEMENTS
–测试图像的数量(10,000,也称为余数)MNIST_IMAGES_SPRITE_PATH
&MNIST_LABELS_PATH
– paths to the images and the labelsMNIST_IMAGES_SPRITE_PATH
和MNIST_LABELS_PATH
–图像和标签的路径
The images are concatenated into one huge image which looks like:
这些图像被串联成一个巨大的图像,看起来像:
MNISTData
(MNISTData
)
Next up, starting on line 38, is MnistData
, a class that exposes the following functions:
接下来,从第38行开始是MnistData
,该类提供以下功能:
load
– responsible for asynchronously loading the image and labeling dataload
–负责异步加载图像和标签数据nextTrainBatch
– load the next training batchnextTrainBatch
加载下一个训练批次nextTestBatch
– load the next test batchnextTestBatch
–加载下一个测试批次nextBatch
– a generic function to return the next batch, depending on whether it is in the training set or test setnextBatch
–返回下一批的通用函数,具体取决于它在训练集中还是在测试集中
For the purposes of getting started, this article will only go through the load
function.
为了入门,本文将仅介绍load
函数。
load
(load
)
44 async load() {45 // Make a request for the MNIST sprited image.46 const img = new Image();47 const canvas = document.createElement('canvas');48 const ctx = canvas.getContext('2d');
async
is a relatively new language feature in Javascript for which you will need a transpiler.
async
是Javascript中相对较新的语言功能 ,您需要使用该功能。
The Image
object is a native DOM function that represents an image in memory. It provides callbacks for when the image is loaded along, with access to the image attributes. canvas
is another DOM element that provides easy access to pixel arrays and processing by way of context
.
Image
对象是本机DOM函数,表示内存中的图像。 它提供了在加载图像时的回调以及对图像属性的访问。 canvas
是另一个DOM元素,可以通过context
轻松访问像素数组和进行处理。
Since both of these are DOM elements, if you’re working in Node.js (or a Web Worker) you won’t have access to these elements. For an alternative approach, see below.
由于这两个都是DOM元素,因此,如果您在Node.js(或Web Worker)中工作,则将无法访问这些元素。 有关替代方法,请参见下文 。
imgRequest
(imgRequest
)
49 const imgRequest = new Promise((resolve, reject) => {50 img.crossOrigin = '';51 img.onload = () => {52 img.width = img.naturalWidth;53 img.height = img.naturalHeight;
The code initializes a new promise that will be resolved once the image is loaded successfully. This example does not explicitly handle the error state.
该代码初始化一个新的Promise,一旦成功加载图像,该Promise将被解决。 本示例未明确处理错误状态。
crossOrigin
is an img
attribute that allows for the loading of images across domains, and gets around CORS (cross-origin resource sharing) issues when interacting with the DOM. naturalWidth
and naturalHeight
refer to the original dimensions of the loaded image, and serve to enforce that the image's size is correct when performing calculations.
crossOrigin
是一个img
属性,它允许跨域加载图像,并且在与DOM交互时crossOrigin
了CORS(跨域资源共享)问题。 naturalWidth
和naturalHeight
是指加载的图像的原始尺寸,用于在执行计算时强制图像的大小正确。
55 const datasetBytesBuffer =56 new ArrayBuffer(NUM_DATASET_ELEMENTS * IMAGE_SIZE * 4);5758 const chunkSize = 5000;59 canvas.width = img.width;60 canvas.height = chunkSize;
The code initializes a new buffer to contain every pixel of every image. It multiplies the total number of images by the size of each image by the number of channels (4).
该代码初始化一个新缓冲区,以包含每个图像的每个像素。 它将图像总数乘以每个图像的大小乘以通道数(4)。
I believe that chunkSize
is used to prevent the UI from loading too much data into memory at once, though I'm not 100% sure.
我相信 , chunkSize
用于阻止加载太多的数据UI到内存中一次,虽然我不是100%肯定。
62 for (let i = 0; i < NUM_DATASET_ELEMENTS / chunkSize; i++) {63 const datasetBytesView = new Float32Array(64 datasetBytesBuffer, i * IMAGE_SIZE * chunkSize * 4,65 IMAGE_SIZE * chunkSize);66 ctx.drawImage(67 img, 0, i * chunkSize, img.width, chunkSize, 0, 0, img.width,68 chunkSize);6970 const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height);
This code loops through every image in the sprite and initializes a new TypedArray
for that iteration. Then, the context image gets a chunk of the image drawn. Finally, that drawn image is turned into image data using context's getImageData
function, which returns an object representing the underlying pixel data.
此代码循环遍历子画面中的每个图像,并为该迭代初始化一个新的TypedArray
。 然后,上下文图像将获得绘制图像的一部分。 最后,使用上下文的getImageData
函数将该绘制的图像转换为图像数据,该函数返回一个表示基础像素数据的对象。
72 for (let j = 0; j < imageData.data.length / 4; j++) {73 // All channels hold an equal value since the image is grayscale, so74 // just read the red channel.75 datasetBytesView[j] = imageData.data[j * 4] / 255;76 }77 }
We loop through the pixels, and divide by 255 (the maximum possible value of a pixel) to clamp the values between 0 and 1. Only the red channel is necessary, since it’s a grayscale image.
我们遍历像素,然后除以255(像素的最大可能值)以将值限制在0和1之间。由于红色通道是灰度图像,因此仅需要红色通道。
78 this.datasetImages = new Float32Array(datasetBytesBuffer);7980 resolve();81 };82 img.src = MNIST_IMAGES_SPRITE_PATH;83 });
This line takes the buffer, recasts it into a new TypedArray
that holds our pixel data, and then resolves the Promise. The last line (setting the src
) actually begins loading the image, which starts the function.
这行代码将缓冲区,将其TypedArray
到容纳我们的像素数据的新TypedArray
中,然后解析Promise。 最后一行(设置src
)实际上开始加载图像,从而启动功能。
One thing that confused me at first was the behavior of TypedArray
in relation to its underlying data buffer. You might notice that datasetBytesView
is set within the loop, but is never returned.
一开始让我感到困惑的是TypedArray
与其底层数据缓冲区有关的行为。 您可能会注意到, datasetBytesView
是在循环内设置的,但是从不返回。
Under the hood, datasetBytesView
is referencing the buffer datasetBytesBuffer
(with which it is initialized). When the code updates the pixel data, it is indirectly editing the values of the buffer itself, which in turn is recast into a new Float32Array
on line 78.
在datasetBytesView
, datasetBytesView
引用了缓冲区datasetBytesBuffer
(用于对其进行初始化)。 当代码更新像素数据时,它正在间接编辑缓冲区本身的值,然后将其Float32Array
到第78行的新Float32Array
。
在DOM之外获取图像数据 (Fetching image data outside of the DOM)
If you’re in the DOM, you should use the DOM. The browser (through canvas
) takes care of figuring out the format of images and translating buffer data into pixels. But if you're working outside the DOM (say, in Node.js, or a Web Worker), you'll need an alternative approach.
如果您在DOM中,则应使用DOM。 浏览器(通过canvas
)负责确定图像的格式并将缓冲区数据转换为像素。 但是,如果您在DOM之外工作(例如,在Node.js或Web Worker中),则需要另一种方法。
fetch
provides a mechanism, response.arrayBuffer
, which gives you access to a file's underlying buffer. We can use this to read the bytes manually, avoiding the DOM entirely. Here's an alternative approach to writing the above code (this code requires fetch
, which can be polyfilled in Node with something like isomorphic-fetch
):
fetch
提供了一种机制response.arrayBuffer
,使您可以访问文件的基础缓冲区。 我们可以使用它来手动读取字节,从而完全避免使用DOM。 这是编写以上代码的另一种方法(此代码需要fetch
,可以将它用isomorphic-fetch
类的东西填充到Node中):
const imgRequest = fetch(MNIST_IMAGES_SPRITE_PATH).then(resp => resp.arrayBuffer()).then(buffer => { return new Promise(resolve => { const reader = new PNGReader(buffer); return reader.parse((err, png) => { const pixels = Float32Array.from(png.pixels).map(pixel => { return pixel / 255; }); this.datasetImages = pixels; resolve(); }); });});
This returns an array buffer for the particular image. When writing this, I first attempted to parse the incoming buffer myself, which I wouldn’t recommend. (If you are interested in doing that, here’s some information on how to read an array buffer for a png.) Instead, I elected to use pngjs
, which handles the png
parsing for you. When dealing with other image formats, you'll have to figure out the parsing functions yourself.
这将返回特定图像的数组缓冲区。 在编写此代码时,我首先尝试自己解析传入的缓冲区,我不建议这样做。 (如果您对此感兴趣, 这里有一些有关如何读取png数组缓冲区的信息 。)相反,我选择使用pngjs
,它为您处理png
解析。 处理其他图像格式时,您必须自己弄清楚解析函数。
只是划伤表面 (Just scratching the surface)
Understanding data manipulation is a crucial component of machine learning in JavaScript. By understanding our use cases and requirements, we can use a few key functions to elegantly format our data correctly for our needs.
了解数据操作是JavaScript机器学习的重要组成部分。 通过了解我们的用例和需求,我们可以使用一些关键功能来优雅地正确格式化我们的数据以满足我们的需求。
The Tensorflow.js team is continuously changing the underlying data API in Tensorflow.js. This can help accommodate more of our needs as the API evolves. This also means that it’s worth staying abreast of developments to the API as Tensorflow.js continues to grow and be improved.
Tensorflow.js团队正在不断更改Tensorflow.js中的基础数据API。 随着API的发展,这可以帮助满足我们的更多需求。 这也意味着,随着Tensorflow.js的持续增长和改进,有必要紧跟API的发展 。
Originally published at thekevinscott.com
最初发布于thekevinscott.com
Special thanks to Ari Zilnik.
特别感谢Ari Zilnik 。
翻译自: https://www.freecodecamp.org/news/how-to-deal-with-mnist-image-data-in-tensorflow-js-169a2d6941dd/