深度学习基础网络
Un-crumpling paper balls is what machine learning is all about, it seeks to find neat representations for complex, highly folded data manifolds.
U n所-皱纸球是什么机器学习是一回事,它试图找到整齐表示对于复杂的,高度折叠数据歧管。
Deep Learning takes the approach of incrementally decomposing a complicated geometric transformation into a long chain of elementary ones, which is essentially the strategy a human would follow to uncrumple a paper ball.
深度学习采用将复杂的几何变换逐步分解为一长串基本几何的方法,这实质上是人类为弄皱纸球所遵循的策略。
A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters — the layers.
深度学习模型就像数据处理的筛子一样,由一系列不断完善的数据过滤器(各层)组成。
What are layers?
什么是图层?
The core building block of a neural network is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation.
神经网络的核心构建模块是层,它是一个数据处理模块,您可以将其视为数据过滤器。 一些数据输入,并且以更有用的形式输出。 大多数深度学习都由将简单的层链接在一起构成,这些简单的层将实现渐进式数据提炼的形式。
Each layer in a deep network applies a transformation that disentangles the data a little — and a deep stack of layers makes tractable an extremely complicated disentanglement process.
深度网络中的每个层都应用了一种转换,该转换可以稍微解开数据—较深的层堆栈使易处理性变得极为复杂。
Some important glossary before we move on…
我们继续之前的一些重要词汇表...
Tensors: At its core, a tensor is a container for data — almost always numerical data. Tensors are a generalisation of matrices to an arbitrary number of dimensions (often called an axis). In general, all current machine-learning systems use tensors as their basic data structure. Tensors are fundamental to the field — so fundamental that Google’s TensorFlow was named after them!
张量:张量的核心是数据的容器-几乎总是数字数据。 张量是矩阵到任意数量维(通常称为轴)的一般化。 通常,所有当前的机器学习系统都使用张量作为其基本数据结构。 张量是该领域的基础-如此重要,以至于Google的TensorFlow以它们命名!
A tensor is defined by three key attributes:
张量由三个关键属性定义:
1. Number of axes — For instance, a 3D tensor has three axes. This is also called the tensor’s ndim in Python libraries such as NumPy.
1.轴数-例如,一个3D张量具有三个轴。 在Python库(例如NumPy)中,这也称为张量ndim。
2. Shape — This is a tuple of integers that describes how many dimensions the tensor has along each axis. For example, the a matrix (2D tensor) could be of shape (3, 5) and a 3D tensor could be of shape (3, 3, 5). A vector has a shape with a single element, such as (5,), whereas a scalar has an empty shape, ().
2.形状-这是一个整数元组,描述张量沿着每个轴的尺寸。 例如,矩阵(2D张量)可以具有形状(3、5),而3D张量可以具有形状(3、3、5)。 向量的形状只有一个元素,例如(5,),而标量的形状是空的()。
3. Data type (usually called dtype in Python libraries) — This is the type of the data contained in the tensor; for instance, a tensor’s type could be float32, uint8, float64, and so on. Note that string tensors don’t exist in NumPy (or in most other libraries). This is because tensors live in preallocated, contiguous memory segments and strings are of variable length, hence they would preclude the use of this implementation.
3.数据类型(在Python库中通常称为dtype)—这是张量中包含的数据的类型。 例如,张量的类型可以是float32,uint8,float64等。 请注意,NumPy(或大多数其他库)中不存在字符串张量。 这是因为张量存在于预分配的连续内存段中,并且字符串的长度可变,因此它们将阻止使用此实现。
The Dot Operation: The dot operation (also called a tensor product) is the most common and useful tensor operation. Contrary to element-wise product operations it combines entries in the input tensors.
点运算:点运算(也称为张量积)是最常见和有用的张量运算。 与逐元素乘积运算相反,它将输入张量中的条目合并。
Tensor Reshaping: Before feeding data into a network for training, it is advised that we pass the data through a pre-processing step. Often this involves reshaping the tensor which means rearranging the rows and columns to match a target shape. Naturally, the reshaped tensor has the same total number of coefficients as the initial tensor. A special case of reshaping that’s commonly encountered is transposition. Transposing a matrix means exchanging its rows and its columns, so that x[i, :] becomes x[:, i]:
张量重塑:在将数据输入网络进行训练之前,建议我们将数据通过预处理步骤。 通常,这涉及重塑张量,这意味着重新排列行和列以匹配目标形状。 自然,整形后的张量具有与初始张量相同的系数总数。 常见的重塑特殊情况是移调。 转置矩阵意味着交换其行和列,以使x [i,:]变为x [:, i]:
Neural networks consist entirely of chains of tensor operations and all of these tensor operations are just geometric transformations of the input data. It follows that you can interpret a neural network as a very complex geometric transformation in a high-dimensional space, implemented via a long series of simple steps.
神经网络完全由张量运算链组成,所有这些张量运算只是输入数据的几何变换。 因此,您可以将神经网络解释为高维空间中非常复杂的几何变换,可以通过一系列简单步骤来实现。
Training, Learning, Weights and a brief introduction to Loss Functions, Optimisers and Metrics.
培训,学习,权重以及损失函数,优化器和指标的简要介绍。
Preparing a network for training:
准备培训网络:
To make a network ready for training, we need to pick three important things, as part of the compilation step.
为了使网络准备好进行培训,我们需要选择三项重要的内容,作为编译步骤的一部分。
1. A loss function — How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
1.损失函数-网络将如何在训练数据上衡量其性能,从而如何将自己引导到正确的方向。
2. An optimizer — The mechanism through which the network will update itself based on the data it sees and its loss function.
2.优化器—网络根据其看到的数据及其丢失功能更新自身的机制。
3. Metrics to monitor during training and testing — An example of a metric is accuracy.
3.在训练和测试过程中要监视的度量标准—度量标准的一个示例是准确性。
We’re now ready to train the network, which in Keras is for example done via a call to the network’s fit method — we fit the model to its training data.
我们现在准备训练网络,例如在Keras中,通过调用网络的fit方法来完成-我们将模型拟合为其训练数据。
However, before training, we’ll pre-process the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. (A process called normalisation)
但是,在训练之前,我们将通过将数据重塑为网络期望的形状并对其进行缩放以使其所有值都在[0,1]区间内来进行预处理。 (称为标准化的过程)
Keep in mind that in training, two important quantities are displayed:
请记住,在培训中,将显示两个重要的数量:
- The loss of the network over the training data 通过训练数据丢失网络
- The accuracy of the network over the training data.网络在训练数据上的准确性。
output = relu (dot (W, input) + b)
输出= relu(点(W,输入)+ b)
In the above expression, W and b are tensors that are attributes of the layer. They’re called the weights or trainable parameters of the layer (the kernel and bias attributes, respectively). These weights contain the information learned by the network from exposure to training data.
在上面的表达式中,W和b是作为层属性的张量。 它们被称为层的权重或可训练参数(分别是内核和偏向属性)。 这些权重包含网络从接触培训数据中学到的信息。
Initially, these weight matrices are filled with small random values (a step called random initialisation). Of course, there’s no reason to expect that relu (dot (W, input) + b), when W and b are random, will yield any useful representations. The resulting representations are meaningless — but they’re a starting point.
最初,这些权重矩阵填充有较小的随机值(称为随机初始化的步骤)。 当然,没有理由期望relu(点(W,输入)+ b)在W和b为随机数时会产生任何有用的表示形式。 由此产生的表示形式毫无意义-但这是一个起点。
What comes next is to gradually adjust these weights, based on a feedback signal from the loss function. The process within which this learning happens is called a training loop.
接下来是基于损失函数的反馈信号逐步调整这些权重。 这种学习发生的过程称为训练循环。
The Training Loop in Steps:
训练循环步骤:
- Draw a batch of training samples x and corresponding targets y. 绘制一批训练样本x和相应的目标y。
- Run the network on x (a step called the forward pass) to obtain predictions y_pred. 在x上运行网络(称为正向传递的步骤),以获得预测y_pred。
- Compute the loss of the network on the batch, a measure of the mismatch between y_pred and y. 计算批次上的网络损失,这是y_pred和y之间不匹配的度量。
- Update all weights of the network in a way that slightly reduces the loss on this batch. 更新网络的所有权重,以略微减少此批次的损失。
- Repeat until model starts to show results. 重复直到模型开始显示结果。
After repeating steps 1 to 4 for a previously defined epoch number, you’ll eventually end up with a network that has a very low loss on its training data: therefore a low mismatch between predictions y_pred and expected targets y. This would confirm that the network has “learned” to map its inputs to correct targets.
对先前定义的时期数重复步骤1到4之后,您最终将得到一个网络,该网络的训练数据损失非常低:因此,预测y_pred与预期目标y之间的失配率很小。 这将确认网络已“学习”将其输入映射到正确的目标。
深度学习基础网络