MiniFlow，帮助理解TensorFlow关键概念--图

最新推荐文章于 2022-03-08 09:25:11 发布

芥末的无奈

最新推荐文章于 2022-03-08 09:25:11 发布

阅读量2.9k

点赞数 6

分类专栏： tensorflow 深度学习文章标签： tensorflow

本文链接：https://blog.csdn.net/weiwei9363/article/details/78700152

版权

深度学习同时被 2 个专栏收录

14 篇文章 2 订阅

订阅专栏

tensorflow

8 篇文章 0 订阅

订阅专栏

MiniFlow

在学习TensorFlow之前，让我们学习可微分图（Differentiable graphs），这是TensorFlow用于运行和训练网络的基本抽象。我们将构建一个叫MiniFlow小型库，在这个过程中，我们将逐渐理解可微分图

完整代码在这里

Graph

一个神经网络，可以看成是一张图，这张图由数学函数组成，比如线性组合，激活函数之类的。图中包含 点(nodes) 和 边(edges) 。

点，可以看成是一个数学函数，利用上一层的输出作为输入。例如一个点可以表示为 $f(x,y) = x+y$ ， $x,y$ 为上一层的输出，并作为该点的输入。

边，是点与点之间的连接，使得值能在图中传播。

MiniFlow Architecture

让我们开始构建MiniFLow。用Node表示一个点，Node接受一组输入，并计算出一个值。

class Node(object):
    def __init__(self, inbound_nodes=[]):
        # 保存输入
        self.inbound_nodes = inbound_nodes
        # 定义输出
        self.outbound_nodes = []
        # 对于每一个输入，其输出就是这个点(有点绕，看代码)，进行点与点之间的连接
        for n in self.inbound_nodes:
            n.outbound_nodes.append(self)

        # 计算出来的值
        self.value = None

对于每一个Node，它都应该能够向前(forward)或者向后(backward)传播。

class Node(object):
    def __init__(self, inbound_nodes=[]):
        # 保存输入
        self.inbound_nodes = inbound_nodes
        # 定义输出
        self.outbound_nodes = []
        # 对于每一个输入，其输出就是这个点(有点绕，看代码)，进行点与点之间的连接
        for n in self.inbound_nodes:
            n.outbound_nodes.append(self)

        # 计算出来的值
        self.value = None

    def forward(self):
        """
        向前传播

        利用inbound_nodes计算输出值
        """
        raise NotImplemented

The Input Subclass

Node定义了点的基本熟悉和操作，然而不同类别的节点具有的操作也是不同的。例如，我们来实现Node的一个子类，Input。

与Node的其他子类不同，Input并没有计算任何东西，只是存放了一个value，这个value可以是数据的特征或者一个权重

class Input(Node):
    def __init__(self):
        # Input 节点没有输入节点（它就是输入节点）
        # 因此输入是空
        Node.__init__(self)

    # Input 节点是唯一一种不需要输入的节点，其他类型的节点在做forward的时候都需要用到上一层的输入
    def forward(self, value=None):
        if value is not None:
            self.value = value

The Add Subclass

Add 节点，计算多个数的和

class Add(Node):
    def __init__(self, *inputs):
        Node.__init__(self, list(inputs))

    def forward(self):
        self.value = 0
        for x in self.inbound_nodes:
            self.value += x.value

Forward propagation

好的，让我们来测试下我们目前所写的代码。在这之前，我们先说一说拓扑排序(topological sort)。

为了定义我们的神经网络，我们就要定义图的运算顺序。考虑到节点与节点之间的相互依赖的关系，我们需要将图”扁平”，拓扑排序就是干这个的，如下图所示。拓扑排序如何实现的，并不重要，我们只要知道它的作用就行了。

我们将用topological_sort()对图进行”扁平”化，它接受一个feed_dict，在python中用字典实现，接下来我们演示下如何使用

def topological_sort(feed_dict):
    """
    Sort generic nodes in topological order using Kahn's Algorithm.

    `feed_dict`: A dictionary where the key is a `Input` node and the value is the respective value feed to that node.

    Returns a list of sorted nodes.
    """

    input_nodes = [n for n in feed_dict.keys()]

    G = {}
    nodes = [n for n in input_nodes]
    while len(nodes) > 0:
        n = nodes.pop(0)
        if n not in G:
            G[n] = {'in': set(), 'out': set()}
        for m in n.outbound_nodes:
            if m not in G:
                G[m] = {'in': set(), 'out': set()}
            G[n]['out'].add(m)
            G[m]['in'].add(n)
            nodes.append(m)

    L = []
    S = set(input_nodes)
    while len(S) > 0:
        n = S.pop()

        if isinstance(n, Input):
            n.value = feed_dict[n]

        L.append(n)
        for m in n.outbound_nodes:
            G[n]['out'].remove(m)
            G[m]['in'].remove(n)
            # if no other incoming edges add to S
            if len(G[m]['in']) == 0:
                S.add(m)
    return L

An example of feed_dict

x, y = Input(), Input()
add = Add(x, y)
feed_dict = {x:10, y:20}
sorted_nodes = topological_sort(feed_dict)
sorted_nodes

[<__main__.Input at 0x21ac5c64dd8>,
 <__main__.Input at 0x21ac5c64a58>,
 <__main__.Add at 0x21ac5c649b0>]

sorted_nodes用到的另外一个函数是forward_pass()，这个函数实际的让网络“跑”了起来。

sorted_nodes是经过拓扑排序之后的有序节点，output_node是在sorted_nodes中某个节点同时也是我们希望得到其输出的节点

def forward_pass(sorted_nodes):

    for n in sorted_nodes:
        n.forward()

Passing Values Forward

向前网络我们已经大致搭建好了，我们来进行测试吧。

这里，我们定义了三个输入 x，y和z，并计算它们的和

x,y,z = Input(), Input(),Input()

f = Add(x, y, z)

feed_dict = {x:10, y:5, z:15}

sorted_nodes = topological_sort(feed_dict)

# 未进行forward_pass，这个网络没有“跑”，只有输入节点有值
print(f.value, x.value, y.value, z.value)

# 让网络向前运行，
forward_pass(sorted_nodes)
print(f.value)

None 10 5 15
30

类似的，我们现在随手就能写出一个Mul类，用于计算节点之间的乘积

class Mul(Node):
    def __init__(self, *inputs):
        Node.__init__(self, list(inputs))

    def forward(self):
        self.value = 1
        for n in self.inbound_nodes:
            self.value *= n.value

x,y,z = Input(), Input(),Input()

f = Mul(x, y, z)

feed_dict = {x:10, y:5, z:15}

sorted_nodes = topological_sort(feed_dict)
forward_pass(sorted_nodes)
print(f.value)

Linear Function

接下来，我们将构建一个更为复杂也更有用的节点：Linear。

在神经网络中线性方程写成这样 $y = \sum w_i x_i + b$ 。 $x_i$ 是输入， $w_i$ 是权重， $b$ 是偏移量。

import numpy as np
class Linear(Node):
    def __init__(self, inputs, weights, bias):
        Node.__init__(self, [inputs, weights, bias])

    def forward(self):
        self.value = 0
        inputs = self.inbound_nodes[0].value
        weights = self.inbound_nodes[1].value
        bias = self.inbound_nodes[2].value

        self.value = np.dot(inputs, weights) + bias

X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2., -3], [2., -3]])
b_ = np.array([-3., -5])

feed_dict = {X: X_, W: W_, b: b_}

graph = topological_sort(feed_dict)
forward_pass(graph)

"""
Output should be:
[[-9., 4.],
[-9., 4.]]
"""
print(f.value)

[[-9.  4.]
 [-9.  4.]]

Sigmoid Function

一个sigmoid函数可以定义为 $sigmoid(x) = \frac{1}{1+e^{-x}}$ ，它的导数与自身相关： $\sigma'(x)=\sigma(x)(1 - \sigma(x))$ 。
接下来，我们实现Sigmoid节点

class Sigmoid(Node):
    def __init__(self, node):
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        return 1./(1 + np.exp(-x))

    def forward(self):
        input_value = self.inbound_nodes[0].value
        self.value = self._sigmoid(input_value)

X, W, b = Input(), Input(), Input()

f = Linear(X, W, b)
g = Sigmoid(f)

X_ = np.array([[-1., -2.], [-1., -2.]]) # 2x2
W_ = np.array([[2., -3.], [2., -3.]]) # 2x2
b_ = np.array([[-3., -5.]])

feed_dict = {X:X_, W:W_, b:b_}

graph = topological_sort(feed_dict)
forward_pass(graph)
"""
Output should be:
[[  1.23394576e-04   9.82013790e-01]
 [  1.23394576e-04   9.82013790e-01]]
"""
print(g.value)

[[  1.23394576e-04   9.82013790e-01]
 [  1.23394576e-04   9.82013790e-01]]

Cost

网络训练发生在向后传播，训练的过程就是让Cost变得最小。

对于一个Cost，比如MSE，定义如下：

C (w, b) = 1 m \sum x ∥ y (x) - a ∥

$C(w, b) = \frac{1}{m}\sum_x \Vert y(x) - a\Vert$

w $w$ 是所有权重，

b $b$ 是所有偏移量，

m $m$ 是样本总数，

a $a$ 是指整个网络的输出，

y(x) $y(x)$ 是

x $x$ 的标签。
所谓的学习，就是通过调整权重和偏移让Cost变小的过程。调整的过程需要计算 Cost对w以及b的偏导

接下来，我们设计一个MSE节点，用于计算MSE

class MSE(Node):
    def __init__(self, y, a):
        Node.__init__(self, [y, a])

    def forward(self):
        self.value = 0
        y = self.inbound_nodes[0].value.reshape(-1, 1)
        a = self.inbound_nodes[1].value.reshape(-1, 1)
        m = a.size

        diff = y - a

        self.value = np.mean(diff**2)

y, a = Input(), Input()
cost = MSE(y, a)

y_ = np.array([[1, 2, 3]])
a_ = np.array([[4.5, 5, 10]])

feed_dict = {y:y_, a:a_}

graph = topological_sort(feed_dict)
forward_pass(graph)
print(cost.value)

23.4166666667

Backward propagation

向前传播大致已经写好了，那接下来我们考虑向后传播的实现。

稍微了解过神经网络的同学会知道，向后传播本质就是一个求偏导的过程。举个例子，假设我们有下面这样的一个网络 forward_pass

这个网络用我们现在的框架来写的话，应该长这样

X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()

l1 = Linear(X, W1, b1)
s = Sigmoid(l1)
l2 = Linear(s, W2, b2)
cost = MSE(l2, y)

这个网络向后传播的过程是这样的
backward_pass

更直观的，我们用数学公式表达，可以写成这样

\partial C \partial w 2 = \partial C \partial 2 \partial l 2 \partial w 2

$\frac{\partial C}{\partial w_2} = \frac{\partial C}{\partial _2} \frac{\partial l_2}{\partial w_2}$
这就是求导的链式法则。我们的代码实现上就是根据链式法则来完成的。

至此我们可以做一个大致的分析，假设，有一个 $L$ 层节点，那么在向前传播forward()中，这个节点接收 $L-1$ 层的节点作为输入，也就是代码中的inbound_nodes，并且根据inbound_nodes的值，进行运算，最终得到一个value。因此，向前传播的时候，传播的是运算结果，”值”。

那么在向后传播中，一个 $L$ 层的节点，接收的应该是 $L+1$ 层网络的节点，也就是 $L$ 层节点的输出节点outbound_nodes作为输入，并根据outbound_nodes的梯度，进行运算，最终得到一个gradient。因此，向后传播的时候，传播的是”梯度”，准确的说，应该是目标函数对当前节点的梯度。

基于这样的想法，对原有的代码进行添加backward


    def __init__(self, inbound_nodes=[]):

        self.inbound_nodes = inbound_nodes

        self.value = None

        self.outbound_nodes = []

        self.gradients = []

        for node in inbound_nodes:
            node.outbound_nodes.append(self)

    def forward(self):
        """
        每一个继承了这个类的子类，都应该实现这个方法
        """
        raise NotImplementedError

    def backward(self):
        """
        每一个继承了这个类的子类，都应该实现这个方法
        """
        raise NotImplementedError

def forward_and_backward(graph):
    # 向前传播
    for n in graph:
        n.forward()

    # 向后传播
    for n in graph[::-1]:
        n.backward()

接下来，对之前所有继承了Node节点的子类进行改写。

MSE

均方误差MSE，为了求导方便，我们用矩阵向量的形式重写MSE的表达式，下面所有变量都是向量或者矩阵

C o s t (W, b) = 1 m ∥ y - a ∥ 22

$Cost(W,b) = \frac{1}{m}\Vert y - a\Vert_2^2$
如果你对矩阵向量求导有了解，那么可以计算得到：

\partial C \partial y = 2 m (y - a) \partial C \partial a = 2 m (a - y)

$\frac{\partial C}{\partial y} = \frac{2}{m}(y - a) \\ \frac{\partial C}{\partial a} = \frac{2}{m}(a - y)$

class MSE(Node):
    def __init__(self, y, a):
        Node.__init__(self, [y, a])

    def forward(self):
        y = self.inbound_nodes[0].value.reshape(-1,1)
        a = self.inbound_nodes[1].value.reshape(-1,1)
        self.m = self.inbound_nodes[0].value.shape[0]
        self.diff = y - a
        self.value = np.mean(self.diff**2)

    def backward(self):
        self.gradients = {n:np.zeros_like(n.value) for n in self.inbound_nodes}

        # MSE 对 y 求偏导（动手拿笔算一算）
        self.gradients[self.inbound_nodes[0]] = (2 / self.m) * self.diff
        # MSE 对 a 求偏导
        self.gradients[self.inbound_nodes[1]] = (-2 / self.m) * self.diff

Linear

class Linear(Node):
    def __init__(self, X, W, b):
        Node.__init__(self, [X, W, b])

    def forward(self):
        self.value = 0

        X = self.inbound_nodes[0].value
        W = self.inbound_nodes[1].value
        b = self.inbound_nodes[2].value

        self.value = np.dot(X, W) + b

    def backward(self):
        self.gradients = { n:np.zeros_like(n.value) for n in self.inbound_nodes}

        for n in self.outbound_nodes:
            # 获取上一层对改节点的偏导
            grad_cost = n.gradients[self]

            # 对 X 求偏导
            self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
            # 对 W 求偏导
            self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
            # 对 b 求偏导(不明白)
            self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)

Sigmoid

一个sigmoid函数可以定义为 $sigmoid(x) = \frac{1}{1+e^{-x}}$ ，它的导数与自身相关： $\sigma'(x)=\sigma(x)(1 - \sigma(x))$ 。

class Sigmoid(Node):
    def __init__(self, node):
        Node.__init__(self, [node])

    def _sigmoid(self, x):
        return 1. / (1 + np.exp(-x))

    def forward(self):
        self.value = 0

        x = self.inbound_nodes[0].value
        self.value = self._sigmoid(x)

    def backward(self):
        self.gradients = {n:np.zeros_like(n.value) for n in self.inbound_nodes}

        for n in self.outbound_nodes:
            grad_cost = n.gradients[self]

            sigmoid_value = self.value
            self.gradients[self.inbound_nodes[0]] = grad_cost * sigmoid_value * (1 - sigmoid_value)

class Input(Node):
    def __init__(self):
        Node.__init__(self)

    def forward(self):
        pass

    def backward(self):
        # Input节点没有输入，因此它的梯度应该是0
        self.gradients = {self:0}
        #Weights and bias may be inputs, so you need to sum
        #the gradient from output gradients.
        for n in self.outbound_nodes:
            grad_cost = n.gradients[self]
            self.gradients[self] += grad_cost*1

class Add(Node):
    def __init__(self, *input):
        Node.__init__(self, list(input))

    def forward(self):
        self.value = 0

        for n in self.inbound_nodes:
            self.value += n.value

    def backward(self):
        self.gradients = {n:np.zeros_like(n.value,dtype='float32') for n in self.inbound_nodes}

        for n in self.outbound_nodes:
            grad_cost = n.gradients[self]
            print(grad_cost)

            for i in self.inbound_nodes:
                self.gradients[i] += np.sum(grad_cost)

x,y,z = Input(), Input(),Input()
yy = Input()

f = Add(x, y, z)

x_ = np.array([2, 2])
y_ = np.array([3, 3])
z_ = np.array([5, 5])
yy_ = np.array([7, 7])

feed_dict = {x:x_, y:y_, z:z_,yy:yy_}
cost = MSE(yy, f)

sorted_nodes = topological_sort(feed_dict)

# 未进行forward_pass，这个网络没有“跑”，只有输入节点有值
#print(f.value, x.value, y.value, z.value)

# 让网络向前运行，
forward_and_backward(sorted_nodes)
print(cost.gradients)
print(f.gradients)

[[ 3.]
 [ 3.]]
{<__main__.Input object at 0x0000021AC7514828>: array([[-3.],
       [-3.]]), <__main__.Add object at 0x0000021AC7514DA0>: array([[ 3.],
       [ 3.]])}
{<__main__.Input object at 0x0000021AC7514860>: array([ 6.,  6.], dtype=float32), <__main__.Input object at 0x0000021AC7514E48>: array([ 6.,  6.], dtype=float32), <__main__.Input object at 0x0000021AC75149B0>: array([ 6.,  6.], dtype=float32)}

向后传播也写完了，我们可以测试一下

X, W, b = Input(), Input(), Input()
y = Input()
f = Linear(X, W, b)
a = Sigmoid(f)
cost = MSE(y, a)

X_ = np.array([[-1., -2.], [-1, -2]])
W_ = np.array([[2.], [3.]])
b_ = np.array([-3.])
y_ = np.array([1, 2])

feed_dict = {
    X: X_,
    y: y_,
    W: W_,
    b: b_,
}

graph = topological_sort(feed_dict)
forward_and_backward(graph)
# return the gradients for each Input
gradients = [t.gradients[t] for t in [X, y, W, b]]
"""
Expected output

[array([[ -3.34017280e-05,  -5.01025919e-05],
       [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833],
       [ 1.9999833]]), array([[  5.01028709e-05],
       [  1.00205742e-04]]), array([ -5.01028709e-05])]
"""
print(gradients)

[array([[ -3.34017280e-05,  -5.01025919e-05],
       [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833],
       [ 1.9999833]]), array([[  5.01028709e-05],
       [  1.00205742e-04]]), array([ -5.01028709e-05])]

SGD 梯度下降

接下来实现SGD梯度下降，用一个例子（波士顿房价预测）来运用以上所有代码

def sgd_update(trainables, learning_rate=1e-2):
    for t in trainables:
        partial = t.gradients[t]
        t.value -= learning_rate * partial


import numpy as np
from sklearn.datasets import load_boston
from sklearn.utils import shuffle, resample

# Load data
data = load_boston()
X_ = data['data']
y_ = data['target']

# Normalize data
X_ = (X_ - np.mean(X_, axis=0)) / np.std(X_, axis=0)

n_features = X_.shape[1]
n_hidden = 10
W1_ = np.random.randn(n_features, n_hidden)
b1_ = np.zeros(n_hidden)
W2_ = np.random.randn(n_hidden, 1)
b2_ = np.zeros(1)

# Neural network
X, y = Input(), Input()
W1, b1 = Input(), Input()
W2, b2 = Input(), Input()

l1 = Linear(X, W1, b1)
s1 = Sigmoid(l1)
l2 = Linear(s1, W2, b2)
cost = MSE(y, l2)

feed_dict = {
    X: X_,
    y: y_,
    W1: W1_,
    b1: b1_,
    W2: W2_,
    b2: b2_
}

epochs = 10
# Total number of examples
m = X_.shape[0]
batch_size = 11
steps_per_epoch = m // batch_size

graph = topological_sort(feed_dict)
trainables = [W1, b1, W2, b2]

print("Total number of examples = {}".format(m))

# Step 4
for i in range(epochs):
    loss = 0
    for j in range(steps_per_epoch):
        # Step 1
        # Randomly sample a batch of examples
        X_batch, y_batch = resample(X_, y_, n_samples=batch_size)

        # Reset value of X and y Inputs
        X.value = X_batch
        y.value = y_batch

        # Step 2
        forward_and_backward(graph)

        # Step 3
        sgd_update(trainables)

        loss += graph[-1].value

    print("Epoch: {}, Loss: {:.3f}".format(i+1, loss/steps_per_epoch))

芥末的无奈

关注

6
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
MiniFlow，帮助理解TensorFlow关键概念--图

MiniFlow在学习TensorFlow之前，让我们学习可微分图（Differentiable graphs），这是TensorFlow用于运行和训练网络的基本抽象。我们将构建一个叫MiniFlow小型库，在这个过程中，我们将逐渐理解可微分图完整代码在这里Graph一个神经网络，可以看成是一张图，这张图由数学函数组成，比如线性组合，激活函数之类的。图中包含点(nodes) 和边(edges)
复制链接

扫一扫