TensorFlow学习笔记(二)神经网络与全连接层
数据集加载
(1)keras.datasets
常用数据集:boston housing, mnist/fashion mnist, cifar10/100, imdb
MINIST
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets
(x, y), (x_test, y_test) = keras.datasets.mnist.load_data()
x.shape #(60000, 28, 28)
y.shape #(60000,)
x.min(), x.max(), x.mean() #(0, 255, 33.318421449829934)
y[:4] #array([5, 0, 4, 1], dtype=uint8)
y_onehot = tf.one_hot(y, depth=10)
y_onehot[:2]
#<tf.Tensor: id=646, shape=(2, 10), dtype=float32, numpy=
#array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
# [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>
CIFAR10/100
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
x.shape, y.shape, x_test.shape, y_test.shape
#((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))
x.min(), x.max() #(0, 255)
y[:4]
#array([[6],
# [9],
# [9],
# [4]], dtype=uint8)
(2)tf.data.Dataset.from_tensor_slices
将tensor沿其第一个维度切片,返回一个含有N个样本的数据集
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
db = tf.data.Dataset.from_tensor_slices(x_test)
next(iter(db)).shape #TensorShape([32, 32, 3])
# .shuffle——打乱顺序
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db = db.shuffle(10000)
# .map——预处理
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
def preprocess(x, y):
x = tf.cast(x, dtype=tf.float32)/255
y = tf.cast(y, dtype=tf.int32)
y = tf.one_hot(y, depth=10)
return x,y
db2 = db.map(preprocess)
res = next(iter(db2))
res[0].shape, res[1].shape
#(TensorShape([32, 32, 3]), TensorShape([1, 10]))
# .batch
db3 = db2.batch(32)
res = next(iter(db3))
res[0].shape, res[1].shape
#(TensorShape([32, 32, 32, 3]), TensorShape([32, 1, 10]))
# .repeat——在读取到组之后的数据时重启数据集
db4 = db3.repeat()
db4 = db3.repeat(2)
注1:
-
batchsize:批大小。在深度学习中,一般采用SGD训练,即每次训练在训练集中取batchsize个样本训练;
-
iteration:1个iteration等于使用batchsize个样本训练一次;
-
epoch:1个epoch等于使用训练集中的全部样本训练一次,通俗的讲epoch的值就是整个数据集被轮几次。
(比如训练集有500个样本,batchsize = 10 ,那么训练完整个样本集:iteration=50,epoch=1)
注2:关于tf.one_hot
tf.one_hot(
indices,
depth,
on_value=None,
off_value=None,
axis=None,
dtype=None,
name=None
)
Returns a one-hot tensor(返回一个one_hot张量).
The locations represented by indices in indices take value on_value, while all other locations take value off_value.
(由indices指定的位置将被on_value填充, 其他位置被off_value填充).
on_value and off_value must have matching data types. If dtype is also provided, they must be the same data type as specified by dtype.
(on_value和off_value必须具有相同的数据类型).
If on_value is not provided, it will default to the value 1 with type dtype.
If off_value is not provided, it will default to the value 0 with type dtype.
If the input indices is rank N, the output will have rank N+1. The new axis is created at dimension axis (default: the new axis is appended at the end).
(如果indices是N维张量,那么函数输出将是N+1维张量,默认在最后一维添加新的维度).
If indices is a scalar the output shape will be a vector of length depth.
(如果indices是一个标量, 函数输出将是一个长度为depth的向量)
If indices is a vector of length features, the output shape will be:
features x depth if axis == -1.
(如果indices是一个长度为features的向量,则默认输出一个features*depth形状的张量)
depth x features if axis == 0.
(如果indices是一个长度为features的向量,axis=0,则输出一个depth*features形状的张量)
If indices is a matrix (batch) with shape [batch, features], the output shape will be:
batch x features x depth if axis == -1
(如果indices是一个形状为[batch, features]的矩阵,axis=-1(默认),则输出一个batch * features * depth形状的张量)
batch x depth x features if axis == 1
(如果indices是一个形状为[batch, features]的矩阵,axis=1,则输出一个batch * depth * features形状的张量)
depth x batch x features if axis == 0
(如果indices是一个形状为[batch, features]的矩阵,axis=0,则输出一个depth * batch * features形状的张量)
全连接层
单层
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets
x = tf.random.normal([4,784])
net = tf.keras.layers.Dense(512) #[x,784]->[x,512]
out = net(x)
out.shape, net.kernel.shape, net.bias.shape
#(TensorShape([4, 512]), TensorShape([784, 512]), TensorShape([512]))
net = tf.keras.layers.Dense(10)
net.build(input_shape=(None, 4))
net.kernel.shape, net.bias.shape
#(TensorShape([4, 10]), TensorShape([10]))
net.build(input_shape=(None, 20))
net.kernel.shape, net.bias.shape
#(TensorShape([20, 10]), TensorShape([10]))
net.build(input_shape=(2, 4))
net.kernel.shape, net.bias
#(TensorShape([4, 10]),
# <tf.Variable 'bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., #0., 0., 0., 0.], dtype=float32)>)
keras.Sequential([layrers1, layers2, layers3])——容器
#Sequential
x = tf.random.normal([2, 3])
model = keras.Sequential([
keras.layers.Dense(2, activation='relu'),
keras.layers.Dense(2, activation='relu'),
keras.layers.Dense(2)
])
model.build(input_shape=[None, 3])
model.summary()
for p in model.trainable_variables:
print(p.name, p.shape)
输出方式
(1) y ∈ R d y\in R^d y∈Rd
- linear regression
- naive classification with MSE
- other general prediction
-
o
u
t
=
r
e
l
u
(
X
@
W
+
b
)
out = relu(X@W+b)
out=relu(X@W+b)
- logits——无激活函数输出
(2) y i ∈ [ 0 , 1 ] y_i \in [0,1] yi∈[0,1]
-
binary classification
-
y > 0.5 , → 1 y>0.5,\rightarrow1 y>0.5,→1
-
y < 0.5 , → 0 y<0.5, \rightarrow0 y<0.5,→0
-
-
Image Generation
-
rgb
-
sigmoid function
- tf.sigmoid
f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+e−x1
- tf.sigmoid
a = tf.linspace(-6.,6,10) # 在[-6,6]范围内返回有10个等间距的样本
tf.sigmoid(a)
x = tf.random.normal([1,28,28])*5
x = tf.sigmoid(x)
tf.reduce_min(x), tf.reduce_max(x)
(3)$ y_i \in [0,1], \sum y_i=1$
注:sigmod并不能实现概率之和为1
- softmax
σ ( z ) j = e z j ∑ k = 1 K e z k , j = 1 , . . . , K . \displaystyle\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}, j=1,...,K. σ(z)j=∑k=1Kezkezj,j=1,...,K.
a =tf.linspace(-2.,2,5)
tf.nn.softmax(a)
- Classification实例
logits = tf.random.uniform([1,10], minval=-2, maxval=2)
prob = tf.nn.softmax(logits, axis=1)
tf.reduce_sum(prob, axis=1)
(4) y i ∈ [ − 1 , 1 ] y_i \in [-1,1] yi∈[−1,1]
- Tanh
t a n h ( x ) = s i n h ( x ) / c o s h ( x ) = ( e x − e − x ) / ( e x + e − x ) tanh(x) = sinh(x)/cosh(x) = (e^x-e^{-x})/(e^x+e^{-x}) tanh(x)=sinh(x)/cosh(x)=(ex−e−x)/(ex+e−x)
a =tf.linspace(-2.,2,5)
tf.tanh(a)
#<tf.Tensor: id=520, shape=(5,), dtype=float32, numpy=
#array([-0.9640276, -0.7615942, 0. , 0.7615942, 0.9640276],
# dtype=float32)>
损失函数
- MSE
- l o s s = 1 N ∑ ( y − o u t ) 2 loss = \frac{1}{N}\sum(y-out)^2 loss=N1∑(y−out)2
- L 2 − n o r m = ∑ ( y − o u t ) 2 L_{2-norm} = \sqrt{\sum(y-out)^2} L2−norm=∑(y−out)2
y = tf.constant([1,2,3,0,2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y , dtype=tf.float32)
out = tf.random.normal([5, 4])
loss1 = tf.reduce_mean(tf.square(y-out))
loss2 = tf.square(tf.norm(y-out))/(5*4)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out))
loss1, loss2, loss3
#(<tf.Tensor: id=593, shape=(), dtype=float32, numpy=1.2126634>,
# <tf.Tensor: id=602, shape=(), dtype=float32, numpy=1.2126634>,
# <tf.Tensor: id=607, shape=(), dtype=float32, numpy=1.2126634>)
- Cross Entropy(交叉熵)
- Entropy
- uncertainty
- measure of surprise
- lower entropy → \rightarrow → more certainty
- Entropy
E n t r o p y = − ∑ i P ( i ) log P ( i ) Entropy = -\sum_{i}P(i)\log P(i) Entropy=−i∑P(i)logP(i)
a = tf.fill([4], 0.25)
a*tf.math.log(a)/tf.math.log(2.)
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.)) #<tf.Tensor: id=631, shape=(), dtype=float32, numpy=2.0>
a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.)) #<tf.Tensor: id=640, shape=(), dtype=float32, numpy=1.3567796>
a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.)) #<tf.Tensor: id=649, shape=(), dtype=float32, numpy=0.24194068>
Cross Entropy
H ( p , q ) = − ∑ x p ( x ) log q ( x ) \displaystyle H(p,q) = -\sum_x p(x)\log q(x) H(p,q)=−x∑p(x)logq(x)
H ( p , q ) = H ( p ) + D K L ( p ∣ q ) \displaystyle H(p,q) = H(p)+D_{KL}(p|q) H(p,q)=H(p)+DKL(p∣q)
* for p = q p=q p=q
Minima: H ( p , q ) = H ( p ) H(p,q) = H(p) H(p,q)=H(p)
* for p p p : one-hot encoding
h ( p : [ 0 , 1 , 0 ] = − 1 log 1 = 0 ) h(p: [0,1,0] = -1 \log1 = 0) h(p:[0,1,0]=−1log1=0)
H ( [ 0 , 1 , 0 ] , [ p 0 , p 1 , p 2 ] ) = 0 + D K L ( p ∣ q ) = − 1 log q 1 H([0,1,0],[p_0,p_1,p_2]) = 0 + D_{KL}(p|q) = -1 \log {q_1} H([0,1,0],[p0,p1,p2])=0+DKL(p∣q)=−1logq1
Binary Classification
Two Cases——Single output
H
(
P
,
Q
)
=
−
P
(
c
a
t
)
log
Q
(
c
a
t
)
−
(
1
−
P
(
c
a
t
)
)
log
(
1
−
Q
(
c
a
t
)
)
H(P,Q) = -P(cat)\log Q(cat) - (1-P(cat))\log (1-Q(cat))
H(P,Q)=−P(cat)logQ(cat)−(1−P(cat))log(1−Q(cat))
其中,
P
(
d
o
g
)
=
(
1
−
P
(
c
a
t
)
)
P(dog) = (1-P(cat))
P(dog)=(1−P(cat))
H
(
P
,
Q
)
=
−
∑
i
=
(
c
a
t
,
d
o
g
)
P
(
i
)
log
Q
(
i
)
=
−
P
(
c
a
t
)
log
Q
(
c
a
t
)
−
P
(
d
o
g
)
log
Q
(
d
o
g
)
=
−
[
y
log
(
p
)
+
(
1
−
y
)
log
(
1
−
p
)
]
H(P,Q) = -\sum_{i=(cat,dog)} P(i)\log Q(i) \\ =-P(cat)\log Q(cat)-P(dog)\log Q(dog)\\ =-[y\log (p) + (1-y)\log(1-p)]
H(P,Q)=−i=(cat,dog)∑P(i)logQ(i)=−P(cat)logQ(cat)−P(dog)logQ(dog)=−[ylog(p)+(1−y)log(1−p)]
tf.losses.categorical_crossentropy([0,1,0,0], [0.25,0.25,0.25,0.25])
#<tf.Tensor: id=666, shape=(), dtype=float32, numpy=1.3862944>
tf.losses.categorical_crossentropy([0,1,0,0], [0.1,0.1,0.7,0.1])
#<tf.Tensor: id=700, shape=(), dtype=float32, numpy=2.3025851>
tf.losses.categorical_crossentropy([0,1,0,0], [0.1,0.7,0.1,0.1])
#<tf.Tensor: id=751, shape=(), dtype=float32, numpy=0.35667497>
tf.losses.categorical_crossentropy([0,1,0,0], [0.01,0.97,0.01,0.01])
#<tf.Tensor: id=819, shape=(), dtype=float32, numpy=0.030459179>
- LossHinge Loss
∑ i m a x ( 0 , 1 − y i ∗ h θ ( x i ) ) \sum_{i}max(0,1-y_i*h_\theta(x_i)) i∑max(0,1−yi∗hθ(xi))