TensorFlow学习笔记(二)神经网络与全连接层

TensorFlow学习笔记(二)神经网络与全连接层

数据集加载

(1)keras.datasets

常用数据集:boston housing, mnist/fashion mnist, cifar10/100, imdb

MINIST

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import  datasets
(x, y), (x_test, y_test) = keras.datasets.mnist.load_data()
x.shape	#(60000, 28, 28)
y.shape	#(60000,)
x.min(), x.max(), x.mean()	#(0, 255, 33.318421449829934)
y[:4]	#array([5, 0, 4, 1], dtype=uint8)
y_onehot = tf.one_hot(y, depth=10)
y_onehot[:2]

#<tf.Tensor: id=646, shape=(2, 10), dtype=float32, numpy=
#array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
#       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>

CIFAR10/100

(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
x.shape, y.shape, x_test.shape, y_test.shape
#((50000, 32, 32, 3), (50000, 1), (10000, 32, 32, 3), (10000, 1))

x.min(), x.max()	#(0, 255)
y[:4]	
#array([[6],
#       [9],
#       [9],
#       [4]], dtype=uint8)

(2)tf.data.Dataset.from_tensor_slices

将tensor沿其第一个维度切片,返回一个含有N个样本的数据集

(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
db = tf.data.Dataset.from_tensor_slices(x_test)
next(iter(db)).shape	#TensorShape([32, 32, 3])
# .shuffle——打乱顺序
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
db = db.shuffle(10000)
# .map——预处理
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import  datasets
(x, y), (x_test, y_test) = keras.datasets.cifar10.load_data()
db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
def preprocess(x, y):
    x = tf.cast(x, dtype=tf.float32)/255
    y = tf.cast(y, dtype=tf.int32)
    y = tf.one_hot(y, depth=10)
    return x,y
db2 = db.map(preprocess)
res = next(iter(db2))
res[0].shape, res[1].shape
#(TensorShape([32, 32, 3]), TensorShape([1, 10]))
# .batch
db3 = db2.batch(32)
res = next(iter(db3))
res[0].shape, res[1].shape
#(TensorShape([32, 32, 32, 3]), TensorShape([32, 1, 10]))
# .repeat——在读取到组之后的数据时重启数据集
db4 = db3.repeat()
db4 = db3.repeat(2)

注1:

  • batchsize:批大小。在深度学习中,一般采用SGD训练,即每次训练在训练集中取batchsize个样本训练;

  • iteration:1个iteration等于使用batchsize个样本训练一次;

  • epoch:1个epoch等于使用训练集中的全部样本训练一次,通俗的讲epoch的值就是整个数据集被轮几次。

(比如训练集有500个样本,batchsize = 10 ,那么训练完整个样本集:iteration=50,epoch=1)

注2:关于tf.one_hot

tf.one_hot(
    indices,
    depth,
    on_value=None,
    off_value=None,
    axis=None,
    dtype=None,
    name=None
)
Returns a one-hot tensor(返回一个one_hot张量).
 
The locations represented by indices in indices take value on_value, while all other locations take value off_value.
(由indices指定的位置将被on_value填充, 其他位置被off_value填充).
 
on_value and off_value must have matching data types. If dtype is also provided, they must be the same data type as specified by dtype.
(on_value和off_value必须具有相同的数据类型).
 
If on_value is not provided, it will default to the value 1 with type dtype.
 
If off_value is not provided, it will default to the value 0 with type dtype.
 
If the input indices is rank N, the output will have rank N+1. The new axis is created at dimension axis (default: the new axis is appended at the end).
(如果indices是N维张量,那么函数输出将是N+1维张量,默认在最后一维添加新的维度).
 
If indices is a scalar the output shape will be a vector of length depth.
(如果indices是一个标量, 函数输出将是一个长度为depth的向量)
 
If indices is a vector of length features, the output shape will be:
 
  features x depth if axis == -1.
(如果indices是一个长度为features的向量,则默认输出一个features*depth形状的张量)
  depth x features if axis == 0.
(如果indices是一个长度为features的向量,axis=0,则输出一个depth*features形状的张量)
 
If indices is a matrix (batch) with shape [batch, features], the output shape will be:
 
  batch x features x depth if axis == -1
(如果indices是一个形状为[batch, features]的矩阵,axis=-1(默认),则输出一个batch * features * depth形状的张量)
 
  batch x depth x features if axis == 1
(如果indices是一个形状为[batch, features]的矩阵,axis=1,则输出一个batch * depth * features形状的张量)
  depth x batch x features if axis == 0
(如果indices是一个形状为[batch, features]的矩阵,axis=0,则输出一个depth * batch * features形状的张量)

全连接层

单层

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import  datasets

x = tf.random.normal([4,784])
net = tf.keras.layers.Dense(512)    #[x,784]->[x,512]
out = net(x)
out.shape, net.kernel.shape, net.bias.shape
#(TensorShape([4, 512]), TensorShape([784, 512]), TensorShape([512]))

net = tf.keras.layers.Dense(10)
net.build(input_shape=(None, 4))
net.kernel.shape, net.bias.shape
#(TensorShape([4, 10]), TensorShape([10]))

net.build(input_shape=(None, 20))
net.kernel.shape, net.bias.shape
#(TensorShape([20, 10]), TensorShape([10]))

net.build(input_shape=(2, 4))
net.kernel.shape, net.bias
#(TensorShape([4, 10]),
# <tf.Variable 'bias:0' shape=(10,) dtype=float32, numpy=array([0., 0., 0., 0., 0., 0., #0., 0., 0., 0.], dtype=float32)>)

keras.Sequential([layrers1, layers2, layers3])——容器

#Sequential
x = tf.random.normal([2, 3])
model = keras.Sequential([
    keras.layers.Dense(2, activation='relu'),
    keras.layers.Dense(2, activation='relu'),
    keras.layers.Dense(2)
])
model.build(input_shape=[None, 3])
model.summary()

for p in model.trainable_variables:
    print(p.name, p.shape)

输出方式

(1) y ∈ R d y\in R^d yRd

  • linear regression
  • naive classification with MSE
  • other general prediction
  • o u t = r e l u ( X @ W + b ) out = relu(X@W+b) out=relu(X@W+b)
    • logits——无激活函数输出

(2) y i ∈ [ 0 , 1 ] y_i \in [0,1] yi[0,1]

  • binary classification

    • y > 0.5 , → 1 y>0.5,\rightarrow1 y>0.5,1

    • y < 0.5 , → 0 y<0.5, \rightarrow0 y<0.5,0

  • Image Generation

  • rgb

  • sigmoid function

    • tf.sigmoid
      f ( x ) = 1 1 + e − x f(x)=\frac{1}{1+e^{-x}} f(x)=1+ex1
a = tf.linspace(-6.,6,10)   # 在[-6,6]范围内返回有10个等间距的样本
tf.sigmoid(a)
x = tf.random.normal([1,28,28])*5
x = tf.sigmoid(x)
tf.reduce_min(x), tf.reduce_max(x)

(3)$ y_i \in [0,1], \sum y_i=1$

注:sigmod并不能实现概率之和为1

  • softmax

σ ( z ) j = e z j ∑ k = 1 K e z k , j = 1 , . . . , K . \displaystyle\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}, j=1,...,K. σ(z)j=k=1Kezkezj,j=1,...,K.

a =tf.linspace(-2.,2,5)
tf.nn.softmax(a)
  • Classification实例
logits = tf.random.uniform([1,10], minval=-2, maxval=2)
prob = tf.nn.softmax(logits, axis=1)
tf.reduce_sum(prob, axis=1)

(4) y i ∈ [ − 1 , 1 ] y_i \in [-1,1] yi[1,1]

  • Tanh

t a n h ( x ) = s i n h ( x ) / c o s h ( x ) = ( e x − e − x ) / ( e x + e − x ) tanh(x) = sinh(x)/cosh(x) = (e^x-e^{-x})/(e^x+e^{-x}) tanh(x)=sinh(x)/cosh(x)=(exex)/(ex+ex)

a =tf.linspace(-2.,2,5)
tf.tanh(a)
#<tf.Tensor: id=520, shape=(5,), dtype=float32, numpy=
#array([-0.9640276, -0.7615942,  0.       ,  0.7615942,  0.9640276],
#      dtype=float32)>

损失函数

  • MSE
    • l o s s = 1 N ∑ ( y − o u t ) 2 loss = \frac{1}{N}\sum(y-out)^2 loss=N1(yout)2
    • L 2 − n o r m = ∑ ( y − o u t ) 2 L_{2-norm} = \sqrt{\sum(y-out)^2} L2norm=(yout)2
y = tf.constant([1,2,3,0,2])
y = tf.one_hot(y, depth=4)
y = tf.cast(y , dtype=tf.float32)
out = tf.random.normal([5, 4])
loss1 = tf.reduce_mean(tf.square(y-out))
loss2 = tf.square(tf.norm(y-out))/(5*4)
loss3 = tf.reduce_mean(tf.losses.MSE(y, out))
loss1, loss2, loss3

#(<tf.Tensor: id=593, shape=(), dtype=float32, numpy=1.2126634>,
# <tf.Tensor: id=602, shape=(), dtype=float32, numpy=1.2126634>,
# <tf.Tensor: id=607, shape=(), dtype=float32, numpy=1.2126634>)
  • Cross Entropy(交叉熵)
    • Entropy
      • uncertainty
      • measure of surprise
      • lower entropy → \rightarrow more certainty

E n t r o p y = − ∑ i P ( i ) log ⁡ P ( i ) Entropy = -\sum_{i}P(i)\log P(i) Entropy=iP(i)logP(i)

a = tf.fill([4], 0.25)
a*tf.math.log(a)/tf.math.log(2.)
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))	#<tf.Tensor: id=631, shape=(), dtype=float32, numpy=2.0>

a = tf.constant([0.1, 0.1, 0.1, 0.7])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))	#<tf.Tensor: id=640, shape=(), dtype=float32, numpy=1.3567796>

a = tf.constant([0.01, 0.01, 0.01, 0.97])
-tf.reduce_sum(a*tf.math.log(a)/tf.math.log(2.))	#<tf.Tensor: id=649, shape=(), dtype=float32, numpy=0.24194068>

Cross Entropy

H ( p , q ) = − ∑ x p ( x ) log ⁡ q ( x ) \displaystyle H(p,q) = -\sum_x p(x)\log q(x) H(p,q)=xp(x)logq(x)

H ( p , q ) = H ( p ) + D K L ( p ∣ q ) \displaystyle H(p,q) = H(p)+D_{KL}(p|q) H(p,q)=H(p)+DKL(pq)

​ * for p = q p=q p=q

​ Minima: H ( p , q ) = H ( p ) H(p,q) = H(p) H(p,q)=H(p)

​ * for p p p : one-hot encoding

h ( p : [ 0 , 1 , 0 ] = − 1 log ⁡ 1 = 0 ) h(p: [0,1,0] = -1 \log1 = 0) h(p:[0,1,0]=1log1=0)

H ( [ 0 , 1 , 0 ] , [ p 0 , p 1 , p 2 ] ) = 0 + D K L ( p ∣ q ) = − 1 log ⁡ q 1 H([0,1,0],[p_0,p_1,p_2]) = 0 + D_{KL}(p|q) = -1 \log {q_1} H([0,1,0],[p0,p1,p2])=0+DKL(pq)=1logq1

Binary Classification

Two Cases——Single output
H ( P , Q ) = − P ( c a t ) log ⁡ Q ( c a t ) − ( 1 − P ( c a t ) ) log ⁡ ( 1 − Q ( c a t ) ) H(P,Q) = -P(cat)\log Q(cat) - (1-P(cat))\log (1-Q(cat)) H(P,Q)=P(cat)logQ(cat)(1P(cat))log(1Q(cat))
其中, P ( d o g ) = ( 1 − P ( c a t ) ) P(dog) = (1-P(cat)) P(dog)=(1P(cat))
H ( P , Q ) = − ∑ i = ( c a t , d o g ) P ( i ) log ⁡ Q ( i ) = − P ( c a t ) log ⁡ Q ( c a t ) − P ( d o g ) log ⁡ Q ( d o g ) = − [ y log ⁡ ( p ) + ( 1 − y ) log ⁡ ( 1 − p ) ] H(P,Q) = -\sum_{i=(cat,dog)} P(i)\log Q(i) \\ =-P(cat)\log Q(cat)-P(dog)\log Q(dog)\\ =-[y\log (p) + (1-y)\log(1-p)] H(P,Q)=i=(cat,dog)P(i)logQ(i)=P(cat)logQ(cat)P(dog)logQ(dog)=[ylog(p)+(1y)log(1p)]

tf.losses.categorical_crossentropy([0,1,0,0], [0.25,0.25,0.25,0.25])
#<tf.Tensor: id=666, shape=(), dtype=float32, numpy=1.3862944>

tf.losses.categorical_crossentropy([0,1,0,0], [0.1,0.1,0.7,0.1])
#<tf.Tensor: id=700, shape=(), dtype=float32, numpy=2.3025851>

tf.losses.categorical_crossentropy([0,1,0,0], [0.1,0.7,0.1,0.1])
#<tf.Tensor: id=751, shape=(), dtype=float32, numpy=0.35667497>

tf.losses.categorical_crossentropy([0,1,0,0], [0.01,0.97,0.01,0.01])
#<tf.Tensor: id=819, shape=(), dtype=float32, numpy=0.030459179>
  • LossHinge Loss

∑ i m a x ( 0 , 1 − y i ∗ h θ ( x i ) ) \sum_{i}max(0,1-y_i*h_\theta(x_i)) imax(0,1yihθ(xi))

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值