Tensorflow 卷积神经网络

前言

kears中文文档

* 代表optional,在搭建判断是猫还是狗的卷积神经网络中,不是必备步骤。

Dogs vs. Cats 分类

数据集下载地址

预处理

解压数据

import os
import zipfile

local_zip = '/tmp/cats_and_dogs_filtered.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')
zip_ref.close()

定义文件夹

base_dir = '/tmp/cats_and_dogs_filtered'

# 拼接路径
train_dir = os.path.join(base_dir, 'train') 
validation_dir = os.path.join(base_dir, 'validation')

# Directory with our training cat/dog pictures
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Directory with our validation cat/dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

* 打印前10张文件名

train_cat_fnames = os.listdir( train_cats_dir ) #os.listdir() 方法用于返回指定的文件夹包含的文件或文件夹的名字的列表
train_dog_fnames = os.listdir( train_dogs_dir )

print(train_cat_fnames[:10])
print(train_dog_fnames[:10])

"""
打印结果:
['cat.277.jpg', 'cat.424.jpg', 'cat.809.jpg', 'cat.148.jpg', 'cat.114.jpg', 'cat.264.jpg', 'cat.392.jpg', 'cat.355.jpg', 'cat.671.jpg', 'cat.562.jpg']
['dog.928.jpg', 'dog.88.jpg', 'dog.470.jpg', 'dog.782.jpg', 'dog.346.jpg', 'dog.383.jpg', 'dog.728.jpg', 'dog.979.jpg', 'dog.389.jpg', 'dog.71.jpg']
"""

* 打印图片数字

print('total training cat images :', len(os.listdir(      train_cats_dir ) ))
print('total training dog images :', len(os.listdir(      train_dogs_dir ) ))

print('total validation cat images :', len(os.listdir( validation_cats_dir ) ))
print('total validation dog images :', len(os.listdir( validation_dogs_dir ) ))

"""
打印结果:
total training cat images : 1000
total training dog images : 1000
total validation cat images : 500
total validation dog images : 500
"""

* 打印一些猫狗图片

%matplotlib inline

import matplotlib.image as mpimg
import matplotlib.pyplot as plt

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4

pic_index = 0 # Index for iterating over images


# Set up matplotlib fig, and size it to fit 4x4 pics
fig = plt.gcf()		 # Get Current Figure
fig.set_size_inches(ncols*4, nrows*4)

pic_index+=8

next_cat_pix = [os.path.join(train_cats_dir, fname) 
                for fname in train_cat_fnames[ pic_index-8:pic_index] 
               ]

next_dog_pix = [os.path.join(train_dogs_dir, fname) 
                for fname in train_dog_fnames[ pic_index-8:pic_index]
               ]

for i, img_path in enumerate(next_cat_pix+next_dog_pix):
  # Set up subplot; subplot indices start at 1
  sp = plt.subplot(nrows, ncols, i + 1)
  sp.axis('Off') # Don't show axes (or gridlines)

  img = mpimg.imread(img_path)
  plt.imshow(img)

plt.show()

搭建模型

import tensorflow as tf

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 150x150 with 3 bytes color
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2), 
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(), 
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'), 
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('cats') and 1 for the other ('dogs')
    tf.keras.layers.Dense(1, activation='sigmoid')  
]
    
model.summary()

打印模型概述信息:

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_6 (Conv2D)            (None, 148, 148, 16)      448       
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 74, 74, 16)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 72, 72, 32)        4640      
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 36, 36, 32)        0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 34, 34, 64)        18496     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 17, 17, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 18496)             0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               9470464   
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 513       
=================================================================
Total params: 9,494,561
Trainable params: 9,494,561
Non-trainable params: 0
_________________________________________________________________

编译模型

from tensorflow.keras.optimizers import RMSprop

model.compile(optimizer=RMSprop(lr=0.001),
              loss='binary_crossentropy',
              metrics = ['accuracy'])

数据预处理

创建图片分类器

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255.
train_datagen = ImageDataGenerator( rescale = 1.0/255. )
test_datagen  = ImageDataGenerator( rescale = 1.0/255. )

# --------------------
# Flow training images in batches of 20 using train_datagen generator
# --------------------
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size=20,
                                                    class_mode='binary',
                                                    target_size=(150, 150))     
# --------------------
# Flow validation images in batches of 20 using test_datagen generator
# --------------------
validation_generator =  test_datagen.flow_from_directory(validation_dir,
                                                         batch_size=20,
                                                         class_mode  = 'binary',
                                                         target_size = (150, 150))

训练模型

history = model.fit(train_generator,
                    validation_data=validation_generator,
                    steps_per_epoch=100,
                    epochs=15,
                    validation_steps=50,
                    verbose=2)

打印结果:

Epoch 1/15
100/100 - 55s - loss: 0.0300 - accuracy: 0.9925 - val_loss: 1.9040 - val_accuracy: 0.7120
Epoch 2/15
100/100 - 55s - loss: 0.0350 - accuracy: 0.9890 - val_loss: 1.7071 - val_accuracy: 0.7210
Epoch 3/15
100/100 - 55s - loss: 0.0188 - accuracy: 0.9925 - val_loss: 2.0725 - val_accuracy: 0.6950
Epoch 4/15
100/100 - 55s - loss: 0.0291 - accuracy: 0.9915 - val_loss: 2.3252 - val_accuracy: 0.7080
Epoch 5/15
100/100 - 55s - loss: 0.0303 - accuracy: 0.9940 - val_loss: 2.1299 - val_accuracy: 0.7340
Epoch 6/15
100/100 - 55s - loss: 0.0193 - accuracy: 0.9930 - val_loss: 2.0416 - val_accuracy: 0.7400
Epoch 7/15
100/100 - 55s - loss: 0.0197 - accuracy: 0.9925 - val_loss: 2.1636 - val_accuracy: 0.7300
Epoch 8/15
100/100 - 55s - loss: 0.0251 - accuracy: 0.9925 - val_loss: 2.4540 - val_accuracy: 0.7360
Epoch 9/15
100/100 - 55s - loss: 0.0180 - accuracy: 0.9955 - val_loss: 2.7262 - val_accuracy: 0.7260
Epoch 10/15
100/100 - 60s - loss: 0.0163 - accuracy: 0.9960 - val_loss: 2.8085 - val_accuracy: 0.7040
Epoch 11/15
100/100 - 55s - loss: 0.0400 - accuracy: 0.9905 - val_loss: 2.6529 - val_accuracy: 0.7270
Epoch 12/15
100/100 - 55s - loss: 0.0193 - accuracy: 0.9945 - val_loss: 2.8958 - val_accuracy: 0.7220
Epoch 13/15
100/100 - 55s - loss: 0.0204 - accuracy: 0.9950 - val_loss: 3.2313 - val_accuracy: 0.7250
Epoch 14/15
100/100 - 55s - loss: 9.9441e-05 - accuracy: 1.0000 - val_loss: 3.5450 - val_accuracy: 0.7270
Epoch 15/15
100/100 - 55s - loss: 0.0781 - accuracy: 0.9905 - val_loss: 3.2831 - val_accuracy: 0.6940

运行模型

import numpy as np

from google.colab import files
from keras.preprocessing import image

uploaded=files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path='/content/' + fn
  img=image.load_img(path, target_size=(150, 150))
  
  x=image.img_to_array(img)
  x=np.expand_dims(x, axis=0)
  images = np.vstack([x])
  
  classes = model.predict(images, batch_size=10)
  
  print(classes[0])
  
  if classes[0]>0:
    print(fn + " is a dog")
    
  else:
    print(fn + " is a cat")

通过裁剪图片,可以改变分类的结果。

模型准确率及损失评估

#-----------------------------------------------------------
# Retrieve a list of list results on training and test data
# sets for each training epoch
#-----------------------------------------------------------
acc      = history.history[     'accuracy' ]
val_acc  = history.history[ 'val_accuracy' ]
loss     = history.history[    'loss' ]
val_loss = history.history['val_loss' ]

epochs   = range(len(acc)) # Get number of epochs

#------------------------------------------------
# Plot training and validation accuracy per epoch
#------------------------------------------------
plt.plot  ( epochs,     acc )
plt.plot  ( epochs, val_acc )
plt.title ('Training and validation accuracy')
plt.figure()

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot  ( epochs,     loss )
plt.plot  ( epochs, val_loss )
plt.title ('Training and validation loss'   )

Training and Validation

总结

**How to build a Dogs vs. Cats classifier? **

  • Unzip datas. 解压数据
  • Define Directories. 指定文件夹
  • Build the model. 搭建模型 (model.Sequential)
  • Compile the model. 编译模型
  • Create image generator. 创建图像生成器(标准化数据、flow_from_directory读文件)
  • Train the model. 训练模型(注意:是model.fit_generator)
  • Running the model. (测试新图片)
  • Evaluating Accuracy and Loss for the Model. (评估准确性和损失)

增强(Augmentation)

如图:在20个epoches后,训练集的准确性接近1,而验证集的准确性并没有提升,说明此时已经出现了过拟合。

ImageDataGenerator 详情

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=(1./255),
    rotation_range=40,  
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)
  • rescale: 重缩放因子。默认为 None。如果是 None 或 0,不进行缩放,否则将数据乘以所提供的值(在应用任何其他转换之前)。

  • rotation_range: 整数。随机旋转的度数范围。

  • width_shift_range: 浮点数、一维数组或整数

    • float: 如果 <1,则是除以总宽度的值,或者如果 >=1,则为像素值。
    • 1-D 数组: 数组中的随机元素。
    • int: 来自间隔 (-width_shift_range, +width_shift_range) 之间的整数个像素。
    • width_shift_range=2 时,可能值是整数 [-1, 0, +1],与 width_shift_range=[-1, 0, +1] 相同;而 width_shift_range=1.0 时,可能值是 [-1.0, +1.0) 之间的浮点数。
  • height_shift_range: 浮点数、一维数组或整数

    • float: 如果 <1,则是除以总宽度的值,或者如果 >=1,则为像素值。
    • 1-D array-like: 数组中的随机元素。
    • int: 来自间隔 (-height_shift_range, +height_shift_range) 之间的整数个像素。
    • height_shift_range=2 时,可能值是整数 [-1, 0, +1],与 height_shift_range=[-1, 0, +1] 相同;而 height_shift_range=1.0 时,可能值是 [-1.0, +1.0) 之间的浮点数。
  • shear_range: 浮点数。剪切强度(以弧度逆时针方向剪切角度)。

  • zoom_range: 浮点数 或 [lower, upper]。随机缩放范围。如果是浮点数,[lower, upper] = [1-zoom_range, 1+zoom_range]

  • horizontal_flip: 布尔值。随机水平翻转。

  • fill_mode: {“constant”, “nearest”, “reflect” or “wrap”} 之一。默认为 ‘nearest’。输入边界以外的点根据给定的模式填充:

    • ‘constant’: kkkkkkkk|abcd|kkkkkkkk (cval=k)

    • ‘nearest’: aaaaaaaa|abcd|dddddddd

    • ‘reflect’: abcddcba|abcd|dcbaabcd

    • ‘wrap’: abcdabcd|abcd|abcdabcd

修改了ImageDataGenerator后:

可以看出,如果只对训练集的ImageDataGenerator做出修改,会使得验证集波动过大,所以要对训练集和验证集的ImageDataGenerator做出同样的修改。

8

迁移学习

准备资料:

Rethinking the Inception Architecture for Computer Vision

Image datasets

InceptionV3

下载地址

import os
import zipfile
from tensorflow.keras import layers
from tensorflow.keras import Model
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.preprocessing.image import ImageDataGenerator

local_weights_file = 'inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

### 预训练模型
pre_trained_model = InceptionV3(input_shape = (150, 150, 3), 
                                include_top = False, 
                                weights = None)

### 读取网络权值
pre_trained_model.load_weights(local_weights_file)

### 冻结层
for layer in pre_trained_model.layers:
  layer.trainable = False
  
# pre_trained_model.summary()	# 因为概述信息太长,故将这一行注释掉

### 看一下最后一层形状
last_layer = pre_trained_model.get_layer('mixed7')		# 获得InceptionV3网络mixed7卷积层
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output

打印信息:

last layer output shape:  (None, 7, 7, 768)

补充:

  • include_top:布尔值,是否将最上层的完全连接层作为网络的最后一层。默认为True

    weights:(None随机初始化), imagenet(在ImageNet上进行预训练)或要加载的weights文件的路径之一。默认为imagenet

  • keras.models.load_model(): 读取网络、权重
    keras.models.load_weights() : 仅读取权重

  • 设置layer.trainableFalse将所有图层的权重从可训练变为不可训练。这称为“冻结”层:冻结层的状态在训练期间不会更新(使用训练fit()或使用依赖于trainable_weights应用渐变更新的任何自定义循环进行训练时)。详情

模型

from tensorflow.keras.optimizers import RMSprop

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)                
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['accuracy'])

ImageDataGenerator

local_zip = 'cats_and_dogs_filtered.zip'

zip_ref = zipfile.ZipFile(local_zip, 'r')

zip_ref.extractall()
zip_ref.close()

# Define our example directories and files
base_dir = 'cats_and_dogs_filtered'

train_dir = os.path.join( base_dir, 'train')
validation_dir = os.path.join( base_dir, 'validation')


train_cats_dir = os.path.join(train_dir, 'cats') # Directory with our training cat pictures
train_dogs_dir = os.path.join(train_dir, 'dogs') # Directory with our training dog pictures
validation_cats_dir = os.path.join(validation_dir, 'cats') # Directory with our validation cat pictures
validation_dogs_dir = os.path.join(validation_dir, 'dogs')# Directory with our validation dog pictures

train_cat_fnames = os.listdir(train_cats_dir)
train_dog_fnames = os.listdir(train_dogs_dir)

# Add our data-augmentation parameters to ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator( rescale = 1.0/255. )

# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size = 20,
                                                    class_mode = 'binary', 
                                                    target_size = (150, 150))     

# Flow validation images in batches of 20 using test_datagen generator
validation_generator =  test_datagen.flow_from_directory( validation_dir,
                                                          batch_size  = 20,
                                                          class_mode  = 'binary', 
                                                          target_size = (150, 150))

模型拟合

history = model.fit(
            train_generator,
            validation_data = validation_generator,
            steps_per_epoch = 100,
            epochs = 20,
            validation_steps = 50,
            verbose = 2)
  • Keras Model 上的 fit() 方法返回一个 History 对象。History.history 属性是一个记录了连续迭代的训练/验证(如果存在)损失值和评估值的字典。
  • plt.legend(loc='位置'): 0表示'best'详情
  • plt.figure: 定义一个图像窗口。 详情

可以看出验证集的准确率飘忽不定,这是因为模型过拟合了。接下来我们使用Dropout (随机失活) 来减少过拟合

随机失活

from tensorflow.keras.optimizers import RMSprop

# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)                  
# Add a final sigmoid layer for classification
x = layers.Dense  (1, activation='sigmoid')(x)           

model = Model( pre_trained_model.input, x) 

model.compile(optimizer = RMSprop(lr=0.0001), 
              loss = 'binary_crossentropy', 
              metrics = ['acc'])

多分类

Rock Paper Scissors

ImageDataGenerator

train_datagen= ImageDataGenerator(
      rescale = 1./255,
	  rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest'	
)
test_datagen=ImageDataGenerator(
	  rescale = 1./255,
	  rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest'
)

train_generator=train_datagen.flow_from_directory(
    train_dir,
    target_size=(300,300),
    batch_size=126,
    class_mode='categorical'    # 这不再是二分类所以,class_mode要更改为categorical
)
validation_generator=test_datagen.flow_from_directory(
    train_dir,
    target_size=(300, 300),
    batch_size=126,
    class_mode='categorical'  # 这不再是二分类所以,class_mode要更改为categorical
)

模型

model=tf.keras.Sequential(
    tf.keras.layers.Conv2D(16,(3,3),activation='relu',input_shape=(300,300,3)),
    tf.keras.layers.MaxPool2D(2,2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPool2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPool2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512,activation='relu'),
    tf.keras.layers.Dense(3,activation='softmax')
)

编译模型

model.compile(loss='categorical_crossentropy',	#这里也要修改为 categorical_crossentropy
              optimizer=RMSprop(lr=0.001),
              metrics=['acc'])

模型拟合

history = model.fit(
    train_generator, 
    epochs=25, 
    steps_per_epoch=20, 
    validation_data = validation_generator,
    verbose = 1, 
    validation_steps=3)

Q&A

Week2 Exercise1

分离数据

def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):
# YOUR CODE STARTS HERE
    dataset = []
    
    for unitData in os.listdir(SOURCE):
        data = SOURCE + unitData	# data指的是每个文件的路径
        if (os.path.getsize(data) > 0):	
            dataset.append(unitData)
        else:
            print('Skipped ' + unitData)
            print('Invalid file size! i.e Zero length.')
    
    train_data_length = int(len(dataset) * SPLIT_SIZE)			# 训练集占的图片数量
    test_data_length = int(len(dataset) - train_data_length)	# 测试集占的图片数量
    shuffled_set = random.sample(dataset, len(dataset))
    train_set = shuffled_set[0:train_data_length]
    test_set = shuffled_set[-test_data_length:]
    
    for unitData in train_set:
        temp_train_data = SOURCE + unitData
        final_train_data = TRAINING + unitData
        copyfile(temp_train_data, final_train_data)
    
    for unitData in test_set:
        temp_test_data = SOURCE + unitData
        final_test_data = TESTING + unitData
        copyfile(temp_train_data, final_test_data)
# YOUR CODE ENDS HERE

CAT_SOURCE_DIR = "/tmp/PetImages/Cat/"
TRAINING_CATS_DIR = "/tmp/cats-v-dogs/training/cats/"
TESTING_CATS_DIR = "/tmp/cats-v-dogs/testing/cats/"
DOG_SOURCE_DIR = "/tmp/PetImages/Dog/"
TRAINING_DOGS_DIR = "/tmp/cats-v-dogs/training/dogs/"
TESTING_DOGS_DIR = "/tmp/cats-v-dogs/testing/dogs/"

split_size = .9
split_data(CAT_SOURCE_DIR, TRAINING_CATS_DIR, TESTING_CATS_DIR, split_size)
split_data(DOG_SOURCE_DIR, TRAINING_DOGS_DIR, TESTING_DOGS_DIR, split_size)
  • 输出所有文件和文件夹

    for file in dirs = os.listdir( path ):
    	print file
    
  • 返回文件大小,如果文件不存在就返回错误

    os.path.getsize(path)
    
  • 为了提取出N个不同元素的样本用来做进一步的操作,可以使用 random.sample()

    >>> random.sample(values, 2)
    [6, 2]
    >>> random.sample(values, 2)
    [4, 3]
    >>> random.sample(values, 3)
    [4, 3, 1]
    >>> random.sample(values, 3)
    [5, 4, 1]
    

    如果你仅仅只是想打乱序列中元素的顺序,可以使用 random.shuffle()

    >>> random.shuffle(values)
    >>> values
    [2, 4, 6, 5, 3, 1]
    >>> random.shuffle(values)
    >>> values
    [3, 5, 2, 1, 6, 4]
    

总结

  • 给每个图片文件指定路径 data = SOURCE + unitData
  • 如果图片文件大小大于零,就加入dataset os.path.getsize(path)
  • 计算出训练集和测试分别所占的图片数 int(len(dataset)*SPLIT_SIZE)
  • 打乱数据集 random.sample(),这里可以用random.shuffle代替
  • 将数据集分别分给训练集和测试集 [0:train_data_length][-test_data_length:]
  • 将图片从Source文件夹复制到Training和testing文件夹下

画loss图和Accuracy图

%matplotlib inline

import matplotlib.image  as mpimg
import matplotlib.pyplot as plt

acc=history.history['acc']
val_acc=history.history['val_acc']
loss=history.history['loss']
val_loss=history.history['val_loss']

epochs=range(len(acc)) # Get number of epochs

#------------------------------------------------
# Plot training and validation accuracy per epoch
#------------------------------------------------
plt.plot(epochs, acc, 'r', "Training Accuracy")			# 横轴、纵轴、颜色、label
plt.plot(epochs, val_acc, 'b', "Validation Accuracy") 	# 横轴、纵轴、颜色、label
plt.title('Training and validation accuracy')
plt.figure()

#------------------------------------------------
# Plot training and validation loss per epoch
#------------------------------------------------
plt.plot(epochs, loss, 'r', "Training Loss")
plt.plot(epochs, val_loss, 'b', "Validation Loss")


plt.title('Training and validation loss')

# Desired output. Charts with training and validation metrics. No crash :)
  • acc还是accuracy,取决于model.compile(optimizer=RMSprop(lr=0.001), loss='binary_crossentropy', metrics=['acc']) 的这个metricsacc还是accuracy

  • 色彩:支持的颜色缩写是单个字母代码

    字符颜色
    'b'蓝色
    'g'绿色
    'r'
    'c'青色
    'm'品红
    'y'黄色
    'k'黑色
    'w'白色

    或者这样也可以:

    epochs=range(len(acc))
    plt.plot(epochs,acc)
    plt.plot(epochs,val_acc)
    plt.title('Training and validation accuracy')
    plt.figure()
    
    plt.plot(epochs,loss )
    plt.plot(epochs,val_loss)
    plt.title('Training and validation loss')
    plt.figure()
    

Week2 Exercise4

读取CSV格式的数据

def get_data(filename):
'''
You will need to write code that will read the file passed into this function. The first line contains the column headers so you should ignore it. Each successive line contians 785 comma separated values between 0 and 255. The first value is the label.The rest are the pixel values for that picture.The function will return 2 np.array types. One with all the labels.One with all the images. Tips: If you read a full line (as 'row') then row[0] has the label,and row[1:785] has the 784 pixel values. Take a look at np.array_split to turn the 784 pixels into 28x28. You are reading in strings, but need the values to be floats.Check out np.array().astype for a conversion
'''    
    with open(filename) as training_file:
    # Your code starts here
        csv_reader = csv.reader(training_file, delimiter=',')
        first_line = True
        temp_labels= []
        temp_images = []

        for row in csv_reader:
        # Makes first iteration of loop for first row do nothing.
        # That's how you skip the first row with headers.
            if first_line: 
                first_line = False
            else:
                temp_labels.append(row[0])
                image_data = row[1:785] 
                image_array = np.array_split(image_data, 28) # Make 28 x 28
                temp_images.append(image_array)
        
        images = np.array(temp_images).astype('float')
        labels = np.array(temp_labels).astype('float')
      # Your code ends here
    return images, labels

path_sign_mnist_train = f"{getcwd()}/../tmp2/sign_mnist_train.csv"
path_sign_mnist_test = f"{getcwd()}/../tmp2/sign_mnist_test.csv"
training_images, training_labels = get_data(path_sign_mnist_train)
testing_images, testing_labels = get_data(path_sign_mnist_test)

print(training_images.shape)
print(training_labels.shape)
print(testing_images.shape)
print(testing_labels.shape)
  • with open() as详情

  • csv_reader = csv.reader:读取csv文件

    Dialect.delimiter:一个用于分隔字段的单字符字符串。默认为','

  • First_line用于跳过第一行,也可以用next(reader, None) # skip the headers

    第一行往往是标题,所以需要被跳过

  • numpy.array_split:将数组拆分为大小相等的多个子数组

  • numpy.array(temp_array).astype()数据类型转换

CSV的文件应该是:

标签(Label)像素值1像素值2像素值785
row[1][0] 图片1row[1][1]row[1][2]row[1][785]
row[2][0] 图片2
row[n][0] 图片n

总结

  • 打开csv文件 csv.reader()
  • 读取csv文件每一行,记得跳过第一行
  • 每一行的加入到temp_labels的数据集中,剩下的全是image_data,再将image_data分成28份image_array,每一份都加入到temp_images的数据集中
  • 更改数据类型,numpy.array(temp_array).astype()

增加维度

# In this section you will have to add another dimension to the data
# So, for example, if your array is (10000, 28, 28)
# You will need to make it (10000, 28, 28, 1)
# Hint: np.expand_dims

training_images = np.expand_dims(training_images, axis=-1)
testing_images =  np.expand_dims(testing_images, axis=-1)
  • np.expand_dims:增加一个维度。axis=1代表扩展行,axis=0代表扩展列,axis=-1代表扩展最后一个参数。详情

    但是为啥要增加维度呢?我也不知道。啥时候解决了这个问题我再来填坑。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值