《面向机器智能的TensorFlow实践》StanfordDog完整修改

本文链接：https://blog.csdn.net/fnhc462354756/article/details/79872994

《面向机器智能的TensorFlow实践》深入浅出，将tensorflow的很多概念讲的很清楚，很适合tensorflow的初学者学习。该书完整的代码在https://github.com/backstopmedia/tensorflowbook点击打开链接可以下载到。

在学习Standfor dog项目时，发现很多博客都没能很好的解决最后准确性问题。然后我仔细研究，发现了其中的问题，具体请参考我的github: https://github.com/Alex-AI-Du/Tensorflow-Tutorial/tree/master/standford_dog

如有问题可以联系我。

"""
Note:2018.3.30
"""

import tensorflow as tf
import glob
from itertools import groupby
from collections import defaultdict
from PIL import Image
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  #忽略烦人的警告
IMAGE_WIDTH = 256
IMAGE_HEIGHT = 256

sess = tf.InteractiveSession()

#查找符合一定规则的所有文件，并将文件名以lis形式返回。
#image_filenames = glob.glob(r"G:\AI\Images\n02110*\*.jpg")
image_filenames = glob.glob(r"G:\AI\Images\n02*\*.jpg")

#这句是我添加的。因为读到的路径形式为：'./imagenet-dogs\\n02085620-Chihuahua\\n02085620_10074.jpg'，路径分隔符中除第1个之外，都是2个反斜杠，与例程不一致。这里将2个反斜杠替换为斜杠
#image_filenames = list(map(lambda image: image.replace('\\', '/'), image_filenames_0))

#用list类型初始化training和testing数据集，用defaultdict的好处是为字典中不存在的键提供默认值
training_dataset = defaultdict(list)
testing_dataset = defaultdict(list)

#将品种名从文件名中切分出，image_filename_with_breed是一个迭代器，用list(image_filename_with_breed)将其转换为list，其中的元素类似于：('n02085620-Chihuahua', './imagenet-dogs/n02085620-Chihuahua/n02085620_10131.jpg')。
image_filename_with_breed = list(map(lambda filename: (filename.split("\\")[-2], filename), image_filenames))

## Group each image by the breed which is the 0th element in the tuple returned above
#groupby后得到的是一个迭代器，每个元素的形式为：('n02085620-Chihuahua', <itertools._grouper at 0xd5892e8>)，其中第1个元素为种类；第2个元素代表该类的文件，这两个元素也分别对应for循环里的dog_breed和breed_images。
for dog_breed, breed_images in groupby(image_filename_with_breed,
                                       lambda x: x[0]):

    #enumerate的作用是列举breed_images中的所有元素，可同时返回索引和元素，i和breed_image
    #的最后一个值分别是：168、('n02116738-African_hunting_dog', './imagenet-dogs/
    #n02116738-African_hunting_dog/n02116738_9924.jpg')
    for i, breed_image in enumerate(breed_images):

        #因为breed_images是按类分别存储的，所以下面是将大约20%的数据作为测试集，大约80%的
        #数据作为训练集。
        #testing_dataset和training_dataset是两个字典，testing_dataset中
        #的第一个元素是 'n02085620-Chihuahua': ['./imagenet-dogs/n02085620-Chihuahua/
        #n02085620_10074.jpg', './imagenet-dogs/n02085620-Chihuahua/
        #n02085620_11140.jpg',.....]
        if i % 5 == 0:
            testing_dataset[dog_breed].append(breed_image[1])
        else:
            training_dataset[dog_breed].append(breed_image[1])

    # 测试每种类型下的测试集是否至少包含了18%的数据
    breed_training_count = len(training_dataset[dog_breed])
    breed_testing_count = len(testing_dataset[dog_breed])

    assert round(breed_testing_count /
                 (breed_training_count + breed_testing_count),
                 2) > 0.18, "Not enough testing images."


def write_records_file(dataset, record_location):
    """
    Fill a TFRecords file with the images found in `dataset` and include their category.
    Parameters
    ----------
    dataset : dict(list)
      Dictionary with each key being a label for the list of image filenames of its value.
    record_location : str
      Location to store the TFRecord output.
    ""