flow_from_diectory是ImageGenerator类下的函数,从函数名,就可以明白其就是从文件夹中读取图像。
定义
def flow_from_directory(self,
directory,
target_size=(256, 256),
color_mode='rgb',
classes=None,
class_mode='categorical',
batch_size=32,
shuffle=True,
seed=None,
save_to_dir=None,
save_prefix='',
save_format='png',
follow_links=False,
subset=None,
interpolation='nearest'):
"""Takes the path to a directory & generates batches of augmented data.
Args:
directory: string, path to the target directory. It should contain one
subdirectory per class. Any PNG, JPG, BMP, PPM or TIF images inside
each of the subdirectories directory tree will be included in the
generator. See [this script](
https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d)
for more details.
target_size: Tuple of integers `(height, width)`, defaults to `(256,
256)`. The dimensions to which all images found will be resized.
color_mode: One of "grayscale", "rgb", "rgba". Default: "rgb". Whether
the images will be converted to have 1, 3, or 4 channels.
classes: Optional list of class subdirectories
(e.g. `['dogs', 'cats']`). Default: None. If not provided, the list
of classes will be automatically inferred from the subdirectory
names/structure under `directory`, where each subdirectory will be
treated as a different class (and the order of the classes, which
will map to the label indices, will be alphanumeric). The
dictionary containing the mapping from class names to class
indices can be obtained via the attribute `class_indices`.
class_mode: One of "categorical", "binary", "sparse",
"input", or None. Default: "categorical".
Determines the type of label arrays that are returned:
- "categorical" will be 2D one-hot encoded labels,
- "binary" will be 1D binary labels,
- "sparse" will be 1D integer labels,
- "input" will be images identical to input images (mainly used to
work with autoencoders).
- If None, no labels are returned (the generator will only yield
batches of image data, which is useful to use with
`model.predict()`).
Please note that in case of class_mode None, the data still needs to
reside in a subdirectory of `directory` for it to work correctly.
batch_size: Size of the batches of data (default: 32).
shuffle: Whether to shuffle the data (default: True) If set to False,
sorts the data in alphanumeric order.
seed: Optional random seed for shuffling and transformations.
save_to_dir: None or str (default: None). This allows you to optionally
specify a directory to which to save the augmented pictures being
generated (useful for visualizing what you are doing).
save_prefix: Str. Prefix to use for filenames of saved pictures (only
relevant if `save_to_dir` is set).
save_format: one of "png", "jpeg", "bmp", "pdf", "ppm", "gif",
"tif", "jpg"
(only relevant if `save_to_dir` is set). Default: "png".
follow_links: Whether to follow symlinks inside
class subdirectories (default: False).
subset: Subset of data (`"training"` or `"validation"`) if
`validation_split` is set in `ImageDataGenerator`.
interpolation: Interpolation method used to resample the image if the
target size is different from that of the loaded image. Supported
methods are `"nearest"`, `"bilinear"`, and `"bicubic"`. If PIL version
1.1.3 or newer is installed, `"lanczos"` is also supported. If PIL
version 3.4.0 or newer is installed, `"box"` and `"hamming"` are also
supported. By default, `"nearest"` is used.
Returns:
A `DirectoryIterator` yielding tuples of `(x, y)`
where `x` is a numpy array containing a batch
of images with shape `(batch_size, *target_size, channels)`
and `y` is a numpy array of corresponding labels.
"""
flow_from_diectory中参数含义:
directory:目标文件夹路径,对于每一个类,该文件夹都要包含一个子文件夹。
target_size:整数tuple,默认为(256, 256)。图像将被resize成该尺寸
color_mode:颜色模式,为"grayscale"和"rgb"之一,默认为"rgb",代表这些图片是否会被转换为单通道或三通道的图片。
classes:可选参数,为子文件夹的列表,如['cat','dog'],默认为None。若未提供,则该类别列表将从directory下的子文件夹名称/结构自动推断。每一个子文件夹都会被认为是一个新的类。(类别的顺序将按照字母表顺序映射到标签值)。
class_mode: "categorical", "binary", "sparse"或None之一。默认为"categorical。该参数决定了返回的标签数组的形式, "categorical"会返回2D的one-hot编码标签,"binary"返回1D的二值标签。"sparse"返回1D的整数标签,如果为None则不返回任何标签,生成器将仅仅生成batch数据。
batch_size:batch数据的大小,默认32。
shuffle:是否打乱数据,默认为True。
seed:可选参数,打乱数据和进行变换时的随机数种子。
save_to_dir:None或字符串,该参数能让你将数据增强后的图片保存起来,用以可视化。
save_prefix:字符串,保存数据增强后图片时使用的前缀, 仅当设置了save_to_dir时生效。
save_format:"png"或"jpeg"之一,指定保存图片的数据格式,默认"jpeg"。
这些参数中的directory一定要弄清楚,它是指类别文件夹的上一层文件夹,在该数据集中,类别文件夹为cat和dog,它的上一级文件夹是train。所以director为 r"D://Learning//tensorflow_2.0//animal//data//train"
另外,class这个参数也要注意,通常我们就采用默认None,directory的子文件夹就是标签。在该分类任务中标签就是smile和neutral。