（一）imgaug -- 2 Multicore Augmentation

最新推荐文章于 2021-09-02 16:56:25 发布

Fiona-Dong

最新推荐文章于 2021-09-02 16:56:25 发布

阅读量303

点赞数

分类专栏： github

原文链接：https://github.com/aleju/imgaug

版权

github 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

2. Multicore Augmentation

2.1 Background

Augmentation can be a slow process, especially when working with large images and when combining many different augmentation techniques.

One way to improve performance is to augment simultaneously on multiple CPU cores.

imgaug offers a native system to do that. Based on following steps:

(1) Split the dataset into batches.
Each batch contains one or more images.

(2) Start one or more child processes.
Each of them runs on its own CPU core.

(3) Send batches to the child processes.
Try to distribute them equally over the child processes so that each of them has a similar amount of work to do.

(4) Let the child processes augment the data.

(5) Receive the augmented batches from the child processes.

Important points:

(1) The data has to be split into batches.

(2) Combining all data into one batch and using multicore augmentation is pointless, as each individual batch is augmented by exactly one core.

(3) Using multicore augmentation for small amounts of data can also be pointless as starting the child processes might take up more time than simply augmenting the dataset on a single CPU core.

(4) Imgaug offers multicore features and it is recommended to use them for multicore augmentation.
It is not recommended to execute imgaug in a custom-made multicore routine using e.g. python’s multiprocessing library or by using the multiprocessing support of some deep learning libraries.

2.2 Example: augment_batches(…, background=True)

import imageio
import imgaug as ia
from imgaug import augmenters as iaa
from imgaug.augmentables.batches import UnnormalizedBatch

import numpy as np
import time

%matplotlib inline

image = imageio.imread("img/cat3.jpeg")
ia.imshow(image)

Output:

BATCH_SIZE=16
images = [image for _ in range(BATCH_SIZE)]


NB_BATCHES = 100
# combine images to UnnormalizedBatch instances
batches = [UnnormalizedBatch(images=images) for _ in range(NB_BATCHES)]


seq = iaa.Sequential([
    iaa.PiecewiseAffine(scale=0.05, nb_cols=6, nb_rows=6),  # very slow
    iaa.Fliplr(0.5),  # very fast
    iaa.CropAndPad(px=(-10, 10))  # very fast
])

2.2.1 a single CPU core

time_start = time.time()
batches_aug = list(seq.augment_batches(batches=batches, background=False)) # list() converts generator to list
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end-time_start))

Output:

Augmentation done in 110.15s

print(image.shape)
print(len(batches_aug))
print(len(batches_aug[0].images_aug))

ia.imshow(batches_aug[0].images_aug[0])
ia.imshow(batches_aug[0].images_aug[1])
ia.imshow(batches_aug[1].images_aug[1])

Output:

(225, 225, 3)
100
16

Roughly 110 seconds for 100 batches, each containing 16 images of size 225x225.
That’s about 0.09s per image.
Not very fast, the GPU would most likely train faster than this.

2.2.2 multiple CPU cores

time_start = time.time()
batches_aug = list(seq.augment_batches(batches=batches, background=True))
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end-time_start))

Output:

Augmentation done in 65.23s

2.3 Batches with Non-Image Data (keypoints)

image = ia.quokka(size=0.2)
keypoints = ia.quokka_keypoints(size=0.2)

ia.imshow(image)
keypoints

Output:

KeypointsOnImage([Keypoint(x=32.59999847, y=15.64852238), Keypoint(x=84.40000153, y=11.23483658), Keypoint(x=49.59999847, y=41.12752533), Keypoint(x=73.19999695, y=38.92068481), Keypoint(x=62.59999847, y=54.16796112), Keypoint(x=48.79999924, y=107.33281708), Keypoint(x=58.00000000, y=107.53343964)], shape=(129, 192, 3))

BATCH_SIZE = 16
NB_BATCHES = 100

images = [image for _ in range(BATCH_SIZE)]
keypoints = [keypoints.deepcopy() for _ in range(BATCH_SIZE)]

batches = [UnnormalizedBatch(images=images, keypoints=keypoints) for _ in range(NB_BATCHES)]

seq = iaa.Sequential([
    iaa.PiecewiseAffine(scale=0.05, nb_cols=6, nb_rows=6),  # very slow
    iaa.Fliplr(0.5),  # very fast
    iaa.CropAndPad(px=(-10, 10))  # very fast
])

time_start = time.time()
batches_aug = list(seq.augment_batches(batches, background=True))  # background=True for multicore aug
time_end = time.time()

print("Augmentation done in %.2fs" % (time_end - time_start))

Output:

Augmentation done in 110.15s

ia.imshow(

    batches_aug[0].keypoints_aug[0].draw_on_image(
    
        batches_aug[0].images_aug[0]
    )
)

Output:

2.3 Using Pool

If you want to e.g.:

control the number of used CPU cores
control the random number seed

augmenter.pool() is a simple alternative (and it is the backend that augment_batches() uses).

The example below configures the pool to:
(1) use all CPU cores except one (processes=-1)
(2) restart child processes after 20 tasks (maxtasksperchild=20)
(3) start with a random number seed of 1

The argument maxtasksperchild can be useful if you deal with memory leaks that lead to more and more memory consumption over time (it does cost performance to use it).

with seq.pool(processes=-1, maxtasksperchild=20, seed=1) as pool:
    batches_aug = pool.map_batches(batches)
    
ia.imshow(batches_aug[0].images_aug[0])

Output:

We called map_batches() here exactly once to augment the input batches.

In practice, we can call map_batches() many times for each generated pool using different input batches.
It is recommended to do so, because creating a new pool requires respawning the child processes, which does cost some time.

augmenter.pool() is a shortcut that creates an instance of imgaug.multicore.Pool, which is a wrapper around python’s multiprocessing.Pool.

The wrapper deals mainly with the correct management of random states between child processes.

The below example shows the usage of imgaug.multicore.Pool, using the same seed as in the augmenter.pool() example above and hence generating the same output.

with ia.multicore.Pool(seq, processes=-1, maxtasksperchild=20, seed=1) as pool:
    batches_aug = pool.map_batches(batches)
    
ia.imshow(batches_aug[0].images_aug[0])

Output:

2.4 Using Pool with Generators

The two previous examples showed how to use lists with imgaug’s Pool.

For large datasets, using generators can be more appropiate to avoid having to store the whole dataset in memory.

This is trivially done by replacing map_batches(<list>) with imap_batches(<generator>).. The output of that function is also a generator.

def create_generator(lst):
    for list_entry in lst:
        yield list_entry
        
my_generator = create_generator(batches)

with seq.pool(processes=-1, seed=1) as pool:
    batches_aug = pool.imap_batches(my_generator)
    
    for i, batch_aug in enumerate(batches_aug):
        if i == 0:
            ia.imshow(batch_aug.images_aug[0])

Output:

Note that if you don’t need your batches to be returned in the same order as you provided them, you can use imap_batches_unordered() instead of imap_batches().

The unordered method tends to be faster.

2.5 Rate-Limiting Pool to decrease maximum RAM requirements

By default, pool will greedely load (and augment) as many batches as possible from generators.

There is no rate-limiting that restricts how many augmented batches are at most allowed to “wait” in the pipeline.

That means that in the worst case (when the model trains very slowly, while the augmentation is very fast) the whole augmented dataset could be waiting to be retrieved, which would potentially lead to high RAM usage, depending on how large the dataset is.

To fix this problem, the argument output_buffer_size can be used.

The value controls how many batches are at most allowed to exist within the whole augmentation pipeline, i.e. :
imap_batches(gen) will load new batches from gen until output_buffer_size batches are reached and then only load another batch from gen whenever it has successfully yielded an augmented batch.

Below code shows an example.

It is similar to the one above, but uses an augmentation pipeline that produces batches faster than they are consumed.

Messages are printed that show exactly when batches are loaded and when they are requested from the augmentation pipeline.

To limit the RAM requirements of this fast pipeline, output_buffer_size=5 is used, restricting the allowed number of waiting batches to five.

Note that batches in imgaug contain the images before and after augmentation, so the effective RAM requirement is here $5 * 2 * I$ , where $I$ is the size of a single image.

In practice that value should be viewed as a lower bound for the actual RAM demands, as e.g. :
copying data to background processes could temporarily double the requirements.

"""
Use a single, very fast augmenter here to show that 
batches are only loaded once there is space again in the buffer.
"""
pipeline = iaa.Fliplr(0.5)

def create_generator(lst):
    for list_entry in lst:
        print("Loading next unaugmented batch...")
        yield list_entry

# only use 25 batches here, which is enough to show the effect
my_generator = create_generator(batches[0:25])

with pipeline.pool(processes=-1, seed=1) as pool:
    batches_aug = pool.imap_batches(my_generator, output_buffer_size=5)
    print("Requesting next augmented batch...")
    
    for i, batch_aug in enumerate(batches_aug):
        # sleep here for a while to simulate a slowly training model
        time.sleep(0.1)

        if i < len(batches)-1:
            print("Requesting next augmented batch...")

Output:

Requesting next augmented batch...
Loading next unaugmented batch...
Loading next unaugmented batch...
Loading next unaugmented batch...
Loading next unaugmented batch...
Loading next unaugmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Loading next unaugmented batch...
Requesting next augmented batch...
Requesting next augmented batch...
Requesting next augmented batch...
Requesting next augmented batch...
Requesting next augmented batch...
Requesting next augmented batch...

The method imap_batches_unordered() also supports output_buffer_size. However, map_batches() does not support the argument and always augments the whole input list.