Tensorpack DataFlow
Tensorpack DataFlow is an efficient and flexible data loading pipeline for deep learning, written in pure Python.
Its main features are:
Highly-optimized for speed. Parallization in Python is hard. DataFlow implements highly-optimized parallel building blocks which gives you an easy interface to parallelize your workload.
Written in pure Python. This allows it to be used together with any other Python-based library.
DataFlow is originally part of the tensorpack library and has been through 3 years of active development. Given its independence of the rest of the tensorpack library, and the high demand from users, it is now a separate library whose source code is synced with tensorpack.
Why would you want to use DataFlow instead of a platform-specific data loading solutions? We recommend you to read Why DataFlow?.
Install:
pip install --upgrade git+https://gi