遇到如下问题多半时数据有问题`。
// A code block
var foo = 'bar';
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 429, in _worker_fn
batch = batchify_fn([_worker_dataset[i] for i in samples])
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 429, in <listcomp>
batch = batchify_fn([_worker_dataset[i] for i in samples])
File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/paper_dataset.py", line 375, in __getitem__
data_dict = self._transforms(data_dict)
File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 13, in __call__
args = trans(args)
File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 468, in __call__
dst_points = np.array([[rdw(), rdh()], [w-1-rdw(), rdh()], [w-1-rdw(), h-1-rdh()], [rdw(), h-1-rdh()]])
File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 466, in <lambda>
rdh = lambda: np.random.randint(0, self.max_affine_xy_ratio * h)
File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/scripts/train_gluon_testpaper.py", line 233, in <module>
for batch_cnt, data_batch in enumerate(tqdm.tqdm(train_loader)):
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 484, in __next__
batch = pickle.loads(ret.get(self._timeout))
File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
ValueError: low >= high
解决思路:
将mx.gluon.data.DataLoader中修改thread_pool=True,什么意思呢?
If True
, use threading pool instead of multiprocessing pool. Using threadpool can avoid shared memory usage. If DataLoader
is more IO bounded or GIL is not a killing problem, threadpool version may achieve better performance than multiprocessing.
翻译:如果True
,则使用线程池而不是多处理池。使用线程池可以避免共享内存的使用。如果“DataLoader”的IO范围更大,或者GIL不是致命的问题是,线程池版本可能实现比多处理更好的性能。
train_loader = mx.gluon.data.DataLoader(train_dataset, batch_size=config.TRAIN.batch_size,
shuffle=True, num_workers=2, thread_pool=True,
last_batch="discard", batchify_fn=batch_fn)