MindSpore数据集加载-【&apos；DictIterator&apos； has no attribute &apos；get_next&apos；】错误

最新推荐文章于 2024-05-28 09:44:47 发布

skytier

最新推荐文章于 2024-05-28 09:44:47 发布

阅读量387

点赞数

文章标签： python 开发语言 Powered by 金山文档

本文链接：https://blog.csdn.net/skytttttt9394/article/details/129127830

版权

MindSpore中的Dataset调用create_dict_iterator()接口时，会拉起数据加载&处理的流水线，一些错误会在这个时间点发生。但是错误发生点可能不在这个接口本身，需要用户根据错误日志进一步分析来定位。

下面介绍两类调用create_dict_iterator时发生的错误：

'DictIterator' has no attribute 'get_next'

  1 #从测试集中取出一组样本，输入模型进行预测
  2 test_ = ds_test.create_dict_iterator().get_next()
  3 #利用key值选出样本
  4 test = Tensor(test_['x'], mindspore.float32)
  5
  6 AttributeError: 'DictIterator' object has no attribute 'get_next'

原因分析：

create_dict_iterator()返回类型为DictIterator对象，其继承内部Iterator类，Iterator中通过实现__iter__()和__next__()两个内置函数实现迭代器协议。

  1 class Iterator:
  2     """
  3     General Iterator over a dataset.
  4
  5     Attributes:
  6         dataset: Dataset to be iterated over
  7     """
  8
  9     def __init__(self, dataset, num_epochs=-1, output_numpy=False, do_copy=True):
 10     ......
 11
 12     def __iter__(self):
 13         return self
 14
 15    def __next__(self):
 16         if not self._runtime_context:
 17             logger.warning("Iterator does not have a running C++ pipeline." +
 18                            "It might because Iterator stop() had been called, or C++ pipeline crashed silently.")
 19             raise RuntimeError("Iterator does not have a running C++ pipeline.")
 20
 21         data = self._get_next()
 22         if not data:
 23             if self.__index == 0:
 24                 logger.warning("No records available.")
 25             if self.__ori_dataset.dataset_size is None:
 26                 self.__ori_dataset.dataset_size = self.__index
 27             raise StopIteration
 28         self.__index += 1
 29
 30         if self.offload_model is not None:
 31             data = offload.apply_offload_iterators(data, self.offload_model)
 32
 33         return data

从Iterator定义可以看出，通过调用__next__方法可以取到下一条数据，第21行 data = self._get_next()表示实际取数据的实现定义子类DictIterator或者TupleIterator中_get_next()方法中。用户可以通过next(ds_test.create_dict_iterator()) 或者 for item in ds_test.create_dict_iterator(): 两种方式从迭代器中取下一条数据。

早期版本中get_next()为公开方法，通过ds_test.create_dict_iterator().get_next()可以取到下一个条dict类型的数据。

解决办法：通过迭代器方式来获取处理后的数据。

1. next(ds_test.create_dict_iterator())

2. for item in ds_test.create_dict_iterator():

2. 调用create_dict_iterator时报错无效的数据类型

用户使用GeneratorDataset来加载数据，并定义了如下：自定义的随机访问。

报错信息如下：

  1 E:\anaconda\envs\mindspore\lib\site-packages\mindspore\dataset\engine\datasets.py:3533: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple      of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  2   yield tuple([np.array(x, copy=False) for x in val])
  3 Traceback (most recent call last):
  4   File "C:/Users/wkml996/Desktop/hypertext/mindspore/project/main.py", line 18, in <module>
  5     for data in train_iter.create_dict_iterator():
  6   File "E:\anaconda\envs\mindspore\lib\site-packages\mindspore\dataset\engine\iterators.py", line 122, in __next__
  7     data = self._get_next()
  8   File "E:\anaconda\envs\mindspore\lib\site-packages\mindspore\dataset\engine\iterators.py", line 173, in _get_next
  9     raise err
 10   File "E:\anaconda\envs\mindspore\lib\site-packages\mindspore\dataset\engine\iterators.py", line 166, in _get_next
 11     return {k: self._transform_tensor(t) for k, t in self._iterator.GetNextAsMap().items()}
 12 RuntimeError: Unexpected error. Invalid data type.
 13 Line of code : 114
 14 File         : D:\jenkins\agent-working-dir\workspace\Compile_CPU_Windows\mindspore\mindspore\ccsrc\minddata\dataset\core\tensor.cc
 15
 16 WARNING: Logging before InitGoogleLogging() is written to STDERR
 17 [ERROR] MD(23788,2,?):2021-9-27 18:49:1 [mindspore\ccsrc\minddata\dataset\core\data_type.cc:159] FromNpArray] Cannot convert from numpy type. Unknown data type is returned!
 18 [ERROR] MD(23788,2,?):2021-9-27 18:49:1 [mindspore\ccsrc\minddata\dataset\core\data_type.cc:159] FromNpArray] Cannot convert from numpy type. Unknown data type is returned!
 19 [ERROR] MD(23788,2,?):2021-9-27 18:49:1 [mindspore\ccsrc\minddata\dataset\util\task_manager.cc:217] InterruptMaster] Task is terminated with err msg(more detail in info level log):Unexpected     error. Invalid data type.
 20 Line of code : 114
 21 File         : D:\jenkins\agent-working-dir\workspace\Compile_CPU_Windows\mindspore\mindspore\ccsrc\minddata\dataset\core\tensor.cc
 22
 23
 24 进程已结束，退出代码为 1
~
~

原因分析：错误信息第12行中提示"Invalid data type"，表示输入的Numpy array的dtype不符合预期，MindSpore支持输入dtype为int, float, str类型的Numpy array。

用户脚本中使用np.array()进行转换时，当输入数据为不同length或shape的list, tuple, ndarrays 组成的list或tuple时，输出的Numpy array的dtype为预期外的object类型，导致 MindSpore加载数据时出错。

比如： one_sample[0]为nd.array组成的list：one_sample[0] = [np.array([1,2]), np.array([1,2,3])]，其中每个元素的dtype为int64，执行完np.array()转换后的data1的dtype为obejct类型，Mindspore在执行到Tensor的转换时抛出"Invalid data type"异常。