将MATLAB的Cell数据类型转化为Python类型非常繁琐,参考
Python 之 h5py 读取 matlab 中 .mat 文件 cell 方法浅析_python 读取mat中的cell-CSDN博客
但是一种更为棘手的情形是Cell嵌套Cell,比如:
如图中的all_to_all_indexes,是个1*26的Cell,点进入发现,每个Cell又是个维数不规则的n*26 Cell。此种情形,即所谓的Cell嵌套Cell。
每个Cell内部又是不均一的内容,如下
如何将这一复杂数据结构转化为Python数据类型,非常令人头疼。
我也不会,GPT4给出了以下方法(我加以简单修改):
def read_all_to_all_indixes(dir_name: str, open_type = 'h5py') -> list[list[list[list]]]:
"""read_all_to_all_indixes: Read the all-to-all indice from the MATLAB file saved by CellReg.
Parameters
----------
dir_name : str
The directory of the MATLAB file
open_type : str, optional
The method to open the MATLAB file, depending on the file signature
Generally, 'h5py' and 'scipy' are used, by default 'h5py'
'h5py' is needed when using v7.3 signature to save the file
'scipy' is needed when 'h5py' does not work, relating to old-version MATLAB file.
Returns
-------
all_to_all_indixes, very complex list hierarchy.
The original data structure is MATLAB 1*n_sessions MATLAB Cell, and each cell contains a
n_neurons*n_sessions MATLAB Cell, the later contains a list.
Raises
------
FileNotFoundError
"""
if os.path.exists(dir_name) == False:
raise FileNotFoundError
if open_type == 'h5py':
with h5py.File(dir_name, 'r') as f:
modeled_data_struct = f['modeled_data_struct']
dataset_a = modeled_data_struct['all_to_all_indexes']
# Initialize an empty list to hold the fully dereferenced data
fully_dereferenced_data = []
# Iterate over each reference in the dataset
for ref in dataset_a:
# Dereference and read the data
nested_data = read_matlab_data(ref[0], f)
# Further dereference if the data contains more HDF5 references
if isinstance(nested_data, np.ndarray) and nested_data.dtype == 'object':
fully_dereferenced_nested_data = [
read_matlab_data(sub_ref, f) if isinstance(sub_ref, h5py.h5r.Reference) else sub_ref
for sub_ref in nested_data.flatten()
]
fully_dereferenced_data.append(fully_dereferenced_nested_data)
else:
fully_dereferenced_data.append(nested_data)
return fully_dereferenced_data