如何数四维矩阵,如下是一个四位矩阵,形状为(4,3,2,5)
tensor([[[[-0.3037, 0.5017, 0.2698, 0.7181, 0.7317],
[ 0.2041, -2.1096, 2.0244, -2.0167, -0.4281]],
[[ 0.0483, 0.0486, -0.9498, -0.1509, -0.9814],
[-0.0850, -0.8587, 1.4044, 0.6037, -0.9757]],
[[ 0.8890, -0.4697, -0.8317, -0.8372, 1.5004],
[-0.4455, 0.9579, -2.2947, 1.9205, 0.3391]]],
[[[-0.0613, -0.3637, 2.9507, 1.0946, 1.9070],
[ 1.9137, -0.8911, -0.3479, 0.9942, -1.7937]],
[[-1.1416, -1.5116, -0.6588, 1.0701, -0.3426],
[-1.3269, 0.5529, -0.6450, 0.0393, 1.5098]],
[[ 1.4184, -0.7669, -1.0354, -0.7632, 0.4466],
[-1.9081, -0.1087, 0.0065, -0.3233, -0.7373]]],
[[[-0.8933, -1.6219, 1.0330, -1.1950, -1.0639],
[ 0.7363, 0.4656, -0.8343, 1.3202, 0.7524]],
[[-0.4619, -0.4536, 0.6632, 0.2840, 1.1670],
[-0.4915, -0.8160, -1.3393, -1.4767, -0.1640]],
[[-1.4513, -0.3191, 0.2184, 1.7790, -1.0141],
[-1.8744, -1.0015, -0.8923, 1.4281, -0.3708]]],
[[[-0.4469, 0.7219, -0.7110, 0.3740, 0.0478],
[-0.4859, 0.3819, -0.4086, 1.0739, 0.0245]],
[[ 0.7192, 2.0502, 0.0091, -1.4356, 1.1417],
[ 0.5979, 0.4916, 0.3360, 0.3793, -0.5015]],
[[-0.5521, 0.0183, -0.0687, -0.5918, 0.1760],
[-1.7947, -0.9572, -0.3511, -0.0038, 1.1138]]]])
最里面的是2*5的矩阵,形状为(2,5)
[[-0.3037, 0.5017, 0.2698, 0.7181, 0.7317],
[ 0.2041, -2.1096, 2.0244, -2.0167, -0.4281]]
在其次是有3个2*5的矩阵,形状为(3,2,5)
[[[-0.3037, 0.5017, 0.2698, 0.7181, 0.7317],
[ 0.2041, -2.1096, 2.0244, -2.0167, -0.4281]],
[[ 0.0483, 0.0486, -0.9498, -0.1509, -0.9814],
[-0.0850, -0.8587, 1.4044, 0.6037, -0.9757]],
[[ 0.8890, -0.4697, -0.8317, -0.8372, 1.5004],
[-0.4455, 0.9579, -2.2947, 1.9205, 0.3391]]]
最后有4个(3,2,5),最终形状为(4,3,2,5)
为了更清晰地知道如何应用四维矩阵,每个矩阵的含义,以音频为例,假设输入的是经过预处理后的音频形状为(B,C,T,F)。
索引DC = d[:,0,:,:]
结果:
tensor([[[ 0.1216, 0.7147, -0.5788, 0.2586, -0.6516],
[-2.0123, -0.5601, 2.0672, 0.7074, -1.2746]],
[[-0.1756, 0.1880, -1.5323, -0.0562, -2.0827],
[ 0.4644, -0.5828, 0.0791, 0.2199, 0.9643]],
[[-0.3476, 1.2914, -1.0125, -1.8291, -2.1911],
[ 0.2787, 1.5103, 0.9466, 1.1055, 0.7242]],
[[ 1.5504, -0.1351, -0.2468, -0.7308, 1.2519],
[ 0.6720, -1.0799, 0.2046, 0.3113, -1.6933]]])
含义是取第一个通道即音频的实部,得到的是三维的数据,规律是被索引的那一维变成具体的数据,其他维数不变。