大模型基础（二）——张量-CSDN博客

本文链接：https://blog.csdn.net/LYS00Q/article/details/147422664

张量基础

概念

张量（Tensor）是 多维数组 的一种通用表示，是 PyTorch 中存储和操作数据的基本结构。

维度	名称	举例
0 维	标量 (scalar)	`7`
1 维	向量 (vector)	`[7, 7]`
2 维	矩阵 (matrix)	`[[7, 8], [9, 10]]`
3 维及以上	多维张量	`[[[1, 2], [3, 4]], [[5, 6], [7, 8]]]`

创建张量

import torch

# 从列表创建张量
a = torch.tensor([1, 2, 3])

# 指定形状创建全0、全1张量
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)

# 指定值填充
full_tensor = torch.full((2, 3), 9)
print(full_tensor)

# 随机张量
rand = torch.rand(2, 2)       # 均匀分布
randn = torch.randn(2, 2)     # 正态分布
randint = torch.randint(0, 10, (2, 2))  # 整数张量

# 等差数列张量
linear = torch.linspace(0, 1, steps=5)

# 相同形状张量复制结构
like = torch.ones_like(rand)

张量维度查看：使用 .ndim 或 .dim()
形状查看：使用 .shape
取值：当张量只有一个值的时候，使用 .item()

创建随机张量跟随机种子有关：

print(torch.random.initial_seed()) # 每次运行都不一样

torch.random.manual_seed(1145141919810) # 手动设置随机种子

类型转换

数字类型：

# 创建浮点型张量
t = torch.tensor([1.5, 2.5], dtype=torch.float32)

# 转换为整型
t_int = t.int()

# 转换为64位浮点
t_double = t.double()

# 查看数据类型
print(t.dtype, t_int.dtype, t_double.dtype)

data=torch.full([2,3],10)
print (data.dtype)

#将data元素类型转换为64位浮类型
data=data.type(torch.DoubleTensor)
print (data.dtype)

#转换为其他类型
data=data.type(torch.ShortTensor)
data=data.type(torch.IntTensor)
data=data.type(torch.LongTensor)
data=data.type(torch.FloatTensor)

Numpy数组转换：

import torch

data_tensor = torch.tensor([2, 3, 4])
# 使用 .numpy() 方法进行转换
data_numpy = data_tensor.numpy()

print(type(data_tensor))  # <class 'torch.Tensor'>
print(type(data_numpy))   # <class 'numpy.ndarray'>

# data_tensor 和 data_numpy 共享内存，修改后同步
data_numpy[0] = 100

print(data_tensor)  # tensor([100,   3,   4])
print(data_numpy)   # [100   3   4]

使用对象拷贝：

import torch

# 创建 PyTorch 张量
data_tensor = torch.tensor([2, 3, 4])

# 正确转换为 NumPy 且不共享内存：先 clone，再 detach，再 .numpy()
data_numpy = data_tensor.clone().detach().numpy()

print(type(data_tensor))  # <class 'torch.Tensor'>
print(type(data_numpy))   # <class 'numpy.ndarray'>

# 不共享内存：修改一个，另一个不会变
data_tensor[0] = 100
data_numpy[0] = 999

print("data_tensor:", data_tensor)  # tensor([100,   3,   4])
print("data_numpy :", data_numpy)   # [999   3   4]

基本运算

运算	操作符方式	不改变原张量	改变原张量
加法	`a + b`	`a.add(b)`	`a.add_(b)`
减法	`a - b`	`a.sub(b)`	`a.sub_(b)`
乘法	`a * b`	`a.mul(b)`	`a.mul_(b)`
除法	`a / b`	`a.div(b)`	`a.div_(b)`
取负	`-a`	`a.neg()`	`a.neg_()`

import torch

# 创建两个张量
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([10.0, 20.0, 30.0])

# -------- 运算符方式（推荐 & 简洁） --------
print("加法:", a + b)
print("减法:", a - b)
print("乘法:", a * b)
print("除法:", a / b)
print("取负:", -a)

# -------- 函数方式（不修改原始张量） --------
print("加法:", a.add(b))    # 等同于 a + b
print("减法:", a.sub(b))    # 等同于 a - b
print("乘法:", a.mul(b))    # 等同于 a * b 或 torch.mul(a, b)
print("除法:", a.div(b))    # 等同于 a / b
print("取负:", a.neg())     # 等同于 -a

# 原始张量未改变
print("a 原值未变:", a)

# -------- 函数方式（带下划线：修改原始张量） --------
a.add_(b)
print("a += b 后:", a)

a.sub_(b)
print("a -= b 后:", a)

a.mul_(b)
print("a *= b 后:", a)

a.div_(b)
print("a /= b 后:", a)

a.neg_()
print("a 取负后:", a)

点积运算

点积定义：
$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i$

点积展开（n = 3）：
$\mathbf{a} \cdot \mathbf{b} = a_1 b_1 + a_2 b_2 + a_3 b_3$

矩阵乘法形式
$\mathbf{a} \cdot \mathbf{b} = \begin{bmatrix} a_1 & a_2 & \dots & a_n \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix} = \sum_{i=1}^{n} a_i b_i$

点积与夹角的关系:
$\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos{\theta}$

例如设：
$\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}, \quad B = \begin{bmatrix} 7 & 8 & 9 \\ 10 & 11 & 12 \\ 13 & 14 & 15 \end{bmatrix}$

矩阵乘积：

$\begin{bmatrix} 1\cdot7 + 2\cdot10 + 3\cdot13 & 1\cdot8 + 2\cdot11 + 3\cdot14 & 1\cdot9 + 2\cdot12 + 3\cdot15 \\ 4\cdot7 + 5\cdot10 + 6\cdot13 & 4\cdot8 + 5\cdot11 + 6\cdot14 & 4\cdot9 + 5\cdot12 + 6\cdot15 \\ \end{bmatrix}= \begin{bmatrix} 66 & 72 & 78 \\ 156 & 171 & 186 \end{bmatrix}$
使用 PyTorch 的 @ 运算（矩阵乘法）：

import torch

# 定义两个矩阵
a = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])

b = torch.tensor([[7, 8, 9],
                  [10, 11, 12],
                  [13, 14, 15]])

# 使用 @ 运算符进行矩阵乘法
result = a @ b

print("a @ b =\n", result)

使用 NumPy 的 @ 运算：

import numpy as np

# 定义两个 NumPy 矩阵
a = np.array([[1, 2, 3],
              [4, 5, 6]])

b = np.array([[7, 8, 9],
              [10, 11, 12],
              [13, 14, 15]])

# 使用 @ 运算符进行矩阵乘法
result = a @ b

print("a @ b =\n", result)

行列索引

操作	代码	说明
获取第 0 行	`a[0]`	返回 `[10, 20, 30]`
获取第 1 行第 2 列的值	`a[1][2]` 或 `a[1, 2]`	返回 `60`
获取第 2 列	`a[:, 2]`	返回 `[30, 60, 90]`（所有行的第 2 列）
获取第 0 列	`a[:, 0]`	返回 `[10, 40, 70]`
获取第 0~1 行	`a[0:2]`	返回前两行
获取第 1~2 行，第 0~1 列	`a[1:3, 0:2]`	取子矩阵

import torch

a = torch.tensor([
    [10, 20, 30],
    [40, 50, 60],
    [70, 80, 90]
])

print("原始张量:\n", a)
print("第0行:", a[0])
print("第1行第2列的元素:", a[1, 2])
print("第2列:", a[:, 2])
print("第0~1行:\n", a[0:2])
print("第1~2行，第0~1列:\n", a[1:3, 0:2])

import torch

tensor_3d = torch.tensor([
    [  # 第0个矩阵 (0层)
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12]
    ],
    [  # 第1个矩阵 (1层)
        [13, 14, 15, 16],
        [17, 18, 19, 20],
        [21, 22, 23, 24]
    ]
])

print("第0层:\n", tensor_3d[0])
print("第1层第2行:", tensor_3d[1, 2])
print("第0层第1行第2列的元素:", tensor_3d[0, 1, 2])
print("所有层的第2行:\n", tensor_3d[:, 2])
print("所有层第1行第0列的值:", tensor_3d[:, 1, 0])

形状操作

函数名	功能描述	重点备注
`reshape()`	改变张量形状，返回新张量	不一定共享内存，推荐用
`view()`	改变张量形状，返回视图	要求连续内存
`squeeze()`	去掉维度为1的维度	变“瘦”
`unsqueeze()`	增加一个维度	变“胖”
`transpose()`	交换两个维度	用于2D或高维张量
`permute()`	按任意顺序排列维度	比`transpose()`更灵活
`contiguous()`	将张量在内存中变为连续	通常和`view()`一起用

import torch

# 重新定义形状
x = torch.arange(6)         # [0, 1, 2, 3, 4, 5]
y = x.reshape(2, 3)         # 变成 2x3 张量
print(y)

# 返回形状变换的视图
x = torch.arange(6)
y = x.view(2, 3)            # 要求 x 是连续的
print(y)

# 使张量连续存储
x = torch.tensor([[1, 2], [3, 4]])
y = x.t()                  # 转置后内存非连续
z = y.contiguous().view(4) # 先 contiguous 再 view 才不会报错
print(z)

# 去除大小为1的维度
x = torch.randn(1, 3, 1, 5)
print("原形状:", x.shape)
y = x.squeeze()
print("去除1维:", y.shape)

# 只去掉第0维（如果是1）
z = x.squeeze(0)
print("仅squeeze(0):", z.shape)

# 增加一个维度（反squeeze）
x = torch.tensor([1, 2, 3])     # shape: [3]
y = x.unsqueeze(0)              # shape: [1, 3]
z = x.unsqueeze(1)              # shape: [3, 1]
print("原:", x.shape, "unsqueeze(0):", y.shape, "unsqueeze(1):", z.shape)

# 交换两个维度
x = torch.randn(2, 3)
print("原形状:", x.shape)
y = x.transpose(0, 1)
print("转置后:", y.shape)

# 任意维度排列组合
x = torch.randn(2, 3, 4)          # [Batch, Channel, Width]
y = x.permute(0, 2, 1)            # 改为 [Batch, Width, Channel]
print("原:", x.shape, "permute后:", y.shape)

张量拼接

方式	是否增加维度	场景推荐
`torch.cat()`	否	沿现有维度拼接
`torch.stack()`	✅ 是	构造新维度，构造 batch
`torch.hstack()`	否	类似横向拼接
`torch.vstack()`	否	类似纵向拼接
`torch.dstack()`	✅ 是	堆叠到第三个维度（图像）

沿指定维度拼接多个张量：

import torch

a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6], [7, 8]])

# 沿dim=0拼接（行拼接）
cat0 = torch.cat((a, b), dim=0)
print("dim=0:\n", cat0)

# 沿dim=1拼接（列拼接）
cat1 = torch.cat((a, b), dim=1)
print("dim=1:\n", cat1)

增加新维度后进行拼接（类似堆叠）：

a = torch.tensor([1, 2])
b = torch.tensor([3, 4])

# 在新维度（dim=0）堆叠
stack0 = torch.stack((a, b), dim=0)
print("stack0:\n", stack0)  # shape: (2, 2)

# 在新维度（dim=1）堆叠
stack1 = torch.stack((a, b), dim=1)
print("stack1:\n", stack1)  # shape: (2, 2)

torch.hstack() 和 torch.vstack()（类似 numpy）：

a = torch.tensor([1, 2])
b = torch.tensor([3, 4])

# 横向拼接（dim=1）
print(torch.hstack((a, b)))  # 输出：[1, 2, 3, 4]

# 竖向拼接（dim=0）
print(torch.vstack((a, b)))  # 输出：[[1, 2], [3, 4]]

torch.dstack() — 沿第三个维度堆叠：

a = torch.tensor([[1, 2]])
b = torch.tensor([[3, 4]])

# dstack: 最后加一个维度
print(torch.dstack((a, b)))  # 输出形状: [1, 2, 2]