文章目录
一. 概念:张量、算子
1.张量
张量(Tensor)是一个定义在一些向量空间和一些对偶空间的笛卡尔积上的多重线性映射,其坐标是|n|维空间内,有|n|个分量的一种量, 其中每个分量都是坐标的函数, 而在坐标变换时,这些分量也依照某些规则作线性变换。r 称为该张量的秩或阶(与矩阵的秩和阶均无关系)。
张量概念是矢量概念的推广,矢量是一阶张量。张量是一个可用来表示在一些矢量、标量和其他张量之间的线性关系的多线性函数。张量概念包括标量、向量和线性算子。张量可以用坐标系统来表达,记作标量的数组,但它是定义为“不依赖于参照系的选择的”。
2.算子
算子是一个函数空间到函数空间上的映射O:X→X。广义上的算子可以推广到任何空间。广义的讲,对任何函数进行某一项操作都可以认为是一个算子,甚至包括求幂次,开方都可以认为是一个算子,他和 f(x) 的 f 没区别,它甚至和加减乘除的基本运算符号都没有区别,只是他可以对单对象操作罢了(有的符号比如大于、小于号要对多对象操作)。
二. 使用pytorch实现张量运算
1.2 张量
1.2.1 创建张量
import torch # 导入torch包
1.2.1.1 指定数据创建张量
一维张量
data = torch.Tensor([2.0,3.0,4.0])
print(data)
运行结果:
tensor([2., 3., 4.])
二维张量
t = torch.tensor([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]])
print(t) )
运行结果:
tensor([[1., 2., 3.],
[4., 5., 6.]])
多维张量
t = torch.tensor([[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]],
[[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]]])
print(t)
运行结果
tensor([[[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]],
[[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]]])
1.2.1.2 指定形状创建
import torch
m, n = 2, 3
zeros_Tensor = torch.zeros([m, n]) # 创建数据全为0,形状为[m, n]的Tensor
ones_Tensor =torch.ones([m, n]) # 创建数据全为1,形状为[m, n]的Tensor
full_Tensor = torch.full([m, n], 10) #创建指定数据为10,形状为[m, n]的Tensor
print('zeros Tensor: ', zeros_Tensor)
print('ones Tensor: ', ones_Tensor)
print('full Tensor: ', full_Tensor)
运行结果:
zeros Tensor: tensor([[0., 0., 0.],
[0., 0., 0.]])
ones Tensor: tensor([[1., 1., 1.],
[1., 1., 1.]])
full Tensor: tensor([[10, 10, 10],
[10, 10, 10]])
1.2.1.3 指定区间创建
import torch
arange_Tensor = torch.arange(1, 6, 2) # 创建以步长step=2均匀分隔数值区间[start=1, end=6)的一维Tensor
linspace_Tensor = torch.linspace(1, 5, 5) # 创建以元素个数num=5均匀分隔数值区间[start=1, stop=5]的Tensor
print('arange Tensor: ', arange_Tensor)
print('linspace Tensor: ', linspace_Tensor)
运行结果:
arange Tensor: tensor([1, 3, 5])
linspace Tensor: tensor([1., 2., 3., 4., 5.])
1.2.2 张量的属性
1.2.2.1 张量的形状
张量具有如下形状属性:
<1>tensor.ndim:张量的维度,例如向量的维度为1,矩阵的维度为2。
<2>tensor.shape:张量每个维度上元素的数量。
<3>tensor.shape[n]:张量第n维的大小。第n维也称为轴(axis)。
<4>tensor.size:返回的是当前张量的形状,返回值是元组tuple的一个子类。
import torch
ndim_4_Tensor = torch.ones([2, 3, 4, 5])
print("Number of dimensions:", ndim_4_Tensor.ndim)
print("Shape of Tensor:", ndim_4_Tensor.shape)
print("Elements number along axis 0 of Tensor:", ndim_4_Tensor.shape[0])
print("Elements number along the last axis of Tensor:", ndim_4_Tensor.shape[-1])
print('Number of elements in Tensor: ', ndim_4_Tensor.size) # 表示数据真实大小
print('Number of elements in Tensor: ', ndim_4_Tensor.numel()) # 表示元素个数
Number of dimensions: 4
Shape of Tensor: torch.Size([2, 3, 4, 5])
Elements number along axis 0 of Tensor: 2
Elements number along the last axis of Tensor: 5
Number of elements in Tensor: <built-in method size of Tensor object at 0x0000018DD8987AE0>
Number of elements in Tensor: 120
1.2.2.2 形状的改变
import torch
ndim_3_Tensor = torch.tensor([[[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]],
[[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]],
[[21, 22, 23, 24, 25],
[26, 27, 28, 29, 30]]]) # 定义一个shape为[3,2,5]的三维Tensor
print("the shape of ndim_3_Tensor:", ndim_3_Tensor.shape)
reshape_Tensor = torch.reshape(ndim_3_Tensor, [2, 5, 3]) # paddle.reshape 可以保持在输入数据不变的情况下,改变数据形状。
print("After reshape:", reshape_Tensor)
运行结果:
the shape of ndim_3_Tensor: torch.Size([3, 2, 5])
After reshape: tensor([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]],
[[16, 17, 18],
[19, 20, 21],
[22, 23, 24],
[25, 26, 27],
[28, 29, 30]]])
1.2.2.3 张量的数据类型
print("Tensor dtype from Python integers:", torch.tensor(1).dtype)
print("Tensor dtype from Python floating point:", torch.tensor(1.0).dtype)
运行结果:
Tensor dtype from Python integers: torch.int64
Tensor dtype from Python floating point: torch.float32
1.2.2.4 张量的设备位置
import torch
ndim_4_Tensor = torch.tensor([2, 3, 4, 5])
print(ndim_4_Tensor.device
运行结果:
cpu
1.2.3 张量与Numpy数组转换
import torch
ndim_1_Tensor = torch.tensor([1., 2.])
print('Tensor to convert: ', ndim_1_Tensor.numpy()) # 将当前 Tensor 转化为 numpy.ndarray
运行结果:
Tensor to convert: [1. 2.]
1.2.4 张量的访问
1.2.4.1 索引和切片
import torch
data = torch.Tensor([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]])
print(data[1]) #索引
print(data[0:1]) #切片
运行结果:
tensor([4., 5., 6.])
tensor([[1., 2., 3.]])
1.2.4.2 访问张量
print(data[0:2, 1:3])
运行结果:
tensor([[2., 3.],
[5., 6.]])
1.2.4.3 修改张量
data[1,0]=7
print(data)
运行结果:
tensor([[1., 2., 3.],
[7., 5., 6.]])
1.2.5 张量的运算
1.2.5.1 数学运算
import torch
x = torch.tensor([[1., 2.], [3., 4.]])
y = torch.tensor([[5., 6.], [7., 8.]])
print("加法运算:",x + y)
print("减法运算:",x - y)
print("乘法运算:",x * y)
print("除法运算:",x / y)
print("幂运算:",x**y)
print("对数运算:",torch.log(x))
print("开方运算:",torch.sqrt(y)
运行结果:
加法运算: tensor([[ 6., 8.],
[10., 12.]])
减法运算: tensor([[-4., -4.],
[-4., -4.]])
乘法运算: tensor([[ 5., 12.],
[21., 32.]])
除法运算: tensor([[0.2000, 0.3333],
[0.4286, 0.5000]])
幂运算: tensor([[1.0000e+00, 6.4000e+01],
[2.1870e+03, 6.5536e+04]])
对数运算: tensor([[0.0000, 0.6931],
[1.0986, 1.3863]])
开方运算: tensor([[2.2361, 2.4495],
[2.6458, 2.8284]])
1.2.5.2 逻辑运算
import torch
x= torch.tensor([[True, True], [True, True]])
y= torch.tensor([[False, False], [False, False]])
print(x & y) # 与运算
print(x | y) # 或运算
print(~x) # 取反
print(x ^ y) # 异或运算
print(torch.eq(x, y)) # 判断每个分量是否相等
print(torch.equal(x, y)) # 判断整体是否相等
运行结果:
tensor([[False, False],
[False, False]])
tensor([[True, True],
[True, True]])
tensor([[False, False],
[False, False]])
tensor([[True, True],
[True, True]])
tensor([[False, False],
[False, False]])
False
Process finished with exit code -1073741749 (0xC000004B)
1.2.5.3 矩阵运算
import torch
x = torch.arange(12, dtype=torch.float32).reshape((4, 3))
y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
print("矩阵相乘:", x.matmul(y) )
运行结果:
矩阵相乘: tensor([[ 9., 8., 7., 6.],
[30., 26., 34., 30.],
[51., 44., 61., 54.],
[72., 62., 88., 78.]])
1.2.5.4 广播机制
import torch
x = torch.ones((2, 3, 4))
y = torch.ones((2, 3, 4)) # 当两个Tensor的形状一致时,可以广播
z = x + y
print('broadcasting with two same shape tensor: ', z.shape)
x = torch.ones((2, 3, 1, 5))
y = torch.ones((3, 4, 1))
# 从后往前依次比较:
# 第一次:y的维度大小是1
# 第二次:x的维度大小是1
# 第三次:x和y的维度大小相等,都为3
# 第四次:y的维度不存在
# 所以x和y是可以广播的
z = x + y
print('broadcasting with two different shape tensor:', z.shape)
运行结果:
broadcasting with two same shape tensor: torch.Size([2, 3, 4])
broadcasting with two different shape tensor: torch.Size([2, 3, 4, 5])
三. 使用pytorch实现数据预处理
1. 读取数据集 house_tiny.csv、boston_house_prices.csv、Iris.csv
import pandas as pd
house_tiny_path = './house_tiny.csv'
boston_house_prices_path = './boston_house_prices.csv'
Iris_path = './Iris.csv'
house_tiny_data = pd.read_csv(house_tiny_path)
boston_house_prices_data = pd.read_csv(boston_house_prices_path)
Iris_data = pd.read_csv(Iris_path)
print(house_tiny_data)
print(boston_house_prices_data)
print(boston_house_prices_data)
运行结果:
NumRooms Alley Price
0 NaN Pave 127500
1 2.0 NaN 106000
2 4.0 NaN 178100
3 NaN NaN 140000
CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO LSTAT MEDV
0 0.00632 18.0 2.31 0 0.538 ... 1 296 15.3 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 ... 2 242 17.8 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 ... 2 242 17.8 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 ... 3 222 18.7 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 ... 3 222 18.7 5.33 36.2
.. ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 ... 1 273 21.0 9.67 22.4
502 0.04527 0.0 11.93 0 0.573 ... 1 273 21.0 9.08 20.6
503 0.06076 0.0 11.93 0 0.573 ... 1 273 21.0 5.64 23.9
504 0.10959 0.0 11.93 0 0.573 ... 1 273 21.0 6.48 22.0
505 0.04741 0.0 11.93 0 0.573 ... 1 273 21.0 7.88 11.9
[506 rows x 13 columns]
CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO LSTAT MEDV
0 0.00632 18.0 2.31 0 0.538 ... 1 296 15.3 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 ... 2 242 17.8 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 ... 2 242 17.8 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 ... 3 222 18.7 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 ... 3 222 18.7 5.33 36.2
.. ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 ... 1 273 21.0 9.67 22.4
502 0.04527 0.0 11.93 0 0.573 ... 1 273 21.0 9.08 20.6
503 0.06076 0.0 11.93 0 0.573 ... 1 273 21.0 5.64 23.9
504 0.10959 0.0 11.93 0 0.573 ... 1 273 21.0 6.48 22.0
505 0.04741 0.0 11.93 0 0.573 ... 1 273 21.0 7.88 11.9
[506 rows x 13 columns]
Process finished with exit code -1073741749 (0xC000004B)
2. 处理缺失值
import pandas as pd
house_tiny_path = './house_tiny.csv'
boston_house_prices_path = './boston_house_prices.csv'
Iris_path = './Iris.csv'
house_tiny_data = pd.read_csv(house_tiny_path)
boston_house_prices_data = pd.read_csv(boston_house_prices_path)
Iris_data = pd.read_csv(Iris_path)
X = house_tiny_data.iloc[:, 0:2]
print(X)
y = house_tiny_data.iloc[:, 2]
print(y)
X = X.fillna(X.mean())
print(X)
运行结果:
NumRooms Alley
0 NaN Pave
1 2.0 NaN
2 4.0 NaN
3 NaN NaN
0 127500
1 106000
2 178100
3 140000
Name: Price, dtype: int64
NumRooms Alley
0 3.0 Pave
1 2.0 NaN
2 4.0 NaN
3 3.0 NaN
3. 转换为张量格式
import pandas as pd
import torch
import numpy as np
house_tiny_path = './house_tiny.csv'
boston_house_prices_path = './boston_house_prices.csv'
Iris_path = './Iris.csv'
house_tiny_data = pd.read_csv(house_tiny_path)
boston_house_prices_data = pd.read_csv(boston_house_prices_path)
Iris_data = pd.read_csv(Iris_path)
X = house_tiny_data.iloc[:, 0:2]
y = house_tiny_data.iloc[:, 2]
X = pd.get_dummies(X, dummy_na=True)
X_tensor, y_tensor = torch.Tensor(X.values), torch.Tensor(y.values)
print(X_tensor, y_tensor)
运行结果:
tensor([[nan, 1., 0.],
[2., 0., 1.],
[4., 0., 1.],
[nan, 0., 1.]]) tensor([127500., 106000., 178100., 140000.])
总结
本次实验旨在了解张量、算子的概念并学会灵活使用pytorch实现张量运算以及使用pytorch实现数据预处理,重点在于理解掌握并运用Tensor。