深度学习与神经网络学习笔记(一) ---数据操作与预处理

最新推荐文章于 2023-04-13 10:53:04 发布

AC_maker

最新推荐文章于 2023-04-13 10:53:04 发布

阅读量403

点赞数 1

分类专栏：深度学习与神经网络文章标签：深度学习神经网络学习 python

本文链接：https://blog.csdn.net/GodWeiJia/article/details/126173571

版权

深度学习与神经网络专栏收录该内容

4 篇文章 0 订阅

订阅专栏

本文详细介绍了使用PyTorch进行数据操作，如张量生成、访问、矩阵运算，以及数据预处理流程，包括CSV数据创建、读取、缺失值处理、离散值编码和张量转换。重点展示了如何处理和转换数据以便于深度学习模型训练。

摘要由CSDN通过智能技术生成

（1）数据操作

1.生成张量

import torch

x0 = torch.arange(12)    # 0~11
print(x0)
print(x0.shape)  # 形状
print(x0.numel())   # 数组个数
x1 = x0.reshape(3, 4)  # 改变形状
print(x1)

输出

tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

torch.Size([12])

12

tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

2.访问

print(x1[-1])  # 倒数第一行
print(x1[1: 3])  # x1[1]和x1[2] 左闭右开
x1[1, 2] = 9  # 单元素赋值 x1[1][2] = 9
x1[0:2, :] = 12  # 区域赋值

输出

tensor([ 8, 9, 10, 11])

tensor([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])

tensor([[ 0, 1, 2, 3],
[ 4, 5, 9, 7],
[ 8, 9, 10, 11]])

tensor([[12, 12, 12, 12],
[12, 12, 12, 12],
[ 8, 9, 10, 11]])

3.生成矩阵

a0 = torch.zeros(3, 3, 4)  # 生成3个3*4元素值都是0的矩阵
a1 = torch.ones(2, 3, 4)  # 生成2个3*4元素值都是1的矩阵
print(a0)
print(a1)

输出

tensor([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],

[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],

[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])

tensor([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]],

[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]]])

4.列表运算

b = torch.tensor([1, 2, 3])  # 生成特定数值的列表
c = torch.tensor([2, 3, 4])
print(b+c)  # 按元素做加法
print(b*c)  # 按元素做乘法
print(b == c)  # 逻辑运算符构建二元张量
print(b.sum())  # 对张量的所有元素求和会产生一个只有一个元素的张量

输出

tensor([3, 5, 7])

tensor([ 2, 6, 12])

tensor([False, False, False])

tensor(6)

5.张量连接

X = torch.arange(12, dtype=torch.float32).reshape(3, 4)
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [2, 2, 2, 3]])
X1 = torch.cat((X, Y), dim=0)  # 把多个张量连结在一起 dim=0时 按行堆叠
X2 = torch.cat((X, Y), dim=1)  # 把多个张量连结在一起 dim=1时 按列连接
print(X1)
print(X2)

输出

tensor([[ 0., 1., 2., 3.],
[ 4., 5., 6., 7.],
[ 8., 9., 10., 11.],
[ 2., 1., 4., 3.],
[ 1., 2., 3., 4.],
[ 2., 2., 2., 3.]])

tensor([[ 0., 1., 2., 3., 2., 1., 4., 3.],
[ 4., 5., 6., 7., 1., 2., 3., 4.],
[ 8., 9., 10., 11., 2., 2., 2., 3.]])

6.torch转numpy

A = X.numpy()   # 转化为numpy张量
B = torch.tensor(A)
print(type(A))
print(type(B))

输出

<class 'numpy.ndarray'>

<class 'torch.Tensor'>

7.张量转标量

a = torch.tensor([3.5])  # 将大小为1的张量转换为python标量
print(a)
print(a.item())
print(float(a))
print(int(a))

输出

tensor([3.5000])

3.5

3.5

3

（2）数据预处理

1.创建人工数据集.csv

import os

os.makedirs(os.path.join('..', 'data'), exist_ok=True)  # 创建一个人工数据集，并存储在csv(逗号分隔值)文件
data_file = os.path.join('..', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')
    f.write('NA,Pave,127500\n')
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')

2.读取.csv原始数据集

import pandas

data = pd.read_csv(data_file)  # 从创建的csv文件中加载原始数据集
print(data)

输出

NumRooms Alley Price
0 NaN Pave 127500
1 2.0 NaN 106000
2 4.0 NaN 178100
3 NaN NaN 140000

3.处理缺失数据

inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]  # 为了处理缺失的数据，典型的方法包括插值和删除，这里考虑插值
inputs = inputs.fillna(inputs.mean())  # 去有效值的平均值插入
print(inputs)

输出

NumRooms Alley
0 3.0 Pave
1 2.0 NaN
2 4.0 NaN
3 3.0 NaN

4.处理离散值

inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

输出

NumRooms Alley_Pave Alley_nan
0 3.0 1 0
1 2.0 0 1
2 4.0 0 1
3 3.0 0 1

5.转换为张量格式

X, Y = torch.tensor(inputs.values), torch.tensor(outputs.values)   # 转换为张量格式
print(X)
print(Y)

输出

tensor([[3., 1., 0.],
[2., 0., 1.],
[4., 0., 1.],
[3., 0., 1.]], dtype=torch.float64)

tensor([127500, 106000, 178100, 140000])

AC_maker

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录