数据处理--2024年4月28日

stdio10t

已于 2024-05-13 09:50:23 修改

阅读量235

点赞数 11

分类专栏： pytorch入门文章标签： pytorch

于 2024-04-28 17:04:32 首次发布

本文链接：https://blog.csdn.net/weixin_45772086/article/details/138283761

版权

pytorch入门专栏收录该内容

13 篇文章 0 订阅

订阅专栏

数据处理--2024年4月28日

读取数据集

读取数据集

创建路径

import os
#os.path.join是用于创建路径的函数，主要功能是将括号里的路径拼接，解决格式（如/\的）的问题
#os.makedirs是用于创建目录，exist_ok=True意思是存在不报错
os.makedirs(os.path.join('..', 'data'), exist_ok=True)
data_file = os.path.join('..', 'data', 'house_tiny.csv')
#文件的打开
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')  # 列名
    f.write('NA,Pave,127500\n')  # 每行表示一个数据样本
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')

处理缺失

#index location
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs = inputs.fillna(inputs.mean())#处理NA
print(inputs)

#get_dummies作用是把列表内容映射为ONE HOT编码
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

转为张量

import torch
#torch.tensor创建张量，to_numpy把csv列表转为numpy
X = torch.tensor(inputs.to_numpy(dtype=float))
y = torch.tensor(outputs.to_numpy(dtype=float))
X, y

函数学习

os.makedirs
os.path.join
with open(data_file, 'w') as f:
	f.write
data.iloc[ ]
fillna
pd.get_dummies
torch.tensor
inputs.to_numpy

报错：AttributeError: module ‘numpy.random’ has no attribute ‘Generator’

错误原因：在安装pandas和numpy时直接使用的pip install，高版本的pandas需要搭配高版本的numpy，而python3.9不能安装指定版本的numpy，所以需要回退pandas和numpy的版本。所以，以后在安装包的时候一定要先查询python对应的包的版本

stdio10t

关注

11
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录