pytorch笔记篇:pandas之数据预处理
pytorch笔记篇:pandas之数据预处理(更新中)
测试例代码
print(train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])
# (※1) 为什么test_data的列最后不是-1,是因为test_data没有价格这个列项
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))
print('-----------------------------------------------')
print(all_features.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])
# (※2) 获取到不是数值的列index]
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
# print('++++++++++++++++++++++++')
# (※3) print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# print('----------------------')
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
# print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# input()
# (※4) 在标准化数据之后,所有均值消失,因此我们可以将缺失值设置为0
all_features[numeric_features] = all_features[numeric_features].fillna(0)
# (※5) dummies & pd to tensor
print('++++++++++ demo test dummies +++++++++++')
test = pd.DataFrame({'“x”':[1,2,3,4,5, 6], "seasion":['here', 'over', '', 'next', '', 'here']})
print(test)
print('-------------------------------')
test = pd.get_dummies(test, dummy_na=True)
print(test)
test = test*1
print(test)
print('++++++++++ test trans to tensor +++++++++++')
# test1 = torch.tensor(test)
# 全部转化
test1 = torch.tensor(test.values, dtype=torch.float32)
print(test1.shape)
print(test1)
print('-------------------------------')
# 不用iloc的话就是光是行处理
test2 = torch.tensor(test[:3].values, dtype=torch.float32)
print(test2.shape)
print(test2)
print('-------------------------------')
# 特定行列转化需要熟练运动iloc
test3 = torch.tensor(test.iloc[:2, :-1].values, dtype=torch.float32)
print(test3.shape)
print(test3)
input()
output-begin:
(1460, 81)
(1459, 80)
Id MSSubClass MSZoning LotFrontage SaleType SaleCondition SalePrice
0 1 60 RL 65.0 WD Normal 208500
1 2 20 RL 80.0 WD Normal 181500
2 3 60 RL 68.0 WD Normal 223500
3 4 70 RL 60.0 WD Abnorml 140000
-----------------------------------------------
MSSubClass MSZoning LotFrontage LotArea YrSold SaleType SaleCondition
0 60 RL 65.0 8450 2008 WD Normal
1 20 RL 80.0 9600 2007 WD Normal
2 60 RL 68.0 11250 2008 WD Normal
3 70 RL 60.0 9550 2006 WD Abnorml
++++++++++ demo test dummies +++++++++++
“x” seasion
0 1 here
1 2 over
2 3
3 4 next
4 5
5 6 here
-------------------------------
“x” seasion_ seasion_here seasion_next seasion_over seasion_nan
0 1 False True False False False
1 2 False False False True False
2 3 True False False False False
3 4 False False True False False
4 5 True False False False False
5 6 False True False False False
“x” seasion_ seasion_here seasion_next seasion_over seasion_nan
0 1 0 1 0 0 0
1 2 0 0 0 1 0
2 3 1 0 0 0 0
3 4 0 0 1 0 0
4 5 1 0 0 0 0
5 6 0 1 0 0 0
++++++++++ test trans to tensor +++++++++++
torch.Size([6, 6])
tensor([[1., 0., 1., 0., 0., 0.],
[2., 0., 0., 0., 1., 0.],
[3., 1., 0., 0., 0., 0.],
[4., 0., 0., 1., 0., 0.],
[5., 1., 0., 0., 0., 0.],
[6., 0., 1., 0., 0., 0.]])
-------------------------------
torch.Size([3, 6])
tensor([[1., 0., 1., 0., 0., 0.],
[2., 0., 0., 0., 1., 0.],
[3., 1., 0., 0., 0., 0.]])
-------------------------------
torch.Size([2, 5])
tensor([[1., 0., 1., 0., 0.],
[2., 0., 0., 0., 1.]])
output-end
相关的算子
concat — 合并.
iloc — 筛选行列.
apply — 处理列数据.
fillna — 填补数值空缺.
get_dummies — 独热编码(自行测试显示)
无
PS: 略。