pytorch笔记篇:pandas之数据预处理(更新中)

pytorch笔记篇:pandas之数据预处理(更新中)

测试例代码

print(train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])
# (1) 为什么test_data的列最后不是-1,是因为test_data没有价格这个列项
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))
print('-----------------------------------------------')
print(all_features.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]])

# (2) 获取到不是数值的列index]
numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index

# print('++++++++++++++++++++++++')
# (3) print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# print('----------------------')
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
# print(all_features[numeric_features].iloc[0:3, [0,1,2,3,-3,-2,-1]])
# input()

# (4) 在标准化数据之后,所有均值消失,因此我们可以将缺失值设置为0
all_features[numeric_features] = all_features[numeric_features].fillna(0)

# (5) dummies & pd to tensor
print('++++++++++  demo test dummies  +++++++++++')
test = pd.DataFrame({'“x”':[1,2,3,4,5, 6], "seasion":['here', 'over', '', 'next', '', 'here']})
print(test)
print('-------------------------------')
test = pd.get_dummies(test, dummy_na=True)
print(test)
test = test*1
print(test)
print('++++++++++  test trans to tensor  +++++++++++')
# test1 = torch.tensor(test)
# 全部转化
test1 = torch.tensor(test.values, dtype=torch.float32)
print(test1.shape)
print(test1)
print('-------------------------------')
# 不用iloc的话就是光是行处理
test2 = torch.tensor(test[:3].values, dtype=torch.float32)
print(test2.shape)
print(test2)
print('-------------------------------')
# 特定行列转化需要熟练运动iloc
test3 = torch.tensor(test.iloc[:2, :-1].values, dtype=torch.float32)
print(test3.shape)
print(test3)
input()

output-begin:
(1460, 81)
(1459, 80)
   Id  MSSubClass MSZoning  LotFrontage SaleType SaleCondition  SalePrice
0   1          60       RL         65.0       WD        Normal     208500
1   2          20       RL         80.0       WD        Normal     181500
2   3          60       RL         68.0       WD        Normal     223500
3   4          70       RL         60.0       WD       Abnorml     140000
-----------------------------------------------
   MSSubClass MSZoning  LotFrontage  LotArea  YrSold SaleType SaleCondition
0          60       RL         65.0     8450    2008       WD        Normal
1          20       RL         80.0     9600    2007       WD        Normal
2          60       RL         68.0    11250    2008       WD        Normal
3          70       RL         60.0     9550    2006       WD       Abnorml
++++++++++  demo test dummies  +++++++++++
   “x” seasion
0    1    here
1    2    over
2    3        
3    4    next
4    5        
5    6    here
-------------------------------
   “x”  seasion_  seasion_here  seasion_next  seasion_over  seasion_nan
0    1     False          True         False         False        False
1    2     False         False         False          True        False
2    3      True         False         False         False        False
3    4     False         False          True         False        False
4    5      True         False         False         False        False
5    6     False          True         False         False        False
   “x”  seasion_  seasion_here  seasion_next  seasion_over  seasion_nan
0    1         0             1             0             0            0
1    2         0             0             0             1            0
2    3         1             0             0             0            0
3    4         0             0             1             0            0
4    5         1             0             0             0            0
5    6         0             1             0             0            0
++++++++++  test trans to tensor  +++++++++++
torch.Size([6, 6])
tensor([[1., 0., 1., 0., 0., 0.],
        [2., 0., 0., 0., 1., 0.],
        [3., 1., 0., 0., 0., 0.],
        [4., 0., 0., 1., 0., 0.],
        [5., 1., 0., 0., 0., 0.],
        [6., 0., 1., 0., 0., 0.]])
-------------------------------
torch.Size([3, 6])
tensor([[1., 0., 1., 0., 0., 0.],
        [2., 0., 0., 0., 1., 0.],
        [3., 1., 0., 0., 0., 0.]])
-------------------------------
torch.Size([2, 5])
tensor([[1., 0., 1., 0., 0.],
        [2., 0., 0., 0., 1.]])
output-end

相关的算子

concat — 合并.
iloc — 筛选行列.
apply — 处理列数据.
fillna — 填补数值空缺.
get_dummies — 独热编码(自行测试显示)

PS: 略。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值