pandas 中 .stack() 的使用

有时候需要将特征名称转化为变量,也就是将数据集由横向改为纵向,或者为转秩。使用场景如下:

# 数据集
In [5]: test
Out[5]: 
             tweet_id doggo floofer  pupper puppo
0  675003128568291329  None    None    None  None
1  786233965241827333  None    None    None  None
2  683481228088049664  None    None  pupper  None
3  675497103322386432  None    None    None  None

# 先设置index,再使用.stack()方法由横向变纵向,对特征进行命名
In [6]: s1 = test.set_index('tweet_id').stack().rename('stage')

In [7]: s1
Out[7]: 
tweet_id                   
675003128568291329  doggo        None
                    floofer      None
                    pupper       None
                    puppo        None
786233965241827333  doggo        None
                    floofer      None
                    pupper       None
                    puppo        None
683481228088049664  doggo        None
                    floofer      None
                    pupper     pupper
                    puppo        None
675497103322386432  doggo        None
                    floofer      None
                    pupper       None
                    puppo        None
Name: stage, dtype: object

# 将多重索引reset
In [8]: s2 = s1.reset_index()

In [9]: s2
Out[9]: 
              tweet_id  level_1   stage
0   675003128568291329    doggo    None
1   675003128568291329  floofer    None
2   675003128568291329   pupper    None
3   675003128568291329    puppo    None
4   786233965241827333    doggo    None
5   786233965241827333  floofer    None
6   786233965241827333   pupper    None
7   786233965241827333    puppo    None
8   683481228088049664    doggo    None
9   683481228088049664  floofer    None
10  683481228088049664   pupper  pupper
11  683481228088049664    puppo    None
12  675497103322386432    doggo    None
13  675497103322386432  floofer    None
14  675497103322386432   pupper    None
15  675497103322386432    puppo    None

# 将level_1列删除,同时stage列只保留不为none的数据
In [10]: s2.drop(['level_1'], axis=1, inplace=True)

In [11]: s3 = s2[s2.stage != 'None']

In [12]: s3
Out[12]: 
              tweet_id   stage
10  683481228088049664  pupper

# 跟原始数据集进行合并
In [14]: result = pd.merge(test, s3, how='left', on='tweet_id')

In [15]: result
Out[15]: 
             tweet_id doggo floofer  pupper puppo   stage
0  675003128568291329  None    None    None  None     NaN
1  786233965241827333  None    None    None  None     NaN
2  683481228088049664  None    None  pupper  None  pupper
3  675497103322386432  None    None    None  None     NaN

# 删除中间特征,得到最终结果
In [16]: result.drop(['doggo','floofer','pupper','puppo'], axis=1)
Out[16]: 
             tweet_id   stage
0  675003128568291329     NaN
1  786233965241827333     NaN
2  683481228088049664  pupper
3  675497103322386432     NaN

In [17]: test
Out[17]: 
             tweet_id doggo floofer  pupper puppo
0  675003128568291329  None    None    None  None
1  786233965241827333  None    None    None  None
2  683481228088049664  None    None  pupper  None
3  675497103322386432  None    None    None  None

应该有更为简便易行的方法。后续补充。

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值