使用cut分箱操作,创建二值响应变量

import pandas as pd
d=pd.read_csv('D:/pandas活用/pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)
print(d.head())
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
       'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
       'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
       'HeatingFuel', 'Insurance', 'Language'],
      dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  Acres  FamilyIncome   FamilyType  NumBedrooms  NumChildren  NumPeople  \
0  1-10           150      Married            4            1          3   
1  1-10           180  Female Head            3            2          4   
2  1-10           280  Female Head            4            0          2   
3  1-10           330  Female Head            2            1          2   
4  1-10           330    Male Head            3            1          2   

   NumRooms         NumUnits  NumVehicles  NumWorkers   OwnRent    YearBuilt  \
0         9  Single detached            1           0  Mortgage    1950-1959   
1         6  Single detached            2           0    Rented  Before 1939   
2         8  Single detached            3           1  Mortgage    2000-2004   
3         4  Single detached            1           0    Rented    1950-1959   
4         5  Single attached            1           0  Mortgage  Before 1939   

   HouseCosts  ElectricBill FoodStamp HeatingFuel  Insurance        Language  
0        1800            90        No         Gas       2500         English  
1         850            90        No         Oil          0         English  
2        2600           260        No         Oil       6600  Other European  
3        1800           140        No         Oil          0         English  
4         860           150        No         Gas        660         Spanish  

以下对FamilyIncome 进行分箱操作:

#其中指定要进行分箱操作的列,指定收入在范围为0-150000的为0,150000到收入的最大值范围之间的为1,标签labels使用列表传入值,也可以指定字符串作为标签
d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])
print(d.info())
print(d['income_15w'].value_counts())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22745 entries, 0 to 22744
Data columns (total 19 columns):
 #   Column        Non-Null Count  Dtype   
---  ------        --------------  -----   
 0   Acres         22745 non-null  object  
 1   FamilyIncome  22745 non-null  int64   
 2   FamilyType    22745 non-null  object  
 3   NumBedrooms   22745 non-null  int64   
 4   NumChildren   22745 non-null  int64   
 5   NumPeople     22745 non-null  int64   
 6   NumRooms      22745 non-null  int64   
 7   NumUnits      22745 non-null  object  
 8   NumVehicles   22745 non-null  int64   
 9   NumWorkers    22745 non-null  int64   
 10  OwnRent       22745 non-null  object  
 11  YearBuilt     22745 non-null  object  
 12  HouseCosts    22745 non-null  int64   
 13  ElectricBill  22745 non-null  int64   
 14  FoodStamp     22745 non-null  object  
 15  HeatingFuel   22745 non-null  object  
 16  Insurance     22745 non-null  int64   
 17  Language      22745 non-null  object  
 18  income_15w    22745 non-null  category
dtypes: category(1), int64(10), object(8)
memory usage: 3.1+ MB
None
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
0    18294
1     4451
Name: income_15w, dtype: int64
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

爱打羽毛球的小怪兽

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值