python qcut_python – 将qcut分配为新列

在这里的熊猫笔记本上

我看到qcut的结果被指定为DataFrame的新列. Dataframe有两列,但以某种方式将qcut输出分配给一个新列神奇地找到了“var”变量所在的正确级别 – 未检查另一个变量.这里的熊猫语义是什么?示例输出如下:

In [2]:

from pandas import *

from statsmodels.formula.api import logit

from statsmodels.nonparametric import KDE

from patsy import dmatrix, dmatrices

In [3]:

df = read_csv('wells.dat', sep = ' ', header = 0, index_col = 0)

print df.head()

switch arsenic dist assoc educ

1 1 2.36 16.826000 0 0

2 1 0.71 47.321999 0 0

3 0 2.07 20.966999 0 10

4 1 1.15 21.486000 0 12

5 1 1.10 40.874001 1 14

In [4]:

model_form = ('switch ~ center(I(dist / 100.)) + center(arsenic) + ' +

'center(I(educ / 4.)) + ' +

'center(I(dist / 100.)) : center(arsenic) + ' +

'center(I(dist / 100.)) : center(I(educ / 4.)) + ' +

'center(arsenic) : center(I(educ / 4.))'

)

model4 = logit(model_form, df = df).fit()

In [20]:

resid_df = DataFrame({'var': df['arsenic'], 'resid': model4.resid})

resid_df[:10]

Out [20]:

resid var

1 0.842596 2.36

2 1.281417 0.71

3 -1.613751 2.07

4 0.996195 1.15

5 1.005102 1.10

6 0.592056 3.90

7 0.941372 2.97

8 0.640139 3.24

9 0.886626 3.28

10 1.130149 2.52

In [15]:

qcut(df['arsenic'], 40)

Out [15]:

Categorical: arsenic

array([(2.327, 2.47], (0.68, 0.71], (1.953, 2.07], ..., [0.51, 0.53],

(0.62, 0.64], (0.64, 0.68]], dtype=object)

Levels (40): Index([[0.51, 0.53], (0.53, 0.56], (0.56, 0.59],

(0.59, 0.62], (0.62, 0.64], (0.64, 0.68],

(0.68, 0.71], (0.71, 0.75], (0.75, 0.78],

(0.78, 0.82], (0.82, 0.86], (0.86, 0.9], (0.9, 0.95],

(0.95, 1.0065], (1.0065, 1.0513], (1.0513, 1.1],

(1.1, 1.15], (1.15, 1.2], (1.2, 1.25], (1.25, 1.3],

(1.3, 1.36], (1.36, 1.42], (1.42, 1.49],

(1.49, 1.57], (1.57, 1.66], (1.66, 1.76],

(1.76, 1.858], (1.858, 1.953], (1.953, 2.07],

(2.07, 2.2], (2.2, 2.327], (2.327, 2.47],

(2.47, 2.61], (2.61, 2.81], (2.81, 2.98],

(2.98, 3.21], (3.21, 3.42], (3.42, 3.791],

(3.791, 4.475], (4.475, 9.65]], dtype=object)

In [17]:

resid_df['bins'] = qcut(df['arsenic'], 40)

resid_df[:20]

Out [17]:

resid var bins

1 0.842596 2.36 (2.327, 2.47]

2 1.281417 0.71 (0.68, 0.71]

3 -1.613751 2.07 (1.953, 2.07]

4 0.996195 1.15 (1.1, 1.15]

5 1.005102 1.10 (1.0513, 1.1]

6 0.592056 3.90 (3.791, 4.475]

7 0.941372 2.97 (2.81, 2.98]

8 0.640139 3.24 (3.21, 3.42]

找到“var”的正确bin,分配没有注意“resid”.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值