python corrwith_pandas.DataFrame corrwith()方法

这篇博客探讨了在使用pandas的DataFrame `.corrwith()` 方法时,传入Series和DataFrame作为参数时的不同行为。通过示例,解释了为何在某些情况下,结果会返回全为NaN的表格,并提供了正确计算相关性的方法。
摘要由CSDN通过智能技术生成

I recently start working with pandas. Can anyone explain me difference in behaviour of function .corrwith() with Series and DataFrame?

Suppose i have one DataFrame:

frame = pd.DataFrame(data={'a':[1,2,3], 'b':[-1,-2,-3], 'c':[10, -10, 10]})

And i want calculate correlation between features 'a' and all other features.

I can do it in the following way:

frame.drop(labels='a', axis=1).corrwith(frame['a'])

And result will be:

b -1.0

c 0.0

But very similar code:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

Generate absolutely different and unacceptable table:

a NaN

b NaN

c NaN

So, my question is: why in case of DataFrame as second argument we get such strange output?

解决方案

What I think you're looking for:

Let's say your frame is:

frame = pd.DataFrame(np.random.rand(10, 6), columns=['cost', 'amount', 'day', 'month', 'is_sale', 'hour'])

You want the 'cost' and 'amount' columns to be correlated with all other columns in every combination.

focus_cols = ['cost', 'amount']

frame.corr().filter(focus_cols).drop(focus_cols)

Answering what you asked:

Compute pairwise

correlation between rows or columns of two DataFrame objects.

Parameters:

other : DataFrame

axis : {0 or ‘index’, 1 or ‘columns’},

default 0 0 or ‘index’ to compute column-wise, 1 or ‘columns’ for row-wise drop : boolean, default False Drop missing indices from

result, default returns union of all Returns: correls : Series

corrwith is behaving similarly to add, sub, mul, div in that it expects to find a DataFrame or a Series being passed in other despite the documentation saying just DataFrame.

When other is a Series it broadcast that series and matches along the axis specified by axis, default is 0. This is why the following worked:

frame.drop(labels='a', axis=1).corrwith(frame.a)

b -1.0

c 0.0

dtype: float64

When other is a DataFrame it will match the axis specified by axis and correlate each pair identified by the other axis. If we did:

frame.drop('a', axis=1).corrwith(frame.drop('b', axis=1))

a NaN

b NaN

c 1.0

dtype: float64

Only c was in common and only c had its correlation calculated.

In the case you specified:

frame.drop(labels='a', axis=1).corrwith(frame[['a']])

frame[['a']] is a DataFrame because of the [['a']] and now plays by the DataFrame rules in which its columns must match up with what its being correlated with. But you explicitly drop a from the first frame then correlate with a DataFrame with nothing but a. The result is NaN for every column.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值