python输出特征相关矩阵_Python＃2中的分组特征矩阵

最新推荐文章于 2022-03-27 16:42:38 发布

Mr.棱恩

最新推荐文章于 2022-03-27 16:42:38 发布

阅读量144

点赞数

文章标签： python输出特征相关矩阵

本文链接：https://blog.csdn.net/weixin_36330238/article/details/111905403

版权

It's not too different from before. We can start with the sample data:

DataFrame1:

Name No. Comment

Bob 2123320 Doesn't Matter

Joe 2832883 Whatever

John 2139300 Irrelevant

Bob 2123320 Something

John 2234903 Regardless

DataFrame2:

Name No. Report

Bob 2123320 Great

Joe 2832883 Solid

John 2139300 Awesome

Bob 2123320 Good

John 2234903 Perfect

I am looking for a way to make a new excel file that looks like this (Expected Outcome):

-----------------------2139300--------------------- 2234903----

Name Irrelevant Whatever Regardless Awesome Solid Perfect Irrelevant \

John 1 0 0 1 0 0 0

--------------------2234903-------------

Name Whatever Regardless Awesome Solid Perfect

John 0 1 0 0 1

(Note: It doesn't need to have the head-titles of the No., I just did that for clarity and later explanation).

Basically what I have done is, very similar to the other, looks for each name, and then for each name it looks to see how many distinct No.'s it has. It then selects for people who have a certain amount of distinct No.'s. Now, I have a set of "Comments" and "Reports" I wish to look for

({Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect} respectively [note: this is only a subset of Comments/Reports]) and for these I want to have a 1 or 0 if it appears but only for each No. Put another way, I want for each No. to have a "group" of columns titled {Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect} and for each value I want a 1 if it appeared for the person for that Specific No. and a 0 if it didn't.

In this matrix, for example, we only see John because he is the only one with more than 1 distinct No. In the first group of columns only Irrelevant and Awesome have values of 1 whereas the rest have 0 and in the second group only Regardless and Perfect will have 1s. What it did was it listed all of my desired Comments/Reports ({Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect}) for only one No. and then found out if each appeared or not (1 or 0). It then repeated all the desired Comments/Reports in a new "group" of columns for a new No. and for this new No. found out which Comments/Reports now appeared.

Let me know if anything is unclear and I truly do appreciate your help.

Thank you.

解决方案

Try:

df_out = df_out[df_out.groupby(['Name'])['No.'].transform(lambda x: x.nunique() > 1)]\

.set_index(['Name','No.'])['Comment'].str.get_dummies()\

.reindex(df_out.Comment, fill_value=0, axis=1)\

.sum(level=[0,1])\

.unstack()\

.swaplevel(0,1,axis=1)\

.sort_index(1)

print(df_out)

Output:

No. 2139300 \

Comment Awesome Doesn't Matter Good Great Irrelevant Perfect Regardless Solid

Name

John 1 0 0 0 1 0 0 0

No. 2234903 \

Comment Something Whatever Awesome Doesn't Matter Good Great Irrelevant

Name

John 0 0 0 0 0 0 0

No.

Comment Perfect Regardless Solid Something Whatever

Name

John 1 1 0 0 0

Mr.棱恩

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python输出特征相关矩阵_Python＃2中的分组特征矩阵

It's not too different from before. We can start with the sample data:DataFrame1:Name No. CommentBob 2123320 Doesn't MatterJoe 2832883 WhateverJohn 2139300 ...
复制链接

扫一扫