excel多条件计数python,使用多个条件和多个数据框的python pandas countifs

Trying to create--in Python using multiple data frames--the equivalent of a countifs in Excel that would span multiple sheets.

I need a new column count of records on another data frame based on criteria from the current data frame.

See Excel impression of what I want to do in python, also here.

My goal?

Count exams on students data frame

by Student ID

with exam date >= enroll date

with exam date <= detail date

with exam grade >= 70

Basically the Excel equivalent would be...

=COUNTIFS(Summary!$B$1:$B$11, ">="&Detail!B2,

Summary!$B$1:$B$11, "<="&Detail!C2,

Summary!$C$1:$C$11, ">="&70,

Summary!$A$1:$A$11, "="&Detail!A2)

...where Summary is the primary data frame and Detail is the secondary data frame where I want to count records.

Found these answers in my research:

Not quite what I'm looking for, because they don't span multiple data frames. I was able to create a basic countifs for a singular data frame:

sum(1 for x in students['Student ID'] if x == 1)

sum(1 for x in exams['Exam Grade'] if x >= 70)

解决方案

Basically what you'll want to do is set up two dataframes, say df1 for the "exams passed" information and df2 for the marks on each exam.

To get yourself started, you can read in your excel files like this:

df1 = pd.read_excel('filename1.xlsx')

df2 = pd.read_excel('filename2.xlsx')

Then for each row in df1 you want to segment df2 and get the length of the segmented dataframe.

First though you might want to make list of information for each row in df1, which could be done like this:

student_info = df1[['Student ID', 'Enrollment Date', 'Qualification Date']].values

Then you can iterate through the rows like this:

N_exams_passed = [] # Store counts for each student in a list

for s_id, s_enroll, s_qual in student_info:

N_exams_passed.append(len(df2[(df2['Student ID']==s_id) &

(df2['Exam Date']>=s_enroll) &

(df2['Exam Date']<=s_qual) &

(df2['Grade']>=70)])

)

Then add/replace the column in df1:

df1['Exams Passed'] = N_exams_passed

In order to compare the dates properly you will need to convert them to datetime objects in each pandas dataframe, and I will leave this up to you. Hint: you can use the pd.to_datetime() function.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值