I have a pandas dataframe, df, like:
name | grade | grade_type
---------------------------
sarah | B | letter
alice | A | letter
eliza | C | letter
beth | 76 | numeral
jones | 90 | numeral
All values in df are strings, including the numbers. I want to convert the grade numeric values into letters, based on checking the grade_type column, to get:
name | grade | grade_type
---------------------------
sarah | B | letter
alice | A | letter
eliza | C | letter
beth | B | numeral
jones | A | numeral
For completeness, the numeral-to-letter grade conversions are:
A: grade > 80
B: 70 < grade <= 80
C: 60 < grade <= 70
Why doesn't this work?
for index, row in df.iterrows():
if row.grade_type == "numeral":
grade_val = int(row.grade.values[0])
if grade_val > 80:
row.grade = "A" # This assignment doesn't update row.grade!
elif...
The alternative is using df.apply(...lambda:...), but I'm not too sure how to pull that off, since we have to check the grade_type column before deciding whether or not to update the grade value.
解决方案
The reason that your DataFrame doesn't update is because rows returned from iterrows(): are copies. And you're working on that copy.
You can use the index returned from iterrows and manipulate DataFrame directly:
for index, row in df.iterrows():
grade_val = int(row.grade.values[0])
if grade_val > 80:
df.loc[index, 'grade'] = 'A'
...
Or as you said you can use df.apply(), and pass it a custom function:
def get_grades(x):
if x['grade_type'] == 'letter':
return(x['grade_val'])
if x['grade_val'] > 80:
return "A"
...
df['grade'] = df.apply(lambda x: get_grades(x), axis=1)
You can also use if else in your lambda to check if x['grade_type'] is numeric as follows, use the one that looks easier to read.
def get_grades(grade_val):
if grade_val > 80:
return "A"
...
df['grade'] = df.apply(lambda x: get_grades(x['grade'])
if x['grade_type'] == 'numeral' else x['grade'], axis=1)