Let's say I have the following pandas DataFrame:
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13], ['Bob', '#'], ['Bob', '#'], ['Bob', '#']]
df = pd.DataFrame(data,columns=['Name','Age'], dtype=float)
print(df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
3 Bob #
4 Bob #
5 Bob #
So, there are odd rows in the DataFrame for Bob, namely rows 3, 4, and 5. These values are consistently #, not 12. Row 1 shows that Bob should be 12, not #.
In this example, it's straightforward to fix this with replace():
df = df.replace("#", 12)
print(df)
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
3 Bob 12
4 Bob 12
5 Bob 12
However, this wouldn't work for larger dataframes, e.g.
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
3 Bob #
4 Bob #
5 Bob #
6 Clarke #
whereby row 6 should be 6 Clarke 13.
How does one replace any row in Age with # with the correct integer as given in other rows, based on Name? If # exists, check other rows with the same Name value and replace #.
解决方案
try this,
d= df[df['Age']!='#'].set_index('Name')['Age']
df['Age']=df['Name'].replace(d)
O/P:
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
3 Bob 12
4 Bob 12
5 Bob 12
6 Clarke 13