interpolate函数方法的使用
函数介绍
interpolate是一个插值函数,用插值方法填充 NaN 值
Series.interpolate(method=‘linear’, axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, downcast=None)
参数
- method : str,默认为‘linear’
可选方法:
‘linear’ - 忽略索引,并将值等距地对待
‘pad’ - 使用现有值填写NaN
‘index’, ‘values’ - 使用索引的实际数值
‘time’ - 处理每日和更高分辨率的数据
… - axis : {0或’index’,1或’columns’,None},默认为None
沿轴进行interpolate - limit : 整数,可选
要填充的连续NaN的最大数量,必须大于0 - inplace : bool,默认为False
更新数据 - limit_direction : {‘forward’,‘backward’,‘both’},默认为’forward’
如果指定了限制,则将沿该方向填充连续的NaN
使用示例
import pandas as pd
data = {"grammer":["Python", "C", "Java", "Go", np.NaN, "SQL", "PHP", "Python"],
"popularity":[1.0, 2.0, np.NaN, 4.0, 5.0, 6.0, np.NaN, 10.0]}
df = pd.DataFrame(data)
df
grammer | popularity | |
---|---|---|
0 | Python | 1.0 |
1 | C | 2.0 |
2 | Java | NaN |
3 | GO | 4.0 |
4 | NaN | 5.0 |
5 | SQL | 6.0 |
6 | PHP | NaN |
7 | Python | 10.0 |
1.线性等距插值,即popularity列中空值用上下值的平均值填充
df['popularity'] = df['popularity'].fillna(df['popularity'].interpolate())
df
grammer | popularity | |
---|---|---|
0 | Python | 1.0 |
1 | C | 2.0 |
2 | Java | 3.0 |
3 | Go | 4.0 |
4 | NaN | 5.0 |
5 | SQL | 6.0 |
6 | PHP | 8.0 |
7 | Python | 10.0 |
2.连续插值,即选用前一个值来填充
df['popularity'] = df['popularity'].fillna(df['popularity'].interpolate(method="pad"))
df
grammer | popularity | |
---|---|---|
0 | Python | 1.0 |
1 | C | 2.0 |
2 | Java | 2.0 |
3 | Go | 4.0 |
4 | NaN | 5.0 |
5 | SQL | 6.0 |
6 | PHP | 6.0 |
7 | Python | 10.0 |