文章目录
引言
在进行数据分析与建模的过程中,大量的时间都花在数据的准备上:加载、清洗、转换与重新排列。
8.1处理缺失值
pandas中使用np.nan来表示缺失值,python内建None值在对象数组中也被当成缺失值。
处理缺失值的处理方法:
8.1过滤缺失值
可以使用isnull()方法结合布尔值索引来手动过滤缺失值,同时,dropna在过滤缺失值中也是非常有用的。
在Series上使用dropna会返回Series中所有的非空数据及其索引值。
在DataFrame上使用dropna时,默认情况下会删除包含缺失值的行。该方法官方文档如下:
pd.DataFrame.dropna(
self,
axis=0,
how='any',
thresh=None,
subset=None,
inplace=False,
)
Docstring:
Remove missing values.
See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.
Parameters
----------
axis : {
0 or 'index', 1 or 'columns'}, default 0
Determine if rows or columns which contain missing values are
removed.
* 0, or 'index' : Drop rows which contain missing values.
* 1, or 'columns' : Drop columns which contain missing value.
.. versionchanged:: 1.0.0
Pass tuple or list to drop on multiple axes.
Only a single axis is allowed.
how : {
'any', 'all'}, default 'any'
Determine if row or column is removed from DataFrame, when we have
at least one NA or all NA.
* 'any' : If any NA values are present, drop that row or column.
* 'all' : If all values are NA, drop that row or column.
thresh : int, optional
Require that many non-NA values.
subset : array-like, optional
Labels along other axis to consider, e.g. if you are dropping rows
these would be a list of columns to include.
inplace : bool, default False
If True, do operation inplace and return None.
Returns
-------
DataFrame
DataFrame with NA entries dropped from it.
See Also
--------
DataFrame.isna: Indicate missing values.
DataFrame.not