Detecting null values
pandas data structures have two useful methods for detecting null data:
isnull() and notnull(). Either one will return a boolean mask over the data.
For example:
boolean masks can be used directly as a Series or DataFrame index:
Dropping null values
dropna() can remove null values, the result is straightforward.
For a DataFrame, there are more options. Considering the following DataFrame.
we cannot drop a single value from a DataFrame, we can only drop full rows or full columns.Depending on the application, you want one or the other.
Alternatively, you can drop NA value along a different axis, axis=1 drops all columns containing a null value.
But this drops good data as well. We might rather be interested in rows or columns with all NA values,or a majority of NA values.This can be specified through the how or thresh parameters, which allow fine control of number of nulls to allow through.
The default is ‘how=any’, such that any row or column containing a null value will be dropped.
For finer-grained control, the thresh parameter can specify a minimum number of non-null values for the column/row to be kept.
Filling null values
Sometimes rather than dropping NA value, we’d rather replace them with a valid value. This value might be a single number like zero, or it might be some sort of imputation from good values.
Consider the following Series:
We can fill NA entries with a single value,such as zero:
or we can specify a forward-fill to propagate the previous value:
or specify a back-fill