1. Pandas DataFrame dropna()函数 (1. Pandas DataFrame dropna() Function)
Pandas DataFrame dropna() function is used to remove rows and columns with Null/NaN values. By default, this function returns a new DataFrame and the source DataFrame remains unchanged.
Pandas DataFrame dropna()函数用于删除具有Null / NaN值的行和列。 默认情况下,此函数返回一个新的DataFrame,而源DataFrame保持不变。
We can create null values using None, pandas.NaT, and numpy.nan variables.
我们可以使用None,pandas.NaT和numpy.nan变量创建空值。
The dropna() function syntax is:
dropna()函数的语法为:
dropna(self, axis=0, how="any", thresh=None, subset=None, inplace=False)
- axis: possible values are {0 or ‘index’, 1 or ‘columns’}, default 0. If 0, drop rows with null values. If 1, drop columns with missing values. axis :可能的值为{0或'index',1或'columns'},默认值为0。如果为0,则删除具有空值的行。 如果为1,则删除缺少值的列。
- how: possible values are {‘any’, ‘all’}, default ‘any’. If ‘any’, drop the row/column if any of the values is null. If ‘all’, drop the row/column if all the values are missing. 如何 :可能的值为{'any','all'},默认值为“ any”。 如果为“ any”,则在任何值为null的情况下删除行/列。 如果为“全部”,则在所有值均缺失的情况下删除行/列。
- thresh: an int value to specify the threshold for the drop operation. thresh :一个整数值,用于指定放置操作的阈值。
- subset: specifies the rows/columns to look for null values. 子集 :指定要查找空值的行/列。
- inplace: a boolean value. If True, the source DataFrame is changed and None is returned. inplace :布尔值。 如果为True,则更改源DataFrame,并返回None。
Let’s look at some examples of using dropna() function.
让我们看一些使用dropna()函数的示例。
2.熊猫使用所有Null / NaN / NaT值删除所有行 (2. Pandas Drop All Rows with any Null/NaN/NaT Values)
This is the default behavior of dropna() function.
这是dropna()函数的默认行为。
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, 4], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': ['CEO', None, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
# drop all rows with any NaN and NaT values
df1 = df.dropna()
print(df1)
Output:
输出:
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 None
2 David 3 NaN NaT
3 Lisa 4 NaT NaT
Name ID Salary Role
0 Pankaj 1 100 CEO
3.删除所有缺少任何值的列 (3. Drop All Columns with Any Missing Value)
We can pass axis=1
to drop columns with the missing values.
我们可以传递axis=1
来删除缺少值的列。
df1 = df.dropna(axis=1)
print(df1)
Output:
输出:
Name ID
0 Pankaj 1
1 Meghna 2
2 David 3
3 Lisa 4
4.仅当所有值都为空时才删除行/列 (4. Drop Row/Column Only if All the Values are Null)
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(how='all')
print(df1)
df1 = df.dropna(how='all', axis=1)
print(df1)
Output:
输出:
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David 3 NaN NaT
3 NaT NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David 3 NaN NaT
Name ID Salary
0 Pankaj 1 100
1 Meghna 2 200
2 David 3 NaN
3 NaT NaT NaT
5.超过空值的阈值时,DataFrame删除行/列 (5. DataFrame Drop Rows/Columns when the threshold of null values is crossed)
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', pd.NaT], 'ID': [1, 2, pd.NaT, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': [np.nan, np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(thresh=2)
print(df1)
Output:
输出:
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
2 David NaT NaN NaT
3 NaT NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 NaT
1 Meghna 2 200 NaT
The rows with 2 or more null values are dropped.
具有2个或更多空值的行将被删除。
6.定义标签以查找空值 (6. Define Labels to look for null values)
import pandas as pd
import numpy as np
d1 = {'Name': ['Pankaj', 'Meghna', 'David', 'Lisa'], 'ID': [1, 2, 3, pd.NaT], 'Salary': [100, 200, np.nan, pd.NaT],
'Role': ['CEO', np.nan, pd.NaT, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df1 = df.dropna(subset=['ID'])
print(df1)
Output:
输出:
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 NaN
2 David 3 NaN NaT
3 Lisa NaT NaT NaT
Name ID Salary Role
0 Pankaj 1 100 CEO
1 Meghna 2 200 NaN
2 David 3 NaN NaT
We can specify the index values in the subset when dropping columns from the DataFrame.
当从DataFrame中删除列时,我们可以在子集中指定索引值。
df1 = df.dropna(subset=[1, 2], axis=1)
print(df1)
Output:
输出:
Name ID
0 Pankaj 1
1 Meghna 2
2 David 3
3 Lisa NaT
The ‘ID’ column is not dropped because the missing value is looked only in index 1 and 2.
因为缺少的值仅在索引1和2中查找,所以不会删除“ ID”列。
7.放行,NA不存在 (7. Dropping Rows with NA inplace)
We can pass inplace=True
to change the source DataFrame itself. It’s useful when the DataFrame size is huge and we want to save some memory.
我们可以传递inplace inplace=True
来更改源DataFrame本身。 当DataFrame很大并且我们想节省一些内存时,这很有用。
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna'], 'ID': [1, 2], 'Salary': [100, pd.NaT]}
df = pd.DataFrame(d1)
print(df)
df.dropna(inplace=True)
print(df)
Output:
输出:
Name ID Salary
0 Pankaj 1 100.0
1 Meghna 2 NaN
Name ID Salary
0 Pankaj 1 100.0
8.参考 (8. References)
- Python Pandas Module Tutorial Python Pandas模块教程
- Pandas Drop Duplicate Rows 熊猫掉落重复行
- Pandas DataFrame dropna() API Doc 熊猫DataFrame dropna()API文档
翻译自: https://www.journaldev.com/33492/pandas-dropna-drop-null-na-values-from-dataframe