python 剔除nan,在python字典中删除带有nan值的条目-CSDN博客

I have the foll. dictionary in python:

OrderedDict([(30, ('A1', 55.0)), (31, ('A2', 125.0)), (32, ('A3', 180.0)), (43, ('A4', nan))])

Is there a way to remove the entries where any of the values is NaN? I tried this:

{k: dict_cg[k] for k in dict_cg.values() if not np.isnan(k)}

It would be great if the soln works for both python 2 and python 3

解决方案

Since you have pandas, you can leverage pandas' pd.Series.notnull function here, which works with mixed dtypes.

>>> import pandas as pd

>>> {k: v for k, v in dict_cg.items() if pd.Series(v).notna().all()}

{30: ('A1', 55.0), 31: ('A2', 125.0), 32: ('A3', 180.0)}

This is not part of the answer, but may help you understand how I've arrived at the solution. I came across some weird behaviour when trying to solve this question, using pd.notnull directly.

Take dict_cg[43].

>>> dict_cg[43]

('A4', nan)

pd.notnull does not work.

>>> pd.notnull(dict_cg[43])

True

It treats the tuple as a single value (rather than an iterable of values). Furthermore, converting this to a list and then testing also gives an incorrect answer.

>>> pd.notnull(list(dict_cg[43]))

array([ True, True])

Since the second value is nan, the result I'm looking for should be [True, False]. It finally works when you pre-convert to a Series:

>>> pd.Series(dict_cg[43]).notnull()

0 True

1 False

dtype: bool

So, the solution is to Series-ify it and then test the values.

Along similar lines, another (admittedly roundabout) solution is to pre-convert to an object dtype numpy array, and pd.notnull will work directly:

>>> pd.notnull(np.array(dict_cg[43], dtype=object))

Out[151]: array([True, False])

I imagine that pd.notnull directly converts dict_cg[43] to a string array under the covers, rendering the NaN as a string "nan", so it is no longer a "null" value.